This document describes the metagenomic analysis pipeline used in the paper PAPER_NAME_LINK. Steps included range start with quality control and end with predicting open read frames are included.
Metagenome fastq files can be found under NCBI BioProject PRJNA693524.
Use FastQC to evaluate the quality of the raw metagenome fastq files.
# FastQC v0.11.8
/path/to/fastqc /path/to/R1.fastq.gz /path/to/R2.fastq.gz -o /path/to/fastqc_output_raw
Use Trimmomatic.
For adapters, I used the Trimmomatic TruSeq3 sequences and added one additional sequence that was overrepresented based on the FastQC outputs. Reverse read files for all samples (except W56) had the overrpresented sequence (string of G’s).
>PrefixPE/1
TACACTCTTTCCCTACACGACGCTCTTCCGATCT
>PrefixPE/2
GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT
>PE1
TACACTCTTTCCCTACACGACGCTCTTCCGATCT
>PE1_rc
AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA
>PE2
GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT
>PE2_rc
AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC
>U_R2/2
GGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGGG
I ran Trimmomatic using the above adapter files and used Trimmomatic’s recommended parameters.
# Trimmomatic v0.38
java -jar /path/to/trimmomatic-0.38.jar PE -threads 4 -trim.log /path/to/trimmomatic.log.txt /path/to/R1.fastq.gz /path/to/R2/fastqc.gz -baseout /path/to/trimmomatic/output/trimmed.fastq.gz ILLUMINACLIP:/path/to/adapter.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36
Evaluate Trimmomatic output with FastQC.
# FastQC v0.11.8
/path/to/fastqc /path/to/trimmed_1P.fastq.gz /path/to/trimmed_2P.fastq.gz /path/to/trimmed_1U.fastq.gz /path/to/trimmed_2U.fastq.gz -o /path/to/fastqc_output_trimmomatic
/path/to/R1.fastq.gz /path/to/R2.fastq.gz -o /path/to/fastqc_output_trimmomatic
I followed this tutorial by Matthias Scholz to remove host sequences using bowtie2 and samtools.
### Software Info ###
# Bowtie2 v2.2.2
# Samtools v0.1.19-44428cd
### Prepare host genome ###
# Download host genome
curl -O ftp://ftp.ncbi.nlm.nih.gov/genomes/all/GCA/000/236/235/GCA_000236235.1_SpeTri2.0/GCA_000236235.1_SpeTri2.0_assembly_structure/Primary_Assembly/unplaced_scaffolds/FASTA/unplaced.scaf.fna.gz
# Open downloaded file
gunzip unplaced.scaf.fna.gz
# Rename file
mv unplaced.scaf.fna 13liner.fna
# Build Host Database
bowtie2-build --seed 1 -p 10 13liner.fna 13liner.db
### Work with trimmomatic outputs ###
# Convert Trimmomatic output to SAM
bowtie2 --very-sensitive-local -p 4 --seed 1 -x /path/to/13liner.db -1 /path/to/trimmed_1P.fastq.gz -2 /path/to/trimmed_2P.fastq.gz -S /path/to/nohost.sam
# Convert SAM to BAM
samtools view -@ 4 -bS /path/to/nohost.sam > /path/to/nohost.bam
# Extract unmapped sequences (non-host sequences)
/path/to/samtools view -@ 4 -b -f 12 -F 256 /path/to/nohost.bam > /path/to/nohost.unmap.bam
# Sort BAM file to organized paired reads
/path/to/samtools sort -n /path/to/nohost.unmap.bam nohost.unmap.sorted.bam
# Convert BAM to fastq
/path/to/samtools bam2fq /path/to/nohost.unmap.sorted.bam > /path/to/nohost.fastq
Use FastQC to evaluate the quality of the metagenome after host DNA removal.
# FastQC v0.11.8
/path/to/fastqc /path/to/nohost.fastq -o /path/to/fastqc_output_nohost
Use metaSPAdes to assemble metagenome.
# SPAdes version 3.13.0
/path/to/SPAdes-3.13.0-Linux/bin/metaspades.py --pe1-12 /path/to/nohost.fastq -k 55,75,95 -t 20 -m 160 -o /path/to/metaSPAdes_output
Use MetaQUAST to evaluate the quality of the assembled metagenome.
# Quast v5.0.2
/home/GLBRCORG/echiang3/metag/quast-5.0.2/metaquast.py -m 1 -t 20 /path/to/metaSPAdes_output/contigs.fasta -o /path/to/fastqc_metaSPAdes --max-ref-number 0
### Software Info ###
# Bowtie2 v2.2.2
# Samtools v0.1.19-44428cd
### Prepare files ###
# Deinterleave metaSPAdes input file into forward and reverse read
# deinterleave_fastq.sh is by Nathan Watson-Haigh: https://gist.github.com/nathanhaigh/3521724
/path/to/deinterleave_fastq.sh < /path/to/nohost.fastq /path/to/nohost.R1.fastq /path/to/nohost.R2.fastq
# Convert metaSPAdes assembly into bowtie2 database
bowtie2-build --seed 1 /path/to/metaSPAdes_output/contigs.fasta /path/to/metaSPAdes.bowtie.database
### Map input reads to assembly ###
bowtie2 --sensitive-local -p 10 --seed 1 -x /path/to/metaSPAdes.bowtie.database -1 /path/to/nohost.R1.fastq -2 /path/to/nohost.R2.fastq -S /path/to/metaSPAdes.sam
Use prodigal to predict open reading frames.
# Prodigal v2.6.3
/path/to/prodigal -a /path/to/output.faa -d /path/to/output.ffn -i /path/to/metaSPAdes_output/contigs.fasta -o /path/to/prodigal.txt -p meta -s /path/to/genes.txt