QIIME2 tutorial

for help with installation, visit qiime2 website

to start conda and activate qiime2 run these commands

source ~/anaconda3/etc/profile.d/conda.sh
conda activate qiime2-amplicon-2024.5

Running conda and qiime2 on Metacentrum

always check current versions

module add mambaforge && mamba env list
export LANG=C.UTF-8
conda activate qiime2-amplicon-2024.5

Importing data (and demultiplexing)

import data to qiime, in my case demultiplexed paired-end fastq files

I dont have multiplexed reads or Casava format files so I will use manifest file metadata.tsv for import

Making a metadata.tsv file

I make a table with three columns: sample-id, forward-absolute-filepath, reverse-absolute-filepath

tip for making the metadata file easily:

#get all the paths to my fastq files using find

find /path_to_the_directory_with_fastq_files/ -name “KAP*_R1.fastq” > R1.txt

find /path_to_the_directory_with_fastq_files/ -name “KAP*_R2.fastq” > R2.txt

save it as metadata.tsv

use threshold Phred33V2

for other input formats and file types, see: https://docs.qiime2.org/2022.8/tutorials/importing/

qiime tools import 
--type 'SampleData[PairedEndSequencesWithQuality]' \
--input-path ./metadata.tsv \
--output-path  ./paired-end-demux.qza \ 
--input-format PairedEndFastqManifestPhred33V2 

Visualizing quality of reads

upload the .qzv file to view.qiime2.org

qiime demux summarize 
--i-data paired-end-demux.qza \
--o-visualization paired-end-demux.qzv

Trimming off primers

for primers 515F and 806R

qiime cutadapt trim-paired 
--i-demultiplexed-sequences paired-end-demux.qza \
--p-front-f GTGYCAGCMGCCGCGGTAA \
--p-front-r GGACTACNVGGGTWTCTAAT \
--p-discard-untrimmed \
--o-trimmed-sequences demux-paired-end-trimmed.qza \
--verbose

Denoising with DADA2

(denoising can be also done by Deblur)

its important to choose the right parameters for sequence trimming based on the data, also a chimera removal step here, truncate the reads based on the sequence quality

This command will put 3 files in the directory DADA2_denoising_output: table.qza (per-sample ASV count table), representative_sequences.qza (ASV IDs and their sequences), and DADA2-stats.qza

qiime dada2 denoise-paired 
--i-demultiplexed-seqs demux-paired-end-trimmed.qza \
--p-trunc-len-f 200 \ 
--p-trunc-len-r 200 \
--p-chimera-method consensus \
--output-dir DADA2_denoising_output \
--verbose

Using single-end reads

if the quality of one of the reads is not good, you can use only one of the reads, import only single reads, the metadata.tsv file needs to have two columns: sample-id, absolute-filepath

qiime tools import 
--type 'SampleData[SequencesWithQuality]'\
--input-path metadata.tsv\
--output-path single-end-demux.qza\
--input-format SingleEndFastqManifestPhred33V2

qiime demux summarize 
--i-data single-end-demux.qza\
--o-visualization single-end-demux.qzv

qiime dada2 denoise-single 
--i-demultiplexed-seqs single-end-demux.qza\
--p-trunc-len 200\
--p-chimera-method consensus\
--output-dir DADA2_denoising_output\
--verbose

then copy the metadata file to the DADA2 folder and cd DADA2_denoising_output

Visualization of denoized results

generate visualizations for these results

qiime feature-table summarize
--i-table table.qza\
--o-visualization table.qzv\
--m-sample-metadata-file metadata.tsv

qiime feature-table tabulate-seqs
--i-data representative_sequences.qza\
--o-visualization representative_sequences.qzv

qiime metadata tabulate 
--m-input-file denoising_stats.qza\
--o-visualization denoising_stats.qzv

Optional: Chimera removal using vsearch tool

qiime vsearch uchime-denovo
--i-table table.qza 
--i-sequences representative_sequences.qza 
--output-dir uchime-dn-out

visualize stats for chimeric sequences

qiime metadata tabulate 
--m-input-file uchime-dn-out/stats.qza 
--o-visualization uchime-dn-out/stats.qzv

filter out table and sequences to exclude chimeras and “borderline chimeras”

qiime feature-table filter-features --i-table table.qza --m-metadata-file uchime-dn-out/nonchimeras.qza --o-filtered-table uchime-dn-out/table-nonchimeric_wo_borderline.qza

qiime feature-table filter-seqs --i-data representative_sequences.qza --m-metadata-file uchime-dn-out/nonchimeras.qza --o-filtered-data uchime-dn-out/rep-seqs-nonchimeric_wo_borderline.qza

qiime feature-table summarize --i-table uchime-dn-out/table-nonchimeric_wo_borderline.qza  --o-visualization uchime-dn-out/table-nonchimeric_wo_borderline.qzv

filter out table and sequences to exclude chimeras but retain “borderline chimeras”

qiime feature-table filter-features --i-table table.qza --m-metadata-file uchime-dn-out/chimeras.qza --p-exclude-ids --o-filtered-table uchime-dn-out/table-nonchimeric.qza

qiime feature-table filter-seqs --i-data representative_sequences.qza --m-metadata-file uchime-dn-out/chimeras.qza --p-exclude-ids --o-filtered-data uchime-dn-out/rep-seqs-nonchimeric.qza

qiime feature-table summarize --i-table uchime-dn-out/table-nonchimeric.qza  --o-visualization uchime-dn-out/table-nonchimeric.qzv

Assign taxonomy

do this step on a cluster

using self-trained Silva Naive Bayes classifier, this and other classifiers can be downloaded from qiime2

qiime feature-classifier classify-sklearn 
--i-reads /.../representative_sequences.qza\
--i-classifier /.../silva-138-99-515-806-nb-classifier.qza\
--output-dir /.../classified_sequences\
--verbose

Visualize classification

qiime metadata tabulate 
--m-input-file classification.qza \
--o-visualization classification.qzv

Rename sample or feature IDs

If you need to rename the sample-ids

Add a metadata column called “new-ID” defining the new ids in the metadata.tsv file. Each original id must map to a new unique id. If strict mode is used, then every id in the original table must have a new id.

qiime feature-table rename-ids
--i-table table.qza \ 
--m-metadata-file rename.tsv \
--m-metadata-column new-ID \
--o-renamed-table table-renamed.qza

Filtering data based on taxonomy

#filter out unwanted taxa, e.g. contamination Vibrio spp.
qiime taxa filter-table   
--i-table table.qza \   
--i-taxonomy classification.qza \
--p-exclude Vibrio \  
--o-filtered-table table-no-contamination.qza

Keep only certain taxa

only include certain taxa, e.g. Methanoregula and Methanobacterium spp. and visualize

p-exclude and p-include can be done together in one command

#keep only certain genera
qiime taxa filter-table 
--i-table table.qza\
--i-taxonomy classification.qza\
--p-include "g__Methanoregula,g__Methanobacterium"\
--o-filtered-table table-Methanoregula-Methanobacterium.qza

#then you can visualize the filtered results
qiime taxa barplot 
--i-table table-Methanoregula-Methanobacterium.qza\
--i-taxonomy classification.qza\
--m-metadata-file metadata.tsv\
--o-visualization taxa-barplot-Methanoregula-Methanobacterium.qzv

#or filter out only family Syntrophaceae
#you have to add the f__ (states for family in the taxonomy file)
qiime taxa filter-seqs 
--i-sequences representative_sequences.qza 
--i-taxonomy classification.qza 
--p-include f__Syntrophaceae 
--o-filtered-sequences Syntrophaceae-seqs.qza

#or filter out only phylum Euryarchaeota
qiime taxa filter-seqs 
--i-sequences representative_sequences.qza 
--i-taxonomy classification.qza 
--p-include p__Euryarchaeota 
--o-filtered-sequences Euryarchaeota-seqs.qza

Filter sequences based on a feature-table

you can also filter out sequences based on the edited feature table which can be then converted to fasta file

qiime feature-table filter-seqs
--i-data representative_sequences.qza \
--i-table table-Methanoregula-Methanobacterium.qza \
--o-filtered-data Methanoregula-Methanobacterium-seqs.qza

qiime tools export 
--input-path Methanoregula-Methanobacterium-seqs.qza \
--output-path export

Filtering based on sample-ID

qiime feature-table filter-samples \
--i-table merged-table.qza \
--m-metadata-file samples-to-keep.tsv \
--o-filtered-table id-filtered-table.qza

Merging data from different runs

(also possible for runs sequenced with different primers)

#merging data 
qiime feature-table merge 
--i-tables filepath1/table.qza\
--i-tables filepath2/table.qza\
--o-merged-table merged-table.qza

#merge representative sequences 
qiime feature-table merge-seqs
--i-data filepath1/representative_sequences.qza\
--i-data filepath2/representative_sequences.qza\
--o-merged-data merged-rep-seqs.qza

#combine assigned classification
qiime feature-table merge-taxa
--i-data filepath1/classification.qza\
--i-data filepath2/classification.qza\
--o-merged-data merged-taxonomy.qza

Barplot

qiime taxa barplot --i-table table.qza --i-taxonomy classification.qza --m-metadata-file metadata.tsv --o-visualization taxa-barplot.qzv

Building a phylogeny

when working with amplicon data, we dont want to do de novo phylogenetic trees, we can use qiime fragment-insertion tree with a reference database, which can be either downloaded from qiime2 web or you can make your own reference database.

run fragment insertion tree on a cluster using this script insertion-tree.sh

#SBATCH -t 24:00:00
#SBATCH -n 24
#SBATCH -N 1
#SBATCH --mem=200GB
#SBATCH -e errorfile_sepp_taxafiltered
#SBATCH --mail-type=ALL
#SBATCH --mail-user=
        
qiime fragment-insertion sepp \
--i-representative-sequences merged-rep-seqs-all.qza \
--i-reference-database sepp-refs-silva-128.qza \
--o-tree insertion-tree.qza \
--o-placements insertion-placements.qza

Tree visualization

import sample data in metadata.tsv and visualize the tree

qiime empress community-plot 
--i-tree insertion-tree.qza \
--i-feature-table merged-table-all.qza  \ 
--m-sample-metadata-file metadata-all.tsv \
--m-feature-metadata-file merged-taxonomy.qza \
--o-visualization empress-tree.qzv \
--p-filter-missing-features \
--p-ignore-missing-samples

Data export

exporting from qiime2 to use in R
#export count table 
qiime tools export --input-path table.qza \
--output-path export/table
convert the resulting .biom file to .tsv
biom convert 
-i feature-table.biom \ 
-o table.tsv --to-tsv
export representative sequences to a fasta file
qiime tools export 
--input-path merged-rep-seqs.qza \
--output-path export/rep-seqs.fasta
export taxonomy/classification file to a tsv file
qiime tools export
--input-path classification.qza \
--output-path export/taxonomy
export phylogenetic tree
qiime tools export \
  --input-path insertion-tree.qza \
  --output-path tree