Qiime2 Introductory Workshop
April, 2019
Robert W. Murdoch
Bioinformatics Resource Center, Center for Environmental Biotechnology, University of Tennessee, Knoxville
1 Overview
Qiime2 is a computing environment for processing and analyzing amplicon library data. Typically, these amplicon libraries are generated by targetting the 16S rRNA gene in a prokaryotic community in order to gain insight to the taxa present and their relative abundances. It is important to note that Qiime2 is a system in which other analyses operate, many of which are not designed by the Qiime2 developers at all. There is no single way to send data through the Qiime2 system; often, many different processing and analysis options are available.
This brief tutorial will take users through an analysis pipeline that is similar to the classic Qiime2 tutorials (https://docs.qiime2.org/2019.1/) but with a handful of key differences suited to what is a more likely scenario regarding data format and analysis choices (based on the experiences of the UT Bioinformatics Resource Center in teaching and collaboration with researchers from across campus).
- This tutorial uses a very detailed (and narrow) step-by-step walkthrough strategy to guide users through unfamiliar territory.
- This tutorial takes a quick and dirty approach, definitely treating qiime2 in as much a “black-box” manner as possible. You won’t come out of this an expert! But you will get a feeling for qiime2 and be ready to work through other tutorials
1.1 Goals and Outcome
- Familiarity with the Qiime2 system, how you can interact with it, how it processes files and what types of results it provides.
- This will also provide experience with command-line interafaces.
- Ability to use a valid, cutting-edge analysis method to generate Feature/Taxonomy tables (a.k.a. OTU tables)
2 Prerequisites
This tutorial uses Qiime2 version 2019.1. It has been developed on Ubuntu 18.04 machines and has not been tested in other environments, but given the structure of Qiime2, it will almost certainly work on any system where Qiime2 has been installed.
It is assumed that the user has only a very minimal experience with command-line work and little to no experience with Qiime2. That being said, it does not aim to teach the principals of amplicon library analysis or microbiome analysis.
2.1 How to follow this tutorial
While it is useful to keep a graphics-based file-explorer open during your analysis, everything is done in the command line. All commands you will use are contained in text boxes.
If you see anything in a text box like this
it is meant to be transcribed into the terminal
try to avoid mass copy/pasting!
typing it out will help to familiarize you with the system
2.2 Qiime2 View
Throughout the tutorial you will generate *.qzv files. These can be drag-and-dropped into the qiime2-view website: https://view.qiime2.org/
2.3 Help docs
Qiime2 is very well documented. Once it is installed and activated (we will do this in the first steps), you can always use “–help” switch to get a help file on how to interact with any given command. For example
qiime import -h
(this won’t do anything yet!)
3 Preparation
Open a terminal! You can do this through the operating system graphical-user-interface (OS GUI) or on Ubuntu, hit ctrl+alt+t. On MacOS, it is easy to find via the magnifying glass search on the upper right or the launch button in the task bar at the bottom (search for “terminal”)
3.1 Installing Qiime2
It is assumed that “Miniconda” is already installed on your machine (it is already installed on BRC computers, where this workshop is held).
This page has downloads and instructions for MiniConda installation https://docs.conda.io/en/latest/miniconda.html
Download and configure the Qiime2 conda package
3.1.1 On Linux machines (Ubuntu)
wget https://data.qiime2.org/distro/core/qiime2-2019.1-py36-linux-conda.yml
conda env create -n qiime2-2019.1 -f qiime2-2019.1-py36-linux-conda.yml
rm qiime2-2019.1-py36-linux-conda.yml
3.1.2 On MacOS
wget https://data.qiime2.org/distro/core/qiime2-2019.1-py36-osx-conda.yml
conda env create -n qiime2-2019.1 -f qiime2-2019.1-py36-osx-conda.yml
rm qiime2-2019.1-py36-osx-conda.yml
This will take some time to download and configure.
3.2 Activate Qiime2
You can check what conda environments are installed by;
conda env list
Activate (turn on) the qiime2 that you just installed
source activate qiime2-2019.1
alternatively if “source activate” fails, use this instead:
conda activate qiime2-2019.1
3.3 Setting up your project folder
Make a general “projects” folder in the home directory, then make a folder for this project
cd
mkdir Projects
cd Projects
mkdir qiime2.workshop.2019
cd qiime2.workshop.2019
From now on, always sit in this directory! All terminal commands are executed from here
4 Downloading data and metadata
Now make a sub-directory for the raw data to be placed in
mkdir reads
These commands download and decompress the reads into the correct directory
wget https://www.dropbox.com/s/9vpey3k0s3yjdmu/reads.tar.xz
tar -xvf reads.tar.xz -C reads
rm reads.tar.xz
Download the metadata
Metadata is very important and can be very tricky to configure correctly. We’re not focusing on this now, but for more reading, check out the main metadata tutorial at the qiime2 website: https://docs.qiime2.org/2019.1/tutorials/metadata/
5 Importing data
Your data has to be imported into qiime2 to start the process! This is a point where this tutorial differs largely from the qiime2 tutorials; our data is in a different format, it has already been demultiplexed.
qiime tools import \
--type SampleData[PairedEndSequencesWithQuality] \
--input-path reads \
--input-format CasavaOneEightSingleLanePerSampleDirFmt \
--output-path reads.qza
You generated a *.qza file; in the OS GUI, try double-clicking on it. It is just a compressed file with provenance information. It is ready to be passed off to other qiime2 widgets now.
5.1 About the data
This is real data and metadata that we are working with; it was generated by the Qiime2 developers and was analysed/described in a publication, Significant Impacts of Increasing Aridity on the Arid Soil Microbiome (doi: 10.1128/mSystems.00195-16)
6 Visualizing your reads (QA)
qiime demux summarize \
--i-data reads.qza \
--o-visualization reads-QA.qzv
This produzes a *.qzv file… the “v” stands for “visualization”. .qzv format is used all over qiime2 for files that you can actually look at, tables, graphs, lists, etc.
drag and drop reads-QA.qzv at https://view.qiime2.org/
6.0.1 This QA analysis groups all the samples together. How do you think you might look at just one sample? Or generate a file for each sample?
7 Removing primers
qiime cutadapt trim-paired \
--i-demultiplexed-sequences reads.qza \
--p-cores 8 \
--p-front-f GTGYCAGCMGCCGCGGTAA \
--p-front-r GGACTACNVGGGTWTCTAAT \
--o-trimmed-sequences reads-cutadapt \
--verbose > cutadapt_log.txt
It is important to note that the forward and r primers MUST be appropriate for your data set; the primers that were used to perform the original PCR reaction.
This step produces a log-file called cutdapt_log.txt. (the primers had already been cleaned from this data set so you won’t see much trimming going on… but this is how you can go go about it with your own data).
8 Denoising
A.K.A making OTU tables
Dada2 is an algorithm that handles quality control, read error correction, and paired end merging.
qiime dada2 denoise-paired \
--i-demultiplexed-seqs reads-cutadapt.qza \
--p-n-threads 6 \
--p-trunc-len-f 150 \
--p-trunc-len-r 150 \
--o-table table.qza \
--o-representative-sequences rep-seqs.qza \
--o-denoising-stats denoising-stats.qza
This has produced three new *.qza files; a table, a list of sequences, and a stats file. Take a quick look at these via the GUI to see what is inside them.
Generate visualization files
qiime feature-table summarize \
--i-table table.qza \
--o-visualization table.qzv \
--m-sample-metadata-file metadata.tsv
qiime feature-table tabulate-seqs \
--i-data rep-seqs.qza \
--o-visualization rep-seqs.qzv
This generates two more visualization files; check them out at qiime2 view
9 Taxonomic Classification
Let’s taxonomically classify our representative sequences.
Download a pre-trained classifier based on the Silva database (note that I have pre-trained a simple genus-level classifier for this workshop; instructions for making your own classifier are at the end of this document. Alternatively, the qiime group offers pre-trained classifiers at their website)
wget https://www.dropbox.com/s/1qdhs7mz4f61t3k/classifier.qza
Note that what we downloaded is already a qza, a qiime2 artifact. It has been trimmed and trained on the 515/806 primer set. If you find yourself on your own project using different primers, you’ll have to download a different model or train your own classifier.
Classify your representative sequences
qiime feature-classifier classify-sklearn \
--i-classifier classifier.qza \
--p-n-jobs 6 \
--i-reads rep-seqs.qza \
--o-classification taxonomy.qza
and generate a qzv taxonomy file
qiime metadata tabulate \
--m-input-file taxonomy.qza \
--o-visualization taxonomy.qzv
Note that this is only the feature taxonomy and has no abundance information.
9.1 Create bar-plots
Next we will cobine the feature table, feature taxonomy, and metadata to make taxonomy barplots
qiime taxa barplot \
--i-table table.qza \
--i-taxonomy taxonomy.qza \
--m-metadata-file metadata.tsv \
--o-visualization taxa-bar-plots.qzv
This is where everything starts to come together. Take some time to explore this visualization.
You can: * visualize by any taxonomic level * sort by any metadata feature * download a particular visualization as an .svg file gather csv tables of any particular taxnomic aggregation
9.2 Exporting OTU tables
Qiime2 seems to be build under the assumption that you will continue analyzing within the Qiime2 environment. Within qiime2, you can do ordination, multivariate testing, differential abundance testing, diversity analyses, etc.
If you want to get your classic OTU tables immediately, from the bar-plot visualization, you can download csv tables.
Alternately, you can export the feature table and feature taxonomy and work with them yourself in R packages or spreadsheet programs
first, make a directory for your exports
mkdir exports
convert the table.qza object to a biom format file (this is a difficult format)
qiime tools export \
--input-path table.qza \
--output-path feature-table
convert the biom file to a simple tsv table
biom convert \
-i feature-table/feature-table.biom \
-o exports/feature-table.tsv \
--to-tsv
export the taxonomy file
qiime tools export \
--input-path taxonomy.qza \
--output-path exports
These two files, in the export directory, can be integrated (in another program) to produce classic OTU tables.
10 Scripting
In this recipe, you ran qiime2 commands one at a time.
What happens when you want to work with another data set? Or if somebody asks what you did?
Let’s handle essential issues like repeatability and documentation by making a script. A “script” in this case will simply be a text document with all of the steps in order which we can run directly from the command-line.
first lets make a new directory
cd/..
mkdir qiime2_2019_scripting
cd qiime2_2019_scripting
Make a new text file
touch qiime2_workshop.sh
nano qiime2_workshop.sh
You are now in a very simple text editor called “nano”. Note that you can make text files using many different programs; nano is clumsy, but a nice core text editor that you should be aware of.
We are now making a “bash” or “shell” script; the only thing that makes this text file special is this: start the file with this:
#!/bin/bash
Now type in / copy in each of the commands from the recipe, starting at “Downloading Data and Metadata” step, all the way down to “Exporting OTU Tables”.
Note that you can use the “#” character to make comments to yourself or to anybody who reads this script; anything following the “#” character will be ignored until the next new-line (hard return)
Your script will look start with something that looks like this (don’t type this its just an example)
#!/bin/bash
# April, 2019
## P. Herman
#DATA IMPORT
mkdir reads
wget https://website.com/l33tdataz.tar
etc
Hit CTRL+X to exit nano; be sure to hit “Y” on exiting in order to save your work.
Now you have what is hopefully a completed script which, with one command, will download data, metadata, taxonomic classifer, trim primers, denoise, classify, and produce tables and qzv files.
Note that you cannot activate the qiime2 environment from the scipt, you must do this manually
Try running it:
bash qiime2_workshop.sh
Monitor the terminal output (at this stage this is called “stdout”) for progress, success (green lines) and any errors (red lines). If you run into any errors, hit CTRL+X or CTRL+C to terminate the script and try to figure out what went wrong!
You now have a text file that can, after qiime2 is installed and activated, can be adapted to new data, new primers, new classifiers, etc. Feel free to email to yourself this text file (i.e. bash script)
10.1 What makes it a bash script?
#!/bin/bash as the first line + “.sh” file-extension; that’s it! (…psst, the “.sh” isn’t even required)
11 Conclusions
11.1 Your results
Take some time to explore the various data files, both qza and qzv. There is a ton of useful information in there.
Can you find where PHRED score can be visualized?
Take a look at the two files in “exports” folder. Do you know how to combine these to make a classic OTU table? Check out the “vlookup” function in Excel/LibreOffice
11.2 Processing your own data
To adapt this basic recipe to your own data, you will need to consider:
- are these the appropriate primers
- is the classifier using the appropriate clustering percentage and primers (see last section for some details)
- is your data in the same format
- is your metadata ready to be imported
It is challenging to make the transition; there are many knobs and dials and switches that you will need to learn about, but you can do it! The qiime2 tutorials (https://docs.qiime2.org/2019.1/tutorials/) cover many of these issues, not always in the most clear way for a beginner, but it is all there. Remember “-h” also!
11.3 What comes next
Making the OTU table and classifying your sequences is the first step, which I would call data processing rather than analysis. After this comes the real battle!
Now that you have your bearings, check out the tutorials that qiime2 group maintains (https://docs.qiime2.org/2019.1/tutorials/) ; their data formats are often a bit weird, so be prepared for that!
- Before you start learning at the qiime2 website, be sure you know the difference between multiplexed and non-demultiplexed data!
12 Extra Stuff: Training the feature classifier
This short tutorial used a pre-trained classifier at a high-level of clustering, 90%, which only takes us to genus.. this was specifically so that the taxonomic classification would be FAST. This is the procedure used to train the model, which can be altered to get a 97 or 99% model.
Download the Silva132 collection of reference files
wget https://www.arb-silva.de/fileadmin/silva_databases/qiime/Silva_128_release.tgz
tar -xvf Silva_128_release.tgz
rm Silva_128_release.tgz
Importing the 90% 16S-only data set sequences and taxonomy (this can be redirected to any rep set you want to work with)
qiime tools import \
--type FeatureData[Sequence] \
--input-path SILVA_128_QIIME_release/rep_set/rep_set_16S_only/90/90_otus_16S.fasta \
--output-path 90_reps.qza
qiime tools import \
--type FeatureData[Taxonomy] \
--input-format HeaderlessTSVTaxonomyFormat \
--input-path SILVA_128_QIIME_release/taxonomy/16S_only/90/consensus_taxonomy_7_levels.txt \
--output-path 90_ref_taxonomy.qza
Training a classifier using the earth microbiome primer set (this can be substituted for your own primers or primers can simply be ignored)
First the target region is extracted based on the primer set
qiime feature-classifier extract-reads \
--i-sequences 90_reps.qza \
--p-f-primer GTGCCAGCMGCCGCGGTAA \
--p-r-primer GGACTACHVGGGTWTCTAAT \
--p-min-length 100 \
--p-max-length 400 \
--o-reads ref-seqs.qza
Then the model is trained
qiime feature-classifier fit-classifier-naive-bayes \
--i-reference-reads ref-seqs.qza \
--i-reference-taxonomy 90_ref_taxonomy.qza \
--o-classifier classifier.qza