Qiime2 Introductory Workshop

1 Overview
- 1.1 Goals and Outcome
2 Prerequisites
3 Preparation
4 Downloading data and metadata
5 Importing data
- 5.1 About the data
6 Visualizing your reads (QA)
7 Removing primers
8 Denoising
9 Taxonomic Classification
- 9.1 Create bar-plots
- 9.2 Exporting OTU tables
10 Scripting
- 10.1 What makes it a bash script?
11 Conclusions
12 Extra Stuff: Training the feature classifier

April, 2019
Robert W. Murdoch
Bioinformatics Resource Center, Center for Environmental Biotechnology, University of Tennessee, Knoxville

1 Overview

Qiime2 is a computing environment for processing and analyzing amplicon library data. Typically, these amplicon libraries are generated by targetting the 16S rRNA gene in a prokaryotic community in order to gain insight to the taxa present and their relative abundances. It is important to note that Qiime2 is a system in which other analyses operate, many of which are not designed by the Qiime2 developers at all. There is no single way to send data through the Qiime2 system; often, many different processing and analysis options are available.

This brief tutorial will take users through an analysis pipeline that is similar to the classic Qiime2 tutorials (https://docs.qiime2.org/2019.1/) but with a handful of key differences suited to what is a more likely scenario regarding data format and analysis choices (based on the experiences of the UT Bioinformatics Resource Center in teaching and collaboration with researchers from across campus).

This tutorial uses a very detailed (and narrow) step-by-step walkthrough strategy to guide users through unfamiliar territory.
This tutorial takes a quick and dirty approach, definitely treating qiime2 in as much a “black-box” manner as possible. You won’t come out of this an expert! But you will get a feeling for qiime2 and be ready to work through other tutorials

1.1 Goals and Outcome

Familiarity with the Qiime2 system, how you can interact with it, how it processes files and what types of results it provides.
This will also provide experience with command-line interafaces.
Ability to use a valid, cutting-edge analysis method to generate Feature/Taxonomy tables (a.k.a. OTU tables)

2 Prerequisites

This tutorial uses Qiime2 version 2019.1. It has been developed on Ubuntu 18.04 machines and has not been tested in other environments, but given the structure of Qiime2, it will almost certainly work on any system where Qiime2 has been installed.

It is assumed that the user has only a very minimal experience with command-line work and little to no experience with Qiime2. That being said, it does not aim to teach the principals of amplicon library analysis or microbiome analysis.

2.1 How to follow this tutorial

While it is useful to keep a graphics-based file-explorer open during your analysis, everything is done in the command line. All commands you will use are contained in text boxes.

If you see anything in a text box like this
it is meant to be transcribed into the terminal
try to avoid mass copy/pasting!
typing it out will help to familiarize you with the system

2.2 Qiime2 View

Throughout the tutorial you will generate *.qzv files. These can be drag-and-dropped into the qiime2-view website: https://view.qiime2.org/

2.3 Help docs

Qiime2 is very well documented. Once it is installed and activated (we will do this in the first steps), you can always use “–help” switch to get a help file on how to interact with any given command. For example

qiime import -h

(this won’t do anything yet!)

3 Preparation

Open a terminal! You can do this through the operating system graphical-user-interface (OS GUI) or on Ubuntu, hit ctrl+alt+t. On MacOS, it is easy to find via the magnifying glass search on the upper right or the launch button in the task bar at the bottom (search for “terminal”)

3.1 Installing Qiime2

It is assumed that “Miniconda” is already installed on your machine (it is already installed on BRC computers, where this workshop is held).

This page has downloads and instructions for MiniConda installation https://docs.conda.io/en/latest/miniconda.html

Download and configure the Qiime2 conda package

3.1.1 On Linux machines (Ubuntu)

wget https://data.qiime2.org/distro/core/qiime2-2019.1-py36-linux-conda.yml
conda env create -n qiime2-2019.1 -f qiime2-2019.1-py36-linux-conda.yml
rm qiime2-2019.1-py36-linux-conda.yml

3.1.2 On MacOS

wget https://data.qiime2.org/distro/core/qiime2-2019.1-py36-osx-conda.yml
conda env create -n qiime2-2019.1 -f qiime2-2019.1-py36-osx-conda.yml
rm qiime2-2019.1-py36-osx-conda.yml

This will take some time to download and configure.

3.2 Activate Qiime2

You can check what conda environments are installed by;

conda env list

Activate (turn on) the qiime2 that you just installed

source activate qiime2-2019.1

alternatively if “source activate” fails, use this instead:

conda activate qiime2-2019.1

3.3 Setting up your project folder

Make a general “projects” folder in the home directory, then make a folder for this project

cd
mkdir Projects
cd Projects
mkdir qiime2.workshop.2019
cd qiime2.workshop.2019

From now on, always sit in this directory! All terminal commands are executed from here

4 Downloading data and metadata

Now make a sub-directory for the raw data to be placed in

mkdir reads

These commands download and decompress the reads into the correct directory

wget https://www.dropbox.com/s/9vpey3k0s3yjdmu/reads.tar.xz
tar -xvf reads.tar.xz -C reads
rm reads.tar.xz

Download the metadata

wget https://www.dropbox.com/s/vxbdt56ifentbh5/metadata.tsv

Metadata is very important and can be very tricky to configure correctly. We’re not focusing on this now, but for more reading, check out the main metadata tutorial at the qiime2 website: https://docs.qiime2.org/2019.1/tutorials/metadata/

5 Importing data

Your data has to be imported into qiime2 to start the process! This is a point where this tutorial differs largely from the qiime2 tutorials; our data is in a different format, it has already been demultiplexed.

qiime tools import \
--type SampleData[PairedEndSequencesWithQuality] \
--input-path reads \
--input-format CasavaOneEightSingleLanePerSampleDirFmt \
--output-path reads.qza

You generated a *.qza file; in the OS GUI, try double-clicking on it. It is just a compressed file with provenance information. It is ready to be passed off to other qiime2 widgets now.

5.1 About the data

This is real data and metadata that we are working with; it was generated by the Qiime2 developers and was analysed/described in a publication, Significant Impacts of Increasing Aridity on the Arid Soil Microbiome (doi: 10.1128/mSystems.00195-16)

6 Visualizing your reads (QA)

qiime demux summarize \
--i-data reads.qza \
--o-visualization reads-QA.qzv

This produzes a *.qzv file… the “v” stands for “visualization”. .qzv format is used all over qiime2 for files that you can actually look at, tables, graphs, lists, etc.

drag and drop reads-QA.qzv at https://view.qiime2.org/

6.0.1 This QA analysis groups all the samples together. How do you think you might look at just one sample? Or generate a file for each sample?

7 Removing primers

qiime cutadapt trim-paired \
--i-demultiplexed-sequences reads.qza \
--p-cores 8 \
--p-front-f GTGYCAGCMGCCGCGGTAA \
--p-front-r GGACTACNVGGGTWTCTAAT \
--o-trimmed-sequences reads-cutadapt \
--verbose > cutadapt_log.txt

It is important to note that the forward and r primers MUST be appropriate for your data set; the primers that were used to perform the original PCR reaction.

This step produces a log-file called cutdapt_log.txt. (the primers had already been cleaned from this data set so you won’t see much trimming going on… but this is how you can go go about it with your own data).

8 Denoising

A.K.A making OTU tables

Dada2 is an algorithm that handles quality control, read error correction, and paired end merging.

qiime dada2 denoise-paired \
--i-demultiplexed-seqs reads-cutadapt.qza \
--p-n-threads 6 \
--p-trunc-len-f 150 \
--p-trunc-len-r 150 \
--o-table table.qza \
--o-representative-sequences rep-seqs.qza \
--o-denoising-stats denoising-stats.qza

This has produced three new *.qza files; a table, a list of sequences, and a stats file. Take a quick look at these via the GUI to see what is inside them.

Generate visualization files

qiime feature-table summarize \
--i-table table.qza \
--o-visualization table.qzv \
--m-sample-metadata-file metadata.tsv

qiime feature-table tabulate-seqs \
--i-data rep-seqs.qza \
--o-visualization rep-seqs.qzv

This generates two more visualization files; check them out at qiime2 view

9 Taxonomic Classification

Let’s taxonomically classify our representative sequences.

Download a pre-trained classifier based on the Silva database (note that I have pre-trained a simple genus-level classifier for this workshop; instructions for making your own classifier are at the end of this document. Alternatively, the qiime group offers pre-trained classifiers at their website)

wget https://www.dropbox.com/s/1qdhs7mz4f61t3k/classifier.qza

Note that what we downloaded is already a qza, a qiime2 artifact. It has been trimmed and trained on the 515/806 primer set. If you find yourself on your own project using different primers, you’ll have to download a different model or train your own classifier.

Classify your representative sequences

qiime feature-classifier classify-sklearn \
--i-classifier classifier.qza \
--p-n-jobs 6 \
--i-reads rep-seqs.qza \
--o-classification taxonomy.qza

and generate a qzv taxonomy file

qiime metadata tabulate \
--m-input-file taxonomy.qza \
--o-visualization taxonomy.qzv

Note that this is only the feature taxonomy and has no abundance information.

9.1 Create bar-plots

Next we will cobine the feature table, feature taxonomy, and metadata to make taxonomy barplots

qiime taxa barplot \
--i-table table.qza \
--i-taxonomy taxonomy.qza \
--m-metadata-file metadata.tsv \
--o-visualization taxa-bar-plots.qzv

This is where everything starts to come together. Take some time to explore this visualization.

You can: * visualize by any taxonomic level * sort by any metadata feature * download a particular visualization as an .svg file gather csv tables of any particular taxnomic aggregation

9.2 Exporting OTU tables

Qiime2 seems to be build under the assumption that you will continue analyzing within the Qiime2 environment. Within qiime2, you can do ordination, multivariate testing, differential abundance testing, diversity analyses, etc.

If you want to get your classic OTU tables immediately, from the bar-plot visualization, you can download csv tables.

Alternately, you can export the feature table and feature taxonomy and work with them yourself in R packages or spreadsheet programs

first, make a directory for your exports

mkdir exports

convert the table.qza object to a biom format file (this is a difficult format)

qiime tools export \
--input-path table.qza \
--output-path feature-table

convert the biom file to a simple tsv table

biom convert \
-i feature-table/feature-table.biom \
-o exports/feature-table.tsv \
--to-tsv

export the taxonomy file

qiime tools export \
--input-path taxonomy.qza \
--output-path exports

These two files, in the export directory, can be integrated (in another program) to produce classic OTU tables.

10 Scripting

In this recipe, you ran qiime2 commands one at a time.

What happens when you want to work with another data set? Or if somebody asks what you did?

Let’s handle essential issues like repeatability and documentation by making a script. A “script” in this case will simply be a text document with all of the steps in order which we can run directly from the command-line.

first lets make a new directory

cd/..
mkdir qiime2_2019_scripting
cd qiime2_2019_scripting

Make a new text file

touch qiime2_workshop.sh
nano qiime2_workshop.sh

You are now in a very simple text editor called “nano”. Note that you can make text files using many different programs; nano is clumsy, but a nice core text editor that you should be aware of.

We are now making a “bash” or “shell” script; the only thing that makes this text file special is this: start the file with this:

#!/bin/bash

Now type in / copy in each of the commands from the recipe, starting at “Downloading Data and Metadata” step, all the way down to “Exporting OTU Tables”.

Note that you can use the “#” character to make comments to yourself or to anybody who reads this script; anything following the “#” character will be ignored until the next new-line (hard return)

Your script will look start with something that looks like this (don’t type this its just an example)

#!/bin/bash

# April, 2019
## P. Herman

#DATA IMPORT

mkdir reads

wget https://website.com/l33tdataz.tar
etc

Hit CTRL+X to exit nano; be sure to hit “Y” on exiting in order to save your work.

Now you have what is hopefully a completed script which, with one command, will download data, metadata, taxonomic classifer, trim primers, denoise, classify, and produce tables and qzv files.

Note that you cannot activate the qiime2 environment from the scipt, you must do this manually

Try running it:

bash qiime2_workshop.sh

Monitor the terminal output (at this stage this is called “stdout”) for progress, success (green lines) and any errors (red lines). If you run into any errors, hit CTRL+X or CTRL+C to terminate the script and try to figure out what went wrong!

You now have a text file that can, after qiime2 is installed and activated, can be adapted to new data, new primers, new classifiers, etc. Feel free to email to yourself this text file (i.e. bash script)

10.1 What makes it a bash script?

#!/bin/bash as the first line + “.sh” file-extension; that’s it! (…psst, the “.sh” isn’t even required)

11 Conclusions

11.1 Your results

Take some time to explore the various data files, both qza and qzv. There is a ton of useful information in there.

Can you find where PHRED score can be visualized?
Take a look at the two files in “exports” folder. Do you know how to combine these to make a classic OTU table? Check out the “vlookup” function in Excel/LibreOffice

11.2 Processing your own data

To adapt this basic recipe to your own data, you will need to consider:

are these the appropriate primers
is the classifier using the appropriate clustering percentage and primers (see last section for some details)
is your data in the same format
is your metadata ready to be imported

It is challenging to make the transition; there are many knobs and dials and switches that you will need to learn about, but you can do it! The qiime2 tutorials (https://docs.qiime2.org/2019.1/tutorials/) cover many of these issues, not always in the most clear way for a beginner, but it is all there. Remember “-h” also!

11.3 What comes next

Making the OTU table and classifying your sequences is the first step, which I would call data processing rather than analysis. After this comes the real battle!

Now that you have your bearings, check out the tutorials that qiime2 group maintains (https://docs.qiime2.org/2019.1/tutorials/) ; their data formats are often a bit weird, so be prepared for that!

Before you start learning at the qiime2 website, be sure you know the difference between multiplexed and non-demultiplexed data!

12 Extra Stuff: Training the feature classifier

This short tutorial used a pre-trained classifier at a high-level of clustering, 90%, which only takes us to genus.. this was specifically so that the taxonomic classification would be FAST. This is the procedure used to train the model, which can be altered to get a 97 or 99% model.

Download the Silva132 collection of reference files

wget https://www.arb-silva.de/fileadmin/silva_databases/qiime/Silva_128_release.tgz
tar -xvf Silva_128_release.tgz
rm Silva_128_release.tgz

Importing the 90% 16S-only data set sequences and taxonomy (this can be redirected to any rep set you want to work with)

qiime tools import \
--type FeatureData[Sequence] \
--input-path SILVA_128_QIIME_release/rep_set/rep_set_16S_only/90/90_otus_16S.fasta \
--output-path 90_reps.qza

qiime tools import \
--type FeatureData[Taxonomy] \
--input-format HeaderlessTSVTaxonomyFormat \
--input-path SILVA_128_QIIME_release/taxonomy/16S_only/90/consensus_taxonomy_7_levels.txt \
--output-path 90_ref_taxonomy.qza

Training a classifier using the earth microbiome primer set (this can be substituted for your own primers or primers can simply be ignored)

First the target region is extracted based on the primer set

qiime feature-classifier extract-reads \
--i-sequences 90_reps.qza \
--p-f-primer GTGCCAGCMGCCGCGGTAA \
--p-r-primer GGACTACHVGGGTWTCTAAT \
--p-min-length 100 \
--p-max-length 400 \
--o-reads ref-seqs.qza

Then the model is trained

qiime feature-classifier fit-classifier-naive-bayes \
--i-reference-reads ref-seqs.qza \
--i-reference-taxonomy 90_ref_taxonomy.qza \
--o-classifier classifier.qza