Rmd for StringTie Tutorial, BIOL792, Spring 2019

StringTie Tutorial from Pertea, M. et. al., Nat. Protoc. 2016
- Welcome Back!
Video 3
Video 3.1
- Generation of TDF Files
Parting Words

University of Nevada, Reno Vintage Logo

StringTie Tutorial from Pertea, M. et. al., Nat. Protoc. 2016

Link to YouTube Video 3: Beginner’s RNA-Seq Tutorial Part 3 - Dr. Pedro Miura’s BIOL792 Course, University of Nevada, Reno

Link to YouTube Video 3.1: Beginner’s RNA-Seq Tutorial Part 3.1 - Dr. Pedro Miura’s BIOL792 Course, University of Nevada, Reno

Welcome Back!

Welcome back to the third installment in this beginner’s series introduction to RNA-Seq. Here we have another Rmd for you to use as a guide while watching the next two installments in the series. The first installment, Video 3, goes over the usage of StringTie and a brief introduction to IGV. The second installation, Video 3.1, takes a more indepth look at the generation of TDF files and viewing multiple TDF files side-by-side in IGV. I have also included screenshot of color coded, male and female TDF files not included in Video 3.1 to display the differences in expression patters of the XIST (X-inactive specific transcript) gene. The XIST gene is only expressed in mammalian females and is very important in the process of X inactivation, or silencing of one of the two X chromosomes. I have included the code for Video 3.1 at the very bottom of this Rmd and have also labeled it as such. Video 3 begins with the installation of IGV and IGVTools, but I have removed the code for this as I included it in the previous HISAT2 Rmd and this reference guide begins at the opening of IGV and loading of .bam and .bam.bai files. Without further ado, let’s begin!

Video 3

Working in IGV

To begin, we will open IGV and load .bam and .bam.bai files to view. In order to do this we enter the command `igv` from the terminal and use the GUI to load our reference genome, choose the X chromosome to view, then enter XIST as our locus of interest.

$ igv

Using StringTie to Assemble Transcripts

Here, we will begin using StringTie to assemble our transcripts for each sample .bam file we created. We use the `stringtie` command with the options: ‘-p 11’ to set our threads to 11, ‘-G’ to designate our reference annotation file, ‘-o’ to designate out output file destination (can include full path/ here) and format as [output.gtf], ‘-l’ to set the prefix label for the outputs, and finally [input.bam] to denote our input files.

$ stringtie -p 11 -G chrX_data/genes/chrX.gtf -o ERR188044_chrX.gtf -l ERR188044 ERR188044_chrX.bam

We have written a nifty `bash` script to automate this step for us.

#!/usr/bin/bash

#bash script for stringtie; assemble transcripts using stringtie

SAMPLES="ERR188044 ERR188104 ERR188234 ERR188245 ERR188257 ERR188273 ERR188337 ERR188383 ERR188401 ERR188428 ERR188454 ERR204916"

for SAMPLE in $SAMPLES; do
    stringtie -p 11 -G ~/chrX_data/genes/chrX.gtf -o ${SAMPLE}_chrX.gtf -l ${SAMPLE} ${SAMPLE}_chrX.bam
done

#this works

A neat little `perl` script I wrote for the same task.

#!/usr/bin/perl

#perl script for stringtie; assemble transcripts using stringtie

use warnings;

use strict;

my @samples = qw(ERR188044 ERR188104 ERR188234 ERR188245 ERR188257 ERR188273 ERR188337 ERR188383 ERR188401 ERR188428 ERR188454 ERR204916);

foreach(@samples){
    do {
        system("stringtie", "-p 11", "-G ~/chrX_data/genes/chrX.gtf", "-o ${_}_chrX.gtf", "-l ${_}" "${_}_chrX.bam");
    }
}

#perl works too

Using StringTie to Merge Transcripts

To merge our transcripts, we utilize the `stringtie` command with the options: ‘–merge’ to use the merge function of stringtie, ‘-p 11’ to use 11 threads, ‘-G’ to designate our reference annotation file, ‘-o’ to assign our output file name and format to [output.gtf], and ‘chrX_data/mergelist.txt’ to provide the program with all of our inputs. Note: if you do not run the transcript assembly command/script in the same directory you are running this command, you will need to generate the mergelist.txt file manually, making sure to include the full path/ to each .gtf file.

stringtie --merge -p 11 -G chrX_data/genes/chrX.gtf -o stringtie_merged.gtf chrX_data/mergelist.txt

Using Gffcompare

We can then look at how the transcripts compare with the reference annotation (this is optional). Here we installed Gffcompare using `conda install`, then utilized `gffcompare` with options: ‘-r’ to include our reference annotation file, ‘-G’ to tell `gffcompare` to compare all the transcripts in our input file [stringtie_merged.gtf], and ‘-o’ to denote out output prefix.

conda install gffcompare
gffcompare -r chrX_data/genes/chrX.gtf -G -o merged stringtie_merged.gtf

Finally, we will use StringTie to estimate transcript abundances and create table counts for Ballgown (this is for when we do our differential expression analysis in R). Here we again use `stringtie` with options: ‘-e’ to limit the estimation and output of transcripts to only those that match the reference annotation, ‘-B’ turns our outputs into Ballgown input table files (.ctab) with our desired coverage data for our reference transcripts, ‘-p 11’ to denote the use of 11 threads, ‘-G’ to designate our reference annotation file, ‘-o’ to denote the creation of a new ballgown/ directory with all our individaul /[output.gtf] files contained within their own individual /[output_prefix]/ directories, and finally we enter our [input.bam] files.

stringtie -e -B -p 11 -G stringtie_merged.gtf -o ballgown/ERR188044/ERR188044_chrX.gtf ERR188044_chrX.bam

We have also written a nifty `bash` script to automate this step for us.

#!/usr/bin/bash

#bash script for stringtie; calculate abundances and generate ballgown dir containing tables for use in R

SAMPLES="ERR188044 ERR188104 ERR188234 ERR188245 ERR188257 ERR188273 ERR188337 ERR188383 ERR188401 ERR188428 ERR188454 ERR204916"

for SAMPLE in $SAMPLES; do
    stringtie -e -B -p 11 -G stringtie_merged.gtf -o ~/ballgown/${SAMPLE}/${SAMPLE}_chrX.gtf ${SAMPLE}_chrX.bam
done

#this works

And a `perl` script for those of us who operate better under pressure.

#!/usr/bin/perl

#perl script for stringtie; calculate abundances and generate ballgown/ dir containing tables for use in R

use warnings;

use strict;

my @samples = qw(ERR188044 ERR188104 ERR188234 ERR188245 ERR188257 ERR188273 ERR188337 ERR188383 ERR188401 ERR188428 ERR188454 ERR204916);

foreach(@samples){
    do {
        system("stringtie", "-e", "-B", "-p 11", "-G stringtie_merged.gtf", "-o ~/ballgown/${_}/${_}_chrX.gtf", "${_}_chrX.bam");
    }
}

#perl works too

Video 3.1

Generation of TDF Files

Let’s use IGVTools to generate TDF (Tiled Data File) files using the `igvtools` command in our terminal so that we may view all of our samples in IGV. Here we use the `igvtools` command with the following options: ‘count’ which computes coverage density for an alignment file and provides either .tdf or .wig output files, ‘-z’ is the max-zoom, ‘-w’ denotes window size, ‘-e’ which extends the read by the designated number in bp before counting, our [input.bam] file comes next, followed by our [output.tdf] file, and finally the path to our reference genome assembly.

$ igvtools count -z 5 -w 25 -e 250 ERR188044_chrX.bam ERR188044_chrX.tdf ~/igv/genomes/hg38.genome

We have generated another life saving `bash` script to automate this process for us!

#!/usr/bin/bash

#bash script for igvtools; generate .tdf files using igvtools

SAMPLES="ERR188044 ERR188104 ERR188234 ERR188245 ERR188257 ERR188273 ERR188337 ERR188383 ERR188401 ERR188428 ERR188454 ERR204916"

for SAMPLE in $SAMPLES; do
    igvtools count -z 5 -w 25 -e 250 ${SAMPLE}_chrX.bam ${SAMPLE}_chrX.tdf ~/igv/genomes/hg38.genome
done

#this works

Here, we are under pressure once again, creating `perl` scripts!

#!/usr/bin/perl

#perl script for igvtools; generate .tdf files using igvtools

use warnings;

use strict;

my @samples = qw(ERR188044 ERR188104 ERR188234 ERR188245 ERR188257 ERR188273 ERR188337 ERR188383 ERR188401 ERR188428 ERR188454 ERR204916);

foreach(@samples){
    do {
        system("igvtools", "count", "-z 5", "-w 25", "-e 250", "${_}_chrX.bam", "${_}_chrX.tdf", "~/igv/genomes/hg38.genome");
    }
}

#perl works too

Parting Words

So, I hear that you have completed the StringTie/IGV tutorial! Congratulations on all your hard work! With a little more effort, you will be analyzing RNA-Seq datasets like a pro in no time! Next time we will transition to R for our differential expression analysis using Ballgown. You can follow along using this Rmd for the Ballgown tutorial video!

Rmd for StringTie Tutorial, BIOL792, Spring 2019

Alexander Selvey, University of Nevada, Reno

March 9th, 2019

StringTie Tutorial from Pertea, M. et. al., Nat. Protoc. 2016

Link to YouTube Video 3: Beginner’s RNA-Seq Tutorial Part 3 - Dr. Pedro Miura’s BIOL792 Course, University of Nevada, Reno

Link to YouTube Video 3.1: Beginner’s RNA-Seq Tutorial Part 3.1 - Dr. Pedro Miura’s BIOL792 Course, University of Nevada, Reno

Welcome Back!

Video 3

Working in IGV

To begin, we will open IGV and load .bam and .bam.bai files to view. In order to do this we enter the command `igv` from the terminal and use the GUI to load our reference genome, choose the X chromosome to view, then enter XIST as our locus of interest.

Using StringTie to Assemble Transcripts

We have written a nifty `bash` script to automate this step for us.

A neat little `perl` script I wrote for the same task.

Using StringTie to Merge Transcripts

Using Gffcompare

We have also written a nifty `bash` script to automate this step for us.

And a `perl` script for those of us who operate better under pressure.

Video 3.1

Generation of TDF Files

We have generated another life saving `bash` script to automate this process for us!

Here, we are under pressure once again, creating `perl` scripts!

Parting Words

Rmd for StringTie Tutorial, BIOL792, Spring 2019

Alexander Selvey, University of Nevada, Reno

March 9th, 2019

StringTie Tutorial from Pertea, M. et. al., Nat. Protoc. 2016

Link to YouTube Video 3: Beginner’s RNA-Seq Tutorial Part 3 - Dr. Pedro Miura’s BIOL792 Course, University of Nevada, Reno

Link to YouTube Video 3.1: Beginner’s RNA-Seq Tutorial Part 3.1 - Dr. Pedro Miura’s BIOL792 Course, University of Nevada, Reno

Welcome Back!

Video 3

Working in IGV

To begin, we will open IGV and load .bam and .bam.bai files to view. In order to do this we enter the command igv from the terminal and use the GUI to load our reference genome, choose the X chromosome to view, then enter XIST as our locus of interest.

Using StringTie to Assemble Transcripts

We have written a nifty bash script to automate this step for us.

A neat little perl script I wrote for the same task.

Using StringTie to Merge Transcripts

Using Gffcompare

We have also written a nifty bash script to automate this step for us.

And a perl script for those of us who operate better under pressure.

Video 3.1

Generation of TDF Files

We have generated another life saving bash script to automate this process for us!

Here, we are under pressure once again, creating perl scripts!

Parting Words

To begin, we will open IGV and load .bam and .bam.bai files to view. In order to do this we enter the command `igv` from the terminal and use the GUI to load our reference genome, choose the X chromosome to view, then enter XIST as our locus of interest.

We have written a nifty `bash` script to automate this step for us.

A neat little `perl` script I wrote for the same task.

We have also written a nifty `bash` script to automate this step for us.

And a `perl` script for those of us who operate better under pressure.

We have generated another life saving `bash` script to automate this process for us!

Here, we are under pressure once again, creating `perl` scripts!