University of Nevada, Reno Vintage Logo
conda install command. In the third installment of this series, I actually walk through the steps of obtaining the bash script to download Miniconda and run through the code to install Bioconda. As I generated this Rmd file, I realized these steps would have been better laid out to begin with so, lucky you, I have compiled all the information you need to obtain the tools required before you begin. After installing Miniconda and Bioconda, run the code below to obtain HISAT2, SAMTools, IGV, IGVTools, StringTie, Gffcompare, R and RStudio (if you don’t already have them), and Ballgown (if you use conda to install R/RStudio).conda command below if you install R/RStudio using Bioconda. If you have already installed these tools from the command line using sudo apt-get install r-base, sudo apt-get install rstudio, or have compiled them from source code, use the install.packages('ballgown') command in R/RStudio to install Ballgown.$ conda install hisat2
$ conda install samtools
$ conda install igv
$ conda install igvtools
$ conda install stringtie
$ conda install gffcompare
$ conda install r-base
$ conda install rstudio
$ conda install -c bioconda bioconductor-ballgown
conda if you have not already installed them using a different method.UNIX command wget to pull the data off the FTP server hosting the data we will be working with. Use the command cd [Options] [Directory] to change into your desired ~/working_directory and then download these files.$ wget ftp://ftp.ccb.jhu.edu/pub/RNAseq_protocol/chrX_data.tar.gz
UNIX command tar xvzf, we can extract the .tar.gz files into our ~/working_directory. The command option ‘x’ extracts the files, ‘v’ lists the processed files verbosely, ‘z’ filters the archive through gzip, and ‘f’ tells tar to used the archived file.$ tar xvzf chrX_data.tar.gz
hisat2 to denote the command we are using. The options entered here are ‘-p 8’ denoting the use of 8 threads, ‘–dta’ is used to generate output SAM files that can be directly read into StringTie, ‘-x’ is used to denote the indexed reference genome, ‘-1’ and ‘-2’ are used to denote our fwd and rev samples in a paired-end alignment, and ‘-S’ is used to denote that we would like our output in SAM format.$ hisat2 -p 8 --dta -x chrX_data/indexes/chrX_tran -1 chrX_data/samples/ERR188044_chrX_1.fastq.gz -2 chrX_data/samples/ERR188044_chrX_2.fastq.gz -S ERR188044_chrX.sam
bash script for the above HISAT2 command called hisat2.sh that will run all the .fastq.gz files for you simultaneously.#!/usr/bin/bash
#bash script for hisat2; align all .fastq.gz files to indexed reference genome to generate .sam files
SAMPLES="ERR188044 ERR188104 ERR188234 ERR188245 ERR188257 ERR188273 ERR188337 ERR188383 ERR188401 ERR188428 ERR188454 ERR204916"
for SAMPLE in $SAMPLES; do
hisat2 -p 11 --dta -x ~/chrX_data/indexes/chrX_tran -1 ~/chrX_data/samples/${SAMPLE}_chrX_1.fastq.gz -2 ~/chrX_data/samples/${SAMPLE}_chrX_2.fastq.gz -S ${SAMPLE}_chrX.sam
done
#this works
perl script to do the same thing.#!/usr/bin/perl
#perl script for hisat2; align all .fastq.gz files to indexed reference genome to generate .sam files
use warnings;
use strict;
my @samples = qw(ERR188044 ERR188104 ERR188234 ERR188245 ERR188257 ERR188273 ERR188337 ERR188383 ERR188401 ERR188428 ERR188454 ERR204916);
foreach(@samples){
do {
system("hisat2", "-p 11", "--dta", "-x ~/chrX_data/indexes/chrX_tran", "-1 ${_}_chrX_1.fastq.gz", "-2 ${_}_chrX_2.fastq.gz", "-S ${_}_chrX.sam");
}
}
#perl works too
samtools command with the options: ‘sort’ to sort the alignments by the leftmost coordinates, ‘-@ 8’ to denote the usage of 8 threads, ‘-o’ to denote that we want our outputs to be BAM files in [out.bam] format, and finally we enter our [input.sam] files.$ samtools sort -@ 8 -o ERR188044_chrX.bam ERR188044_chrX.sam
bash script for the above command called sort.sh that will do all our .sam files simultaneously.#!/usr/bin/bash
#bash script for samtools; convert .sam files to .bam files
SAMPLES="ERR188044 ERR188104 ERR188234 ERR188245 ERR188257 ERR188273 ERR188337 ERR188383 ERR188401 ERR188428 ERR188454 ERR204916"
for SAMPLE in $SAMPLES; do
samtools sort -@ 11 -o ${SAMPLE}_chrX.bam ${SAMPLE}_chrX.sam
done
#this works
perl script.#!/usr/bin/perl
#perl script for samtools; convert .sam files to .bam files
use warnings;
use strict;
my @samples = qw(ERR188044 ERR188104 ERR188234 ERR188245 ERR188257 ERR188273 ERR188337 ERR188383 ERR188401 ERR188428 ERR188454 ERR204916);
foreach(@samples){
do {
system("samtools", "sort", "-@ 11", "-o ${_}_chrX.bam", "${_}_chrX.sam");
}
}
#perl works too
samtools command with the ‘index’ option, we enter out [in.bam] files and receive [out.bam.bai] files. With these two files in hand, we can now view our data using IGV!$ samtools index ERR188044_chrX.bam ERR188044_chrX.bam.bai
bash script called index.sh that will index all our .bam files and generate .bam.bai files simultaneously.#!/usr/bin/bash
#bash script for samtools; index our .bam files to obtain .bam.bai files using samtools
SAMPLES="ERR188044 ERR188104 ERR188234 ERR188245 ERR188257 ERR188273 ERR188337 ERR188383 ERR188401 ERR188428 ERR188454 ERR204916"
for SAMPLE in $SAMPLES; do
samtools index ${SAMPLE}_chrX.bam ${SAMPLE}_chrX.bam.bai
done
#this works
perl script.#!/usr/bin/perl
#perl script for samtools; index our .bam files to obtain .bam.bai files using samtools
use warnings;
use strict;
my @samples = qw(ERR188044 ERR188104 ERR188234 ERR188245 ERR188257 ERR188273 ERR188337 ERR188383 ERR188401 ERR188428 ERR188454 ERR204916);
foreach(@samples){
do {
system("samtools", "index", "${_}_chrX.bam", "${_}_chrX.bam.bai");
}
}
#perl works too