Removing MT reads

BEFORE you run bam2RPM, you will want to get rid of your MT reads so that they don’t skew the normalization (if not removed, MT will be overrepresented in RPM calculation). This can easily be down with samtools. I’m guessing if your lab has already done genomic analysis, this should already be on your computer but if not, it can be downloaded here. To install the unpacked file and run the following commands.

cd samtools1.3.X
./configure --prefix=/PATH/TO/SOMEWHERE #--prefix is the directory you want it to be installed
make
make install

After installing, add the $PATH of the repository to your bashrc (how to do this).

Samtools lets you manipulate sam/bam files, and in our case we want to use it to get a list of the chromosomes we want to keep (everything but ‘MT’). We’ll use a couple samtools commands to give us a list of all chromosomes except MT, then input another samtools command to output a bam with removed mitochondrial reads. There are two steps to this.

  1. Samtools index is a function of samtools that gives you a list of the chromosomes you’re aligning genome has and tells you the amount of reads that mapped to each chromosome (among some other non important things). We use it just to ge the first column of the file (a list of chromosomes).
samtools index file1.bam
  1. Next using a command (samtools idxstats) that refrences the index generated before we will input a list of wanted chromsomes for samtools to output us a new subsetted non-MT bamfile. Details on how that works later but the command is below.
samtools idxstats file1.bam | cut -f 1 | grep -v MT | xargs samtools view -b file1.bam > file1_rmMT.bam

For how part 2 works:

A. samtools idxstats file1.bam outputs a tab deleminated file of the chromososomes and how many reads aligned to each.

B. cut -f 1 gets the first column of that file (chromosomes)

C. grep -v MT grep -v is the oppisite of grep so every chromsome but ‘MT’ (NOTE: the mito Chromosome isn’t always ‘MT’, it depends on the genome aligned to. Run samtools idxstats file1.bam | cut -f 1 to get a list of all chromosome names).

D. xargs takes the output of the previous three piped commands and inputs it into the samtools view to say let me view the reads of the following chromosomes (all but MT).

Running bam2rpm.r

Download R and Download necessary genomic packages

First thing that should be done is to download R and the packages. R can be downloaded here. Make sure that the R version is 3.3.1 or later.

The next thing that needs to be done is to download a couple required packages. This can be done with the install.packages() command. The first package is argparse.

install.packages("argparse")

The other packages are from Bioconductor, the largest bioinformatics software respository. To do this you have to first source the bioconductor repository and then use a package installer (biocLite()).

source("http://bioconductor.org/biocLite.R")
biocLite("rtracklayer")
biocLite("GenomicAlignments")

Run bam2rpm

Once this is done you should be all set to run bam2rpm.r now. Type ./bam2rpm.r -h on the command line to see the parameters you have to implement. This function takes all .bam files and makes a bigwig and other intermediate files so be wary that if you store your removed-MT bams in the same folder as raw bam, the program will make output files FOR BOTH. I recommend storing the files you want to run bam2rpm.r on in their own directory. Finally, I’m still trying to tinker with the extsize and shift parameters so leave those as default for now.