Using Sleuth R package for differential expression analysis in Atmosphere

Sleuth analysis steps

set the working directory

base_dir <- "/home/kapeelc/sleuth_analysis"

set the number of CPU’s to use for the analysis; set it according to your instance size

options(mc.cores = 4L)

install ballgown R package along with its dependencies

source("http://bioconductor.org/biocLite.R")
biocLite("rhdf5")
install.packages("devtools", repos = "http://cran.us.r-project.org")
library("httr")
set_config(config(ssl_verifypeer = 0L))
devtools::install_github("pachterlab/sleuth")
library("sleuth")

set the base directory where the kallisto output files are

base_dir <- "/home/kapeelc/sleuth_analysis"

The first step in a sleuth analysis is to specify where the kallisto results are stored. A variable is created for this purpose with

sample_id <- dir(file.path(base_dir,"kallisto_qaunt_output"))

The result can be displayed by typing

sample_id

## [1] "IS20351_DS_1_1" "IS20351_DS_2_1" "IS20351_DS_3_1" "IS20351_WW_1_1"
## [5] "IS20351_WW_2_1" "IS20351_WW_3_1"

In the box above, lines beginning with ## show the output of the command (in what follows we include the output that should appear with each command).

A list of paths to the kallisto results indexed by the sample IDs is collated with

kal_dirs <- sapply(sample_id, function(id) file.path(base_dir, "kallisto_qaunt_output",id))
kal_dirs

##                                                       IS20351_DS_1_1 
## "/home/kapeelc/sleuth_analysis/kallisto_qaunt_output/IS20351_DS_1_1" 
##                                                       IS20351_DS_2_1 
## "/home/kapeelc/sleuth_analysis/kallisto_qaunt_output/IS20351_DS_2_1" 
##                                                       IS20351_DS_3_1 
## "/home/kapeelc/sleuth_analysis/kallisto_qaunt_output/IS20351_DS_3_1" 
##                                                       IS20351_WW_1_1 
## "/home/kapeelc/sleuth_analysis/kallisto_qaunt_output/IS20351_WW_1_1" 
##                                                       IS20351_WW_2_1 
## "/home/kapeelc/sleuth_analysis/kallisto_qaunt_output/IS20351_WW_2_1" 
##                                                       IS20351_WW_3_1 
## "/home/kapeelc/sleuth_analysis/kallisto_qaunt_output/IS20351_WW_3_1"

The next step is to load an auxillary table that describes the experimental design and the relationship between the kallisto directories and the samples

s2c <- read.table(file.path(base_dir, "design_matrix"), header = TRUE, stringsAsFactors=FALSE)
s2c <- dplyr::select(s2c, sample = ID, condition, reps)
s2c

##           sample condition reps
## 1 IS20351_DS_1_1        DS    1
## 2 IS20351_DS_2_1        DS    2
## 3 IS20351_DS_3_1        DS    3
## 4 IS20351_WW_1_1        WW    1
## 5 IS20351_WW_2_1        WW    2
## 6 IS20351_WW_3_1        WW    3

Now the directories must be appended in a new column to the table describing the experiment. This column must be labeled path, otherwise sleuth will report an error. This is to ensure that samples can be associated with kallisto quantifications.

s2c <- dplyr::mutate(s2c, path = kal_dirs)
print(s2c)

##           sample condition reps
## 1 IS20351_DS_1_1        DS    1
## 2 IS20351_DS_2_1        DS    2
## 3 IS20351_DS_3_1        DS    3
## 4 IS20351_WW_1_1        WW    1
## 5 IS20351_WW_2_1        WW    2
## 6 IS20351_WW_3_1        WW    3
##                                                                 path
## 1 /home/kapeelc/sleuth_analysis/kallisto_qaunt_output/IS20351_DS_1_1
## 2 /home/kapeelc/sleuth_analysis/kallisto_qaunt_output/IS20351_DS_2_1
## 3 /home/kapeelc/sleuth_analysis/kallisto_qaunt_output/IS20351_DS_3_1
## 4 /home/kapeelc/sleuth_analysis/kallisto_qaunt_output/IS20351_WW_1_1
## 5 /home/kapeelc/sleuth_analysis/kallisto_qaunt_output/IS20351_WW_2_1
## 6 /home/kapeelc/sleuth_analysis/kallisto_qaunt_output/IS20351_WW_3_1

Include gene names to transcript level analysis. You can use Biomart function to get the transcipt to gene mapping or use custom gff parsing script. Here we already have the mapping file done, make sure the column names are as shown below

target_id	ens_gene
Sb10g000101.1	Sb10g000101
Sb10g000200.1	Sb10g000200
Sb10g000210.1	Sb10g000210
Sb10g000220.1	Sb10g000220
Sb10g000230.1	Sb10g000230

t2g <- read.table("/home/kapeelc/sleuth_analysis/t2g.txt", header = TRUE, stringsAsFactors=FALSE)

Next, the “sleuth object” can be constructed. This object will store not only the information about the experiment, but also details of the model to be used for differential testing, and the results. It is prepared and used with four commands that (1) load the kallisto processed data into the object (2) estimate parameters for the sleuth response error measurement (full) model (3) estimate parameters for the sleuth reduced model, and (4) perform differential analysis (testing) using the likelihood ratio test The sleuth object must first be initialized with

so <- sleuth_prep(s2c, ~ condition, target_mapping = t2g, extra_bootstrap_summary = TRUE)

## Warning in check_num_cores(num_cores): It appears that you are running Sleuth from within Rstudio.
## Because of concerns with forking processes from a GUI, 'num_cores' is being set to 1.
## If you wish to take advantage of multiple cores, please consider running sleuth from the command line.

## reading in kallisto results

## dropping unused factor levels

## ......

##

## normalizing est_counts

## 22062 targets passed the filter

## normalizing tpm

## merging in metadata

## aggregating by column: ens_gene

## 20635 genes passed the filter

## summarizing bootstraps

## ......

##

Then the full model is fit with

so <- sleuth_fit(so)

## fitting measurement error models

## shrinkage estimation

## computing variance of betas

What this has accomplished is to “smooth” the raw kallisto abundance estimates for each sample using a linear model with a parameter that represents the experimental condition (in this case DS vs. WW). To test for transcripts that are differential expressed between the conditions, sleuth performs a second fit to a “reduced” model that presumes abundances are equal in the two conditions. To identify differential expressed transcripts sleuth will then identify transcripts with a significantly better fit with the “full” model.

Sleuth likelihood ratio test(LRT) The “reduced” model is fit with

so <- sleuth_fit(so, ~1, 'reduced')

## fitting measurement error models

## shrinkage estimation

## computing variance of betas

and the test is performed with

so <- sleuth_lrt(so, 'reduced', 'full')

The models that have been fit can always be examined with the models() function.

models(so)

## [  full  ]
## formula:  ~condition 
## coefficients:
##  (Intercept)
##      conditionWW
## [  reduced  ]
## formula:  ~1 
## coefficients:
##  (Intercept)

Results using Sleuth LRT test

sleuth_table <- sleuth_results(so, 'reduced:full', 'lrt', show_all = FALSE)
sleuth_significant <- dplyr::filter(sleuth_table, qval <= 0.05)
head(sleuth_significant, 20)

##      target_id test_stat         pval        qval       rss  sigma_sq
## 1  Sb02g029940  29.58808 5.343242e-08 0.001101830 39.422813 7.8576050
## 2  Sb09g029170  27.15144 1.881247e-07 0.001939660 10.557922 2.1108217
## 3  Sb01g027570  24.01893 9.539320e-07 0.003278505  5.618592 1.1224756
## 4  Sb01g049480  24.34469 8.054829e-07 0.003278505 12.923175 2.5750263
## 5  Sb02g029500  24.50740 7.402510e-07 0.003278505 10.706748 2.1375197
## 6  Sb06g030910  24.73321 6.583969e-07 0.003278505 11.669192 2.3314440
## 7  Sb06g029520  22.88469 1.720163e-06 0.004433935 17.065528 3.3890648
## 8  Sb10g030520  22.92296 1.686259e-06 0.004433935  6.138199 1.2239239
## 9  Sb03g032450  22.39862 2.215335e-06 0.005075825  7.215683 1.4382912
## 10 Sb01g017890  20.84643 4.976180e-06 0.005130690  8.474861 1.6939772
## 11 Sb02g000230  20.90889 4.816546e-06 0.005130690  7.698030 1.5370680
## 12 Sb02g043270  21.00442 4.582251e-06 0.005130690  4.388220 0.8739358
## 13 Sb03g030350  20.92360 4.779688e-06 0.005130690  8.481503 1.6917119
## 14 Sb04g009380  21.16492 4.214066e-06 0.005130690  5.816588 1.1621681
## 15 Sb04g022780  21.24101 4.050055e-06 0.005130690  7.173298 1.4294987
## 16 Sb04g023540  20.91532 4.800410e-06 0.005130690  3.504512 0.7001695
## 17 Sb05g023970  21.24897 4.033265e-06 0.005130690  4.871567 0.9706728
## 18 Sb06g031460  21.97864 2.757017e-06 0.005130690  7.486657 1.4968000
## 19 Sb06g032740  20.90139 4.835441e-06 0.005130690  5.646188 1.1290081
## 20 Sb09g022360  20.97467 4.653966e-06 0.005130690  4.714452 0.9409795
##        tech_var mean_obs   var_obs sigma_sq_pmax smooth_sigma_sq
## 1  0.0269575462 5.437268 7.8845625     7.8576050      0.10042321
## 2  0.0007626147 7.816658 2.1115843     2.1108217      0.09751482
## 3  0.0012428359 7.136449 1.1237184     1.1224756      0.08525765
## 4  0.0096086891 5.521946 2.5846350     2.5750263      0.09748964
## 5  0.0038299837 6.236925 2.1413497     2.1375197      0.08200889
## 6  0.0023944573 6.651215 2.3338385     2.3314440      0.08053236
## 7  0.0240408402 4.965810 3.4131056     3.3890648      0.12062126
## 8  0.0037157803 6.132523 1.2276397     1.2239239      0.08323863
## 9  0.0048453369 5.740736 1.4431366     1.4382912      0.09070462
## 10 0.0009949900 7.637174 1.6949722     1.6939772      0.09389996
## 11 0.0025378948 6.291518 1.5396059     1.5370680      0.08148414
## 12 0.0037082658 5.776490 0.8776440     0.8739358      0.08975347
## 13 0.0045887113 5.910535 1.6963006     1.6917119      0.08670741
## 14 0.0011494398 7.059961 1.1633176     1.1621681      0.08418323
## 15 0.0051609291 5.748936 1.4346596     1.4294987      0.09048182
## 16 0.0007329007 7.553536 0.7009024     0.7001695      0.09231031
## 17 0.0036406368 5.916088 0.9743134     0.9706728      0.08660073
## 18 0.0005313628 8.060806 1.4973313     1.4968000      0.10290909
## 19 0.0002294607 8.834977 1.1292376     1.1290081      0.12430690
## 20 0.0019108458 6.744899 0.9428903     0.9409795      0.08099911
##    final_sigma_sq degrees_free
## 1       7.8576050            1
## 2       2.1108217            1
## 3       1.1224756            1
## 4       2.5750263            1
## 5       2.1375197            1
## 6       2.3314440            1
## 7       3.3890648            1
## 8       1.2239239            1
## 9       1.4382912            1
## 10      1.6939772            1
## 11      1.5370680            1
## 12      0.8739358            1
## 13      1.6917119            1
## 14      1.1621681            1
## 15      1.4294987            1
## 16      0.7001695            1
## 17      0.9706728            1
## 18      1.4968000            1
## 19      1.1290081            1
## 20      0.9409795            1

Sleuth Wald test: This function computes the Wald test on one specific ‘beta’ coefficient on every transcript; this offers beta values which are biased folds changes values for pairwise comparisons

so <- sleuth_wt(so,'conditionWW','full')

Results using Sleuth Wald test

results_table <- sleuth_results(so, 'conditionWW', test_type = 'wt')
sleuth_significant <- dplyr::filter(results_table, qval <= 0.05)
head(sleuth_significant, 20)

##      target_id         pval         qval         b      se_b mean_obs
## 1  Sb02g029940 1.480105e-97 3.052125e-93  5.113227 0.2439366 5.437268
## 2  Sb06g030910 7.288914e-58 7.515235e-54 -2.767723 0.1726061 6.651215
## 3  Sb09g029170 1.659953e-54 1.140996e-50  2.646767 0.1702398 7.816658
## 4  Sb02g029500 6.363724e-48 3.280659e-44 -2.652629 0.1823849 6.236925
## 5  Sb01g049480 9.125599e-42 3.763579e-38  2.916635 0.2154148 5.521946
## 6  Sb06g029520 1.714308e-37 5.891791e-34 -3.340866 0.2610758 4.965810
## 7  Sb06g031460 6.880121e-37 2.026785e-33  2.206830 0.1739286 8.060806
## 8  Sb02g000230 5.338813e-31 1.125528e-27  2.232334 0.1928113 6.291518
## 9  Sb03g030350 4.500870e-31 1.125528e-27 -2.343266 0.2021373 5.910535
## 10 Sb04g009380 5.458164e-31 1.125528e-27  1.942344 0.1677918 7.059961
## 11 Sb01g017890 1.096846e-30 2.056188e-27  2.341897 0.2033618 7.637174
## 12 Sb01g027570 1.384164e-30 2.378571e-27 -1.930588 0.1679378 7.136449
## 13 Sb02g032815 3.040858e-29 4.823503e-26  1.907134 0.1698861 7.027473
## 14 Sb06g027520 3.836596e-28 5.651032e-25  2.868165 0.2607508 7.737337
## 15 Sb03g032450 7.029695e-28 9.663956e-25  2.177463 0.1989478 5.740736
## 16 Sb10g030520 1.173675e-27 1.512647e-24 -2.013254 0.1847302 6.132523
## 17 Sb04g022780 2.010690e-27 2.438967e-24  2.160773 0.1991633 5.748936
## 18 Sb05g001050 1.072051e-25 1.228153e-22  2.865335 0.2734204 5.363238
## 19 Sb06g032740 3.879851e-25 4.210863e-22  1.916693 0.1850582 8.834977
## 20 Sb09g022360 1.010259e-24 1.041627e-21 -1.752569 0.1707278 6.744899
##      var_obs     tech_var    sigma_sq smooth_sigma_sq final_sigma_sq
## 1  7.8845625 0.0269575462 0.024337209      0.06230006     0.06230006
## 2  2.3338385 0.0023944573 0.042294830      0.04226623     0.04229483
## 3  2.1115843 0.0007626147 0.011702703      0.04270977     0.04270977
## 4  2.1413497 0.0038299837 0.034191835      0.04606639     0.04606639
## 5  2.5846350 0.0096086891 0.031150422      0.05999660     0.05999660
## 6  3.4131056 0.0240408402 0.056821080      0.07820005     0.07820005
## 7  1.4973313 0.0005313628 0.044845374      0.04402345     0.04484537
## 8  1.5396059 0.0025378948 0.053226399      0.04539551     0.05322640
## 9  1.6963006 0.0045887113 0.056700543      0.05099744     0.05670054
## 10 1.1633176 0.0011494398 0.038234436      0.04108168     0.04108168
## 11 1.6949722 0.0009949900 0.061039015      0.04199324     0.06103902
## 12 1.1237184 0.0012428359 0.005717086      0.04106183     0.04106183
## 13 1.1257817 0.0012789022 0.042013051      0.04110683     0.04201305
## 14 2.5494999 0.0010455669 0.100940886      0.04236779     0.10094089
## 15 1.4431366 0.0048453369 0.021070718      0.05452499     0.05452499
## 16 1.2276397 0.0037157803 0.010887106      0.04747208     0.04747208
## 17 1.4346596 0.0051609291 0.037311354      0.05433813     0.05433813
## 18 2.5527531 0.0122589251 0.099879168      0.06436212     0.09987917
## 19 1.1292376 0.0002294607 0.033675849      0.05114035     0.05114035
## 20 0.9428903 0.0019108458 0.024889917      0.04181114     0.04181114

The easiest way to view and interact with the results is to generate the sleuth live site that allows for exploratory data analysis

sleuth_live(so)

Using Sleuth R package for differential expression analysis in Atmosphere

Kapeel Chougule

7/13/2017

Introduction

Sleuth analysis steps