sleuth is a tool for the analysis and comparison of multiple related RNA-Seq experiments. Key features include:
To use sleuth, RNA-Seq data must first be quantified with kallisto, which is a program for very fast RNA-Seq quantification based on pseudo-alignment. An important feature of kallisto is that it outputs bootstraps along with the estimates of transcript abundances. These can serve as proxies for technical replicates, allowing for an ascertainment of the variability in estimates due to the random processes underlying RNA-Seq as well as the statistical procedure of read assignment. kallisto can quantify 30 million human reads in less than 3 minutes on a Mac desktop computer using only the read sequences and a transcriptome index that itself takes less than 10 minutes to build. sleuth has been designed to work seamlessly and efficiently with kallisto, and therefore RNA-Seq analysis with kallisto and sleuth is tractable on a laptop computer in a matter of minutes. More details about kallisto and sleuth are provided the papers describing the methods:
Nicolas L Bray, Harold Pimentel, Páll Melsted and Lior Pachter, Near-optimal probabilistic RNA-seq quantification, Nature Biotechnology 34, 525–527 (2016), doi:10.1038/nbt.3519
Harold Pimentel, Nicolas L Bray, Suzette Puente, Páll Melsted and Lior Pachter, Differential analysis of RNA-seq incorporating quantification uncertainty, in press.
sleuth has been designed to facilitate the exploration of RNA-Seq data by utilizing the Shiny web application framework by RStudio. The worked example below illustrates how to load data into sleuth and how to open Shiny plots for exploratory data analysis. The code underlying all plots is available via the Shiny interface so that analyses can be fully “open source”.
Follow the steps; copy paste the command on the R studio console and hit enter to run the code
base_dir <- "/home/kapeelc/sleuth_analysis"
options(mc.cores = 4L)
source("http://bioconductor.org/biocLite.R")
biocLite("rhdf5")
install.packages("devtools", repos = "http://cran.us.r-project.org")
library("httr")
set_config(config(ssl_verifypeer = 0L))
devtools::install_github("pachterlab/sleuth")
library("sleuth")
base_dir <- "/home/kapeelc/sleuth_analysis"
sample_id <- dir(file.path(base_dir,"kallisto_qaunt_output"))
sample_id
## [1] "IS20351_DS_1_1" "IS20351_DS_2_1" "IS20351_DS_3_1" "IS20351_WW_1_1"
## [5] "IS20351_WW_2_1" "IS20351_WW_3_1"
In the box above, lines beginning with ## show the output of the command (in what follows we include the output that should appear with each command).
kal_dirs <- sapply(sample_id, function(id) file.path(base_dir, "kallisto_qaunt_output",id))
kal_dirs
## IS20351_DS_1_1
## "/home/kapeelc/sleuth_analysis/kallisto_qaunt_output/IS20351_DS_1_1"
## IS20351_DS_2_1
## "/home/kapeelc/sleuth_analysis/kallisto_qaunt_output/IS20351_DS_2_1"
## IS20351_DS_3_1
## "/home/kapeelc/sleuth_analysis/kallisto_qaunt_output/IS20351_DS_3_1"
## IS20351_WW_1_1
## "/home/kapeelc/sleuth_analysis/kallisto_qaunt_output/IS20351_WW_1_1"
## IS20351_WW_2_1
## "/home/kapeelc/sleuth_analysis/kallisto_qaunt_output/IS20351_WW_2_1"
## IS20351_WW_3_1
## "/home/kapeelc/sleuth_analysis/kallisto_qaunt_output/IS20351_WW_3_1"
s2c <- read.table(file.path(base_dir, "design_matrix"), header = TRUE, stringsAsFactors=FALSE)
s2c <- dplyr::select(s2c, sample = ID, condition, reps)
s2c
## sample condition reps
## 1 IS20351_DS_1_1 DS 1
## 2 IS20351_DS_2_1 DS 2
## 3 IS20351_DS_3_1 DS 3
## 4 IS20351_WW_1_1 WW 1
## 5 IS20351_WW_2_1 WW 2
## 6 IS20351_WW_3_1 WW 3
s2c <- dplyr::mutate(s2c, path = kal_dirs)
print(s2c)
## sample condition reps
## 1 IS20351_DS_1_1 DS 1
## 2 IS20351_DS_2_1 DS 2
## 3 IS20351_DS_3_1 DS 3
## 4 IS20351_WW_1_1 WW 1
## 5 IS20351_WW_2_1 WW 2
## 6 IS20351_WW_3_1 WW 3
## path
## 1 /home/kapeelc/sleuth_analysis/kallisto_qaunt_output/IS20351_DS_1_1
## 2 /home/kapeelc/sleuth_analysis/kallisto_qaunt_output/IS20351_DS_2_1
## 3 /home/kapeelc/sleuth_analysis/kallisto_qaunt_output/IS20351_DS_3_1
## 4 /home/kapeelc/sleuth_analysis/kallisto_qaunt_output/IS20351_WW_1_1
## 5 /home/kapeelc/sleuth_analysis/kallisto_qaunt_output/IS20351_WW_2_1
## 6 /home/kapeelc/sleuth_analysis/kallisto_qaunt_output/IS20351_WW_3_1
target_id | ens_gene |
---|---|
Sb10g000101.1 | Sb10g000101 |
Sb10g000200.1 | Sb10g000200 |
Sb10g000210.1 | Sb10g000210 |
Sb10g000220.1 | Sb10g000220 |
Sb10g000230.1 | Sb10g000230 |
t2g <- read.table("/home/kapeelc/sleuth_analysis/t2g.txt", header = TRUE, stringsAsFactors=FALSE)
so <- sleuth_prep(s2c, ~ condition, target_mapping = t2g, extra_bootstrap_summary = TRUE)
## Warning in check_num_cores(num_cores): It appears that you are running Sleuth from within Rstudio.
## Because of concerns with forking processes from a GUI, 'num_cores' is being set to 1.
## If you wish to take advantage of multiple cores, please consider running sleuth from the command line.
## reading in kallisto results
## dropping unused factor levels
## ......
##
## normalizing est_counts
## 22062 targets passed the filter
## normalizing tpm
## merging in metadata
## aggregating by column: ens_gene
## 20635 genes passed the filter
## summarizing bootstraps
## ......
##
so <- sleuth_fit(so)
## fitting measurement error models
## shrinkage estimation
## computing variance of betas
Sleuth likelihood ratio test(LRT) The “reduced” model is fit with
so <- sleuth_fit(so, ~1, 'reduced')
## fitting measurement error models
## shrinkage estimation
## computing variance of betas
so <- sleuth_lrt(so, 'reduced', 'full')
models(so)
## [ full ]
## formula: ~condition
## coefficients:
## (Intercept)
## conditionWW
## [ reduced ]
## formula: ~1
## coefficients:
## (Intercept)
sleuth_table <- sleuth_results(so, 'reduced:full', 'lrt', show_all = FALSE)
sleuth_significant <- dplyr::filter(sleuth_table, qval <= 0.05)
head(sleuth_significant, 20)
## target_id test_stat pval qval rss sigma_sq
## 1 Sb02g029940 29.58808 5.343242e-08 0.001101830 39.422813 7.8576050
## 2 Sb09g029170 27.15144 1.881247e-07 0.001939660 10.557922 2.1108217
## 3 Sb01g027570 24.01893 9.539320e-07 0.003278505 5.618592 1.1224756
## 4 Sb01g049480 24.34469 8.054829e-07 0.003278505 12.923175 2.5750263
## 5 Sb02g029500 24.50740 7.402510e-07 0.003278505 10.706748 2.1375197
## 6 Sb06g030910 24.73321 6.583969e-07 0.003278505 11.669192 2.3314440
## 7 Sb06g029520 22.88469 1.720163e-06 0.004433935 17.065528 3.3890648
## 8 Sb10g030520 22.92296 1.686259e-06 0.004433935 6.138199 1.2239239
## 9 Sb03g032450 22.39862 2.215335e-06 0.005075825 7.215683 1.4382912
## 10 Sb01g017890 20.84643 4.976180e-06 0.005130690 8.474861 1.6939772
## 11 Sb02g000230 20.90889 4.816546e-06 0.005130690 7.698030 1.5370680
## 12 Sb02g043270 21.00442 4.582251e-06 0.005130690 4.388220 0.8739358
## 13 Sb03g030350 20.92360 4.779688e-06 0.005130690 8.481503 1.6917119
## 14 Sb04g009380 21.16492 4.214066e-06 0.005130690 5.816588 1.1621681
## 15 Sb04g022780 21.24101 4.050055e-06 0.005130690 7.173298 1.4294987
## 16 Sb04g023540 20.91532 4.800410e-06 0.005130690 3.504512 0.7001695
## 17 Sb05g023970 21.24897 4.033265e-06 0.005130690 4.871567 0.9706728
## 18 Sb06g031460 21.97864 2.757017e-06 0.005130690 7.486657 1.4968000
## 19 Sb06g032740 20.90139 4.835441e-06 0.005130690 5.646188 1.1290081
## 20 Sb09g022360 20.97467 4.653966e-06 0.005130690 4.714452 0.9409795
## tech_var mean_obs var_obs sigma_sq_pmax smooth_sigma_sq
## 1 0.0269575462 5.437268 7.8845625 7.8576050 0.10042321
## 2 0.0007626147 7.816658 2.1115843 2.1108217 0.09751482
## 3 0.0012428359 7.136449 1.1237184 1.1224756 0.08525765
## 4 0.0096086891 5.521946 2.5846350 2.5750263 0.09748964
## 5 0.0038299837 6.236925 2.1413497 2.1375197 0.08200889
## 6 0.0023944573 6.651215 2.3338385 2.3314440 0.08053236
## 7 0.0240408402 4.965810 3.4131056 3.3890648 0.12062126
## 8 0.0037157803 6.132523 1.2276397 1.2239239 0.08323863
## 9 0.0048453369 5.740736 1.4431366 1.4382912 0.09070462
## 10 0.0009949900 7.637174 1.6949722 1.6939772 0.09389996
## 11 0.0025378948 6.291518 1.5396059 1.5370680 0.08148414
## 12 0.0037082658 5.776490 0.8776440 0.8739358 0.08975347
## 13 0.0045887113 5.910535 1.6963006 1.6917119 0.08670741
## 14 0.0011494398 7.059961 1.1633176 1.1621681 0.08418323
## 15 0.0051609291 5.748936 1.4346596 1.4294987 0.09048182
## 16 0.0007329007 7.553536 0.7009024 0.7001695 0.09231031
## 17 0.0036406368 5.916088 0.9743134 0.9706728 0.08660073
## 18 0.0005313628 8.060806 1.4973313 1.4968000 0.10290909
## 19 0.0002294607 8.834977 1.1292376 1.1290081 0.12430690
## 20 0.0019108458 6.744899 0.9428903 0.9409795 0.08099911
## final_sigma_sq degrees_free
## 1 7.8576050 1
## 2 2.1108217 1
## 3 1.1224756 1
## 4 2.5750263 1
## 5 2.1375197 1
## 6 2.3314440 1
## 7 3.3890648 1
## 8 1.2239239 1
## 9 1.4382912 1
## 10 1.6939772 1
## 11 1.5370680 1
## 12 0.8739358 1
## 13 1.6917119 1
## 14 1.1621681 1
## 15 1.4294987 1
## 16 0.7001695 1
## 17 0.9706728 1
## 18 1.4968000 1
## 19 1.1290081 1
## 20 0.9409795 1
so <- sleuth_wt(so,'conditionWW','full')
results_table <- sleuth_results(so, 'conditionWW', test_type = 'wt')
sleuth_significant <- dplyr::filter(results_table, qval <= 0.05)
head(sleuth_significant, 20)
## target_id pval qval b se_b mean_obs
## 1 Sb02g029940 1.480105e-97 3.052125e-93 5.113227 0.2439366 5.437268
## 2 Sb06g030910 7.288914e-58 7.515235e-54 -2.767723 0.1726061 6.651215
## 3 Sb09g029170 1.659953e-54 1.140996e-50 2.646767 0.1702398 7.816658
## 4 Sb02g029500 6.363724e-48 3.280659e-44 -2.652629 0.1823849 6.236925
## 5 Sb01g049480 9.125599e-42 3.763579e-38 2.916635 0.2154148 5.521946
## 6 Sb06g029520 1.714308e-37 5.891791e-34 -3.340866 0.2610758 4.965810
## 7 Sb06g031460 6.880121e-37 2.026785e-33 2.206830 0.1739286 8.060806
## 8 Sb02g000230 5.338813e-31 1.125528e-27 2.232334 0.1928113 6.291518
## 9 Sb03g030350 4.500870e-31 1.125528e-27 -2.343266 0.2021373 5.910535
## 10 Sb04g009380 5.458164e-31 1.125528e-27 1.942344 0.1677918 7.059961
## 11 Sb01g017890 1.096846e-30 2.056188e-27 2.341897 0.2033618 7.637174
## 12 Sb01g027570 1.384164e-30 2.378571e-27 -1.930588 0.1679378 7.136449
## 13 Sb02g032815 3.040858e-29 4.823503e-26 1.907134 0.1698861 7.027473
## 14 Sb06g027520 3.836596e-28 5.651032e-25 2.868165 0.2607508 7.737337
## 15 Sb03g032450 7.029695e-28 9.663956e-25 2.177463 0.1989478 5.740736
## 16 Sb10g030520 1.173675e-27 1.512647e-24 -2.013254 0.1847302 6.132523
## 17 Sb04g022780 2.010690e-27 2.438967e-24 2.160773 0.1991633 5.748936
## 18 Sb05g001050 1.072051e-25 1.228153e-22 2.865335 0.2734204 5.363238
## 19 Sb06g032740 3.879851e-25 4.210863e-22 1.916693 0.1850582 8.834977
## 20 Sb09g022360 1.010259e-24 1.041627e-21 -1.752569 0.1707278 6.744899
## var_obs tech_var sigma_sq smooth_sigma_sq final_sigma_sq
## 1 7.8845625 0.0269575462 0.024337209 0.06230006 0.06230006
## 2 2.3338385 0.0023944573 0.042294830 0.04226623 0.04229483
## 3 2.1115843 0.0007626147 0.011702703 0.04270977 0.04270977
## 4 2.1413497 0.0038299837 0.034191835 0.04606639 0.04606639
## 5 2.5846350 0.0096086891 0.031150422 0.05999660 0.05999660
## 6 3.4131056 0.0240408402 0.056821080 0.07820005 0.07820005
## 7 1.4973313 0.0005313628 0.044845374 0.04402345 0.04484537
## 8 1.5396059 0.0025378948 0.053226399 0.04539551 0.05322640
## 9 1.6963006 0.0045887113 0.056700543 0.05099744 0.05670054
## 10 1.1633176 0.0011494398 0.038234436 0.04108168 0.04108168
## 11 1.6949722 0.0009949900 0.061039015 0.04199324 0.06103902
## 12 1.1237184 0.0012428359 0.005717086 0.04106183 0.04106183
## 13 1.1257817 0.0012789022 0.042013051 0.04110683 0.04201305
## 14 2.5494999 0.0010455669 0.100940886 0.04236779 0.10094089
## 15 1.4431366 0.0048453369 0.021070718 0.05452499 0.05452499
## 16 1.2276397 0.0037157803 0.010887106 0.04747208 0.04747208
## 17 1.4346596 0.0051609291 0.037311354 0.05433813 0.05433813
## 18 2.5527531 0.0122589251 0.099879168 0.06436212 0.09987917
## 19 1.1292376 0.0002294607 0.033675849 0.05114035 0.05114035
## 20 0.9428903 0.0019108458 0.024889917 0.04181114 0.04181114
The easiest way to view and interact with the results is to generate the sleuth live site that allows for exploratory data analysis
sleuth_live(so)