- Role of read depth in experiment design
- "Do you have enough reads?": Introducing subSeq software
- "Do people usually have enough reads?": subSeq results across many RNA-Seq experiments
- "Exactly how many reads do we need?": Predicting the curve
October 10, 2014
\[ Y_{i,j} \sim \mbox{Binom}(X_{i,j}, p) \]
library(subSeq) data(hammer) library(Biobase) countmatrix <- exprs(hammer)[, 1:4] countmatrix <- countmatrix[rowSums(countmatrix) >= 5, ] head(countmatrix)
## SRX020102 SRX020103 SRX020104 SRX020105 ## ENSRNOG00000000001 2 4 18 24 ## ENSRNOG00000000007 4 1 3 1 ## ENSRNOG00000000008 0 1 4 2 ## ENSRNOG00000000010 19 10 19 13 ## ENSRNOG00000000012 7 5 1 0 ## ENSRNOG00000000017 4 1 12 18
We compare a control group with rats treated with L5 SNL.
treatment <- phenoData(hammer)$protocol[1:4] treatment
## [1] control control L5 SNL L5 SNL ## Levels: control L5 SNL
We recommend choosing proportions on a logarithmic scale.
proportions <- 10^seq(-2, 0, .25) proportions
## [1] 0.01000 0.01778 0.03162 0.05623 0.10000 0.17783 0.31623 0.56234 1.00000
Choose one or more RNA-Seq differential expression methods for each step.
subsamples <- subsample(countmatrix, proportions, method=c("edgeR", "DESeq2"),
treatment=treatment)
subsamples.summ <- summary(subsamples) head(subsamples.summ)
## depth proportion method replication significant pearson spearman ## 1: 198567 0.01000 edgeR 1 201 0.3296 0.3892 ## 2: 352812 0.01778 edgeR 1 425 0.3920 0.4603 ## 3: 627759 0.03162 edgeR 1 827 0.4606 0.5373 ## 4: 1117230 0.05623 edgeR 1 1534 0.5503 0.6222 ## 5: 1985611 0.10000 edgeR 1 2468 0.6417 0.7127 ## 6: 3530798 0.17783 edgeR 1 3502 0.7456 0.8001 ## MSE estFDP rFDP percent ## 1: 2.3728 0.001191 0.000000 0.02403 ## 2: 2.0724 0.004430 0.007059 0.05046 ## 3: 1.7405 0.007126 0.008464 0.09805 ## 4: 1.3607 0.005091 0.003911 0.18271 ## 5: 1.0577 0.009132 0.009319 0.29236 ## 6: 0.7355 0.013354 0.013992 0.41289
| depth | proportion | significant | pearson | spearman | MSE | estFDP | rFDP | percent |
|---|---|---|---|---|---|---|---|---|
| 198567 | 0.0100 | 201 | 0.3296 | 0.3892 | 2.3728 | 0.0012 | 0.0000 | 0.0240 |
| 352812 | 0.0178 | 425 | 0.3920 | 0.4603 | 2.0724 | 0.0044 | 0.0071 | 0.0505 |
| 627759 | 0.0316 | 827 | 0.4606 | 0.5373 | 1.7405 | 0.0071 | 0.0085 | 0.0981 |
| 1117230 | 0.0562 | 1534 | 0.5503 | 0.6222 | 1.3607 | 0.0051 | 0.0039 | 0.1827 |
| 1985611 | 0.1000 | 2468 | 0.6417 | 0.7127 | 1.0577 | 0.0091 | 0.0093 | 0.2924 |
| 3530798 | 0.1778 | 3502 | 0.7456 | 0.8001 | 0.7355 | 0.0134 | 0.0140 | 0.4129 |
| 6280315 | 0.3162 | 5152 | 0.8323 | 0.8728 | 0.4479 | 0.0243 | 0.0219 | 0.6025 |
| 11172016 | 0.5623 | 6763 | 0.9194 | 0.9419 | 0.1967 | 0.0340 | 0.0271 | 0.7868 |
| 19862441 | 1.0000 | 8363 | 1.0000 | 1.0000 | 0.0000 | 0.0510 | 0.0000 | 1.0000 |
plot(subsamples.summ)
\[ G(p) = k\times \Phi(log(p - b), \mu, \sigma)\]
where