- Motivation
- Experimental design: batch effects and confounding
- Normalization: accounting for sample quality
- Clustering: resampling and sequential strategies
- Lineage reconstruction with single-cell RNA-seq
September 12, 2016
Wang et al. (2009) Nature Review Genetics 10, 57-63.
Owens (2012) Nature 491, 27–29.
www.fluidigm.com
The brain is made up of 100’s if not 1000’s different cell types.
We need a rational way to identify and classify them.
Low sequencing depth: 192 cells per Illumina lane (average 1.2M reads per cell)
_ | Olfactory | Cortical |
---|---|---|
Mice | 51 | 41 |
C1 runs | 61 | 40 |
Illumina Lanes | 19 | 7 |
Cells | 2,627 | 1,249 |
Cells pass QC | 2,190 | 1,042 |
Sequenced reads | 4,000 M | 1,500 M |
Each level of the factor of interest, say layer of origin, is observed in each batch.
We can only isolate one cell type per animal / batch.
See also in bioRxiv:
Hicks et al. (2015) http://dx.doi.org/10.1101/025528
Need to have multiple batches per condition and account for batch effects in the design matrix (nested design).
Note that mouse and C1 run effects are still confounded!
See also in bioRxiv:
Tung et al. (2016) http://dx.doi.org/10.1101/062919
To account for batch j(i) in condition i, we can model the log-expression of each sample k as
yijk=μ+αi+βj(i)+εijk,
subject to the n+1 constraints
n∑i=1αi=0;ni∑j=1βj(i)=0.
Absolute correlation between PCA of log(TPM+1) and QC scores.
PC1 of the QC score matrix stratified by batch.
RUV can be used to estimate Wα using negative control genes.
Gagnon-Bartsch & Speed (2012) [http://dx.doi.org/10.1093/biostatistics/kxr034]
Risso et al. (2014) [http://dx.doi.org/10.1038/nbt.2931]
RUVSeq package [http://bioconductor.org/packages/RUVSeq/]
Risso et al. (2014) [http://dx.doi.org/10.1038/nbt.2931]
RUVSeq package [http://bioconductor.org/packages/RUVSeq/]
Apply a (combination of) normalization method(s).
Rank the normalizations using a set of performance scores.
Michael Cole, Nir Yosef, Sandrine Dudoit
Points color-coded by average score.
Heatmap of the 100 most variable genes
Heatmap of the 100 most variable genes
In the literature, most approaches can be summarized by three steps.
For each step there are many tuning parameters. E.g.,
Given a base cluster algorithm
Implemented in the R/Bioconductor package clusterExperiment: http://bioconductor.org/packages/clusterExperiment
Elizabeth Purdom
Given an underlying clustering strategy, e.g., k-means or PAM with a particular choice of k, we repeat the following:
Our sequential clustering works as follows.
Inspired by the "tight clustering" algorithm
Tseng and Wong (2005) http://dx.doi.org/10.1111/j.0006-341X.2005.031032.x
Functions to
Available at http://bioconductor.org/packages/clusterExperiment
Liam Purvis, Elizabeth Purdom
GOAL: High-resolution view of transcriptional changes during differentiation and neurogenesis.
Questions:
Kelly Street
Russell Fletcher
Kelly Street
Kelly Street
Kelly Street
Flexible, supervised branching lineage reconstruction.
Input: scRNA-seq data after normalization, clustering and dimensionality reduction.
R package available at: https://github.com/kstreet13/slingshot
Kelly Street, Elizabeth Purdom, Sandrine Dudoit
Kelly Street
Kelly Street
Kelly Street, Russell Fletcher, Diya Das
Kelly Street, Russell Fletcher, Diya Das