Normalisation by deconvolution

In order to correct for systematic differences in sequencing coverage between libraries we will normalise the dataset. This involves dividing all counts for each cell by a cell-specific scaling factor, often called a “size factor” (Anders and Huber 2010). The assumption here is that any cell-specific bias (e.g., in capture or amplification efficiency) affects all genes equally via scaling of the expected mean count for that cell. The size factor for each cell represents the estimate of the relative bias in that cell, so division of its counts by its size factor should remove that bias.

Specifically we will used the deconvolution method available in the scran package. This method allows to take in consideration the composition bias between samples (Lun et al., 2016)

On top of normalisation the data is also log-transformed. The log-transformation is useful as differences in the log-values represent log-fold changes in expression. Or in other words, which is more interesting - a gene that is expressed at an average count of 50 in cell type A and 10 in cell type B, or a gene that is expressed at an average count of 1100 in A and 1000 in B? Log-transformation focuses on the former by promoting contributions from genes with strong relative differences.

Assess Confunding factors impact

Variance Explained plots

Variable-level metrics are computed by the getVarianceExplained() function (before and after normalization). This calculates the percentage of variance of each gene’s expression that is explained by each variable in the colData of the SingleCellExperiment object. We can then use this to determine which experimental factors are contributing most to the variance in expression. This is useful for diagnosing batch effects or to quickly verify that a treatment has an effect.

The percentage of variance explained by a factor is on the x axis, and in the y axis there is the density of the R-squared values across all genes.

The “total” label is the total number of molecules, that correlates with the detected number of genes, “detected”.

Before normalisation

Before normalisation it is expected that most variance will be explained by the sequencing depth, i.e. the total number of umis and the total number of genes

## Warning: Removed 342 rows containing non-finite values (stat_density).

After normalisation

We can see how there is less variance explained now by factors such as the detected genes or the number of counts

## Warning in self$trans$transform(x): NaNs produced

## Warning: Transformation introduced infinite values in continuous x-axis

## Warning: Removed 344 rows containing non-finite values (stat_density).

Dimensional reduction

We will more accurate dimensional reductions in the next step, only using the most variable genes to reduce noise Another way to assess the variance is with a PCA plot. Here again we can see how the sequencing depth(sum) explains most of the variance before the normalisation

Another type of dimensional reduction are the non linear UMAP and TSNE reductions.

Session Info

Click to expand

## R version 4.1.1 (2021-08-10)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 10 x64 (build 19041)
## 
## Matrix products: default
## 
## locale:
## [1] LC_COLLATE=English_United Kingdom.1252 
## [2] LC_CTYPE=English_United Kingdom.1252   
## [3] LC_MONETARY=English_United Kingdom.1252
## [4] LC_NUMERIC=C                           
## [5] LC_TIME=English_United Kingdom.1252    
## 
## attached base packages:
## [1] parallel  stats4    stats     graphics  grDevices utils     datasets 
## [8] methods   base     
## 
## other attached packages:
##  [1] Matrix_1.3-4                scran_1.20.1               
##  [3] scater_1.20.1               ggplot2_3.3.5              
##  [5] scuttle_1.2.1               SingleCellExperiment_1.14.1
##  [7] SummarizedExperiment_1.22.0 Biobase_2.52.0             
##  [9] GenomicRanges_1.44.0        GenomeInfoDb_1.28.1        
## [11] IRanges_2.26.0              S4Vectors_0.30.0           
## [13] BiocGenerics_0.38.0         MatrixGenerics_1.4.2       
## [15] matrixStats_0.60.0          here_1.0.1                 
## 
## loaded via a namespace (and not attached):
##  [1] bitops_1.0-7              RcppAnnoy_0.0.19         
##  [3] rprojroot_2.0.2           tools_4.1.1              
##  [5] bslib_0.2.5.1             utf8_1.2.2               
##  [7] R6_2.5.0                  irlba_2.3.3              
##  [9] vipor_0.4.5               uwot_0.1.10              
## [11] DBI_1.1.1                 colorspace_2.0-2         
## [13] withr_2.4.2               tidyselect_1.1.1         
## [15] gridExtra_2.3             compiler_4.1.1           
## [17] BiocNeighbors_1.10.0      DelayedArray_0.18.0      
## [19] labeling_0.4.2            sass_0.4.0               
## [21] scales_1.1.1              stringr_1.4.0            
## [23] digest_0.6.27             rmarkdown_2.10           
## [25] XVector_0.32.0            pkgconfig_2.0.3          
## [27] htmltools_0.5.1.1         sparseMatrixStats_1.4.2  
## [29] highr_0.9                 limma_3.48.2             
## [31] rlang_0.4.11              DelayedMatrixStats_1.14.2
## [33] jquerylib_0.1.4           generics_0.1.0           
## [35] farver_2.1.0              jsonlite_1.7.2           
## [37] BiocParallel_1.26.1       dplyr_1.0.7              
## [39] RCurl_1.98-1.3            magrittr_2.0.1           
## [41] BiocSingular_1.8.1        GenomeInfoDbData_1.2.6   
## [43] Rcpp_1.0.7                ggbeeswarm_0.6.0         
## [45] munsell_0.5.0             fansi_0.5.0              
## [47] viridis_0.6.1             lifecycle_1.0.0          
## [49] stringi_1.7.3             yaml_2.2.1               
## [51] edgeR_3.34.0              zlibbioc_1.38.0          
## [53] Rtsne_0.15                grid_4.1.1               
## [55] dqrng_0.3.0               crayon_1.4.1             
## [57] lattice_0.20-44           cowplot_1.1.1            
## [59] beachmat_2.8.0            locfit_1.5-9.4           
## [61] metapod_1.0.0             knitr_1.33               
## [63] pillar_1.6.2              igraph_1.2.6             
## [65] codetools_0.2-18          ScaledMatrix_1.0.0       
## [67] glue_1.4.2                evaluate_0.14            
## [69] vctrs_0.3.8               gtable_0.3.0             
## [71] purrr_0.3.4               assertthat_0.2.1         
## [73] xfun_0.25                 rsvd_1.0.5               
## [75] RSpectra_0.16-0           viridisLite_0.4.0        
## [77] tibble_3.1.3              beeswarm_0.4.0           
## [79] cluster_2.1.2             bluster_1.2.1            
## [81] statmod_1.4.36            ellipsis_0.3.2

Normalisation

NadineBestard

08/03/2021

Set-up