Cell and gene QC

We need to bear in mind this dataset comes from a bigger dataset where outliers have already been excluded.

Violin plots

## Scale for 'colour' is already present. Adding another scale for 'colour',
## which will replace the existing scale.

## Scale for 'colour' is already present. Adding another scale for 'colour',
## which will replace the existing scale.

Histograms

Scatter plots

## Warning: Using size for a discrete variable is not advised.

## Warning: Using size for a discrete variable is not advised.

## Scale for 'colour' is already present. Adding another scale for 'colour',
## which will replace the existing scale.

## Warning: Using size for a discrete variable is not advised.

## Scale for 'colour' is already present. Adding another scale for 'colour',
## which will replace the existing scale.

## Warning: Using size for a discrete variable is not advised.

sample S1826 has more Oligo cells with lower counts. Cutting at minimum 5000 umi counts will get rid of them. for the OPCs we can cut at 2500.

We can also cut at 10% mt genes for both celltypes.

Dimensional redution

This is the dimensional reduction done with the whole dataset, not as accurate as the one we will compute later, with only the oligos and OPCs

Tables

##       OPCs Oligo
## S1823   94   250
## S1824   18   445
## S1825   30   366
## S1826   61  1318
## S1827   20   362
## S1828   58   341

##    OPCs Oligo
## WT  142  1061
## KO  139  2021

Subset to the best quality cells and delete non detectable genes

## [1] "before filtering"

## [1] 18827  3363

## [1] "after filtering"

## [1] 17234  3061

result:

## Scale for 'colour' is already present. Adding another scale for 'colour',
## which will replace the existing scale.

Feature selection and dimensional reduction

Quantify per-gene variation

We quantify per-gene variation computing the variance of the log-normalized expression values (referred to as “log-counts” for simplicity) for each gene across all cells in the population (A. T. L. Lun, McCarthy, and Marioni 2016). We use modelGeneVar() that does also corrects for the abundance of each gene.

Select the HVGs

The next step is to select the subset of HVGs to use in downstream analyses. The simplest HVG selection strategy is to take the top X genes with the largest values for the relevant variance metric. Here I select the top 15 % of genes.

This leaves us with 1255 highly variable genes.

Run PCA and choose PCs

Here we recompute the dimensional reduction to better fit our subsetted oligo data. This will remove the dimensional batch correction performed earlier, that will be recomputed.

## Scale for 'colour' is already present. Adding another scale for 'colour',
## which will replace the existing scale.

## Scale for 'colour' is already present. Adding another scale for 'colour',
## which will replace the existing scale.

subset oligos

Nadine Bestard

12/09/2021

Set-up

Import