Applications of data science to flow cytometry

Anjan Kr. Dasgupta

Dec 26 2020

A bit of history

Same principle -different outcomes

Fluoresence Spectrometer Flow Cytometer
Molecules,Nanoparticles \(\le 500nm\) Cells (\(\mu m\))
No statistical Information Population behavior
Single molecule data unavailable Single cell data

Block diagrams Fluoresence

Block Diagram Scattering

Block Digram

The cytometric platform (cyto\(\equiv\) cell) is a combination of

Flow cytometry basic algorithmic issues

Flow Cytometry - Commercial software and Data

Flow cytometers can now measure dozens of parameters. Typically a given experiment of a flow cytometer will generate about 10000 rows of 8 to 24 columns (channels) of data. Software packages

Data repository

- <https://flowrepository.org/public_experiment_representations?top=10>
- <http://docs.cytobank.org/wagn/Troubleshooting> 

Target Applications

Flow Som

fname='./FlowRepository_FR-FCM-Z2KP_files/export_COVID19 samples 21_04_20_ST3_COVID19_ICU_005_A ST3 210420_080_Live_cells.fcs'
ff <- flowCore::read.FCS(fname)
colnames(ff)[1:5]
## NULL
fname='./Compensation Controls/Patient_17_Unstained.fcs'
ff1 <- flowCore::read.FCS(fname)
typeof(ff1)
## [1] "S4"

Understanding S4 Classes

An S4 class definition is quite a bit different from a class definition in other programming languages such as C# and Python.

An S4 class encapsulates data fields inside a setClass function, but doesn’t encapsulate class methods.

Instead, S4 class methods are defined by pairs of special R functions named setMethod and setGeneric. Ref:https://advanced-r-solutions.rbind.io/s4.html

Plotting code

df=data.frame(ff1@exprs[,1:6])
attach(df)

FSC.H-FSC-A AREA-HEIGHT

plot(FSC.H,FSC.A,pch=16,cex=0.2)

SSC.H-SSC-A AREA-HEIGHT SSC

plot(SSC.H,SSC.A,pch=16,cex=0.2)

FSC-SSC

plot(FSC.H,SSC.H,pch=16,cex=0.2)

FSC.W-FSC.A

plot(FSC.W,FSC.A,pch=16,cex=0.2)

Flow clust gating (level=0.1)

Flow clust gating (level=0.8)

Using the flowClust code

library(flowClust)

res1 <- flowClust(df, varNames=c(“FSC.H”, “FSC.A”), K=2, B=100)

df2 <- df[df %in% res1,]

res2 <- flowClust(df2, varNames=c(“FSC.A”, “SSC.A”), K=1:6, B=100)

criterion(res2, “BIC”)

summary(res2[[4]])

ruleOutliers(res2[[4]]) <- list(level=0.95)

summary(res2[[4]]) ruleOutliers(res2[[4]]) <- list(z.cutoff=0.6)

summary(res2[[4]]) plot(res2[[4]], data=df2, level=0.8, z.cutoff=0),

res2.den <- density(res2[[4]], data=df2)

A typical output for flowcust

The Model based clustering

Flow peaks

FlowPeaks is a fast unsupervised clustering for flow cytometry data via K-means and density peak finding, 2012, Bioinformatics 8(15):2052-8

## Warning: package 'flowPeaks' was built under R version 3.5.2
plot(fp) #an alternative of using summary(fp) 

The precribed manual clustering

Origin of Doublets

What is doublet discrimination? - A doublet is a single event that actually consists of 2 independent particles.

A non-rocket-science approach

What has been seen in case of the elimination of doublets the conventional clustering method for gating fails , as gating in this case has something to do with whether the cells are present in singlets or doublets. The common sense knowledge that is used in this case is that points with higher area-height ratio are more likely to be doublets. This is true for both the forward and side scattering.

A simple combined algo

Effect of doublet filtering on the FSC-SSC clusters

plot(SSC.A,FSC.A,pch=16,cex=0.5,col="red")
points(SSC.A[ik],FSC.A[ik],pch=16,cex=0.5,col="yellow")

Important site for Flow Resources

https://dillonhammill.github.io/CytoRSuite/ https://github.com/RGLab

## Loading required package: flowCore
## Loading required package: flowWorkspace
## Loading required package: ncdfFlow
## Loading required package: RcppArmadillo
## Loading required package: BH
## Loading required package: openCyto
## Registered gate_manual
## Registered gate_draw
## Registered pp_gate_draw
## All FCS files have the same following channels:
## FSC-A
## FSC-H
## FSC-W
## SSC-A
## SSC-H
## SSC-W
## Alexa Fluor 488-A
## PE-A
## APC-A
## PerCP-A
## APC-Cy7-A
## Time
## write Patient_12_120min_MP20.fcs to empty cdf slot...
## write Patient_17_120min_MP20.fcs to empty cdf slot...
## write Patient_17_Unstained.fcs to empty cdf slot...
## done!

Histogram gating (from easyGgpplot2)

library(easyGgplot2)
## Loading required package: ggplot2
head(weight)
##      sex   weight
## 1 Female 63.79293
## 2 Female 65.27743
## 3 Female 66.08444
## 4 Female 62.65430
## 5 Female 65.42912
## 6 Female 65.50606

Plotting Histogram

fname='./Compensation Controls/Patient_17_Unstained.fcs'
ff1 <- flowCore::read.FCS(fname)
df=data.frame(ff1@exprs[,1:6])
attach(df)
## The following objects are masked from df (pos = 14):
## 
##     FSC.A, FSC.H, FSC.W, SSC.A, SSC.H, SSC.W

FSC histogram

ggplot2.histogram(FSC.H,xlab="Forward Scattering",fill="white", color="black",addDensityCurve=TRUE,addMeanLine=TRUE, densityFill='#FF6666')
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.