Applications of data science to flow cytometry
Anjan Kr. Dasgupta adgcal@gmail.com
Dec 26 2020
A bit of history
- In the year 1954, Wallace Coulter first proposed to devise an instrument, which can electronically measure cells in a conductive liquid. This form the basis of a flow cytometer.
- The first fluorescence-based flow cytometry device (ICP 11) was developed in 1968 by Wolfgang Göhde from the University of Münster, Germany.
- The original name - pulse cytophotometry - 1978 name changed to flow cytometry, at - (1971) from Bio/Physics Systems Inc.
- PAS 8000 (1973) from Partec
- FACS instrument from Becton Dickinson (1974),
- ICP 22 (1975) from Partec/Phywe
- Epics from Coulter (1977/78)2.
Same principle -different outcomes
Molecules,Nanoparticles \(\le 500nm\) |
Cells (\(\mu m\)) |
No statistical Information |
Population behavior |
Single molecule data unavailable |
Single cell data |
Block diagrams Fluoresence
Block Diagram Scattering
Block Digram
Flow cytometry basic algorithmic issues
Flow Cytometry - Commercial software and Data
Flow cytometers can now measure dozens of parameters. Typically a given experiment of a flow cytometer will generate about 10000 rows of 8 to 24 columns (channels) of data.
Software packages
Data repository
- <https://flowrepository.org/public_experiment_representations?top=10>
- <http://docs.cytobank.org/wagn/Troubleshooting>
Target Applications
- Cell Cycle Analysis
- Immunophenotyping
- Cell counting and sorting
- Biomarker detection
Flow Som
fname='./FlowRepository_FR-FCM-Z2KP_files/export_COVID19 samples 21_04_20_ST3_COVID19_ICU_005_A ST3 210420_080_Live_cells.fcs'
ff <- flowCore::read.FCS(fname)
colnames(ff)[1:5]
## NULL
fname='./Compensation Controls/Patient_17_Unstained.fcs'
ff1 <- flowCore::read.FCS(fname)
typeof(ff1)
## [1] "S4"
Understanding S4 Classes
An S4 class definition is quite a bit different from a class definition in other programming languages such as C# and Python.
An S4 class encapsulates data fields inside a setClass function, but doesn’t encapsulate class methods.
Instead, S4 class methods are defined by pairs of special R functions named setMethod and setGeneric. Ref:https://advanced-r-solutions.rbind.io/s4.html
Plotting code
df=data.frame(ff1@exprs[,1:6])
attach(df)
FSC.H-FSC-A AREA-HEIGHT
plot(FSC.H,FSC.A,pch=16,cex=0.2)

SSC.H-SSC-A AREA-HEIGHT SSC
plot(SSC.H,SSC.A,pch=16,cex=0.2)

FSC-SSC
plot(FSC.H,SSC.H,pch=16,cex=0.2)

FSC.W-FSC.A
plot(FSC.W,FSC.A,pch=16,cex=0.2)

Flow clust gating (level=0.1)
Flow clust gating (level=0.8)
Using the flowClust code
library(flowClust)
res1 <- flowClust(df, varNames=c(“FSC.H”, “FSC.A”), K=2, B=100)
df2 <- df[df %in% res1,]
res2 <- flowClust(df2, varNames=c(“FSC.A”, “SSC.A”), K=1:6, B=100)
criterion(res2, “BIC”)
summary(res2[[4]])
ruleOutliers(res2[[4]]) <- list(level=0.95)
summary(res2[[4]]) ruleOutliers(res2[[4]]) <- list(z.cutoff=0.6)
summary(res2[[4]]) plot(res2[[4]], data=df2, level=0.8, z.cutoff=0),
res2.den <- density(res2[[4]], data=df2)
- BIC The Bayesian Information Criterion for the fitted mixture model.
- ICL
The Integrated Completed Likelihood for the fitted mixture model.
A typical output for flowcust

The Model based clustering
- The \(flowClust\) package makes extensive use of the GSL as well as BLAS. If an optimized BLAS library is provided when compiling the package, the flowClust package will be able to run multi-threaded processes.
- GSL \(\equiv\) GNU Scientific Library
- BLAS \(\equiv\) Basic Linear Algebra Subprograms
- This function performs automated clustering for identifying cell populations in flow cytometry data.
- The approach is based on the tmixture model with the Box-Cox transformation, which provides a unified framework to handle outlier identification and data transformation simultaneously.
Flow peaks
FlowPeaks is a fast unsupervised clustering for flow cytometry data via K-means and density peak finding, 2012, Bioinformatics 8(15):2052-8
## Warning: package 'flowPeaks' was built under R version 3.5.2
plot(fp) #an alternative of using summary(fp)

The precribed manual clustering

Origin of Doublets
What is doublet discrimination? - A doublet is a single event that actually consists of 2 independent particles.
- The cytometer classified these particles as a single event because they passed through the interrogation point very close to one another.
- In other words, the particles were so close together when they passed through this laser spot, that the instrument was incapable of distinguishing them as individual events or particles.
A non-rocket-science approach
What has been seen in case of the elimination of doublets the conventional clustering method for gating fails , as gating in this case has something to do with whether the cells are present in singlets or doublets. The common sense knowledge that is used in this case is that points with higher area-height ratio are more likely to be doublets. This is true for both the forward and side scattering.
A simple combined algo
- X=log(FSC.A) and Y=log(FSC.H)
- X1=log(SSC.A) and Y1=log(SSC.H)
- Find slopes & intercepts (m,c) and (m1,c1) of Y~X and Y1~X1
- Find ik for which (Y>mX+c) & (Y1>m1X1+C) are simultaneously satisfied.

Effect of doublet filtering on the FSC-SSC clusters
plot(SSC.A,FSC.A,pch=16,cex=0.5,col="red")
points(SSC.A[ik],FSC.A[ik],pch=16,cex=0.5,col="yellow")

Important site for Flow Resources
https://dillonhammill.github.io/CytoRSuite/ https://github.com/RGLab
## Loading required package: flowCore
## Loading required package: flowWorkspace
## Loading required package: ncdfFlow
## Loading required package: RcppArmadillo
## Loading required package: BH
## Loading required package: openCyto
## Registered gate_manual
## Registered gate_draw
## Registered pp_gate_draw
## All FCS files have the same following channels:
## FSC-A
## FSC-H
## FSC-W
## SSC-A
## SSC-H
## SSC-W
## Alexa Fluor 488-A
## PE-A
## APC-A
## PerCP-A
## APC-Cy7-A
## Time
## write Patient_12_120min_MP20.fcs to empty cdf slot...
## write Patient_17_120min_MP20.fcs to empty cdf slot...
## write Patient_17_Unstained.fcs to empty cdf slot...
## done!
Histogram gating (from easyGgpplot2)
## Loading required package: ggplot2
## sex weight
## 1 Female 63.79293
## 2 Female 65.27743
## 3 Female 66.08444
## 4 Female 62.65430
## 5 Female 65.42912
## 6 Female 65.50606
Plotting Histogram
fname='./Compensation Controls/Patient_17_Unstained.fcs'
ff1 <- flowCore::read.FCS(fname)
df=data.frame(ff1@exprs[,1:6])
attach(df)
## The following objects are masked from df (pos = 14):
##
## FSC.A, FSC.H, FSC.W, SSC.A, SSC.H, SSC.W
FSC histogram
ggplot2.histogram(FSC.H,xlab="Forward Scattering",fill="white", color="black",addDensityCurve=TRUE,addMeanLine=TRUE, densityFill='#FF6666')
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.
