Applications of data science to flow cytometry

Flow cytometery for doctors

Anjan Kr Dasgupta (Calcutta University,(Retd))https://scholar.google.com/citations?hl=en&user=9Xc_04IAAAAJ&view_op=list_works&sortby=pubdate
2020-12-25

A bit of history

The basic of basics

Fluoresence Spectrometer Flow Cytometer
Molecules,Nanoparticles \(\le 500nm\) Cells (\(\mu m\))
No statistical Information Population behavior
Single molecule data unavailable Single cell data

Block diagrams Fluoresence

Excitation of light leads to scattering and emission

Block Diagram Scattering

Two major scattering classes

Block Digram for a flow cytometer

The flow cytometry scheme

The integration of cytometric platform with data science

Commercial software and Data

Software packages - FlowJo, - FCS Express

Public Data Base

Data repository

https://flowrepository.org/public_experiment_representations?top=10

https://www.cytobank.org/

Data mining in Flow Cytometry

R command to call a FCS file

library(flowCore)
fname='./Compensation Controls/Patient_17_Unstained.fcs'
ff <- read.FCS(fname)
typeof(ff)
[1] "S4"

What is S4 ?

S4 2 DataFrames

 [1] "FSC.A"             "FSC.H"             "FSC.W"            
 [4] "SSC.A"             "SSC.H"             "SSC.W"            
 [7] "Alexa.Fluor.488.A" "PE.A"              "APC.A"            
[10] "PerCP.A"           "APC.Cy7.A"         "Time"             

Two important computaional problems specific to Flow Cytometry

A typical Flow plot for unlabelled samples

Typical target objectives in Flow analysis

Typical analyte PBMC

Automated gating Flow Clust (Box-Cox)

# Using the flowClust code to remove outliers
library(flowClust)
fname='./Compensation Controls/Patient_17_Unstained.fcs'
ff <- flowCore::read.FCS(fname)
df=data.frame(ff@exprs[,1:6])
res1 <- flowClust(df,
varNames=c("FSC.H", "SSC.H"), K=1) 
df2=Subset(df,res1)
res2<-flowClust(df2, varNames=c("FSC.H", "SSC.H"), K=1:6, B=100)
criterion(res2,"BIC")
[1] -1392214 -1378290 -1361787 -1354436 -1353362 -1350507
# There will be 6 clusters 
# 6 clusters 
plot(res2[[6]], data=df2, level=0.8, z.cutoff=0)

Rule of identifying outliers: 80% quantile
# Now suppose we want to identify a particular cluster
# Let us identify the clusters 
ik1=which(res2[[6]]@label==1)
ik2=which(res2[[6]]@label==2)
ik3=which(res2[[6]]@label==3)
ik4=which(res2[[6]]@label==4)
ik5=which(res2[[6]]@label==5)
ik6=which(res2[[6]]@label==6)
attach(df2)

Gating out a single cluster out

Now suppose we want to gate one and see the other 5 clusters

plot(FSC.H,SSC.H,pch=16,cex=0.3,col="red")

points(FSC.H[ik4],SSC.H[ik4],pch=16,cex=0.3)

Gating the cell debris out

plot(FSC.H[-ik4],SSC.H[-ik4],pch=16,cex=0.3)

Flow peaks

FlowPeaks is a fast unsupervised clustering for flow cytometry data via K-means and density peak finding, 2012, Bioinformatics 8(15):2052-8

library(flowPeaks)
fp<-flowPeaks(asinh(ff@exprs[,c(1,2)]))
plot(fp) #an alternative of using summary(fp) 

The prescribed manual clustering

Origin of Doublets

A non-rocket-science approach

What has been seen in case of the elimination of doublets the conventional clustering method for gating fails , as gating in this case has something to do with whether the cells are present in singlets or doublets. The common sense knowledge that is used in this case is that points with higher area-height ratio are more likely to be doublets. This is true for both the forward and side scattering.

A simple combined algo

Effect of doublet filtering on the FSC-SSC clusters

Compensation

# An example In this hypothetical experiment, cells were stained with FITC CD3 and with a PE isotype control, and collected at different compensation values to correct for the FITC spillover into the PE channel (panel 1 is uncompensated; 2-5 are for increasing compensation values). ## The questions - Which panel represents proper compensation? - On what basis did you make this determination?

Twocolor compensation

http://www.drmr.com/compensation/ Let’s consider an experiment where we stain human peripheral blood lymphocytes with - FITC CD3 \(\rightarrow ^{Stains}\) CD4 and CD8 T cells) - PE CD8 \(\rightarrow ^{Stains}\) CD8 T cells and, less brightly, NK cells.

Four populations

Qunatitative Measures

Measured fluorescence for each cell

Mathematical formulea for 2 color compensation

\[ FL1_{measured}=FITC_{true}+0.01\cdot PE_{true}\\ FL2_{measured}=0.15\cdot FITC_{true} +PE_{true} \] ## Finding the true value of fluoresence Let’s apply these equations to our measured values for CD8 T cells (FL1 = 104, FL2 = 415). We may find PE true and FITC true from:

Multicolor compensation

Multi-color compensation is a simple extension of two-color compensation:

M(1) = A(11) x F(1) + A(21) x F(2) + … A(n1) x F(n) M(2) = A(12) x F(1) + A(22) x F(2) + … A(n2) x F(n) … M(n) = A(1n) x F(1) + A(2n) x F(2) + … A(nn) x F(n)

where M(i) is the measured fluorescence in channel i; F(i) is the amount of fluorescent molecule (i) present on the cell of interest, and A(ij) is the ratio of the fluorescence of molecule (i) in channel (i) to the fluorescence of molecule (i) in channel (j).

Immuno Phenotyping

A good link to start with https://www.abcam.com/protocols/flow-cytometry-immunophenotyping

Basic Idea

FLOW CYTOMETRIC ANALYSIS OF LEUKEMIA - A case study

How defined is leukomia

Diagnostic method

Microscopy based assessment

Flow based guess with unlabelled samples

# Algoritmic problem - Identifying the clusters with cell type - Choosing between the manual and computaional method (doctors like the first!)

Gating the Blast

Blood cells are not fully developed and are called blasts. Gating out the blasts is thus nontrivial.

Probes

The classic BLMG gates

The antibodies

A real time Leukomia File

      Time            FSC.A            FSC.H       
 Min.   :  53.6   Min.   : 10144   Min.   : 10024  
 1st Qu.: 565.5   1st Qu.: 60168   1st Qu.: 43537  
 Median :1085.2   Median : 79984   Median : 57508  
 Mean   :1083.5   Mean   : 93186   Mean   : 61014  
 3rd Qu.:1595.3   3rd Qu.:109668   3rd Qu.: 73634  
 Max.   :2111.9   Max.   :262143   Max.   :258709  
     SSC.A              SSC.H            FITC.A         
 Min.   :   155.4   Min.   :   156   Min.   :   -51.45  
 1st Qu.:  1407.0   1st Qu.:  1101   1st Qu.:   138.60  
 Median :  1939.3   Median :  1443   Median :   213.15  
 Mean   :  3243.5   Mean   :  1914   Mean   :   722.33  
 3rd Qu.:  2998.8   3rd Qu.:  1943   3rd Qu.:   348.60  
 Max.   :262143.0   Max.   :224332   Max.   :262143.00  
      PE.A          PerCP.Cy5.5.A         PE.Cy7.A       
 Min.   :   -81.9   Min.   :  -105.0   Min.   :   -90.3  
 1st Qu.:   444.1   1st Qu.:   536.5   1st Qu.:  1098.3  
 Median :   828.5   Median :  1029.0   Median :  2794.6  
 Mean   :  1237.3   Mean   :  1814.5   Mean   :  4506.2  
 3rd Qu.:  1395.5   3rd Qu.:  1857.5   3rd Qu.:  6019.6  
 Max.   :239286.6   Max.   :262143.0   Max.   :262143.0  
     APC.A              APC.H7.A            V450.A        
 Min.   :   -89.28   Min.   :  -224.1   Min.   :  -439.3  
 1st Qu.:    70.68   1st Qu.:   306.0   1st Qu.:   213.9  
 Median :   169.26   Median :   580.3   Median :   378.4  
 Mean   :  1781.22   Mean   :  1516.5   Mean   :  1291.5  
 3rd Qu.:   778.41   3rd Qu.:  1208.1   3rd Qu.:   670.5  
 Max.   :262143.00   Max.   :262143.0   Max.   :262143.0  
    V500c.A        
 Min.   :  -284.1  
 1st Qu.:   159.8  
 Median :   316.2  
 Mean   :  1187.2  
 3rd Qu.:   621.0  
 Max.   :262143.0  

CD45(V500c.A) CD 19(PE.Cy7.A) and CD 10 (APC.A)

attach(dfL)
X=log(FSC.A)
Y=log(FSC.H)
X1=log(SSC.A)
Y1=log(SSC.H)
g=lm(Y~X)
g1=lm(Y1~X1)
ik=which( Y >g$coefficients[2]*X+g$coefficients[1])
par(mfrow=c(2,2))
plot(X,Y,pch=16,cex=0.2,col="blue",xlab='log(AREA)',ylab='log(HEIGHT)',main="FWD")
points(X[ik],Y[ik],pch=16,cex=0.2,col="brown")
fsc=FSC.H[ik]
ssc=SSC.H[ik]
fcd45=V500c.A[ik]
fcd19=PE.Cy7.A[ik]
fcd10=APC.A[ik]

[1] -988502.2 -986779.2 -986474.7 -986135.8

Rule of identifying outliers: 50% quantile

res1 <- flowClust(dfL,
varNames=c("APC.A","PE.Cy7.A"), K=1) 
df2=Subset(dfL,res1)
res2<-flowClust(df2, varNames=c("APC.A","PE.Cy7.A"), K=1:4, B=100)
criterion(res2,"BIC")
[1] -1141117 -1123068 -1119284 -1117204
# There will be 6 clusters 
# 6 clusters 
plot(res2[[4]], data=df2, level=0.5, z.cutoff=0)

Rule of identifying outliers: 50% quantile
      Time            FSC.A            FSC.H       
 Min.   :  53.6   Min.   : 10144   Min.   : 10024  
 1st Qu.: 565.5   1st Qu.: 60168   1st Qu.: 43537  
 Median :1085.2   Median : 79984   Median : 57508  
 Mean   :1083.5   Mean   : 93186   Mean   : 61014  
 3rd Qu.:1595.3   3rd Qu.:109668   3rd Qu.: 73634  
 Max.   :2111.9   Max.   :262143   Max.   :258709  
     SSC.A              SSC.H            FITC.A         
 Min.   :   155.4   Min.   :   156   Min.   :   -51.45  
 1st Qu.:  1407.0   1st Qu.:  1101   1st Qu.:   138.60  
 Median :  1939.3   Median :  1443   Median :   213.15  
 Mean   :  3243.5   Mean   :  1914   Mean   :   722.33  
 3rd Qu.:  2998.8   3rd Qu.:  1943   3rd Qu.:   348.60  
 Max.   :262143.0   Max.   :224332   Max.   :262143.00  
      PE.A          PerCP.Cy5.5.A         PE.Cy7.A       
 Min.   :   -81.9   Min.   :  -105.0   Min.   :   -90.3  
 1st Qu.:   444.1   1st Qu.:   536.5   1st Qu.:  1098.3  
 Median :   828.5   Median :  1029.0   Median :  2794.6  
 Mean   :  1237.3   Mean   :  1814.5   Mean   :  4506.2  
 3rd Qu.:  1395.5   3rd Qu.:  1857.5   3rd Qu.:  6019.6  
 Max.   :239286.6   Max.   :262143.0   Max.   :262143.0  
     APC.A              APC.H7.A            V450.A        
 Min.   :   -89.28   Min.   :  -224.1   Min.   :  -439.3  
 1st Qu.:    70.68   1st Qu.:   306.0   1st Qu.:   213.9  
 Median :   169.26   Median :   580.3   Median :   378.4  
 Mean   :  1781.22   Mean   :  1516.5   Mean   :  1291.5  
 3rd Qu.:   778.41   3rd Qu.:  1208.1   3rd Qu.:   670.5  
 Max.   :262143.00   Max.   :262143.0   Max.   :262143.0  
    V500c.A        
 Min.   :  -284.1  
 1st Qu.:   159.8  
 Median :   316.2  
 Mean   :  1187.2  
 3rd Qu.:   621.0  
 Max.   :262143.0  

Flow peaks

FlowPeaks is a fast unsupervised clustering for flow cytometry data via K-means and density peak finding, 2012, Bioinformatics 8(15):2052-8

library(flowPeaks)
fp<-flowPeaks(asinh(ffL@exprs[,c(13,4)]))
        step 0, set the intial seeds, tot.wss=1040.56
        step 1, do the rough EM, tot.wss=712.991 at 0.269617 sec
        step 2, do the fine transfer of Hartigan-Wong Algorithm
                 tot.wss=703.557 at 0.478351 sec

#The Beauty of SOM

Some other important applications

https://rpubs.com/ANJANKRDASGUPTA/706017

Take home lessons

Acknowledgements

Lastly