Applications of data science to flow cytometry

Flow cytometery for doctors

Anjan Kr Dasgupta (Calcutta University)https://scholar.google.com/citations?hl=en&user=9Xc_04IAAAAJ&view_op=list_works&sortby=pubdate
2020-12-18

A bit of history

Same principle -different outcomes

Fluoresence Spectrometer Flow Cytometer
Molecules,Nanoparticles \(\le 500nm\) Cells (\(\mu m\))
No statistical Information Population behavior
Single molecule data unavailable Single cell data

Block diagrams Fluoresence

Excitation of light leads to scattering and emission

Block Diagram Scattering

Two major scattering classes

Block Digram for a flow cytometer

The power of flow cytometry is illustrated in this diagram

The cytometric platform (cyto\(\equiv\) cell) is a combination of

Flow cytometry basic algorithmic issues

Commercial software and Data

Flow cytometers can now measure dozens of parameters. Typically a given experiment of a flow cytometer will generate about 10000 rows of 8 to 24 columns (channels) of data. Software packages - FlowJo, - FCS Express

Flow cytometry as a treasure hunt for biological big data

Data repository

https://flowrepository.org/public_experiment_representations?top=10

https://www.cytobank.org/

Data mining in Flow Cytometry

R command to call a FCS file

fname='./Compensation Controls/Patient_17_Unstained.fcs'
ff <- flowCore::read.FCS(fname)

The S4 Classes

Converting an S4 object to a R data frame

library(flowCore)
fname='./Compensation Controls/Patient_17_Unstained.fcs'
ff <- flowCore::read.FCS(fname)
typeof(ff)
[1] "S4"
# Converting S4 to dataframe 
df=data.frame(ff@exprs[,1:6])
# typical structure of the data frame 
summary(df)
     FSC.A            FSC.H            FSC.W       
 Min.   :  2909   Min.   :  5009   Min.   : 36588  
 1st Qu.: 21116   1st Qu.: 19694   1st Qu.: 66662  
 Median : 46014   Median : 43265   Median : 71521  
 Mean   : 72403   Mean   : 64801   Mean   : 72133  
 3rd Qu.:128617   3rd Qu.:117858   3rd Qu.: 76619  
 Max.   :262143   Max.   :257923   Max.   :180780  
     SSC.A              SSC.H            SSC.W       
 Min.   :   569.5   Min.   :   600   Min.   : 42355  
 1st Qu.:  7990.4   1st Qu.:  7897   1st Qu.: 65152  
 Median : 19952.3   Median : 19438   Median : 66967  
 Mean   : 40328.2   Mean   : 37823   Mean   : 67712  
 3rd Qu.: 52786.3   3rd Qu.: 50038   3rd Qu.: 69002  
 Max.   :262143.0   Max.   :257327   Max.   :198544  
attach(df)
plot(FSC.H,SSC.H,pch=16,cex=0.2)

Flow Cytometry of PBMC

Targets in Flow analysis

Target Applications - Cell Cycle Analysis - Immunophenotyping - Cell counting and sorting - Biomarker detection In the above vase we have used unstained samples from PBMC

A typical unlabelled flow cytometric phase space - FSC.H-FSC.H

plot(FSC.H,SSC.H,pch=16,cex=0.2)

Cluster = Function_of(CellType)

The above scatter plot shows a typical scattering cluster for unlabelled (no fluoresence label) PBMC suspension.A major computational challenge is to identify the cell types and associate them with a flow cytometric cluster.

Automated gating Flow Clust

# Using the flowClust code to remove outliers
library(flowClust)
fname='./Compensation Controls/Patient_17_Unstained.fcs'
ff <- flowCore::read.FCS(fname)
df=data.frame(ff@exprs[,1:6])
res1 <- flowClust(df,
varNames=c("FSC.H", "SSC.H"), K=1) 
df2=Subset(df,res1)
res2<-flowClust(df2, varNames=c("FSC.H", "SSC.H"), K=1:6, B=100)
criterion(res2,"BIC")
[1] -1392214 -1378289 -1361787 -1354436 -1353362 -1350507
# There will be 6 clusters 
# 6 clusters 
plot(res2[[6]], data=df2, level=0.8, z.cutoff=0)

Rule of identifying outliers: 80% quantile
# Now suppose we want to identify a particular cluster
# Let us identify the clusters 
ik1=which(res2[[6]]@label==1)
ik2=which(res2[[6]]@label==2)
ik3=which(res2[[6]]@label==3)
ik4=which(res2[[6]]@label==4)
ik5=which(res2[[6]]@label==5)
ik6=which(res2[[6]]@label==6)
attach(df2)

Gating out a single cluster out

Now suppose we want to gate one and see the other 5 clusters

plot(FSC.H,SSC.H,pch=16,cex=0.3,col="red")

points(FSC.H[ik4],SSC.H[ik4],pch=16,cex=0.3)

Gating the cell debris out

plot(FSC.H[-ik4],SSC.H[-ik4],pch=16,cex=0.3)

Flow peaks

FlowPeaks is a fast unsupervised clustering for flow cytometry data via K-means and density peak finding, 2012, Bioinformatics 8(15):2052-8

library(flowPeaks)
fp<-flowPeaks(asinh(ff@exprs[,c(1,2)]))
plot(fp) #an alternative of using summary(fp) 

The prescribed manual clustering

Origin of Doublets

A non-rocket-science approach

What has been seen in case of the elimination of doublets the conventional clustering method for gating fails , as gating in this case has something to do with whether the cells are present in singlets or doublets. The common sense knowledge that is used in this case is that points with higher area-height ratio are more likely to be doublets. This is true for both the forward and side scattering.

A simple combined algo

Effect of doublet filtering on the FSC-SSC clusters

Histogram gating (from easyGgpplot2)

library(easyGgplot2)
# Let us construct data using three patients 
fname='./Compensation Controls/Patient_17_Unstained.fcs'
ff1 <- flowCore::read.FCS(fname)
df=data.frame(ff1@exprs[,1:6])
attach(df) 
ggplot2.histogram(FSC.H,xlab="Forward Scattering",fill="white", color="black",addDensityCurve=TRUE,addMeanLine=TRUE, densityFill='#FF6666')

Compensation

An example

In this hypothetical experiment, cells were stained with FITC CD3 and with a PE isotype control, and collected at different compensation values to correct for the FITC spillover into the PE channel (panel 1 is uncompensated; 2-5 are for increasing compensation values). ## The questions - Which panel represents proper compensation? - On what basis did you make this determination?

Twocolor compensation

http://www.drmr.com/compensation/ Let’s consider an experiment where we stain human peripheral blood lymphocytes with - FITC CD3 \(\rightarrow ^{Stains}\) CD4 and CD8 T cells) - PE CD8 \(\rightarrow ^{Stains}\) CD8 T cells and, less brightly, NK cells.

Four populations

Qunatitative Measures

Measured fluorescence for each cell

Mathematical formulea for 2 color compensation

\[ FL1_{measured}=FITC_{true}+0.01\cdot PE_{true}\\ FL2_{measured}=0.15\cdot FITC_{true} +PE_{true} \] ## Finding the true value of fluoresence Let’s apply these equations to our measured values for CD8 T cells (FL1 = 104, FL2 = 415). We may find PE true and FITC true from:

   1*x1 + 0.15*x2  =  104 
0.01*x1    + 1*x2  =  415 
          [,1]
[1,]  41.81272
[2,] 414.58187

Multicolor compensation

Multi-color compensation is a simple extension of two-color compensation:

M(1) = A(11) x F(1) + A(21) x F(2) + … A(n1) x F(n) M(2) = A(12) x F(1) + A(22) x F(2) + … A(n2) x F(n) … M(n) = A(1n) x F(1) + A(2n) x F(2) + … A(nn) x F(n)

where M(i) is the measured fluorescence in channel i; F(i) is the amount of fluorescent molecule (i) present on the cell of interest, and A(ij) is the ratio of the fluorescence of molecule (i) in channel (i) to the fluorescence of molecule (i) in channel (j).

–> # Immuno Phenotyping A good link to start with https://www.abcam.com/protocols/flow-cytometry-immunophenotyping

Basic Idea

FLOW CYTOMETRIC ANALYSIS OF LEUKEMIA - A case study

The flow cytometric charecterization is more qunatitative than the microscopic detection.

The minimal approach

# Immunophenotyping the leukomia cells Blood cells are immunophenotypically coated with fluorescent antibodies at the time of the sample preparation, and then the sample tube is placed in the flow cytometry device.

Blast cells

The classic BLMG gates

Leukomia File CMC

      Time            FSC.A            FSC.H       
 Min.   :  53.6   Min.   : 10144   Min.   : 10024  
 1st Qu.: 565.5   1st Qu.: 60168   1st Qu.: 43537  
 Median :1085.2   Median : 79984   Median : 57508  
 Mean   :1083.5   Mean   : 93186   Mean   : 61014  
 3rd Qu.:1595.3   3rd Qu.:109668   3rd Qu.: 73634  
 Max.   :2111.9   Max.   :262143   Max.   :258709  
     SSC.A              SSC.H            FITC.A         
 Min.   :   155.4   Min.   :   156   Min.   :   -51.45  
 1st Qu.:  1407.0   1st Qu.:  1101   1st Qu.:   138.60  
 Median :  1939.3   Median :  1443   Median :   213.15  
 Mean   :  3243.5   Mean   :  1914   Mean   :   722.33  
 3rd Qu.:  2998.8   3rd Qu.:  1943   3rd Qu.:   348.60  
 Max.   :262143.0   Max.   :224332   Max.   :262143.00  
      PE.A          PerCP.Cy5.5.A         PE.Cy7.A       
 Min.   :   -81.9   Min.   :  -105.0   Min.   :   -90.3  
 1st Qu.:   444.1   1st Qu.:   536.5   1st Qu.:  1098.3  
 Median :   828.5   Median :  1029.0   Median :  2794.6  
 Mean   :  1237.3   Mean   :  1814.5   Mean   :  4506.2  
 3rd Qu.:  1395.5   3rd Qu.:  1857.5   3rd Qu.:  6019.6  
 Max.   :239286.6   Max.   :262143.0   Max.   :262143.0  
     APC.A              APC.H7.A            V450.A        
 Min.   :   -89.28   Min.   :  -224.1   Min.   :  -439.3  
 1st Qu.:    70.68   1st Qu.:   306.0   1st Qu.:   213.9  
 Median :   169.26   Median :   580.3   Median :   378.4  
 Mean   :  1781.22   Mean   :  1516.5   Mean   :  1291.5  
 3rd Qu.:   778.41   3rd Qu.:  1208.1   3rd Qu.:   670.5  
 Max.   :262143.00   Max.   :262143.0   Max.   :262143.0  
    V500c.A        
 Min.   :  -284.1  
 1st Qu.:   159.8  
 Median :   316.2  
 Mean   :  1187.2  
 3rd Qu.:   621.0  
 Max.   :262143.0  

CD45(V500c.A) CD 19(PE.Cy7.A) and CD 10 (APC.A)

[1] -988502.2 -986779.2 -986474.7 -986135.8

Rule of identifying outliers: 50% quantile
[1] -1141117 -1123068 -1119284 -1117204

Rule of identifying outliers: 50% quantile
      Time            FSC.A            FSC.H       
 Min.   :  53.6   Min.   : 10144   Min.   : 10024  
 1st Qu.: 565.5   1st Qu.: 60168   1st Qu.: 43537  
 Median :1085.2   Median : 79984   Median : 57508  
 Mean   :1083.5   Mean   : 93186   Mean   : 61014  
 3rd Qu.:1595.3   3rd Qu.:109668   3rd Qu.: 73634  
 Max.   :2111.9   Max.   :262143   Max.   :258709  
     SSC.A              SSC.H            FITC.A         
 Min.   :   155.4   Min.   :   156   Min.   :   -51.45  
 1st Qu.:  1407.0   1st Qu.:  1101   1st Qu.:   138.60  
 Median :  1939.3   Median :  1443   Median :   213.15  
 Mean   :  3243.5   Mean   :  1914   Mean   :   722.33  
 3rd Qu.:  2998.8   3rd Qu.:  1943   3rd Qu.:   348.60  
 Max.   :262143.0   Max.   :224332   Max.   :262143.00  
      PE.A          PerCP.Cy5.5.A         PE.Cy7.A       
 Min.   :   -81.9   Min.   :  -105.0   Min.   :   -90.3  
 1st Qu.:   444.1   1st Qu.:   536.5   1st Qu.:  1098.3  
 Median :   828.5   Median :  1029.0   Median :  2794.6  
 Mean   :  1237.3   Mean   :  1814.5   Mean   :  4506.2  
 3rd Qu.:  1395.5   3rd Qu.:  1857.5   3rd Qu.:  6019.6  
 Max.   :239286.6   Max.   :262143.0   Max.   :262143.0  
     APC.A              APC.H7.A            V450.A        
 Min.   :   -89.28   Min.   :  -224.1   Min.   :  -439.3  
 1st Qu.:    70.68   1st Qu.:   306.0   1st Qu.:   213.9  
 Median :   169.26   Median :   580.3   Median :   378.4  
 Mean   :  1781.22   Mean   :  1516.5   Mean   :  1291.5  
 3rd Qu.:   778.41   3rd Qu.:  1208.1   3rd Qu.:   670.5  
 Max.   :262143.00   Max.   :262143.0   Max.   :262143.0  
    V500c.A        
 Min.   :  -284.1  
 1st Qu.:   159.8  
 Median :   316.2  
 Mean   :  1187.2  
 3rd Qu.:   621.0  
 Max.   :262143.0  

Flow peaks

FlowPeaks is a fast unsupervised clustering for flow cytometry data via K-means and density peak finding, 2012, Bioinformatics 8(15):2052-8

        step 0, set the intial seeds, tot.wss=1040.56
        step 1, do the rough EM, tot.wss=712.991 at 0.281295 sec
        step 2, do the fine transfer of Hartigan-Wong Algorithm
                 tot.wss=703.557 at 0.495917 sec

** The Beauty of SOM**

LAsTlY