A bit of history

In the year 1954, Wallace Coulter first proposed to devise an instrument, which can electronically measure cells in a conductive liquid. This form the basis of a flow cytometer.
The first fluorescence-based cytometric device (ICP 11) was developed in 1968 by Wolfgang Göhde from the University of Münster, Germany.
Göhde’s major contribution to the field of flow cytometry was the creation of the flow chamber, which remains in use today.
The original name - pulse cytophotometry - 1978 name changed to flow cytometry, at
FACS instrument from Becton Dickinson (1974) emerged
Epics from Coulter came in the market (1977/78) .
Future of Cytof (mass cytometry).

The basic of basics

Fluoresence Spectrometer	Flow Cytometer
Molecules,Nanoparticles \(\le 500nm\)	Cells (\(\mu m\))
No statistical Information	Population behavior
Single molecule data unavailable	Single cell data

Block diagrams Fluoresence

Excitation of light leads to scattering and emission

Block Diagram Scattering

Block Digram for a flow cytometer

The integration of cytometric platform with data science

Instrument Design Fluidics,Optics,Electronics
Synthesis of Fluorescent probes,Qdots
Gaussian mixture models
Techniques like
- t-SNE(T-stochastic neighbor embedding)
- Machine learning (supervised & K-free models)
- SPADE Spanning-tree progression analysis of density-normalized events)
- PCA (Principal component analysis)
- FLOCK (FLOw clustering without K).

Commercial software and Data

Software packages - FlowJo, - FCS Express

Public Data Base

Data repository

https://flowrepository.org/public_experiment_representations?top=10

https://www.cytobank.org/

Data mining in Flow Cytometry

R platform has been pionner in the computational flow cytometry
Most accepted form of data format in a flow platform is a .FCS.
In R the .FCS files are read as S4 objects Here is a simple R command to call a flow cytometric file

R command to call a FCS file

library(flowCore)
fname='./Compensation Controls/Patient_17_Unstained.fcs'
ff <- read.FCS(fname)
typeof(ff)

[1] "S4"

What is S4 ?

The object that is returned above is of S4 type
An S4 class definition is quite a bit different from a class definition in other programming languages such as C# and Python.
An S4 class encapsulates data fields inside a setClass function, but doesn’t encapsulate class methods.

Instead, S4 class methods are defined by pairs of special R functions named setMethod and setGeneric.
https://advanced-r-solutions.rbind.io/s4.html

S4 2 DataFrames

 [1] "FSC.A"             "FSC.H"             "FSC.W"            
 [4] "SSC.A"             "SSC.H"             "SSC.W"            
 [7] "Alexa.Fluor.488.A" "PE.A"              "APC.A"            
[10] "PerCP.A"           "APC.Cy7.A"         "Time"

Two important computaional problems specific to Flow Cytometry

Gating
Compensation

A typical Flow plot for unlabelled samples

Typical target objectives in Flow analysis

Immuno phenotyping
Cell Cycle Analysis
Immunophenotyping
Cell counting and sorting
Biomarker detection
Chemotherapy Drug assessment
Stress (ROS) analysis

Typical analyte PBMC

Automated gating Flow Clust (Box-Cox)

# Using the flowClust code to remove outliers
library(flowClust)
fname='./Compensation Controls/Patient_17_Unstained.fcs'
ff <- flowCore::read.FCS(fname)
df=data.frame(ff@exprs[,1:6])
res1 <- flowClust(df,
varNames=c("FSC.H", "SSC.H"), K=1) 
df2=Subset(df,res1)
res2<-flowClust(df2, varNames=c("FSC.H", "SSC.H"), K=1:6, B=100)
criterion(res2,"BIC")

[1] -1392214 -1378290 -1361787 -1354436 -1353362 -1350507

# There will be 6 clusters 
# 6 clusters 
plot(res2[[6]], data=df2, level=0.8, z.cutoff=0)

Rule of identifying outliers: 80% quantile

# Now suppose we want to identify a particular cluster
# Let us identify the clusters 
ik1=which(res2[[6]]@label==1)
ik2=which(res2[[6]]@label==2)
ik3=which(res2[[6]]@label==3)
ik4=which(res2[[6]]@label==4)
ik5=which(res2[[6]]@label==5)
ik6=which(res2[[6]]@label==6)
attach(df2)

Gating out a single cluster out

Now suppose we want to gate one and see the other 5 clusters

plot(FSC.H,SSC.H,pch=16,cex=0.3,col="red")

points(FSC.H[ik4],SSC.H[ik4],pch=16,cex=0.3)

Gating the cell debris out

plot(FSC.H[-ik4],SSC.H[-ik4],pch=16,cex=0.3)

Flow peaks

FlowPeaks is a fast unsupervised clustering for flow cytometry data via K-means and density peak finding, 2012, Bioinformatics 8(15):2052-8

library(flowPeaks)
fp<-flowPeaks(asinh(ff@exprs[,c(1,2)]))
plot(fp) #an alternative of using summary(fp)

The prescribed manual clustering

Origin of Doublets

A doublet is a single event that actually consists of 2 independent particles.
The cytometer classified these particles as a single event because they passed through the interrogation point very close to one another.
In other words, the particles were so close together when they passed through this laser spot, that the instrument was incapable of distinguishing them as individual events or particles.

A non-rocket-science approach

What has been seen in case of the elimination of doublets the conventional clustering method for gating fails , as gating in this case has something to do with whether the cells are present in singlets or doublets. The common sense knowledge that is used in this case is that points with higher area-height ratio are more likely to be doublets. This is true for both the forward and side scattering.

A simple combined algo

X=log(FSC.A) and Y=log(FSC.H)
X1=log(SSC.A) and Y1=log(SSC.H)
Find slopes & intercepts (m,c) and (m1,c1) of Y~X and Y1~X1
Find ik for which (Y>mX+c) & (Y1>m1X1+C) are simultaneously satisfied.

Effect of doublet filtering on the FSC-SSC clusters

Compensation

All fluorochromes have excitation and emission spectra. The excitation spectrum is a range of light wavelengths that add energy to a fluorochrome, causing it to emit light in another range of wavelengths, the emission spectrum.
Within a flow cytometer, the appropriate ranges of excitation and emission wavelengths are selected by bandpass filters.
However, when emission spectra overlap, fluorescence from more than one fluorochrome may be detected.
To correct for this spectral overlap, a process of fluorescence compensation is used. This ensures that the fluorescence detected in a particular detector derives from the fluorochrome that is being measured.
In the example below, following excitation with 488 nm light, PE emission is largely detected in the detector specific for PE but the emission tail lies within the range of the bandpass filter used for detection of PE-Cy5.
This will be seen as “false positive” signals in the PE-Cy5 channel and fluorescence compensation is needed to correct for this overlap.

# An example In this hypothetical experiment, cells were stained with FITC CD3 and with a PE isotype control, and collected at different compensation values to correct for the FITC spillover into the PE channel (panel 1 is uncompensated; 2-5 are for increasing compensation values). ## The questions - Which panel represents proper compensation? - On what basis did you make this determination?

Twocolor compensation

http://www.drmr.com/compensation/ Let’s consider an experiment where we stain human peripheral blood lymphocytes with - FITC CD3 \(\rightarrow ^{Stains}\) CD4 and CD8 T cells) - PE CD8 \(\rightarrow ^{Stains}\) CD8 T cells and, less brightly, NK cells.

Four populations

We should have four major populations of cells: -B cells, which will be unstained -CD4 T cells, which will have only fluorescein (CD3) fluorescence
- CD8 T cells, which will have both fluorescein and PE fluorescence
- NK cells, which will only have PE fluorescence (they do not express CD3).

Qunatitative Measures

Assume CD3 fluorescence (FITC) on T cells gives 100 units of true fluorescence (i.e., the amount of fluorescence in FL1 when no compensation is set and no PE had been included in the stain);
CD8 fluorescence on CD8 T cells is 400 units.
CD8 on NK is 50 units.

Measured fluorescence for each cell

Let us assume that the spillover constants have been measured at 15% and 1% (from the compensation samples)?
For B cells, it will be zero in both FL1 and FL2 (remember, we are ignoring autofluorescence for the time being).
CD4 T cells will have 100 in FL1, and 15 in FL2 (i.e., 15% of the 100 units of fluorescein signal will appear in FL2).
NK cells will have 50 units in FL2, and 0.5 units in FL1 (i.e., 1% of the 10 units of PE signal).
CD8 T cells will have 104 units in FL1 (100 units of fluorescein CD3 signal plus 1% of the 400 units of PE CD8 signal), and 415 units FL2 (400 units of PE signal plus 15% of the 100 units of fluorescein signal).

Mathematical formulea for 2 color compensation

\[ FL1_{measured}=FITC_{true}+0.01\cdot PE_{true}\\ FL2_{measured}=0.15\cdot FITC_{true} +PE_{true} \] ## Finding the true value of fluoresence Let’s apply these equations to our measured values for CD8 T cells (FL1 = 104, FL2 = 415). We may find PE true and FITC true from:

Multicolor compensation

Multi-color compensation is a simple extension of two-color compensation:

M(1) = A(11) x F(1) + A(21) x F(2) + … A(n1) x F(n) M(2) = A(12) x F(1) + A(22) x F(2) + … A(n2) x F(n) … M(n) = A(1n) x F(1) + A(2n) x F(2) + … A(nn) x F(n)

where M(i) is the measured fluorescence in channel i; F(i) is the amount of fluorescent molecule (i) present on the cell of interest, and A(ij) is the ratio of the fluorescence of molecule (i) in channel (i) to the fluorescence of molecule (i) in channel (j).

Immuno Phenotyping

A good link to start with https://www.abcam.com/protocols/flow-cytometry-immunophenotyping

Basic Idea

Antibodies are used to identify cells by detecting specific antigens (markers) expressed by these cells
These markers are usually functional membrane proteins involved in cell communication, adhesion, or metabolism.
Immunophenotyping using flow cytometry has become the method of choice in identifying and sorting cells within complex populations, for example the analysis of immune cells in a blood sample.

FLOW CYTOMETRIC ANALYSIS OF LEUKEMIA - A case study

How defined is leukomia

Leukemia is a group of blood cancers that usually begin in the bone marrow and result in high numbers of abnormal blood cells.They are considered tumors of the hematopoietic and lymphoid tissues
Symptoms may include bleeding and bruising, fatigue, fever, and an increased risk of infections
Diagnosis is typically made by blood tests or bone marrow biopsy. and from flow cytometry
Four main types of leukemia —acute lymphoblastic leukemia (ALL),
- acute myeloid leukemia (AML),
- chronic lymphocytic leukemia (CLL)
- and chronic myeloid leukemia (CML).[10][11]

Diagnostic method

Blood cells are immunophenotypically coated with fluorescent antibodies at the time of the sample preparation, and then the sample tube is placed in the flow cytometry device.

Microscopy based assessment

Flow based guess with unlabelled samples

# Algoritmic problem - Identifying the clusters with cell type - Choosing between the manual and computaional method (doctors like the first!)

Gating the Blast

Blood cells are not fully developed and are called blasts. Gating out the blasts is thus nontrivial.

Probes

Construct SSC-H vs CD45 to gate the blast population.
Next CD19 (CD19 is a B cell–specific antigen expressed on chronic lymphocytic leukemia (CLL) cells)vs CD10 (The human CD10 antigen is present in common acute lymphoblastic leukemia as a cancer specific antigen ) scatterplot.
The propensity and type of leukomia and efficacy of chemotherapy with time can be determined.

The classic BLMG gates

The antibodies

A real time Leukomia File

      Time            FSC.A            FSC.H       
 Min.   :  53.6   Min.   : 10144   Min.   : 10024  
 1st Qu.: 565.5   1st Qu.: 60168   1st Qu.: 43537  
 Median :1085.2   Median : 79984   Median : 57508  
 Mean   :1083.5   Mean   : 93186   Mean   : 61014  
 3rd Qu.:1595.3   3rd Qu.:109668   3rd Qu.: 73634  
 Max.   :2111.9   Max.   :262143   Max.   :258709  
     SSC.A              SSC.H            FITC.A         
 Min.   :   155.4   Min.   :   156   Min.   :   -51.45  
 1st Qu.:  1407.0   1st Qu.:  1101   1st Qu.:   138.60  
 Median :  1939.3   Median :  1443   Median :   213.15  
 Mean   :  3243.5   Mean   :  1914   Mean   :   722.33  
 3rd Qu.:  2998.8   3rd Qu.:  1943   3rd Qu.:   348.60  
 Max.   :262143.0   Max.   :224332   Max.   :262143.00  
      PE.A          PerCP.Cy5.5.A         PE.Cy7.A       
 Min.   :   -81.9   Min.   :  -105.0   Min.   :   -90.3  
 1st Qu.:   444.1   1st Qu.:   536.5   1st Qu.:  1098.3  
 Median :   828.5   Median :  1029.0   Median :  2794.6  
 Mean   :  1237.3   Mean   :  1814.5   Mean   :  4506.2  
 3rd Qu.:  1395.5   3rd Qu.:  1857.5   3rd Qu.:  6019.6  
 Max.   :239286.6   Max.   :262143.0   Max.   :262143.0  
     APC.A              APC.H7.A            V450.A        
 Min.   :   -89.28   Min.   :  -224.1   Min.   :  -439.3  
 1st Qu.:    70.68   1st Qu.:   306.0   1st Qu.:   213.9  
 Median :   169.26   Median :   580.3   Median :   378.4  
 Mean   :  1781.22   Mean   :  1516.5   Mean   :  1291.5  
 3rd Qu.:   778.41   3rd Qu.:  1208.1   3rd Qu.:   670.5  
 Max.   :262143.00   Max.   :262143.0   Max.   :262143.0  
    V500c.A        
 Min.   :  -284.1  
 1st Qu.:   159.8  
 Median :   316.2  
 Mean   :  1187.2  
 3rd Qu.:   621.0  
 Max.   :262143.0

CD45(V500c.A) CD 19(PE.Cy7.A) and CD 10 (APC.A)

attach(dfL)
X=log(FSC.A)
Y=log(FSC.H)
X1=log(SSC.A)
Y1=log(SSC.H)
g=lm(Y~X)
g1=lm(Y1~X1)
ik=which( Y >g$coefficients[2]*X+g$coefficients[1])
par(mfrow=c(2,2))
plot(X,Y,pch=16,cex=0.2,col="blue",xlab='log(AREA)',ylab='log(HEIGHT)',main="FWD")
points(X[ik],Y[ik],pch=16,cex=0.2,col="brown")
fsc=FSC.H[ik]
ssc=SSC.H[ik]
fcd45=V500c.A[ik]
fcd19=PE.Cy7.A[ik]
fcd10=APC.A[ik]

[1] -988502.2 -986779.2 -986474.7 -986135.8

Rule of identifying outliers: 50% quantile

res1 <- flowClust(dfL,
varNames=c("APC.A","PE.Cy7.A"), K=1) 
df2=Subset(dfL,res1)
res2<-flowClust(df2, varNames=c("APC.A","PE.Cy7.A"), K=1:4, B=100)
criterion(res2,"BIC")

[1] -1141117 -1123068 -1119284 -1117204

# There will be 6 clusters 
# 6 clusters 
plot(res2[[4]], data=df2, level=0.5, z.cutoff=0)

Rule of identifying outliers: 50% quantile

      Time            FSC.A            FSC.H       
 Min.   :  53.6   Min.   : 10144   Min.   : 10024  
 1st Qu.: 565.5   1st Qu.: 60168   1st Qu.: 43537  
 Median :1085.2   Median : 79984   Median : 57508  
 Mean   :1083.5   Mean   : 93186   Mean   : 61014  
 3rd Qu.:1595.3   3rd Qu.:109668   3rd Qu.: 73634  
 Max.   :2111.9   Max.   :262143   Max.   :258709  
     SSC.A              SSC.H            FITC.A         
 Min.   :   155.4   Min.   :   156   Min.   :   -51.45  
 1st Qu.:  1407.0   1st Qu.:  1101   1st Qu.:   138.60  
 Median :  1939.3   Median :  1443   Median :   213.15  
 Mean   :  3243.5   Mean   :  1914   Mean   :   722.33  
 3rd Qu.:  2998.8   3rd Qu.:  1943   3rd Qu.:   348.60  
 Max.   :262143.0   Max.   :224332   Max.   :262143.00  
      PE.A          PerCP.Cy5.5.A         PE.Cy7.A       
 Min.   :   -81.9   Min.   :  -105.0   Min.   :   -90.3  
 1st Qu.:   444.1   1st Qu.:   536.5   1st Qu.:  1098.3  
 Median :   828.5   Median :  1029.0   Median :  2794.6  
 Mean   :  1237.3   Mean   :  1814.5   Mean   :  4506.2  
 3rd Qu.:  1395.5   3rd Qu.:  1857.5   3rd Qu.:  6019.6  
 Max.   :239286.6   Max.   :262143.0   Max.   :262143.0  
     APC.A              APC.H7.A            V450.A        
 Min.   :   -89.28   Min.   :  -224.1   Min.   :  -439.3  
 1st Qu.:    70.68   1st Qu.:   306.0   1st Qu.:   213.9  
 Median :   169.26   Median :   580.3   Median :   378.4  
 Mean   :  1781.22   Mean   :  1516.5   Mean   :  1291.5  
 3rd Qu.:   778.41   3rd Qu.:  1208.1   3rd Qu.:   670.5  
 Max.   :262143.00   Max.   :262143.0   Max.   :262143.0  
    V500c.A        
 Min.   :  -284.1  
 1st Qu.:   159.8  
 Median :   316.2  
 Mean   :  1187.2  
 3rd Qu.:   621.0  
 Max.   :262143.0

Flow peaks

FlowPeaks is a fast unsupervised clustering for flow cytometry data via K-means and density peak finding, 2012, Bioinformatics 8(15):2052-8

library(flowPeaks)
fp<-flowPeaks(asinh(ffL@exprs[,c(13,4)]))

        step 0, set the intial seeds, tot.wss=1040.56
        step 1, do the rough EM, tot.wss=712.991 at 0.269617 sec
        step 2, do the fine transfer of Hartigan-Wong Algorithm
                 tot.wss=703.557 at 0.478351 sec

#The Beauty of SOM

Some other important applications

Exploring the Cell Death https://rpubs.com/ANJANKRDASGUPTA/706527
Use ofFlow cytometry in ROS detection https://rpubs.com/ANJANKRDASGUPTA/707350

https://rpubs.com/ANJANKRDASGUPTA/706017

Take home lessons

Flow cytometry serves as a magic solution in complex diagnostic problems
Poses a multitude of technological computational and data sciece challenges to solve complex biomdical or cell biology problems.
The new methodology would demand computational platform even more as the commercial flow cytometry software like flowJO may ot handle the new challenges that may emerge from newer technques like CyTof (mass cytometry, takes advantage of the low spectral overlap between heavy metal isotopes) or microfluidics based flow cytometry.
Multidimesional gating and compensation from multi-color flow cytometry are computationally chalenging probelms.

Acknowledgements

Prof. Bhaswati Ganguly(CU) - Who introduced me to R
Dr. Arnab Chattopadhyay, who introduced me to immunophenotyping
Prof. Senroy (CU) ,Dr. Jyoti Shaw & Dr. Moumita Chattapadhya (AU) for initiating some work on flow cytometry based clustering.

Applications of data science to flow cytometry

A bit of history

The basic of basics

Block diagrams Fluoresence

Block Diagram Scattering

Block Digram for a flow cytometer

The integration of cytometric platform with data science

Commercial software and Data

Data mining in Flow Cytometry

R command to call a FCS file

What is S4 ?

S4 2 DataFrames

Two important computaional problems specific to Flow Cytometry

A typical Flow plot for unlabelled samples

Typical target objectives in Flow analysis

Typical analyte PBMC

Automated gating Flow Clust (Box-Cox)

Gating out a single cluster out

Gating the cell debris out

Flow peaks

The prescribed manual clustering

Origin of Doublets

A non-rocket-science approach

A simple combined algo

Effect of doublet filtering on the FSC-SSC clusters

Compensation

Twocolor compensation

Four populations

Qunatitative Measures

Measured fluorescence for each cell

Mathematical formulea for 2 color compensation

Multicolor compensation

Immuno Phenotyping

Basic Idea

FLOW CYTOMETRIC ANALYSIS OF LEUKEMIA - A case study

How defined is leukomia

Diagnostic method

Microscopy based assessment

Flow based guess with unlabelled samples

Gating the Blast

Probes

The classic BLMG gates

The antibodies

A real time Leukomia File

CD45(V500c.A) CD 19(PE.Cy7.A) and CD 10 (APC.A)

Flow peaks

Some other important applications

Take home lessons

Acknowledgements

Lastly