Flow cytometery for doctors
Fluoresence Spectrometer | Flow Cytometer |
---|---|
Molecules,Nanoparticles \(\le 500nm\) | Cells (\(\mu m\)) |
No statistical Information | Population behavior |
Single molecule data unavailable | Single cell data |
Software packages - FlowJo, - FCS Express
Public Data Base
Data repositoryhttps://flowrepository.org/public_experiment_representations?top=10
library(flowCore)
fname='./Compensation Controls/Patient_17_Unstained.fcs'
ff <- read.FCS(fname)
typeof(ff)
[1] "S4"
The object that is returned above is of S4 type
An S4 class definition is quite a bit different from a class definition in other programming languages such as C# and Python.
An S4 class encapsulates data fields inside a setClass function, but doesn’t encapsulate class methods.
Instead, S4 class methods are defined by pairs of special R functions named setMethod and setGeneric.
[1] "FSC.A" "FSC.H" "FSC.W"
[4] "SSC.A" "SSC.H" "SSC.W"
[7] "Alexa.Fluor.488.A" "PE.A" "APC.A"
[10] "PerCP.A" "APC.Cy7.A" "Time"
# Using the flowClust code to remove outliers
library(flowClust)
fname='./Compensation Controls/Patient_17_Unstained.fcs'
ff <- flowCore::read.FCS(fname)
df=data.frame(ff@exprs[,1:6])
res1 <- flowClust(df,
varNames=c("FSC.H", "SSC.H"), K=1)
df2=Subset(df,res1)
res2<-flowClust(df2, varNames=c("FSC.H", "SSC.H"), K=1:6, B=100)
criterion(res2,"BIC")
[1] -1392214 -1378290 -1361787 -1354436 -1353362 -1350507
# There will be 6 clusters
# 6 clusters
plot(res2[[6]], data=df2, level=0.8, z.cutoff=0)
Rule of identifying outliers: 80% quantile
Now suppose we want to gate one and see the other 5 clusters
plot(FSC.H[-ik4],SSC.H[-ik4],pch=16,cex=0.3)
FlowPeaks is a fast unsupervised clustering for flow cytometry data via K-means and density peak finding, 2012, Bioinformatics 8(15):2052-8
library(flowPeaks)
fp<-flowPeaks(asinh(ff@exprs[,c(1,2)]))
plot(fp) #an alternative of using summary(fp)
A doublet is a single event that actually consists of 2 independent particles.
The cytometer classified these particles as a single event because they passed through the interrogation point very close to one another.
In other words, the particles were so close together when they passed through this laser spot, that the instrument was incapable of distinguishing them as individual events or particles.
What has been seen in case of the elimination of doublets the conventional clustering method for gating fails , as gating in this case has something to do with whether the cells are present in singlets or doublets. The common sense knowledge that is used in this case is that points with higher area-height ratio are more likely to be doublets. This is true for both the forward and side scattering.
# An example In this hypothetical experiment, cells were stained with FITC CD3 and with a PE isotype control, and collected at different compensation values to correct for the FITC spillover into the PE channel (panel 1 is uncompensated; 2-5 are for increasing compensation values).
## The questions - Which panel represents proper compensation? - On what basis did you make this determination?
http://www.drmr.com/compensation/ Let’s consider an experiment where we stain human peripheral blood lymphocytes with - FITC CD3 \(\rightarrow ^{Stains}\) CD4 and CD8 T cells) - PE CD8 \(\rightarrow ^{Stains}\) CD8 T cells and, less brightly, NK cells.
Let us assume that the spillover constants have been measured at 15% and 1% (from the compensation samples)?
For B cells, it will be zero in both FL1 and FL2 (remember, we are ignoring autofluorescence for the time being).
CD4 T cells will have 100 in FL1, and 15 in FL2 (i.e., 15% of the 100 units of fluorescein signal will appear in FL2).
NK cells will have 50 units in FL2, and 0.5 units in FL1 (i.e., 1% of the 10 units of PE signal).
CD8 T cells will have 104 units in FL1 (100 units of fluorescein CD3 signal plus 1% of the 400 units of PE CD8 signal), and 415 units FL2 (400 units of PE signal plus 15% of the 100 units of fluorescein signal).
\[ FL1_{measured}=FITC_{true}+0.01\cdot PE_{true}\\ FL2_{measured}=0.15\cdot FITC_{true} +PE_{true} \] ## Finding the true value of fluoresence Let’s apply these equations to our measured values for CD8 T cells (FL1 = 104, FL2 = 415). We may find PE true and FITC true from:
Multi-color compensation is a simple extension of two-color compensation:
M(1) = A(11) x F(1) + A(21) x F(2) + … A(n1) x F(n) M(2) = A(12) x F(1) + A(22) x F(2) + … A(n2) x F(n) … M(n) = A(1n) x F(1) + A(2n) x F(2) + … A(nn) x F(n)
where M(i) is the measured fluorescence in channel i; F(i) is the amount of fluorescent molecule (i) present on the cell of interest, and A(ij) is the ratio of the fluorescence of molecule (i) in channel (i) to the fluorescence of molecule (i) in channel (j).
A good link to start with https://www.abcam.com/protocols/flow-cytometry-immunophenotyping
# Algoritmic problem - Identifying the clusters with cell type - Choosing between the manual and computaional method (doctors like the first!)
Blood cells are not fully developed and are called blasts. Gating out the blasts is thus nontrivial.
Construct SSC-H vs CD45 to gate the blast population.
Next CD19 (CD19 is a B cell–specific antigen expressed on chronic lymphocytic leukemia (CLL) cells)vs CD10 (The human CD10 antigen is present in common acute lymphoblastic leukemia as a cancer specific antigen ) scatterplot.
The propensity and type of leukomia and efficacy of chemotherapy with time can be determined.
Time FSC.A FSC.H
Min. : 53.6 Min. : 10144 Min. : 10024
1st Qu.: 565.5 1st Qu.: 60168 1st Qu.: 43537
Median :1085.2 Median : 79984 Median : 57508
Mean :1083.5 Mean : 93186 Mean : 61014
3rd Qu.:1595.3 3rd Qu.:109668 3rd Qu.: 73634
Max. :2111.9 Max. :262143 Max. :258709
SSC.A SSC.H FITC.A
Min. : 155.4 Min. : 156 Min. : -51.45
1st Qu.: 1407.0 1st Qu.: 1101 1st Qu.: 138.60
Median : 1939.3 Median : 1443 Median : 213.15
Mean : 3243.5 Mean : 1914 Mean : 722.33
3rd Qu.: 2998.8 3rd Qu.: 1943 3rd Qu.: 348.60
Max. :262143.0 Max. :224332 Max. :262143.00
PE.A PerCP.Cy5.5.A PE.Cy7.A
Min. : -81.9 Min. : -105.0 Min. : -90.3
1st Qu.: 444.1 1st Qu.: 536.5 1st Qu.: 1098.3
Median : 828.5 Median : 1029.0 Median : 2794.6
Mean : 1237.3 Mean : 1814.5 Mean : 4506.2
3rd Qu.: 1395.5 3rd Qu.: 1857.5 3rd Qu.: 6019.6
Max. :239286.6 Max. :262143.0 Max. :262143.0
APC.A APC.H7.A V450.A
Min. : -89.28 Min. : -224.1 Min. : -439.3
1st Qu.: 70.68 1st Qu.: 306.0 1st Qu.: 213.9
Median : 169.26 Median : 580.3 Median : 378.4
Mean : 1781.22 Mean : 1516.5 Mean : 1291.5
3rd Qu.: 778.41 3rd Qu.: 1208.1 3rd Qu.: 670.5
Max. :262143.00 Max. :262143.0 Max. :262143.0
V500c.A
Min. : -284.1
1st Qu.: 159.8
Median : 316.2
Mean : 1187.2
3rd Qu.: 621.0
Max. :262143.0
attach(dfL)
X=log(FSC.A)
Y=log(FSC.H)
X1=log(SSC.A)
Y1=log(SSC.H)
g=lm(Y~X)
g1=lm(Y1~X1)
ik=which( Y >g$coefficients[2]*X+g$coefficients[1])
par(mfrow=c(2,2))
plot(X,Y,pch=16,cex=0.2,col="blue",xlab='log(AREA)',ylab='log(HEIGHT)',main="FWD")
points(X[ik],Y[ik],pch=16,cex=0.2,col="brown")
fsc=FSC.H[ik]
ssc=SSC.H[ik]
fcd45=V500c.A[ik]
fcd19=PE.Cy7.A[ik]
fcd10=APC.A[ik]
[1] -988502.2 -986779.2 -986474.7 -986135.8
Rule of identifying outliers: 50% quantile
res1 <- flowClust(dfL,
varNames=c("APC.A","PE.Cy7.A"), K=1)
df2=Subset(dfL,res1)
res2<-flowClust(df2, varNames=c("APC.A","PE.Cy7.A"), K=1:4, B=100)
criterion(res2,"BIC")
[1] -1141117 -1123068 -1119284 -1117204
# There will be 6 clusters
# 6 clusters
plot(res2[[4]], data=df2, level=0.5, z.cutoff=0)
Rule of identifying outliers: 50% quantile
Time FSC.A FSC.H
Min. : 53.6 Min. : 10144 Min. : 10024
1st Qu.: 565.5 1st Qu.: 60168 1st Qu.: 43537
Median :1085.2 Median : 79984 Median : 57508
Mean :1083.5 Mean : 93186 Mean : 61014
3rd Qu.:1595.3 3rd Qu.:109668 3rd Qu.: 73634
Max. :2111.9 Max. :262143 Max. :258709
SSC.A SSC.H FITC.A
Min. : 155.4 Min. : 156 Min. : -51.45
1st Qu.: 1407.0 1st Qu.: 1101 1st Qu.: 138.60
Median : 1939.3 Median : 1443 Median : 213.15
Mean : 3243.5 Mean : 1914 Mean : 722.33
3rd Qu.: 2998.8 3rd Qu.: 1943 3rd Qu.: 348.60
Max. :262143.0 Max. :224332 Max. :262143.00
PE.A PerCP.Cy5.5.A PE.Cy7.A
Min. : -81.9 Min. : -105.0 Min. : -90.3
1st Qu.: 444.1 1st Qu.: 536.5 1st Qu.: 1098.3
Median : 828.5 Median : 1029.0 Median : 2794.6
Mean : 1237.3 Mean : 1814.5 Mean : 4506.2
3rd Qu.: 1395.5 3rd Qu.: 1857.5 3rd Qu.: 6019.6
Max. :239286.6 Max. :262143.0 Max. :262143.0
APC.A APC.H7.A V450.A
Min. : -89.28 Min. : -224.1 Min. : -439.3
1st Qu.: 70.68 1st Qu.: 306.0 1st Qu.: 213.9
Median : 169.26 Median : 580.3 Median : 378.4
Mean : 1781.22 Mean : 1516.5 Mean : 1291.5
3rd Qu.: 778.41 3rd Qu.: 1208.1 3rd Qu.: 670.5
Max. :262143.00 Max. :262143.0 Max. :262143.0
V500c.A
Min. : -284.1
1st Qu.: 159.8
Median : 316.2
Mean : 1187.2
3rd Qu.: 621.0
Max. :262143.0
FlowPeaks is a fast unsupervised clustering for flow cytometry data via K-means and density peak finding, 2012, Bioinformatics 8(15):2052-8
step 0, set the intial seeds, tot.wss=1040.56
step 1, do the rough EM, tot.wss=712.991 at 0.269617 sec
step 2, do the fine transfer of Hartigan-Wong Algorithm
tot.wss=703.557 at 0.478351 sec
#The Beauty of SOM
https://rpubs.com/ANJANKRDASGUPTA/706017