FLOWFIT TUTORIAL

The Quah and Parish dataset

Here we will use the raw data from figure 2 of the paper: New and improved methods for measuring lymphocyte proliferation in vitro and in vivo using CFSE-like fluorescent dyes to test the flowFit library.

In figure 2 of the Papaer the authors analyze the quality of three different proliferation tracking dyes:

  1. Download the Supplementary FCS file and the annotation.txt file: https://dl.dropboxusercontent.com/u/40644496/S1.zip
  2. Put them in your current working directory.

We first load the necessary Bioconductor packages:

library(flowCore)
library(flowFit)

Then we load the raw data using the R flowCore function flowSet. We don't apply any transformation on the data, we will use a workflow object to define the transformations:

flowData <- read.flowSet(path = ".", phenoData = "annotation.txt", transformation = FALSE)
wf <- workFlow(flowData, name = "TUTORIAL LOG workflow")
trunTrans <- truncateTransform(transformationId = "truncate", a = 1)
tr <- transformList(c("<FITC-A>", "<PE-A>", "<PE-Cy5-A>", "<Alexa Fluor 405-A>", 
    "<Alexa Fluor 430-A>", "<APC-A>", "<APC-Cy7-A>"), trunTrans, transformationId = "truncate")
add(wf, tr)

logTrans <- logTransform(transformationId = "log10-transformation", logbase = 10, 
    r = 1, d = 1)
tf <- transformList(c("<FITC-A>", "<PE-A>", "<PE-Cy5-A>", "<Alexa Fluor 405-A>", 
    "<Alexa Fluor 430-A>", "<APC-A>", "<APC-Cy7-A>"), logTrans, transformationId = "log")
add(wf, tf, parent = "truncate")

mDataLog <- Data(wf[["log"]])

We have applied 2 tranforamtions on the raw data:

For convenience, we save the resulting transformed dataset in a new object called mDataLog.

Let's explore the resulting flowSet object:

mDataLog
## A flowSet with 12 experiments.
## 
## An object of class 'AnnotatedDataFrame'
##   rowNames: Fig 2a All B cell Nonstim.fcs Fig 2a All CD4 T
##     Nonstim.fcs ... Fig 2a CTV CD8 T Stim.fcs (12 total)
##   varLabels: name
##   varMetadata: labelDescription
## 
##   column names:
##   FSC-A FSC-H FSC-W SSC-A SSC-H SSC-W <FITC-A> <PE-A> <PE-Cy5-A> <Alexa Fluor 405-A> <Alexa Fluor 430-A> <APC-A> <APC-Cy7-A>

And we are mainly interested in these flowFrame objects:

With these transformations, we have prepared the dataset for the proliferation analysis. In this case, in order to reduce file size, the dataset were already gated for CD4+ cells. In custom experiments, after the transformations you will need to gate the data for CD4+ cells. See also the flowCore HowTo.

Let's start with the parentFitting function to model the parental population for the three samples.

PARENT CFSE

parent.fitting.cfse <- parentFitting(mDataLog[[2]], "<FITC-A>", logDecades = 5.417616, 
    verbose = TRUE, binning = TRUE)
plot(parent.fitting.cfse)

plot of chunk parentFittingCFSE

PARENT CTV

parent.fitting.ctv <- parentFitting(mDataLog[[2]], "<Alexa Fluor 405-A>", logDecades = 5.417616, 
    verbose = TRUE, binning = TRUE)
plot(parent.fitting.ctv)

plot of chunk parentFittingCTV

PARENT CPD

parent.fitting.cpd <- parentFitting(mDataLog[[2]], "<APC-A>", logDecades = 5.417616, 
    verbose = TRUE, binning = TRUE)
plot(parent.fitting.cpd)

plot of chunk parentFittingCPD

FITTING ON THE PROLIFERATING POPULATIONS

Now that we have a parent population model, for each stain we can apply the inferred parameters (parent peak poisiton and size) to the proliferating population.

CFSE

Let's apply the proliferation fitting model on the cell population stained with CFSE and proliferating:

fitting.cfse <- proliferationFitting(mDataLog[[5]], "<FITC-A>", parent.fitting.cfse@parentPeakPosition, 
    parent.fitting.cfse@parentPeakSize, logDecades = 5.417616, binning = TRUE)

While plotting flowFit objects, you can use the option which to specify which plot I want to see:

plot(fitting.cfse, which = 3, main = "CFSE")

plot of chunk unnamed-chunk-1

CTV

Let's apply the proliferation fitting model on the cell population stained with CTV and proliferating:

fitting.ctv <- proliferationFitting(mDataLog[[11]], "<Alexa Fluor 405-A>", parent.fitting.ctv@parentPeakPosition, 
    parent.fitting.ctv@parentPeakSize, logDecades = 5.417616, binning = TRUE)

While plotting flowFit objects, you can use the option which to specify which diagnostic plot I want to see:

plot(fitting.ctv, which = 3, main = "CTV")

plot of chunk unnamed-chunk-2

CPD

Let's apply the proliferation fitting model on the cell population stained with CPD and proliferating:

fitting.cpd.dynamic <- proliferationFitting(mDataLog[[8]], "<APC-A>", parent.fitting.cpd@parentPeakPosition, 
    parent.fitting.cpd@parentPeakSize, logDecades = 5.417616, binning = TRUE)

While plotting flowFit objects, you can use the option which to specify which diagnostic plot I want to see:

plot(fitting.cpd.dynamic, which = 2, main = "CPD dynamic model")

plot of chunk unnamed-chunk-3

You may notice that the fitting on the CPD sample have a bad quality. In fact, if we compare the CPD dataset with the CFSE dataset, we can notice that in the CPD dataset there are not discernible peaks (this is mainly due to the staining characteristics):

plot(fitting.cpd.dynamic, which = 1, main = "CPD raw data")

plot of chunk unnamed-chunk-4

To improve our fitting we can use the fixedModel: In the fixed model only the height of the peaks are modeled, while one or more of these parameters

  1. Parent Peak Position 2, Parent Peak Size
  2. Generations Distance

Will be keeped fixed (it will be used as a constant in the model).

We can improve our model starting from the observation of the model behaviour:

plot(fitting.cpd.dynamic, which = 2, main = "CPD dynamic model")

plot of chunk unnamed-chunk-5

Looking at the model (red line), you can notice that the first peak (the parent population) has been moved to the left by the model.

We can avoid this behaviour keeping fixed the Parent Peak Position:

fitting.cpd <- proliferationFitting(mDataLog[[8]], "<APC-A>", parent.fitting.cpd@parentPeakPosition, 
    parent.fitting.cpd@parentPeakSize, logDecades = 5.417616, binning = TRUE, 
    fixedModel = TRUE, fixedPars = list(M = parent.fitting.cpd@parentPeakPosition))

The resulting fitting:

plot(fitting.cpd, which = 3)

plot of chunk unnamed-chunk-7

COMPARING PROLIFERATION WITH FLOWFIT

Now that we have performed the fitting, we can explore the percentage of cells for generation estimated by the model in the 3 samples:

par(mfrow = c(2, 2))
plot(fitting.cfse, which = 4, legend = FALSE, main = "CFSE")
plot(fitting.ctv, which = 4, legend = FALSE, main = "CTV")
plot(fitting.cpd, which = 4, legend = FALSE, main = "CPD")

plot of chunk unnamed-chunk-8

We can also notice that the scale in three samples is different, let change this:

To access the generations in a flowFit object you can use the SLOT generations, that return a vector of percentages:

fitting.cfse@generations
##  [1] 11.42455 20.90417 21.95311 22.59750 19.31936 11.53001  4.06857
##  [8]  0.93412  0.06989  0.01892

Wth the function getGenerations() you can retrieve the same information in a list() format.

getGenerations(fitting.cfse)
## $generation_1
## [1] 11.42
## 
## $generation_2
## [1] 20.9
## 
## $generation_3
## [1] 21.95
## 
## $generation_4
## [1] 22.6
## 
## $generation_5
## [1] 19.32
## 
## $generation_6
## [1] 11.53
## 
## $generation_7
## [1] 4.069
## 
## $generation_8
## [1] 0.9341
## 
## $generation_9
## [1] 0.06989
## 
## $generation_10
## [1] 0.01892

To have the same scale, is sufficient to add some 0 to the CFSE vector of generations, and change the numberOfPeaks slot:

fitting.cfse@generations <- c(fitting.cfse@generations, rep(0, 6))
fitting.cfse@numberOfPeaks = 16

Replotting we can appreciate the difference:

par(mfrow = c(2, 2))
plot(fitting.cfse, which = 4, legend = FALSE, main = "CFSE")
plot(fitting.ctv, which = 4, legend = FALSE, main = "CTV")
plot(fitting.cpd, which = 4, legend = FALSE, main = "CPD")

plot of chunk unnamed-chunk-12

Now the distribution of cells per generation for the three samples is easier to compare (is in the same scale).

PROLIFERATION INDEX

The proliferation index is a measure of the proliferation in the sample: is calculated as the sum of the cells in all generations – including the parental – divided by the computed number of original parent cells theoretically present at the beginning of the experiment.

The proliferation index in the threee samples analyzed can be calculated with the proliferationIndex function:

proliferationIndex(fitting.cfse)
## [1] 3.545
proliferationIndex(fitting.ctv)
## [1] 3.517
proliferationIndex(fitting.cpd)
## [1] 3.285