require(tidyverse)
Loading required package: tidyverse
Registered S3 methods overwritten by 'dbplyr':
method from
print.tbl_lazy
print.tbl_sql
── Attaching packages ──────────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.3.1 ──
✓ ggplot2 3.3.5 ✓ purrr 0.3.4
✓ tibble 3.1.3 ✓ dplyr 1.0.7
✓ tidyr 1.1.3 ✓ stringr 1.4.0
✓ readr 2.0.0 ✓ forcats 0.5.1
── Conflicts ─────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
x dplyr::filter() masks stats::filter()
x dplyr::lag() masks stats::lag()
require(flowCore)
Loading required package: flowCore
require(flowClust)
Loading required package: flowClust
Attaching package: ‘flowClust’
The following object is masked from ‘package:graphics’:
box
The following object is masked from ‘package:base’:
Map
require(open)
Loading required package: open
Warning in library(package, lib.loc = lib.loc, character.only = TRUE, logical.return = TRUE, :
there is no package called ‘open’
require(ggcyto)
Loading required package: ggcyto
Loading required package: ncdfFlow
Loading required package: RcppArmadillo
Loading required package: BH
Loading required package: flowWorkspace
As part of improvements to flowWorkspace, some behavior of
GatingSet objects has changed. For details, please read the section
titled "The cytoframe and cytoset classes" in the package vignette:
vignette("flowWorkspace-Introduction", "flowWorkspace")
require(gridExtra)
Loading required package: gridExtra
Attaching package: ‘gridExtra’
The following object is masked from ‘package:dplyr’:
combine
Follow Metzger et al. 2015 (PMID: 25778704) to learn how to analyze flow cytometry data in R
To do so, I have the following tasks:
file.name <- system.file("extdata","0877408774.B08", package="flowCore")
x <- read.FCS(file.name, transformation=FALSE)
summary(x)
keyword(x, paste0("$P",1:4,"E"))
summary(read.FCS(file.name))
fcs.dir <- system.file("extdata",
"compdata",
"data",
package="flowCore")
frames <- lapply(dir(fcs.dir, full.names=TRUE), read.FCS)
fs <- as(frames, "flowSet")
fs
sampleNames(fs)
names(frames) <- sapply(frames, keyword, "SAMPLE ID")
fs <- as(frames, "flowSet")
fs
sampleNames(fs)
phenoData(fs)$Filename <- fsApply(fs, keyword, "$FIL")
pData(phenoData(fs))
fs <- read.flowSet(path = fcs.dir)
x <- read.FCS(file.name, column.pattern = "-H")
ggcyto(x, aes(x = `FL1-H`, `FL2-H`)) + geom_hex(bins = 128)
autoplot(x, "FL1-H")
data.path = "data/FCS-files/20210402-LS-JZ-chimeric-Pho4/"
fs <- read.flowSet(path = data.path, transformation = FALSE, # the original values are already linearized.
emptyValue = FALSE, alter.names = TRUE, # change parameter names to R format
column.pattern = ".H|FSC|SSC") # only load the height variables for the fluorescent parameters
Metadata
sample <- read_csv("data/sample-list/20210402-LS-JZ-chimeric-Pho4-sample.csv")
Rows: 29 Columns: 7
── Column specification ─────────────────────────────────────────────────────────────────────────────────────────────────────────────
Delimiter: ","
chr (6): Sample, Group, Pho4, Tag, Content, PHO5RFP
lgl (1): Pho2
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
df <- data.frame(name = sampleNames(fs)) %>% mutate(strain = str_split(name, "-", simplify = TRUE)[,1]) %>% left_join(sample, by = c("strain" = "Sample"))
rownames(df) <- df$name
metaData <- data.frame(labelDescription = c("filename", "sample name", "experimenter", "Pho4 source", "Fluorescent protein tag", "Pho4 chimera genotype", "Pho2 present", "PHO5pr reporter location"))
pData(fs) <- df
varMetadata(fs) <- metaData
The code below demonstrates how to subset a flowSet, how to apply logicle (or other) transformations in ggcyto() (not on the original dataset)
p1 <- autoplot(fs[sampleNames(fs) %in% c("yH156-1.fcs","177w-1.fcs")], "BL1.H") + scale_x_logicle() + facet_wrap(~name, nrow = 2)
p2 <- autoplot(fs[sampleNames(fs) %in% c("yH156-1.fcs","177w-1.fcs")], "YL2.H") + scale_x_logicle() + facet_wrap(~name, nrow = 2)
grid.arrange(as.ggplot(p1), as.ggplot(p2), ncol = 2)
Next we use a series of plots to guide our gating strategy for identifying the population we want to work with. ### Remove outliers We first gate on FSC.H and SSC.H to remove outliers (events that are too big or too small). The Attune instrument we use can record six decades (100-106), with the first two decades mostly occupied by electronic noise. Here we will use a higher cutoff for the lower bound for FSC.H primarily because of at least one sample showed an abnormal pattern (179w-3). Note that this bound likely needs to be optimized if we decide to change the voltage on the FSC channel latter.
Let’s first define a gate and visualize it in a plot before adding it to a GatingSet.
outlier.gate <- rectangleGate(filterId = "-outlier", "FSC.H" = c(1e5, 1e6), "SSC.H" = c(1e2, 1e6))
ggcyto(fs[sampleNames(fs) %in% c("yH156-1.fcs", "177w-1.fcs", "179w-1.fcs", "179w-3.fcs")], aes(x = FSC.H, y = SSC.H), subset = "root") +
geom_hex(bins = 64) + geom_gate(outlier.gate) + facet_wrap(~name, ncol = 2)# + ggcyto_par_set(limits = "instrument")
Add this gate to the GatingSet
gs <- GatingSet(fs) # create a GatingSet
gs_pop_add(gs, outlier.gate, parent = "root")
[1] 2
recompute(gs)
done!
Let’s examine how this gate intersected with the FSC.H vs FSC.W plot (for singlets)
p1 <- ggcyto(gs[[1]], aes(x = FSC.H, y = FSC.W), subset = "root") + geom_hex(bins = 64)
p2 <- ggcyto(gs[[1]], aes(x = FSC.H, y = FSC.W), subset = "-outlier") + geom_hex(bins = 64)
grid.arrange(as.ggplot(p1), as.ggplot(p2), ncol = 2)
Next let’s remove multiplets on FSC.H vs FSC.W. To do this, we could either manually set up a polygon gate, or use the automatic clustering function provided by the flowClust
package. Note that in the original implementation, the flowClust()
function or the tmixFilter()
version that was supposed to allow for integration with the flowCore
package, both were designed with different downstream actions in mind than what I want to do here (visualize with ggcyto() + geom_gate()
). The openCyto
package written by the same group of authors who created flowClust
and ggcyto
has a helper function to make this possible. See this post for a discussion on alternative ways to achieve this.
ex <- Subset(fs[[1]], outlier.gate)
singlet.gate <- gate_flowclust_2d(ex, "FSC.H", "FSC.W", filterId = "singlet", K = 1, quantile = 0.95)
The prior specification has no effect when usePrior=no
Using the serial version of flowClust
ggcyto(ex, aes(x = FSC.H, y = FSC.W)) + geom_hex(bins = 64) + geom_gate(singlet.gate) + geom_stats()
To add the flowClust gates to the GatingSet, realize that the same clustering operations need to be conducted for each flowframe inside the flowSet individually. The following code is based on the example listed under the section 8.5 in the flowCore
manual
ggcyto(gs, aes(x = FSC.H, y = FSC.W), subset = "-outlier") + geom_hex(bins = 64) + geom_gate("singlet") + geom_stats(type = c("percent", "count")) + facet_wrap(~name, ncol = 10)# + scale_x_logicle() + scale_y_logicle()
When I plotted the singlet events on GFP-RFP 2d space, I noticed a few samples that show more than one population of cells, where the main population appeared to be “induced” while one or more subpopulations are less or not induced. While the biological reasons behind require further investigation and may be very interesting (heterogeneity), for this analysis we will use flowClust to identify the main population and move forward.
ggcyto(gs[grepl("177|179", sampleNames(gs))], aes(x = BL1.H, y = YL2.H), subset = "singlet") + geom_hex(bins = 64) +
facet_wrap(~name, ncol = 4) + scale_x_logicle() + scale_y_logicle() + theme_bw()
Be careful when working with the GatingSet and GatingHierarchy objects – these are strictly reference classes, meaning that most of the operations work by pointers and the operations will change the underlying data. For example, the first line of code below obtains a pointer to the underlying data rather than making a copy of that data. any operations on it will change the original data as a result.
#ex <- gs_pop_get_data(gs, "singlet")[[1]]
ex <- fs[["B3-1.fcs"]]
#lgcl <- estimateLogicle(ex, channels = c("BL1.H", "YL2.H"))
lgcl <- logicleTransform("induction")
ex <- transform(ex, lgBL1.H = lgcl(`BL1.H`), lgYL2.H = lgcl(`YL2.H`))
fluo.gate <- gate_flowclust_2d(ex, "lgBL1.H", "lgYL2.H", K = 2, quantile = 0.98)
The prior specification has no effect when usePrior=no
Using the serial version of flowClust
ggcyto(ex, aes(x = lgBL1.H, y = lgYL2.H)) + geom_hex(bins = 64) + geom_gate(fluo.gate) + geom_stats()# + scale_x_logicle() + scale_y_logicle()
Even though flowClust is supposed to perform its own transformation (modified Box-Cox), empirically I found the clustering seem to work better on logicle transformed data for the two fluorescent channels. Therefore I’m transforming the underlying data of the GatingSet. Note that it seems to be difficult to “create new parameters” to store the transformed data, while keeping the original data intact. Instead, the transformation functions constructed using the constructor logicle_trans()
stores the inverse transformation functions, which can be used to perform the inverse transformation when needed. Followed the manual for GatingSet here
lgcl <- logicle_trans()
transList <- transformerList(c(lgBL1.H = "BL1.H", lgclYL2.H = "YL2.H"), lgcl)
transform(gs, transList)
A GatingSet with 78 samples
to obtain the original data, use
gs_pop_get_data(gs[[1]], inverse.transform = TRUE)
Now we can do the flowClust gating
dat <- gs_pop_get_data(gs, "singlet") # get parent data
inductionGate <- fsApply(dat, function(fr)
openCyto::gate_flowclust_2d(fr, "BL1.H", "YL2.H", K = 2, quantile = 0.95)
)
gs_pop_add(gs, inductionGate, parent = "singlet", name = "induction")
recompute(gs)
ggcyto(gs, aes(x = BL1.H, y = YL2.H), subset = "singlet") + geom_hex(bins = 64) + geom_gate("induction") + geom_stats() +
facet_wrap(~name, ncol = 10)# + scale_x_logicle() + scale_y_logicle()
Here are the cleaned data
ggcyto(gs, aes(x = BL1.H, y = YL2.H), subset = "induction") + geom_hex(bins = 64) + facet_wrap(~name, ncol = 10)# + scale_x_logicle()
My original idea was to follow the Metzger et al 2015 paper’s protocol and normalize both fluorescent channels by the cell size, approximated by the FSC signal. I had been a bit unsure about how they came up with the formula, i.e. \(\frac{logRFP^2/logFSC^3}{logGFP^2/logFSC^3}\). My best guess is that they explored different transformations of the two variables to look for the most linear correlations between the two measures (since FSC is not a direct measure of cell size), and came up with the cubic and square on each. But looking at the GFP vs RFP plot above, I suddenly realized that in our work what we really cared about is the activity of the chimeric Pho4 as defined by RFP / GFP (reporter protein production per cell normalized by chimeric Pho4 expression level). So that’s what I’ll be using to calculate the median.
See below for the new normalized variable and how it is no longer correlated with FSC
ex <- gs[grepl("177", sampleNames(gs))]; attr(ex, "subset") <- "induction"
ggplot(ex, aes(x = FSC.H, y = YL2.H/BL1.H)) + geom_hex(bins = 64) + facet_wrap(~name, ncol = 3) + scale_fill_gradientn(colours = rev(RColorBrewer::brewer.pal(11, "Spectral"))) + stat_smooth(method = "lm")
`geom_smooth()` using formula 'y ~ x'
The goal is to export the gated events and calculate the RFP/GFP and take the median, which will be used in downstream analyses.
# get the population stats
stats <- gs_pop_get_stats(gs) %>%
as_tibble() %>%
mutate(pop = gsub(".*/", "", pop), pop = gsub("-outlier", "cells", pop)) %>%
pivot_wider(names_from = pop, names_prefix = "n_", values_from = count)
# get population data
fs.out <- gs_pop_get_data(gs, y = "induction", inverse.transform = TRUE) # get the inverse transformed data
# calculate ratio
norm.data <- fsApply(fs.out, function(cf) {
cf <- cbind(cf, ratio = cf[,"YL2.H"] / cf[, "BL1.H"])
apply(cf[, c("BL1.H", "YL2.H", "ratio")], 2, median)
}, use.exprs = TRUE) %>%
as_tibble(rownames = "name")
Export the data
# pull all info together in a single tibble
final <- left_join(as_tibble(pData(fs)), stats, by = c("name" = "sample")) %>%
left_join(norm.data, by = "name")
write_tsv(final, "output/20210402-jz-lfs-gated-median-out.txt")