This is an R Markdown Notebook, to explain the use of the Bioconductor OpenCyto Framework for basic plotting and gating.

In the following I will use exemplary data of phytoplankton cultures, that I gathered during the lab work for my master thesis at the Alfred-Wegener-Institute, under the supervision of Sebastian Rokitta & Björn Rost. In order to perform your own exemplary analysis using my data, you can download it here: Data Download as Zip

Setting gates in FlowJ

The gating of raw data can be done within R, but there is (as far as i know) no good graphical interface to do so, and setting the numerical limits manually is quite a pain. However, there is the possiblity to use FlowJo to do the gating graphically and then read the fully gated workspace into R using the flowWorkspace library.

You can download FlowJo for a free 30-day trial here. Performing the gating is relatively straight-forward, and they have good documentation. What is important to keep in mind is that coherent naming of the gates, and syncing them between your different samples, keeps everything tidy and allows you to easily access your gating hierarchy (from gates to subgates) using R.

Once the Gating is done, you should export your FlowJo file as an XML Workspace (i.e. .xml or .wsp files), so that R can read it. (More on that here)

Start working in R

First of all we have to make sure that all necessary packages are installed:

source("https://bioconductor.org/biocLite.R")
biocLite("openCyto") #this installs the OpenCyto framework
biocLite("ggcyto") #this installs added plotting funtionality

and then load the packages into the workspace:

library(openCyto)
library(ggcyto)

Reading FlowJo Workspace

If all went well, we can start reading our data:

getwd() #this gives you your current working directory
[1] "/Users/bp/Desktop/FlowCytoDATA_bp"

Now you have to change it to the place that you downloaded and unzipped the data:

#change this to the place where the data is
setwd('/Users/bp/Desktop/FlowCytoDATA_bp/')

Now we can read the XML Workspace from FlowJo and start exploring it:

(The official bioconductor documentation for this can be found here)

flowDataPath <- "FlowJoWorkspace_0.2.wsp"
ws <- openWorkspace(flowDataPath)

Now let’s look at the contents of our FlowJo workspace:

print(ws)
FlowJo Workspace Version  20.0 
File location:  . 
File name:  FlowJoWorkspace_0.2.wsp 
Workspace is open. 

Groups in Workspace

Above you can see all the samples in the workspace according to the group names that have been assigned in FlowJo.

Now let’s have a look at a couple of different components of this workspace:

getSamples(ws) #see all samples
getSampleGroups(ws)$groupName #see all groups
 [1] All Samples   All Samples   All Samples   All Samples   All Samples   All Samples   All Samples  
 [8] All Samples   All Samples   All Samples   All Samples   All Samples   All Samples   All Samples  
[15] All Samples   All Samples   All Samples   All Samples   All Samples   All Samples   All Samples  
[22] All Samples   All Samples   All Samples   All Samples   All Samples   All Samples   All Samples  
[29] All Samples   All Samples   All Samples   All Samples   beads         beads         beads        
[36] beads         beads         beads         Micromonas    Micromonas    Micromonas    Micromonas   
[43] Micromonas    Micromonas    Phaeocystis   Phaeocystis   Phaeocystis   Phaeocystis   Phaeocystis  
[50] Phaeocystis   Phaeocystis   Phaeocystis   Phaeocystis   Phaeocystis   Phaeocystis   Phaeocystis  
[57] Phaeocystis   Thalassiosira Thalassiosira Thalassiosira Thalassiosira Thalassiosira Thalassiosira
Levels: All Samples beads Micromonas Phaeocystis Thalassiosira
#access xml keywords
sn <- "A01 MilliQBeads_2211_2200.fcs"
getKeywords(ws, sn)[8]  # <- get $VOL
$`$VOL`
[1] "100015"

For example, this tells me that exactly 100.015 ml were sampled by the Accuri Flow Cytometer according to it’s sensors. You can have a look at the first five XML metadata using:

getKeywords(ws, sn)[1:5]
$`#SPACERS`
[1] "00000"

$`$ENDANALYSIS`
[1] "0"

$`$BEGINANALYSIS`
[1] "0"

$`$ENDDATA`
[1] "38186"

$`$BEGINDATA`
[1] "2570"

parse FCS files within Workspace

As of now, we have only been dealing with metadata, now let’s get started with linking this metadata to our actual FACS sample data. To do this we have to parse the workspace and point it to the correct path that our “.fcs” files reside in.

The parseWorkspace function takes three inputs:

Here I parse the workspace for Name #4, which references Phaeocystis

gs <- parseWorkspace(ws,name = 4, path = '20180219_163432/')
Parsing 13 samples
windows version of flowJo workspace recognized.
version X
Creating ncdfFlowSet...
All FCS files have the same following channels:
FSC-A
SSC-A
FL1-A
FL2-A
FL3-A
FL4-A
FSC-H
SSC-H
FL1-H
FL2-H
FL3-H
FL4-H
Width
Time
done!
loading data: 20180219_163432//D01 EXPIIPh0_2211_2200.fcs
No compensation
gating ...
write D01 EXPIIPh0_2211_2200.fcs_1408 to empty cdf slot...
loading data: 20180219_163432//D02 EXPIIPh0_2311_1700.fcs
No compensation
gating ...
write D02 EXPIIPh0_2311_1700.fcs_1857 to empty cdf slot...
loading data: 20180219_163432//D03 EXPIIPh0_2411_1500.fcs
No compensation
gating ...
write D03 EXPIIPh0_2411_1500.fcs_1608 to empty cdf slot...
loading data: 20180219_163432//D04 EXPIIPh0_2511_1740.fcs
No compensation
gating ...
write D04 EXPIIPh0_2511_1740.fcs_1615 to empty cdf slot...
loading data: 20180219_163432//D05 EXPIIPh0_2611_2000.fcs
No compensation
gating ...
write D05 EXPIIPh0_2611_2000.fcs_2422 to empty cdf slot...
loading data: 20180219_163432//D06 EXPIIPh0_2711_2000.fcs
No compensation
gating ...
write D06 EXPIIPh0_2711_2000.fcs_2526 to empty cdf slot...
loading data: 20180219_163432//G04 EXPIIPhx_2511_1740.fcs
No compensation
gating ...
write G04 EXPIIPhx_2511_1740.fcs_1333 to empty cdf slot...
loading data: 20180219_163432//H01 EXPIIPhx_2211_2200.fcs
No compensation
gating ...
write H01 EXPIIPhx_2211_2200.fcs_1076 to empty cdf slot...
loading data: 20180219_163432//H02 EXPIIPhx_2311_1700.fcs
No compensation
gating ...
write H02 EXPIIPhx_2311_1700.fcs_1372 to empty cdf slot...
loading data: 20180219_163432//H03 EXPIIPhx_2411_1500.fcs
No compensation
gating ...
write H03 EXPIIPhx_2411_1500.fcs_1260 to empty cdf slot...
loading data: 20180219_163432//H04 EXPIIPhx_2511_1740.fcs
No compensation
gating ...
write H04 EXPIIPhx_2511_1740.fcs_1317 to empty cdf slot...
loading data: 20180219_163432//H05 EXPIIPhx_2611_2000.fcs
No compensation
gating ...
write H05 EXPIIPhx_2611_2000.fcs_1560 to empty cdf slot...
loading data: 20180219_163432//H06 EXPIIPhx_2711_2000.fcs
No compensation
gating ...
write H06 EXPIIPhx_2711_2000.fcs_1526 to empty cdf slot...
done!

If all worked correctly you should now be able to access the raw data, and plot the gates on top of the data quite easily:

getPopStats(gs) # <- these are the raw cell counts within the fcs files

Plotting the data with gates

Although this entire framework can seem quite unintuitive at times, when it comes to plotting it is easy to produce some usable results. Here I plot the FCS data in the first sample, together with the gates, which were defined for FL3-A (fluoresence) vs FSC-A (forward scatter)

plotGate(gs[[1]], c("AllPh-beads","normal cells","bigger cells","biggest cells"))

in addition to the plotGate function there is also the useful plotting function autoplot:

plot <- autoplot(gs[[1]], c("normal cells","bigger cells","biggest cells"), bins=100)+ 
            ggcyto_par_set(limits = "data")+
            xlim(c(0.6,1))
plot$`normal cells`

Extracting raw counts from gated data

Make sure to have these other libraries installed:

library(ggplot2)
library(cowplot) # stylesheet for ggplot2 -> makes the plot look much nicer
library(scales) # needed for time conversions later on

Now we can actually start working with the flow counts. The gating set gs contains flow cytometry data from Phaeocystis cultures, hence the Ph abbreviation.

We can extract the count data using the getPopStats function:

PhPop <- getPopStats(gs)[]

And these raw counts are unfortunately a bit messy, so we will convert this into a data frame and clean it up a bit:

PhCounts <- data.frame(PhPop)
PhCounts <- PhCounts[PhCounts$name != "G04 EXPIIPhx_2511_1740.fcs_1333",] #remove one measurement which was botched and repeated
#the following three lines clean up the data frame structure (in particular the row names) after deletion of one sample
PhCounts <- PhCounts[with(PhCounts, order(name)), ]
row.names(PhCounts) <- 1:nrow(PhCounts)
PhnumGate = length(levels((factor(PhCounts$Population))))

Now that we have extracted the raw data and put it into a usable data frame format, we can normalize all counts using the control bead counts, that were added to every sample. For this I take a subset of my samples (one experimental run), then calculate the mean bead concentration of the samples and divide all other counts by it. (note: this might not be the best way to do it)

Ph0Counts <- PhCounts[1:(PhnumGate*6),] #this takes the subset (one experimental run)
meanBeadsPerML_Ph0 <- mean(Ph0Counts[Ph0Counts$Population == "beads",]$Count) * 10 #this gives the mean beads per ml
# now we have to normalize all values with the mean bead concentration:
Ph0Counts$NormCount <- Ph0Counts$Count / rep(Ph0Counts[Ph0Counts$Population == "beads",]$Count,each=PhnumGate) * meanBeadsPerML_Ph0
#and finally we can show the data.frame with the normalized counts:
Ph0Counts[,c(1,2,6)]

To this data, I will add the time of sampling, which i noted down in my labbook, and saved in a csv file:

FloCounts <- read.csv('FloCountsRough.csv')
FloCounts$Time <- as.POSIXct(FloCounts$Time,format="%d.%m.%Y %H:%M:%S")
FloCounts$Time <- FloCounts$Time-1800000 #converting actual date into number of days
TimeVector <- FloCounts$Time[1:6]
Ph0Counts$Time <- rep(TimeVector,each=PhnumGate) #add time to all gates & counts, this is why the structure had to be corrected before

Now that we have normalized counts and time for one species over 6 days, we get to plotting it. To do this in ggplot2, we have to clean the data up some more and convert it into a long format (which is done by data.table::melt).

Ph0Counts_filt <- Ph0Counts[Ph0Counts$Population %in% c('normal cells','bigger cells','biggest cells'),]
Ph0Data <- data.table::melt(Ph0Counts_filt, id.vars = c("Population", "Time")
                           , measure.vars = c("NormCount")) 
Ph0Data$Population <- factor(Ph0Data$Population, levels = c("biggest cells","bigger cells","normal cells"))

Finally, this allows us to plot the normalized cell counts over time, overlayed with the size fraction as taken from Forward Scatter and Fluoresence data. (Which luckily here, does not show too big of a trend towards aggregation of cells)

ggplot(data = Ph0Data, aes(x = Time, y = value, fill = Population)) +
  geom_area() + ggtitle('Ph0 FlowCount + Size Fraction')+
  scale_x_datetime(labels=date_format('%d'), date_breaks = '1 day')+ xlab('days') + ylab('cells/ml')

Additionally:

The bioconductor OpenCyto Framework has much more functions and packages which i did not use.. for example there are extensive capabilites for calculating spillover matrices and normalizing fluoresence data (which often requires a careful experimental setup, including multiple normalization beads).

For now, I hope this serves as a quick overview of the capabilites, feel free to contact me for specific questions at benjamin.post@leibniz-zmt.de

---
title: "Flow-Cyto Data in R, using the Bioconductor OpenCyto Framework"
output: html_notebook
author: Benjamin Post
---

This is an [R Markdown](http://rmarkdown.rstudio.com) Notebook, to explain the use of the Bioconductor OpenCyto Framework for basic plotting and gating.

In the following I will use exemplary data of phytoplankton cultures, that I gathered during the lab work for my master thesis at the Alfred-Wegener-Institute, under the supervision of Sebastian Rokitta & Björn Rost.
In order to perform your own exemplary analysis using my data, you can download it here: [Data Download as Zip](https://drive.google.com/file/d/1jTkX5XNIa8o9PGRMLfRgRAQDQIDepDSw/view?usp=sharing)


## Setting gates in FlowJ
The gating of raw data can be done within R, but there is (as far as i know) no good graphical interface to do so, and setting the numerical limits manually is quite a pain. However, there is the possiblity to use [FlowJo](https://www.flowjo.com) to do the gating graphically and then read the fully gated workspace into R using the *flowWorkspace* library.

You can download FlowJo for a free 30-day trial [here](https://www.flowjo.com/solutions/flowjo/free-trial).
Performing the gating is relatively straight-forward, and they have good [documentation](http://docs.flowjo.com/vx/graphs-and-gating/gw-gating/). What is important to keep in mind is that coherent naming of the gates, and syncing them between your different samples, keeps everything tidy and allows you to easily access your gating hierarchy (from gates to subgates) using R.

Once the Gating is done, you should export your FlowJo file as an XML Workspace (i.e. .xml or .wsp files), so that R can read it. (More on that [here](http://docs.flowjo.com/d2/workspaces-and-samples/ribbons-and-tabs/ws-ribbon-band-debug/vxworkspaces-and-samplesribbons-and-tabsws-ribbon-band-debugworkspace-xml/))

## Start working in R
First of all we have to make sure that all necessary packages are installed:
```{r,echo=T, eval=F}
source("https://bioconductor.org/biocLite.R")
biocLite("openCyto") #this installs the OpenCyto framework
biocLite("ggcyto") #this installs added plotting funtionality
```
and then load the packages into the workspace:
```{r}
library(openCyto)
library(ggcyto)

```

## Reading FlowJo Workspace
If all went well, we can start reading our data:
```{r}
getwd() #this gives you your current working directory
```
Now you have to change it to the place that you downloaded and unzipped the data:

```{r, results='hide'}
#change this to the place where the data is
setwd('/Users/bp/Desktop/FlowCytoDATA_bp/')
```

Now we can read the XML Workspace from FlowJo and start exploring it:

(The official bioconductor documentation for this can be found [here](https://www.bioconductor.org/packages/devel/bioc/vignettes/flowWorkspace/inst/doc/flowWorkspace-Introduction.html))
```{r}
flowDataPath <- "FlowJoWorkspace_0.2.wsp"
ws <- openWorkspace(flowDataPath)
```

Now let's look at the contents of our FlowJo workspace:
```{r}
print(ws)
```
Above you can see all the samples in the workspace according to the group names that have been assigned in FlowJo.

Now let's have a look at a couple of different components of this workspace:
```{r}
getSamples(ws) #see all samples
getSampleGroups(ws)$groupName #see all groups

#access xml keywords
sn <- "A01 MilliQBeads_2211_2200.fcs"
getKeywords(ws, sn)[8]  # <- get $VOL
```
For example, this tells me that exactly 100.015 ml were sampled by the Accuri Flow Cytometer according to it's sensors. You can have a look at the first five XML metadata using:
```{r}
getKeywords(ws, sn)[1:5]
```
## parse FCS files within Workspace
As of now, we have only been dealing with *metadata*, now let's get started with linking this metadata to our actual FACS sample data. To do this we have to parse the workspace and point it to the correct path that our ".fcs" files reside in.

The ``parseWorkspace`` function takes three inputs:

* ws = our FlowJo XML workspace

* name = group within workspace, get number from ```print(ws)```

* path = folder including all ".fsc" files referenced within your workspace

Here I parse the workspace for Name #4, which references Phaeocystis

```{r}
gs <- parseWorkspace(ws,name = 4, path = '20180219_163432/')
```

If all worked correctly you should now be able to access the raw data, and plot the gates on top of the data quite easily:

```{r}
getPopStats(gs) # <- these are the raw cell counts within the fcs files
```
#Plotting the data with gates
Although this entire framework can seem quite unintuitive at times, when it comes to plotting it is easy to produce some usable results. Here I plot the FCS data in the first sample, together with the gates, which were defined for FL3-A (fluoresence) vs FSC-A (forward scatter)
```{r}
plotGate(gs[[1]], c("AllPh-beads","normal cells","bigger cells","biggest cells"))
```

in addition to the ```plotGate``` function there is also the useful plotting function ```autoplot```:

```{r}
plot <- autoplot(gs[[1]], c("normal cells","bigger cells","biggest cells"), bins=100)+ 
            ggcyto_par_set(limits = "data")+
            xlim(c(0.6,1))
plot$`normal cells`
```

#Extracting raw counts from gated data

Make sure to have these other libraries installed:
```{r}
library(ggplot2)
library(cowplot) # stylesheet for ggplot2 -> makes the plot look much nicer
library(scales) # needed for time conversions later on
```

Now we can actually start working with the flow counts. The gating set ```gs``` contains flow cytometry data from Phaeocystis cultures, hence the Ph abbreviation.

We can extract the count data using the ```getPopStats``` function:
```{r}
PhPop <- getPopStats(gs)[]
```
And these raw counts are unfortunately a bit messy, so we will convert this into a data frame and clean it up a bit:
```{r}
PhCounts <- data.frame(PhPop)
PhCounts <- PhCounts[PhCounts$name != "G04 EXPIIPhx_2511_1740.fcs_1333",] #remove one measurement which was botched and repeated

#the following three lines clean up the data frame structure (in particular the row names) after deletion of one sample
PhCounts <- PhCounts[with(PhCounts, order(name)), ]
row.names(PhCounts) <- 1:nrow(PhCounts)
PhnumGate = length(levels((factor(PhCounts$Population))))
```
Now that we have extracted the raw data and put it into a usable data frame format, we can normalize all counts using the control bead counts, that were added to every sample. For this I take a subset of my samples (one experimental run), then calculate the mean bead concentration of the samples and divide all other counts by it. *(note: this might not be the best way to do it)*
```{r}
Ph0Counts <- PhCounts[1:(PhnumGate*6),] #this takes the subset (one experimental run)


meanBeadsPerML_Ph0 <- mean(Ph0Counts[Ph0Counts$Population == "beads",]$Count) * 10 #this gives the mean beads per ml
# now we have to normalize all values with the mean bead concentration:
Ph0Counts$NormCount <- Ph0Counts$Count / rep(Ph0Counts[Ph0Counts$Population == "beads",]$Count,each=PhnumGate) * meanBeadsPerML_Ph0
#and finally we can show the data.frame with the normalized counts:
Ph0Counts[,c(1,2,6)]
```
To this data, I will add the time of sampling, which i noted down in my labbook, and saved in a csv file:
```{r}
FloCounts <- read.csv('FloCountsRough.csv')

FloCounts$Time <- as.POSIXct(FloCounts$Time,format="%d.%m.%Y %H:%M:%S")
FloCounts$Time <- FloCounts$Time-1800000 #converting actual date into number of days
TimeVector <- FloCounts$Time[1:6]

Ph0Counts$Time <- rep(TimeVector,each=PhnumGate) #add time to all gates & counts, this is why the structure had to be corrected before
```
Now that we have normalized counts and time for one species over 6 days, we get to plotting it. To do this in ggplot2, we have to clean the data up some more and convert it into a long format (which is done by ```data.table::melt```).
```{r}
Ph0Counts_filt <- Ph0Counts[Ph0Counts$Population %in% c('normal cells','bigger cells','biggest cells'),]

Ph0Data <- data.table::melt(Ph0Counts_filt, id.vars = c("Population", "Time")
                           , measure.vars = c("NormCount")) 

Ph0Data$Population <- factor(Ph0Data$Population, levels = c("biggest cells","bigger cells","normal cells"))
```
Finally, this allows us to plot the normalized cell counts over time, overlayed with the size fraction as taken from Forward Scatter and Fluoresence data. (Which luckily here, does not show too big of a trend towards aggregation of cells)
```{r}
ggplot(data = Ph0Data, aes(x = Time, y = value, fill = Population)) +
  geom_area() + ggtitle('Ph0 FlowCount + Size Fraction')+
  scale_x_datetime(labels=date_format('%d'), date_breaks = '1 day')+ xlab('days') + ylab('cells/ml')
```

### Additionally:
The bioconductor OpenCyto Framework has much more functions and packages which i did not use.. for example there are extensive capabilites for calculating spillover matrices and normalizing fluoresence data (which often requires a careful experimental setup, including multiple normalization beads).

For now, I hope this serves as a quick overview of the capabilites, feel free to contact me for specific questions at
benjamin.post@leibniz-zmt.de