phenoScreen

phenoScreen is a collection of R functions I created for personal use for phenotypic screening studies, though the majority are applicable to any work with multi-well plates.

Link to Github page:
https://www.github.com/Swarchal/phenoScreen

The code for plotting multi-well plates in ggplot2 was adapted from Brian Connely’s blog.

Well labels

Sometimes lab equipment uses well number rather than the alpha-numeric labelling (eg. A01 - H12). To convert from numeric to more traditional well-identifiers in a row-wise order, use num_to_well:

numbers <- 1:96
num_to_well(numbers)

##  [1] "A01" "A02" "A03" "A04" "A05" "A06" "A07" "A08" "A09" "A10" "A11"
## [12] "A12" "B01" "B02" "B03" "B04" "B05" "B06" "B07" "B08" "B09" "B10"
## [23] "B11" "B12" "C01" "C02" "C03" "C04" "C05" "C06" "C07" "C08" "C09"
## [34] "C10" "C11" "C12" "D01" "D02" "D03" "D04" "D05" "D06" "D07" "D08"
## [45] "D09" "D10" "D11" "D12" "E01" "E02" "E03" "E04" "E05" "E06" "E07"
## [56] "E08" "E09" "E10" "E11" "E12" "F01" "F02" "F03" "F04" "F05" "F06"
## [67] "F07" "F08" "F09" "F10" "F11" "F12" "G01" "G02" "G03" "G04" "G05"
## [78] "G06" "G07" "G08" "G09" "G10" "G11" "G12" "H01" "H02" "H03" "H04"
## [89] "H05" "H06" "H07" "H08" "H09" "H10" "H11" "H12"

And works with 384-well plates, as well as and partial plates:

numbers <- 320:384
num_to_well(numbers, plate = 384)

##  [1] "N08" "N09" "N10" "N11" "N12" "N13" "N14" "N15" "N16" "N17" "N18"
## [12] "N19" "N20" "N21" "N22" "N23" "N24" "O01" "O02" "O03" "O04" "O05"
## [23] "O06" "O07" "O08" "O09" "O10" "O11" "O12" "O13" "O14" "O15" "O16"
## [34] "O17" "O18" "O19" "O20" "O21" "O22" "O23" "O24" "P01" "P02" "P03"
## [45] "P04" "P05" "P06" "P07" "P08" "P09" "P10" "P11" "P12" "P13" "P14"
## [56] "P15" "P16" "P17" "P18" "P19" "P20" "P21" "P22" "P23" "P24"

The opposite function is well_to_num:

wells <- c("A01", "D08", "H12")
well_to_num(wells)

## [1]  1 44 96

Both num_to_well and well_to_num order the wells by starting each new row at the left hand side of the plate, i.e …A11, A12, B01, B02… However, some automated systems count wells in a way to maximise speed, and snake across the rows, i.e …A11, A12, B12, B11… To change to this behaviour use the style argument.

wells <- c("A11", "A12", "B01", "B02")
well_to_num(wells)

## [1] 11 12 13 14

well_to_num(wells, style = "snake")

## [1] 11 12 23 24

Annotating plates: Often experimenters produce a plate map, with data describing concentration or treatments in the same layout as their plate. However, most software outputs experimental data in the form of columns, for example:

well	value
A01	3
A02	8
…	…

The read_map function can annotate data such as this with values from a platemap (in the same spatial layout as the plate). The easiest way to do this is create the platemap as a .csv file in a spreadsheet programme, and then import this into R.

This will then annotate the correct well dependent on the position of the values in the platemap.

Plate data

Plate heatmaps

To produce a heatmap of numerical data in the layout of the plate, the function z_map (‘z’ for z-score) takes alpha-numerical well identifiers, and numerical values, nomalises and centers the numerical values, and plots them in a platelayout in the spatial context of the well identifiers.

# example data
wells <- num_to_well(1:96)
vals <- rnorm(96)
df96 <- data.frame(wells, vals)

z_map(data = df96$vals,
      well = df96$wells,
      title = "Example plot")

This also works with 384-well plates, as well as missing wells:

# example data
wells <- num_to_well(26:322, plate = 384)
vals <- rnorm(length(wells))
df384 <- data.frame(wells, vals)

z_map(data = df384$vals,
      well = df384$wells,
      plate = 384,
      title = "Example  partial 384-well plate")

The colours in the plot can be changed to any RColorBrewer palette:

z_map(data = df96$vals,
      well = df96$wells,
      palette = "RdGy",
      title = "Alternative colour palette")

Identifying ‘hits’

To highlight only wells above/below a certain number of standard deviations from the plate mean, use the function hit_map:

hit_map(data = df96$vals,
        well = df96$wells,
        title = "Hit map")

The default threshold for defining a hit is \(\pm\) 2 standard deviations, though this can be adjusted with the threshold argument:

hit_map(data = df96$vals,
        well = df96$well,
        threshold = 1,
        title = "Hit map: 1 standard deviation")

Spatial normalisation and Edge effects

To counter edge effects, as well as row/column artefacts, the B-score normalisation procedure can be used.

Here is a real-world example showing a clear spatial effect with total cell area decreasing from left to right across the plate.

z_map(data = df_inout$Cell..Total.Area_Average_FITC..SW.morphology.,
      well = df_inout$well,
      title = "Plate effects: Total cell area")

We can use the the B-score normalisation procedure with b_score to normalise the numerical values, or use b_map to plot the normalised values:

b_map(data = df_inout$Cell..Total.Area_Average_FITC..SW.morphology.,
      well = df_inout$well,
      title = "Plate effects: Countered with B-score")

## 1: 3084.597
## 2: 2990.15
## Final: 2964.185

It’s also possible to look at ‘hits’ that have been normalised with the B-score using the bhit_map function in the same way as hit_map.

Using multivariate data

These are some more bespoke functions for multivariate/high-content screening.

Summarising multiple features on a plate heatmap

To summarise multivariate data in a plate heatmap, I have used principal component analysis - specifically plotting a z-score of the first principal component (the linear combination of features that captures the largest proportion of the variance within the data).

# example data with 3 values per well
vals1 <- rnorm(96); vals2 <- rnorm(96); vals3 <- rnorm(96)
wells <- num_to_well(1:96)
df_mv <- data.frame(wells,
                    vals1,
                    vals2,
                    vals3)

pc_map(data = df_mv[, 2:4], # 3 columns of random numbers
       well = df_mv$wells,
       title = "Heatmap of first principal component ")

To identify wells that have a first principal component above/below a certain number of standard deviations away from the plate average (i.e hit_map of multivariate data) use the pchit_map function.

pchit_map(data = df_mv[, 2:4], # 3 columns of random numbers
          well = df_mv$wells,
          threshold = 1.5,
          title = "Hit map of multivariate data")

Bivariate confidence bands

In compound screens, a typical approach is to reduce a large number of variables into 2 dimensions using PCA. Then look for data points (compounds) that fall outside of the main cluster. For this I produce a bi-variate confidence band from a 2D kernal density estimation, and can quantify the confidence interval around the centre of the cluster.

The confidence band can be altered with the confidence argument, with the default set at 95% confidence.

# example data
x <- rnorm(5000)
y <- rnorm(5000)

par(mfrow = c(1,2))
plot_confidence(x, y, title = "Example confidence plot (95%)")

## numeric(0)

plot_confidence(x, y, title = "Example confidence plot (99%)", confidence = 0.99)

## numeric(0)

Quality control and Z’ factors

To test how robust a screening assay is, the separation between the postive and negative controls are calculated with a Z’ factor.

\[ Z = 1 - \dfrac{3(\sigma_p + \sigma_n)}{| \mu_p - \mu_n|} \]

Where the standard deviations (\(\sigma\)) and means (\(\mu\)) of the positive (\(x_p\)) and negative (\(x_n\)) are compared. An assay with a Z’ factor of above 0.5 is typically considered robust.

We can use the z_factor to calculate a Z’ factor between two treatments in a univariate setting, or z_factor_scan to calculate a Z’ factor for a number of variables, and chose to either return a specified number of variables with the highest Z’ factor or those above a cut-off value.

For this example I wil re-use the dataset from the plate effects, and compare the inner 60 wells vs the outer 36 wells, and return 10 variables with the highest Z-factor.

z_factor_scan(data = df_inout[,3:50], # numerical values + column of 'treatments'
              treatments = c("inner", "outer"),
              n = 10)

##                                                Feature  Z_factor
## 3    Features.Count_Sum_granules.Total..SW.morphology. -2.588910
## 4  Features.Count_Sum_granules.Total..SW.morphology..1 -2.588910
## 1      Features.Count_Sum_nuclei.Total..SW.morphology. -3.400522
## 2    Features.Count_Sum_nuclei.Total..SW.morphology..1 -3.400522
## 40           Cell..Area_Average_nuclei..SW.morphology. -5.806960
## 41         Cell..Area_Average_nuclei..SW.morphology..1 -5.806960
## 5        Features.Count_Sum_FITC.Total..SW.morphology. -8.432364
## 6      Features.Count_Sum_FITC.Total..SW.morphology..1 -8.432364
## 45                          Cell.Count..SW.morphology. -8.432364
## 47                        Cell.Count..SW.morphology..1 -8.432364

In this case, the Z’ factor is producing negative values, this means there is no clear separation between the two ‘treatments’, this is expected as this is just inner vs outer wells of untreated cells.

To plot the results, we use the plot to indicate we want produce a plot, and plotline to produce a vertical line indicating a cut-off value.

z_factor_scan(data = df_inout[,3:50], # numerical values + column of 'treatments'
              treatments = c("inner", "outer"),
              n = 10,
              plot = TRUE,
              plotline = 0,
              title = "Example Z factor plot")