phenoScreen
is a collection of R functions I created for personal use for phenotypic screening studies, though the majority are applicable to any work with multi-well plates.
Link to Github page:
https://www.github.com/Swarchal/phenoScreen
The code for plotting multi-well plates in ggplot2 was adapted from Brian Connely’s blog.
num_to_well
:numbers <- 1:96
num_to_well(numbers)
## [1] "A01" "A02" "A03" "A04" "A05" "A06" "A07" "A08" "A09" "A10" "A11"
## [12] "A12" "B01" "B02" "B03" "B04" "B05" "B06" "B07" "B08" "B09" "B10"
## [23] "B11" "B12" "C01" "C02" "C03" "C04" "C05" "C06" "C07" "C08" "C09"
## [34] "C10" "C11" "C12" "D01" "D02" "D03" "D04" "D05" "D06" "D07" "D08"
## [45] "D09" "D10" "D11" "D12" "E01" "E02" "E03" "E04" "E05" "E06" "E07"
## [56] "E08" "E09" "E10" "E11" "E12" "F01" "F02" "F03" "F04" "F05" "F06"
## [67] "F07" "F08" "F09" "F10" "F11" "F12" "G01" "G02" "G03" "G04" "G05"
## [78] "G06" "G07" "G08" "G09" "G10" "G11" "G12" "H01" "H02" "H03" "H04"
## [89] "H05" "H06" "H07" "H08" "H09" "H10" "H11" "H12"
And works with 384-well plates, as well as and partial plates:
numbers <- 320:384
num_to_well(numbers, plate = 384)
## [1] "N08" "N09" "N10" "N11" "N12" "N13" "N14" "N15" "N16" "N17" "N18"
## [12] "N19" "N20" "N21" "N22" "N23" "N24" "O01" "O02" "O03" "O04" "O05"
## [23] "O06" "O07" "O08" "O09" "O10" "O11" "O12" "O13" "O14" "O15" "O16"
## [34] "O17" "O18" "O19" "O20" "O21" "O22" "O23" "O24" "P01" "P02" "P03"
## [45] "P04" "P05" "P06" "P07" "P08" "P09" "P10" "P11" "P12" "P13" "P14"
## [56] "P15" "P16" "P17" "P18" "P19" "P20" "P21" "P22" "P23" "P24"
well_to_num
:wells <- c("A01", "D08", "H12")
well_to_num(wells)
## [1] 1 44 96
Both num_to_well
and well_to_num
order the wells by starting each new row at the left hand side of the plate, i.e …A11, A12, B01, B02… However, some automated systems count wells in a way to maximise speed, and snake across the rows, i.e …A11, A12, B12, B11… To change to this behaviour use the style
argument.
wells <- c("A11", "A12", "B01", "B02")
well_to_num(wells)
## [1] 11 12 13 14
well_to_num(wells, style = "snake")
## [1] 11 12 23 24
well | value |
---|---|
A01 | 3 |
A02 | 8 |
… | … |
The read_map
function can annotate data such as this with values from a platemap (in the same spatial layout as the plate). The easiest way to do this is create the platemap as a .csv file in a spreadsheet programme, and then import this into R.
This will then annotate the correct well dependent on the position of the values in the platemap.
To produce a heatmap of numerical data in the layout of the plate, the function z_map
(‘z’ for z-score) takes alpha-numerical well identifiers, and numerical values, nomalises and centers the numerical values, and plots them in a platelayout in the spatial context of the well identifiers.
# example data
wells <- num_to_well(1:96)
vals <- rnorm(96)
df96 <- data.frame(wells, vals)
z_map(data = df96$vals,
well = df96$wells,
title = "Example plot")
This also works with 384-well plates, as well as missing wells:
# example data
wells <- num_to_well(26:322, plate = 384)
vals <- rnorm(length(wells))
df384 <- data.frame(wells, vals)
z_map(data = df384$vals,
well = df384$wells,
plate = 384,
title = "Example partial 384-well plate")
The colours in the plot can be changed to any RColorBrewer palette:
z_map(data = df96$vals,
well = df96$wells,
palette = "RdGy",
title = "Alternative colour palette")
To highlight only wells above/below a certain number of standard deviations from the plate mean, use the function hit_map
:
hit_map(data = df96$vals,
well = df96$wells,
title = "Hit map")
The default threshold for defining a hit is \(\pm\) 2 standard deviations, though this can be adjusted with the threshold
argument:
hit_map(data = df96$vals,
well = df96$well,
threshold = 1,
title = "Hit map: 1 standard deviation")
To counter edge effects, as well as row/column artefacts, the B-score normalisation procedure can be used.
Here is a real-world example showing a clear spatial effect with total cell area decreasing from left to right across the plate.
z_map(data = df_inout$Cell..Total.Area_Average_FITC..SW.morphology.,
well = df_inout$well,
title = "Plate effects: Total cell area")
We can use the the B-score normalisation procedure with b_score
to normalise the numerical values, or use b_map
to plot the normalised values:
b_map(data = df_inout$Cell..Total.Area_Average_FITC..SW.morphology.,
well = df_inout$well,
title = "Plate effects: Countered with B-score")
## 1: 3084.597
## 2: 2990.15
## Final: 2964.185
It’s also possible to look at ‘hits’ that have been normalised with the B-score using the bhit_map
function in the same way as hit_map
.
These are some more bespoke functions for multivariate/high-content screening.
# example data with 3 values per well
vals1 <- rnorm(96); vals2 <- rnorm(96); vals3 <- rnorm(96)
wells <- num_to_well(1:96)
df_mv <- data.frame(wells,
vals1,
vals2,
vals3)
pc_map(data = df_mv[, 2:4], # 3 columns of random numbers
well = df_mv$wells,
title = "Heatmap of first principal component ")
hit_map
of multivariate data) use the pchit_map
function.pchit_map(data = df_mv[, 2:4], # 3 columns of random numbers
well = df_mv$wells,
threshold = 1.5,
title = "Hit map of multivariate data")
In compound screens, a typical approach is to reduce a large number of variables into 2 dimensions using PCA. Then look for data points (compounds) that fall outside of the main cluster. For this I produce a bi-variate confidence band from a 2D kernal density estimation, and can quantify the confidence interval around the centre of the cluster.
The confidence band can be altered with the confidence
argument, with the default set at 95% confidence.
# example data
x <- rnorm(5000)
y <- rnorm(5000)
par(mfrow = c(1,2))
plot_confidence(x, y, title = "Example confidence plot (95%)")
## numeric(0)
plot_confidence(x, y, title = "Example confidence plot (99%)", confidence = 0.99)
## numeric(0)
To test how robust a screening assay is, the separation between the postive and negative controls are calculated with a Z’ factor.
\[ Z = 1 - \dfrac{3(\sigma_p + \sigma_n)}{| \mu_p - \mu_n|} \]
Where the standard deviations (\(\sigma\)) and means (\(\mu\)) of the positive (\(x_p\)) and negative (\(x_n\)) are compared. An assay with a Z’ factor of above 0.5 is typically considered robust.
We can use the z_factor
to calculate a Z’ factor between two treatments in a univariate setting, or z_factor_scan
to calculate a Z’ factor for a number of variables, and chose to either return a specified number of variables with the highest Z’ factor or those above a cut-off value.
For this example I wil re-use the dataset from the plate effects, and compare the inner 60 wells vs the outer 36 wells, and return 10 variables with the highest Z-factor.
z_factor_scan(data = df_inout[,3:50], # numerical values + column of 'treatments'
treatments = c("inner", "outer"),
n = 10)
## Feature Z_factor
## 3 Features.Count_Sum_granules.Total..SW.morphology. -2.588910
## 4 Features.Count_Sum_granules.Total..SW.morphology..1 -2.588910
## 1 Features.Count_Sum_nuclei.Total..SW.morphology. -3.400522
## 2 Features.Count_Sum_nuclei.Total..SW.morphology..1 -3.400522
## 40 Cell..Area_Average_nuclei..SW.morphology. -5.806960
## 41 Cell..Area_Average_nuclei..SW.morphology..1 -5.806960
## 5 Features.Count_Sum_FITC.Total..SW.morphology. -8.432364
## 6 Features.Count_Sum_FITC.Total..SW.morphology..1 -8.432364
## 45 Cell.Count..SW.morphology. -8.432364
## 47 Cell.Count..SW.morphology..1 -8.432364
In this case, the Z’ factor is producing negative values, this means there is no clear separation between the two ‘treatments’, this is expected as this is just inner vs outer wells of untreated cells.
To plot the results, we use the plot
to indicate we want produce a plot, and plotline
to produce a vertical line indicating a cut-off value.
z_factor_scan(data = df_inout[,3:50], # numerical values + column of 'treatments'
treatments = c("inner", "outer"),
n = 10,
plot = TRUE,
plotline = 0,
title = "Example Z factor plot")