[BACKGROUND]
The Z-Prime (or Z’) score/value is a common statistical measure of signal to noise (S/N) ratio used to determine the dynamic range, or quality, of an assay. It calculates the mean measurable difference between a positive control, and a negative control. It also takes into account the precision of controls. The max Z’ score is 1, and the min is -infinity.
0.5 < Z’ <1 is considered indicative of a robust dynamic range, with excellent separation between controls
0.2 < Z’ < 0.5 is considered sub-optimal DR< but may be workable in certain circumstances
0.2 > Z’ is a red flag that DR is too low and the assay should NOT be used with the current parameters - needs additional optimization to boost SN ratio
[APPLICATION]
Often what we do in pharmacology is a “small molecule screen”. This is a fancy science-y way of saying “try a bunch of drugs in some model of disease, and see if anything works to treat the disease”. Screening dose is an essential optimization problem for the development of a robust screen. It’s important to select a screening dose that induces enough of the disease phenotype to reliably differentiate from negative non-disease controls (Z’>=0.5), but not so much that the phenotype is functionally irreversible by the inhibitors library being screened. Therefore, the lowest dose that yields Z’ < 0.5 should be taken as the screening dose.
[PART1] Getting Ready
First, we will be installing the dplyr package for data wrangling, and ggplot 2 for plotting, from the popular tidyverse library
library(dplyr)
library(ggplot2)
We also need to write a function for easy calculation of Z Prime values
ZPrime <- function (a, b) {
ZP <- (1 - 3*(sd(a)+sd(b))/abs(mean(a)-mean(b)))
return (ZP)
}
Finally, let’s load the desired data into R
ZPrime_Screen_Example...Data <- read.csv("~/Documents/R - Data/ZPrime_Screen_Example - Data.csv")
#Take a peek at 1st 10 rows of data
print(head(ZPrime_Screen_Example...Data),10)
## Row Column NucCnt AggObj_Num AggObj_IntSum AggObj_Area Cell_Area NumFields
## 1 B 2 1593 18612 33700000 44231 1476947 30
## 2 C 2 1518 18835 35100000 42458 1542905 30
## 3 D 2 1494 30457 56300000 68089 1468619 30
## 4 E 2 1504 21684 43300000 42796 1544650 30
## 5 F 2 1526 20675 41000000 38545 1603921 30
## 6 G 2 1826 22082 43600000 45807 1392844 30
## Concentration Weeks
## 1 0 4
## 2 0 4
## 3 0 4
## 4 0 4
## 5 0 4
## 6 0 4
We have two different time points included here: one study endpoint at 4 weeks, and another at 7 weeks. Let’s take a closer look at the shorter time course, 4 weeks, to determine if the disease phenotype at this time is robust enough to screen with.
DR_4wk=filter(ZPrime_Screen_Example...Data, Weeks==4)
#Designating negative control values- [0] nM, no disease treatment in column 2
NegControl = filter(DR_4wk, Column=='2')
[PART2] Automated Z Prime score and minimum screen dose calculation
#parse data for a list of all disease-inducing concentrations tested
concentrations = unique(DR_4wk$Concentration)
#Create an empty vector to load values into
ZP_scores = vector(mode = "list", length = length(concentrations))
for (i in 1:length(concentrations)) {
#take each dose as hypothetical screen dose
PosControl <- filter(DR_4wk, Concentration == concentrations[i])
ZP_score <- ZPrime(PosControl$AggObj_IntSum, NegControl$AggObj_IntSum)
ZP_scores[i] <- ZP_score
}
#create a data frame of the Z prime score calculations for each hypothetical screen dose
ZPrime_table <- data.frame('Dose_nM' = concentrations,
'Z_Prime' = as.numeric(ZP_scores))
print(ZPrime_table[order(ZPrime_table$Dose_nM),])
## Dose_nM Z_Prime
## 1 0.000000 -Inf
## 2 0.015625 -3.0989839
## 3 0.031250 -0.9400752
## 4 0.062500 0.0739885
## 5 0.125000 0.3430446
## 6 0.250000 0.6491655
## 7 0.500000 0.7214076
## 8 1.000000 0.7831168
## 9 2.000000 0.6691486
## 10 4.000000 0.6908844
We can also quickly find the lowest dose of disease-inducing material that yiels a strong phenotype (Z’>0.5) vs. untreated negative control as follows:
#get min screen dose that gives Z Prime >0.5 threshold for robust screen
print(ZPrime_table[(min(which(ZPrime_table[,2]>0.5,))),1])
## [1] 0.25
[PART3] Plotting Data
Finally, let’s generate a visual summary of our findings
P <- ggplot(data = ZPrime_table,
aes(x = (concentrations), y = (as.numeric(ZP_scores)), color = as.numeric(ZP_scores))) +
geom_smooth(se= FALSE, method = NULL, colour="black", linetype = "dashed", size=0.5)+
geom_point(size = 3)+
geom_hline(yintercept = 0.5, color="green", linetype = "dashed")+
scale_colour_gradientn(colours=c('red','green'), limits=c(0,0.5), oob = scales::squish)+
xlab('[Dose] nM')+
scale_x_log10(breaks = scales::log_breaks(n = 10)) +
annotation_logticks(sides = "b")+
ylab('Z Prime')+
ylim(-0.5,1)+
labs(color='Z Prime')
P
Here, we can see Z-prime is roughly logarithmic- there are diminishing returns to increasing screen dose too much. We can run the screen at the shorter 4 week timecourse - treating cells with disease-inducing material at [0.2] nM or higher gives robust Z Prime > 0.5 (intersection of geom smooth line of best fit with dashed green horizontal line at Z’ = 0.5).
[NOTE]
During assay development, Z’ is calculated from large sample sizes in order to best approximate the real, population Z’ value of the assay- here we used n=6 per hypothetical dose. While the assay is ‘active’, each run is subsequently added to the Z’ calculation, generating an ever-more precise estimation of the ‘real’ SN ratio / DR of the assay. But what if for some reason, some of the individual assay ‘runs’ fail? Z’ can also be calculated from smaller individual runs performed on separate microplates, in separate batches, etc to measure the quality of individual screening assay runs within an ongoing screen. If the global average Z’ score for an ongoing screen is >0.5, but one week three microplates from a bad neuron culture prep are run with an average Z=0.11, we know something was ‘off’ that week and will usually opt to repeat that batch run. Therefore the usual Z’ workflow goes: