Hypothetical screen Z Prime score calculator from dose-response data [R]

[BACKGROUND]

The Z-Prime (or Z’) score/value is a common statistical measure of signal to noise (S/N) ratio used to determine the dynamic range, or quality, of an assay. It calculates the mean measurable difference between a positive control, and a negative control. It also takes into account the precision of controls. The max Z’ score is 1, and the min is -infinity.

0.5 < Z’ <1 is considered indicative of a robust dynamic range, with excellent separation between controls
0.2 < Z’ < 0.5 is considered sub-optimal DR< but may be workable in certain circumstances
0.2 > Z’ is a red flag that DR is too low and the assay should NOT be used with the current parameters - needs additional optimization to boost SN ratio

[APPLICATION]

Often what we do in pharmacology is a “small molecule screen”. This is a fancy science-y way of saying “try a bunch of drugs in some model of disease, and see if anything works to treat the disease”. Screening dose is an essential optimization problem for the development of a robust screen. It’s important to select a screening dose that induces enough of the disease phenotype to reliably differentiate from negative non-disease controls (Z’>=0.5), but not so much that the phenotype is functionally irreversible by the inhibitors library being screened. Therefore, the lowest dose that yields Z’ < 0.5 should be taken as the screening dose.

[PART1] Getting Ready

First, we will be installing the dplyr package for data wrangling, and ggplot 2 for plotting, from the popular tidyverse library

library(dplyr)
library(ggplot2)

We also need to write a function for easy calculation of Z Prime values

ZPrime <- function (a, b) {
  ZP <- (1 - 3*(sd(a)+sd(b))/abs(mean(a)-mean(b)))
  return (ZP)
}

Finally, let’s load the desired data into R

ZPrime_Screen_Example...Data <- read.csv("~/Documents/R - Data/ZPrime_Screen_Example - Data.csv")
#Take a peek at 1st 10 rows of data
print(head(ZPrime_Screen_Example...Data),10)

##   Row Column NucCnt AggObj_Num AggObj_IntSum AggObj_Area Cell_Area NumFields
## 1   B      2   1593      18612      33700000       44231   1476947        30
## 2   C      2   1518      18835      35100000       42458   1542905        30
## 3   D      2   1494      30457      56300000       68089   1468619        30
## 4   E      2   1504      21684      43300000       42796   1544650        30
## 5   F      2   1526      20675      41000000       38545   1603921        30
## 6   G      2   1826      22082      43600000       45807   1392844        30
##   Concentration Weeks
## 1             0     4
## 2             0     4
## 3             0     4
## 4             0     4
## 5             0     4
## 6             0     4

We have two different time points included here: one study endpoint at 4 weeks, and another at 7 weeks. Let’s take a closer look at the shorter time course, 4 weeks, to determine if the disease phenotype at this time is robust enough to screen with.

DR_4wk=filter(ZPrime_Screen_Example...Data, Weeks==4)

#Designating negative control values- [0] nM, no disease treatment in column 2 
NegControl = filter(DR_4wk, Column=='2')

[PART2] Automated Z Prime score and minimum screen dose calculation

#parse data for a list of all disease-inducing concentrations tested
concentrations = unique(DR_4wk$Concentration) 
#Create an empty vector to load values into
ZP_scores = vector(mode = "list", length = length(concentrations)) 

for (i in  1:length(concentrations)) {
  #take each dose as hypothetical screen dose
  PosControl <- filter(DR_4wk, Concentration == concentrations[i]) 
  ZP_score <- ZPrime(PosControl$AggObj_IntSum, NegControl$AggObj_IntSum) 
  ZP_scores[i] <- ZP_score
}

#create a data frame of the Z prime score calculations for each hypothetical screen dose
ZPrime_table <- data.frame('Dose_nM' = concentrations,
                           'Z_Prime' = as.numeric(ZP_scores))
print(ZPrime_table[order(ZPrime_table$Dose_nM),])

##     Dose_nM    Z_Prime
## 1  0.000000       -Inf
## 2  0.015625 -3.0989839
## 3  0.031250 -0.9400752
## 4  0.062500  0.0739885
## 5  0.125000  0.3430446
## 6  0.250000  0.6491655
## 7  0.500000  0.7214076
## 8  1.000000  0.7831168
## 9  2.000000  0.6691486
## 10 4.000000  0.6908844

We can also quickly find the lowest dose of disease-inducing material that yiels a strong phenotype (Z’>0.5) vs. untreated negative control as follows:

#get min screen dose that gives Z Prime >0.5 threshold for robust screen
print(ZPrime_table[(min(which(ZPrime_table[,2]>0.5,))),1])

## [1] 0.25

[PART3] Plotting Data

Finally, let’s generate a visual summary of our findings

P <- ggplot(data = ZPrime_table, 
       aes(x = (concentrations), y = (as.numeric(ZP_scores)), color = as.numeric(ZP_scores))) +
  geom_smooth(se= FALSE, method = NULL, colour="black", linetype = "dashed", size=0.5)+
  geom_point(size = 3)+
  geom_hline(yintercept = 0.5, color="green", linetype = "dashed")+
  scale_colour_gradientn(colours=c('red','green'), limits=c(0,0.5), oob = scales::squish)+
  xlab('[Dose] nM')+
  scale_x_log10(breaks = scales::log_breaks(n = 10)) +
  annotation_logticks(sides = "b")+
  ylab('Z Prime')+
  ylim(-0.5,1)+
  labs(color='Z Prime')
P

Here, we can see Z-prime is roughly logarithmic- there are diminishing returns to increasing screen dose too much. We can run the screen at the shorter 4 week timecourse - treating cells with disease-inducing material at [0.2] nM or higher gives robust Z Prime > 0.5 (intersection of geom smooth line of best fit with dashed green horizontal line at Z’ = 0.5).

[NOTE]

During assay development, Z’ is calculated from large sample sizes in order to best approximate the real, population Z’ value of the assay- here we used n=6 per hypothetical dose. While the assay is ‘active’, each run is subsequently added to the Z’ calculation, generating an ever-more precise estimation of the ‘real’ SN ratio / DR of the assay. But what if for some reason, some of the individual assay ‘runs’ fail? Z’ can also be calculated from smaller individual runs performed on separate microplates, in separate batches, etc to measure the quality of individual screening assay runs within an ongoing screen. If the global average Z’ score for an ongoing screen is >0.5, but one week three microplates from a bad neuron culture prep are run with an average Z=0.11, we know something was ‘off’ that week and will usually opt to repeat that batch run. Therefore the usual Z’ workflow goes:

Check batch, run, or plate Z’ score
IF Z <0.5: repeat run
ELSE: add run stats to global Z’ average; proceed with data processing