Dose-response curve fitting for EC50 calculation [R]

[BACKGROUND]

Biological activity can be induced by adding some concentration of compound. One of the core tenants of pharmacology is that this relationship between [Compound], or agonist, and response, follows a semi-log sigmoidal curve described by the Hill equation. This means at first, biological response to the dose (DR) is slow. Then as more compound is added, it becomes roughly linear. FInally, the response is fully saturated and plateaus around some value.

Excellent background resource: https://www.graphpad.com/guides/prism/latest/curve-fitting/reg_the_ec50.htm

To describe the curve, 3 key parameters are estimated.

First, the [compound] that evokes 50% of the response is called the EC50, which is a common measure of how potent a compound is. Lower EC50 = high potency!
Second, the curve top, or maximal saturating biological response that can be induced by the compound is a common measure of the compound’s efficacy. Here, higher max = greater efficacy.
Finally, if we allow the slope of the linear region near EC50 to vary, the slope through the EC50, called the hill slope, can be taken as an estimate of how fast the dose-response occurs. Higher hill slope = smaller [compound] range from the bottom, EC0, to the top, EC100.

The drc library in R works fairly well with R base plots to generate automated dose-response curves. dplyr will help us parse microplate result files (csv results per well in a 96 wp)

Loading data (in this case, the csv results file consists of results from 2 DR curves across two plates- one per plate)

DR_Example...Data <- read.csv("~/Documents/R - Data/ZPrime_Screen_Example - Data.csv")
#Take a peek at 1st 10 rows of data
print(head(DR_Example...Data),10)

##   Row Column NucCnt AggObj_Num AggObj_IntSum AggObj_Area Cell_Area NumFields
## 1   B      2   1593      18612      33700000       44231   1476947        30
## 2   C      2   1518      18835      35100000       42458   1542905        30
## 3   D      2   1494      30457      56300000       68089   1468619        30
## 4   E      2   1504      21684      43300000       42796   1544650        30
## 5   F      2   1526      20675      41000000       38545   1603921        30
## 6   G      2   1826      22082      43600000       45807   1392844        30
##   Concentration Weeks
## 1             0     4
## 2             0     4
## 3             0     4
## 4             0     4
## 5             0     4
## 6             0     4

nrow(DR_Example...Data)

## [1] 120

We can see the data frame has 120 rows in total.

As a quality control pre-processing step, we will check for wells that are outliers in terms of cell growth/health and remove them from the results.

Well_out<- boxplot.stats(DR_Example...Data$Cell_Area)$out 
print(Well_out)

## [1] 974327

One outlier was identified, with a cell area of 9.7E+5

print(median(DR_Example...Data$Cell_Area))

## [1] 1438254

This outlier has about half the cell area compared to the median value accross the two plates 14E+5.

Let’s grab the row index and use it to update the data frame, removing this outlier.

out_ind <- which(DR_Example...Data$Cell_Area %in% c(Well_out)) 
print(out_ind)

## [1] 7

DR_Example_clean <- DR_Example...Data[-c(out_ind),]

nrow(DR_Example_clean)

## [1] 119

Sanity check: the cleaned data frame has 119 rows.

We will specify which row had no compound treatment a sour negatve control, and subtract out this non-specific signal.

NegControl = filter(DR_Example_clean, Column=='2')

Next, parse the csv file for # of plates. In this case, ‘weeks’ is an indicator of which plate is which.

plates = (unique(DR_Example_clean['Weeks']))
weeks = unique((DR_Example_clean['Weeks']))
print(weeks)

##    Weeks
## 1      4
## 61     7

plate_cnt= length(unique((DR_Example_clean['Weeks'])[[1]]))
print(plate_cnt)

## [1] 2

To subtract out the nonspecific backgrouynd signal from the concnetration = wells, individually for each plate:

Parsing for unique curves (timepoints!)

curves = unique((DR_Example_clean['Weeks'])) #List of compounds tested
curve_cnt = length(unique(DR_Example_clean[['Weeks']])) #Length = # of compounds tested

Finally, let’s fit the data to DR curves!

#DR curve fitting 
DR.m <- drm(AggObj_IntSum ~ Concentration, 
            data= DF_norm,
            Weeks, #fit separate curves for each timepoint
            robust = 'mean', #non-robust least squares estimation ("mean")
            fct = LL.4(names = c("Hill slope", "Min", "Max", "EC50")))
print(DR.m)

## 
## A 'drc' model.
## 
## Call:
## drm(formula = AggObj_IntSum ~ Concentration, curveid = Weeks,     data = DF_norm, fct = LL.4(names = c("Hill slope", "Min",         "Max", "EC50")), robust = "mean")
## 
## Coefficients:
## Hill slope:4  Hill slope:7         Min:4         Min:7         Max:4  
##   -7.716e-01    -5.803e-01     2.489e+07    -2.996e+07     3.010e+09  
##        Max:7        EC50:4        EC50:7  
##    5.174e+09     6.257e+00     2.777e+00

drc makes it a breeze to quickly check best-fit parameters of interest between curves

compParm(DR.m, "EC50", "-")

## 
## Comparison of parameter 'EC50' 
## 
##     Estimate Std. Error t-value p-value
## 4-7   3.4804     7.0259  0.4954  0.6213

compParm(DR.m, "Max", "-")

## 
## Comparison of parameter 'Max' 
## 
##        Estimate  Std. Error t-value p-value
## 4-7 -2164062554  1562488231  -1.385  0.1688

summary <- summary(DR.m)[[3]] #Coeffs are stored as 3rd element in list obj

here, we can see the 7-week timecourse had a lower EC50 and higher max compared to the 4 week timecourse- although the results are n.s. statistically using t tests.

We can also take a peek at the confidence intervals of best-fit parameters

confint(DR.m, level = 0.90)

##                        5 %          95 %
## Hill slope:4 -1.009631e+00 -5.335364e-01
## Hill slope:7 -6.441313e-01 -5.164970e-01
## Min:4        -2.061536e+07  7.040501e+07
## Min:7        -8.427547e+07  2.436285e+07
## Max:4         6.000949e+08  5.419623e+09
## Max:7         4.220024e+09  6.127819e+09
## EC50:4       -5.272151e+00  1.778644e+01
## EC50:7        1.077937e+00  4.475646e+00

Hmm, it looks like the 90% CI is quite wide for EC50. This is quite typical of curves where [compound] was not high enough to characterize the top of the curve.

Finally, let’s plot the curves side- by side

cols <- brewer.pal(4,'Set2') #from RColorBrewer

par(mfrow=c(1,2)) #(n row, n col) 

for (i in 1:curve_cnt){ #note curves list index starts at 1
  DR = filter(DF_norm, Weeks==weeks[i,1]) 
  DR.mi <- drm(AggObj_IntSum ~ Concentration, 
              data= DR,
              robust = 'mean', #non-robust least squares estimation ("mean")
              fct = LL.4(names = c("Hill slope", "Min", "Max", "EC50")))
  plot(DR.mi,
       type= "all",
       col=cols[i], #auto color selection for curves
       pch=16,
       lwd=1,
       cex.axis=0.8,
       xlim = c(1e-4, 20),
       ylim = c(0, 5E+09),
       legend = TRUE,
       legendText = (paste(toString(weeks[i,1]), 'weeks')), 
       xlab= "[Compound] (nM)",
       ylab = "Intensity Sum")
  plot(DR.mi,
       col = cols[i],
       xlim = c(1e-4, 20),
       ylim = c(0, 5E+09),
       add=TRUE,
       type='confidence')}

As suspected, the curves does not reach a steady plateau. Therefore we can’t quantitatively compare best-fit curve EC50 and top values. However, there is a clear trend- the longer timecourse (7 weeks) has a stronger response in terms of both potency of compound (smaller EC50) and higher max (non-existent plateau). This can be easily visualized if we plot the suves on top of each other this time.

par(mfrow=c(1,1)) #(n row, n col) 

for (i in 1:curve_cnt){ #note curves list index starts at 1
  DR = filter(DF_norm, Weeks==weeks[i,1]) 
  DR.mi <- drm(AggObj_IntSum ~ Concentration, 
              data= DR,
              robust = 'mean', #non-robust least squares estimation ("mean")
              fct = LL.4(names = c("Hill slope", "Min", "Max", "EC50")))
  plot(DR.mi,
       type= "all",
       col=cols[i], #auto color selection for curves
       pch=16,
       lwd=1,
       cex.axis=0.8,
       xlim = c(1e-4, 20),
       ylim = c(0, 5E+09),
       legend = FALSE,
       xlab= "[Compound] (nM)",
       ylab = "Intensity Sum")
  plot(DR.mi,
       col = cols[i],
       xlim = c(1e-4, 20),
       ylim = c(0, 5E+09),
       add=TRUE,
       type='confidence')
  par(new=TRUE)} #Add new=TRUE arg to overlay graphs