In oncology drug response datasets, IC50 (the concentration that cuts tumor cell viability by half) and dose–response AUC (the overall effect of the drug across all tested doses) do not have a linear relationship. Typically it is sigmoidal. This is because different tumor cells are uniquely sensitive (or not sensitive) to a given drug. That’s why the IC50–AUC scatter is typically heteroskedastic (uneven spread): due to tumor cell properties, the dose-response AUC varies more in some IC50 regions than others.
A standard four-parameter, S-shaped (4PS) model logistic, a form of NONLINEAR regression, is used to fit the dose–response data as seen below:
4Ps equation (with dose \(x\)):
\[\text{Response}(x) = \text{Bottom} + \frac{\text{Top} - \text{Bottom}}{1 + \left(\frac{x}{\text{IC50}}\right)^{\text{Hill}}}\]
This model is particularly well-suited for oncology assays because:
100% | Top ←---------------
| \
| \ ← Hill slope controls steepness
50% | × ← IC50
| \
| \
0% | Bottom →---------
|________________________
Low dose High dose
AUC = Shaded area under this sigmoidal curve
The selection of litronesib for this quantile regression analysis followed these steps:
Step 1a: Dataset creation
Two source files from depmap portal (https://depmap.org/portal/) were merged using an inner join in R to retain only records with both dose-response data (AUC, IC50) and suffiicnet tumor type information. Unnecessary columns were culled.
Step 1b: Heteroskedasticity screening - all drugs
All 1343 drugs, each with response and concentration data in up to 22 tumor types, were tested for heteroskedasticity using two methods:
Fit OLS for predictor (AUC) and response(IC50) - both log10 transformed for easier interpretation - variables of ineterst
Breusch-Pagan test: Tests whether error variance is a linear function of fitted values, detecting “fan-shaped” residuals
\[H_0: \text{Var}(\varepsilon_i) = \sigma^2 \text{ (homoskedasticity)}\] \[H_1: \text{Var}(\varepsilon_i) = \sigma^2 h(x_i) \text{ (heteroskedasticity)}\]
\[Test statistic: $BP = n \cdot R^2_{aux} \sim \chi^2_p\]
\[\text{IQR Ratio} = \frac{IQR(IC50 | AUC \in Q_4)}{IQR(IC50 | AUC \in Q_1)}\]
Step 1c: B-P and IQR results: Selection of litronesib
BP and IQR generation for litronesib for deomnstration purposes ONLY (this was done for the full dataset of all 1343 drugs in making the drug selection):
# packages
library(readxl) #to read my onc dataset
library(quantreg) #for QR fit #Koenker 2023 implements quantile regression using linear programming methods
library(lmtest) # for Breusch-Pagan test
# litronesib data
litro = read_xlsx("C:/Users/amcewen/OneDrive - Bentley University/Desktop/quantile regression data/litronesib_only - xlsx.xlsx")
# log10 transformations for ease of interpretation
litro$ic50_log10 = log10(litro$ic50)
litro$auc_log10 = log10(litro$auc)
# Remove rows with NA values
litro_clean = litro[complete.cases(litro$ic50_log10, litro$auc_log10), ]
# OLS
ols_model = lm(ic50_log10 ~ auc_log10, data = litro_clean)
# B-P
bp_test = bptest(ols_model)
# IQR (quantiles by AUC)
quartiles = quantile(litro_clean$auc_log10, probs = c(0, 0.25, 0.5, 0.75, 1), na.rm = TRUE)
# Q1 IC50s
q1_data = litro_clean[litro_clean$auc_log10 >= quartiles[1] & litro_clean$auc_log10 <= quartiles[2], ]
iqr_q1 = IQR(q1_data$ic50_log10, na.rm = TRUE)
# Q4 IC50s
q4_data = litro_clean[litro_clean$auc_log10 >= quartiles[4] & litro_clean$auc_log10 <= quartiles[5], ]
iqr_q4 = IQR(q4_data$ic50_log10, na.rm = TRUE)
# IQR ratio
iqr_ratio = iqr_q4 / iqr_q1
# overall IQR for IC50
overall_iqr = IQR(litro_clean$ic50_log10, na.rm = TRUE)
# results
het_results = data.frame(
Metric = c("Breusch-Pagan statistic",
"Breusch-Pagan p-value",
"Overall IQR",
"IQR ratio (Q4/Q1)"),
Value = c(round(bp_test$statistic, 2),
format.pval(bp_test$p.value, digits = 3),
round(overall_iqr, 3),
round(iqr_ratio, 2))
)
print(het_results)
## Metric Value
## 1 Breusch-Pagan statistic 71.65
## 2 Breusch-Pagan p-value <2e-16
## 3 Overall IQR 0.618
## 4 IQR ratio (Q4/Q1) 4.81
This analysis examines the relationship between the IC50 (drug concentration at 50% potency) and AUC (dose-response area under the curve) for litronesib across tumor types. Heteroskedasticity, and subsequently the need for a non-mean based regression apporoach, is suspected based on examination of the dataset, but first this section checks what OLS fit looks like so we can see if we still suspect that quantile regression would be best.
simple linear regression model is:
\[ y_i = \beta_0 + \beta_1 x_i + \varepsilon_i \]
where:
In litronesib drug concentration-response data:
Step 2a: Load data and create log10 variables
# litronesib data
litro = read_xlsx("C:/Users/amcewen/OneDrive - Bentley University/Desktop/quantile regression data/litronesib_only - xlsx.xlsx")
litro$ic50_log10 = log10(litro$ic50)
litro$auc_log10 = log10(litro$auc)
Step 2b: Fit the OLS model
ols_model = lm(ic50_log10 ~ auc_log10, data = litro)
summary(ols_model)
##
## Call:
## lm(formula = ic50_log10 ~ auc_log10, data = litro)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.13595 -0.33931 -0.03473 0.22448 1.99912
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.50258 0.04895 -30.70 <2e-16 ***
## auc_log10 1.78673 0.11354 15.74 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.523 on 545 degrees of freedom
## Multiple R-squared: 0.3124, Adjusted R-squared: 0.3112
## F-statistic: 247.6 on 1 and 545 DF, p-value: < 2.2e-16
Step 2c: Plot the data with OLS line
# parameters
intercept = round(coef(ols_model)[1], 3)
slope = round(coef(ols_model)[2], 3)
r2 = round(summary(ols_model)$r.squared, 3)
# plot
plot(litro$auc_log10, litro$ic50_log10,
xlab = "AUC(log10) - Predictor",
ylab = "IC50(log10) - Response",
main = "Litronesib: OLS fit for all tumor types",
pch = 16, col = "gray")
abline(ols_model, col = "red", lwd = 2)
Step 2d: What does this tell us?
From the model output:
Looking at the output, The model suggetsts AUC explains 31.2% of the variance in IC50, leaving a lot of the relationship on the table -AUC is a meaningful predictor, but about 69% of the variation in IC50 remains unexplained by AUC alone.
Additonaly that maximum residual of 2 suggests some observations are poorly predicted by OLS fit.
Upon visual inspection, substantial scatter appears around the regression line, espeically at higher AUC/IC50, suggesting a mean-based regression will not fully capture the relationship between IC50 and AUC for litronesib.
Step 2e: Check OLS by tumor
Now let’s examine the extent to which heteroskedasticity may be tumor-driven: Different tumor types have different drug response curves
Find the top 4 tumor types
Fit separate OLS models for the top 4 tumor/cell types
# top tumors
tumor_counts = sort(table(litro$OncotreeLineage), decreasing = TRUE)
top_4 = c("Lung", "Skin", "CNS/Brain", "Esophagus/Stomach")
litro_top4 = litro[litro$OncotreeLineage %in% top_4, ]
tumor_results = data.frame()
# fit each tumor
for(tumor in top_4) {
# get data for this tumor
tumor_data = litro_top4[litro_top4$OncotreeLineage == tumor, ]
tumor_model = lm(ic50_log10 ~ auc_log10, data = tumor_data)
new_row = data.frame(
Tumor = tumor,
n = nrow(tumor_data),
Intercept = round(coef(tumor_model)[1], 3),
Slope = round(coef(tumor_model)[2], 3),
R_squared = round(summary(tumor_model)$r.squared, 3)
)
tumor_results = rbind(tumor_results, new_row)
}
print(tumor_results)
## Tumor n Intercept Slope R_squared
## (Intercept) Lung 119 -1.703 1.409 0.268
## (Intercept)1 Skin 44 -1.301 2.058 0.374
## (Intercept)2 CNS/Brain 36 -1.287 2.165 0.405
## (Intercept)3 Esophagus/Stomach 45 -1.808 1.310 0.279
Plot each tumor type scatter/regression line separately
# plots
par(mfrow = c(2, 2))
for(i in 1:4) {
tumor = top_4[i]
tumor_data = litro_top4[litro_top4$OncotreeLineage == tumor, ]
plot(tumor_data$auc_log10, tumor_data$ic50_log10,
xlab = "AUC(log10) - Predictor",
ylab = "IC50(log10) - Response",
main = tumor,
pch = 16, col = "gray")
tumor_reds = c("red", "firebrick", "darkred", "indianred")
tumor_model = lm(ic50_log10 ~ auc_log10, data = tumor_data)
abline(tumor_model, col = tumor_reds[i], lwd = 2)
}
Step 2f: Compare to overall model
# plot all one plot
plot(litro$auc_log10, litro$ic50_log10,
col = "lightgray", pch = 16,
xlab = "AUC(log10) - Predictor",
ylab = "IC50(log10) - Response",
main = "All vs Top 4 Tumors")
# tumor lines
tumor_reds = c("firebrick", "darkred", "indianred", "salmon")
for(i in 1:4) {
tumor_data = litro_top4[litro_top4$OncotreeLineage == top_4[i], ]
tumor_model = lm(ic50_log10 ~ auc_log10, data = tumor_data)
abline(tumor_model, col = tumor_reds[i], lwd = 2)
# OLS
abline(ols_model, col = "red", lwd = 3)
}
legend("topleft", c("All (OLS)", top_4),
col = c("red", tumor_reds),
lwd = 2, cex = 0.7)
Steo 2g: Results and interpretation
To determine if heteroskedasticity is tumor-driven vs general, the analysis examined:
Findings:
Finding 1: Weak relationships across all four tumor types
All four tumor types show low R² values (0.268 to 0.405), so AUC explains less than half of the variance in IC50 - even in the best case (CNS/Brain at 40.5%). This substantial unexplained variance confirms heteroskedasticity is generally present across litronesib’s AUC:IC50 relationship.
Finding 2: STILL, some tumor-driven differences exist
While all relationships are weak, CNS/Brain shows the strongest fit (R² = 0.405) and Lung shows the weakest (R² = 0.268). The slopes also vary: CNS/Brain and Skin show steeper slopes (around 2.0-2.2) compared to Lung and Esophagus (around 1.3-1.4), indicating tumor-specific drug responses.
Conclusions:
Heteroskedasticity in litronesib appears both general (present across all tumors) and partially tumor-driven (more pronounced in some tumor types). The consistently low R² values and visible scatter suggest that OLS regression, which models only the mean, is insufficient. This motivates quantile regression, which can model the entire distribution of IC50 values at different AUC levels.
Quantile regression allows modeling of the predictor-response relationships at different points in the distribution, not just the average, which can better account for the changing dynamic between IC50 and AUC.
The quantile regression model for the τ-th quantile is:
\[ Q_\tau(y_i | x_i) = \beta_{0,\tau} + \beta_{1,\tau} x_i \]
where:
Key difference from OLS:
For the litronesib data:
This gives 5 different regression lines:
\[ \begin{aligned} Q_{0.10}(\text{IC50} | \text{AUC}) &= \beta_{0, 0.10} + \beta_{1, 0.10} \times \text{AUC} \quad \text{(10th percentile: high-sensitivity cells)} \\ Q_{0.25}(\text{IC50} | \text{AUC}) &= \beta_{0, 0.25} + \beta_{1, 0.25} \times \text{AUC} \quad \text{(25th percentile)} \\ Q_{0.50}(\text{IC50} | \text{AUC}) &= \beta_{0, 0.50} + \beta_{1, 0.50} \times \text{AUC} \quad \text{(50th percentile: median)} \\ Q_{0.75}(\text{IC50} | \text{AUC}) &= \beta_{0, 0.75} + \beta_{1, 0.75} \times \text{AUC} \quad \text{(75th percentile)} \\ Q_{0.90}(\text{IC50} | \text{AUC}) &= \beta_{0, 0.90} + \beta_{1, 0.90} \times \text{AUC} \quad \text{(90th percentile: low-sensitivity cells)} \end{aligned} \]
What this tells us:
Step 3a: Fit quantile regression model
The analysis fits models at 5 quantiles: τ = 0.10, 0.25, 0.50, 0.75, 0.90
# quantiles
quantiles = c(0.10, 0.25, 0.50, 0.75, 0.90)
# results
qr_results = data.frame()
qr_results_ci = data.frame()
# quantile regression for each tau
for(tau in quantiles) {
qr_model = rq(ic50_log10 ~ auc_log10, data = litro, tau = tau)
# 95% CIs using bootstrap (1000 reps)
qr_summary = summary(qr_model, se = "boot", R = 1000)
# slope estimates and 95% CIs
slope_est = qr_summary$coefficients[2, 1]
slope_se = qr_summary$coefficients[2, 2]
slope_lower = slope_est - 1.96 * slope_se
slope_upper = slope_est + 1.96 * slope_se
# results
new_row = data.frame(
Quantile = tau,
Intercept = round(coef(qr_model)[1], 3),
Slope = round(coef(qr_model)[2], 3)
)
qr_results = rbind(qr_results, new_row)
# results with bootstrap CIs
new_row_ci = data.frame(
Quantile = tau,
CI_Lower = round(slope_lower, 3),
CI_Upper = round(slope_upper, 3)
)
qr_results_ci = rbind(qr_results_ci, new_row_ci)
}
print(qr_results)
## Quantile Intercept Slope
## (Intercept) 0.10 -2.508 0.586
## (Intercept)1 0.25 -2.185 0.832
## (Intercept)2 0.50 -1.786 1.266
## (Intercept)3 0.75 -1.130 2.075
## (Intercept)4 0.90 -0.342 3.031
print(qr_results_ci)
## Quantile CI_Lower CI_Upper
## 1 0.10 0.295 0.877
## 2 0.25 0.636 1.029
## 3 0.50 1.137 1.395
## 4 0.75 1.669 2.482
## 5 0.90 2.655 3.408
Step 3b: What do these coefficients mean?
From the results above, different equations exist for different quantiles:
If the slopes are different across quantiles, this confirms heteroskedasticity - the strength of the AUC-IC50 relationship depends on where a cell line falls in the sensitivity distribution.
Bootstrap confidence interval interpretations:
The bootstrap CIs provide evidence that these slope differences are real, not just sampling variation. Looking at the bootstrap CI results:
These CIs do not overlap,particularly between τ = 0.10 and τ = 0.90. This non-overlap means: 1. The slope differences are statistically significant at the 95% CI 2. The drug behaves fundamentally differently for sensitive versus resistant cell lines (tumor types) 3. Using a single OLS slope would be misleading - it would overestimate the relationship for sensitive cells and underestimate it for resistant cells
Step 3c: Visualizing the quantile regression lines
# plot
plot(litro$auc_log10, litro$ic50_log10,
col = "lightgray", pch = 16,
xlab = "AUC(log10) - Predictor",
ylab = "IC50(log10) - Response",
main = "Quantile Regression: Litronesib IC50 vs AUC")
# lines
abline(ols_model, col = "red", lwd = 2)
qr_colors = c("lightblue", "skyblue", "blue", "darkblue", "navy")
for(i in 1:length(quantiles)) {
qr_model = rq(ic50_log10 ~ auc_log10, data = litro, tau = quantiles[i])
abline(qr_model, col = qr_colors[i], lwd = 2)
}
# legend
legend("topleft",
c(paste("Quantile τ =", quantiles), "OLS mean"),
col = c(qr_colors, "red"),
lty = c(rep(1, 5), 2),
lwd = 2, cex = 0.7)
Step 3d: Compare slopes across quantiles
plot(qr_results$Quantile, qr_results$Slope,
type = "b", col = "blue", lwd = 2, pch = 16,
xlab = "Quantile", ylab = "Slope",
main = "How Slopes Change Across Quantiles")
# legend
legend("topright", c("Quantile slopes"),
col = c("blue"), lwd = 2, pch = c(16, NA))
Are slopes different across quantiles? If slopes at τ = 0.90 and τ = 0.10 differ substantially, this confirms our suspision that high-sensitivity and low-sensitivity cell lines respond differently to changes in drug exposure.
Does the median slope (τ = 0.50) differ from the OLS mean slope? If yes, the distribution is skewed, and OLS may give misleading average results.
Both IC50 and AUC are derived from the same dose-response curve, making their relationship correlational rather than causal. Neither metric truly predicts the other; they represent different summary statistics of the same underlying drug response.
The R quantreg package assumes linear relationships at each quantile on the log10 scale. While this appears reasonable for litronesib, non-linear patterns at extreme quantiles might be missed. Smoothing-based quantile regression methods could reveal more complex relationships.
This analysis treats all cell lines equally, but some tumor types are overrepresented with 98 lung samples versus 33 esophagus/stomach samples. This imbalance might bias the overall quantile estimates toward patterns in better-represented tumor types. While we demonstrate heteroskedasticity exists and varies by tumor type, we don’t identify the biological mechanisms driving this variance. The varying spread could result from differences in drug uptake, metabolism, target expression, or resistance mechanisms that aren’t measured in this dataset.