10.1.4 Main Finding
Non-commercial vs Commercial sponsors:
- Effect: 1% longer completion time
- p-value: 0.8073
- Decision: FAIL TO REJECT null hypothesis - no significant difference
Does trial sponsor type (commercial vs non-commercial) predict how long Phase 2 oncology trials take to complete?
We examine trials initiated during March 2019 through March 2021, a period spanning the onset of COVID-19, where it is theorized that this period might inflate the effect of sponsor on time-to-completion. To do this we will take a sample from the AACT (ClinicalTrials.gov) database.
Regular linear regression (OLS) assumes:
Time-to-trial completion violates both:
A GLM has three components:
# Load required packages
library(dplyr) # Data manipulation
library(ggplot2) # Plotting
library(scales) # Nice axis labels
# Load the data
load("C:/Users/amcewen/OneDrive - Bentley University/Documents/quant pres #2/phase2_solid_tumor_trials.RData")
# Define the COVID start date for our timing variable
covid_start <- as.Date("2020-03-01")
# Create analysis dataset
mydata <- trials
# Create sponsor type: Commercial vs Non-Commercial
mydata$sponsor <- ifelse(mydata$sponsor_type == "INDUSTRY",
"Commercial",
"Non-Commercial")
mydata$sponsor <- factor(mydata$sponsor, levels = c("Commercial", "Non-Commercial"))
# Calculate time-to-completion in days
mydata$start_date <- as.Date(mydata$start_date)
mydata$completion_date <- as.Date(mydata$completion_date)
mydata$completion_time <- as.numeric(mydata$completion_date - mydata$start_date)
# Calculate months from COVID start
mydata$months_from_covid <- as.numeric(mydata$start_date - covid_start) / 31
# Make num_sites numeric
mydata$num_sites <- as.numeric(mydata$num_sites)
We only include trials that have actually completed (not ongoing or terminated).
# Keep only completed trials with valid data
mydata <- mydata[mydata$overall_status == "COMPLETED", ]
mydata <- mydata[mydata$completion_time > 0, ]
mydata <- mydata[!is.na(mydata$num_sites), ]
mydata <- mydata[!is.na(mydata$enrollment), ]
mydata <- mydata[!is.na(mydata$number_of_arms), ]
# How many trials do we have?
nrow(mydata)
## [1] 444
The final sample is 444 completed trials.
The continuous variables are on very different scales:
Scaling (standardizing) them helps the model converge and makes coefficients comparable.
# Scale continuous variables (subtract mean, divide by SD)
mydata$num_sites_z <- scale(mydata$num_sites)[,1]
mydata$enrollment_z <- scale(mydata$enrollment)[,1]
mydata$arms_z <- scale(mydata$number_of_arms)[,1]
mydata$covid_timing_z <- scale(mydata$months_from_covid)[,1]
# What does "1 SD" mean in original units?
cat("1 SD of num_sites =", round(sd(mydata$num_sites), 1), "sites\n")
## 1 SD of num_sites = 69.6 sites
cat("1 SD of enrollment =", round(sd(mydata$enrollment), 1), "participants\n")
## 1 SD of enrollment = 71.7 participants
cat("1 SD of number_of_arms =", round(sd(mydata$number_of_arms), 2), "arms\n")
## 1 SD of number_of_arms = 1.18 arms
cat("1 SD of months_from_covid =", round(sd(mydata$months_from_covid), 1), "months\n")
## 1 SD of months_from_covid = 7.5 months
Note: Scaling does NOT affect our main predictive variable (sponsor type) because it’s categorical.
table(mydata$sponsor)
##
## Commercial Non-Commercial
## 114 330
ggplot(mydata, aes(x = completion_time)) +
geom_histogram(bins = 30, fill = "steelblue", color = "white") +
labs(title = "Distribution of Trial Completion Time",
subtitle = "Notice the right skew - most trials finish within 1500 days but some take much longer",
x = "Completion Time (days)",
y = "Count")
Key observation: The distribution is right-skewed. This is why we need Gamma or Inverse Gaussian, not Normal.
ggplot(mydata, aes(x = sponsor, y = completion_time, fill = sponsor)) +
geom_boxplot() +
labs(title = "Completion Time by Sponsor Type",
x = "Sponsor Type",
y = "Completion Time (days)") +
theme(legend.position = "none")
# Summary statistics by sponsor type
aggregate(completion_time ~ sponsor, data = mydata,
FUN = function(x) c(Mean = mean(x), Median = median(x), SD = sd(x), N = length(x)))
## sponsor completion_time.Mean completion_time.Median completion_time.SD
## 1 Commercial 1176.2368 1148.5000 429.6435
## 2 Non-Commercial 1223.4667 1236.0000 460.3835
## completion_time.N
## 1 114.0000
## 2 330.0000
For Gamma and Inverse Gaussian to be appropriate, variance should increase with the mean.
# Split data into bins and calculate mean and variance for each
mydata$bin <- cut(mydata$completion_time, breaks = 10)
mv <- aggregate(completion_time ~ bin, data = mydata,
FUN = function(x) c(mean = mean(x), var = var(x)))
mv <- data.frame(mean = mv$completion_time[,1], var = mv$completion_time[,2])
ggplot(mv, aes(x = mean, y = var)) +
geom_point(size = 3, color = "blue") +
geom_smooth(method = "lm", se = FALSE, color = "green") +
labs(title = "Mean-Variance Relationship",
subtitle = "Upward slope confirms variance increases with mean",
x = "Mean Completion Time (days)",
y = "Variance")
Conclusion: Variance increases with the mean, supporting use of Gamma or Inverse Gaussian.
What is it?
Key assumption: Variance is proportional to the mean squared: Var(Y) ∝ μ²
When to use: Moderate right skew, no extreme outliers
What is it?
Key assumption: Variance is proportional to the mean cubed: Var(Y) ∝ μ³
When to use: Heavy right tails, extreme outliers, when data represents “time to reach a goal”
Both models use a log link:
\[\log(\mu) = \beta_0 + \beta_1 X_1 + \beta_2 X_2 + ...\]
This means:
\[\mu = e^{\beta_0 + \beta_1 X_1 + \beta_2 X_2 + ...}\]
Why log link?
Part 1 — Distribution:
\[\text{CompletionTime}_i \sim \text{Gamma}(\mu_i, \phi)\]
This says: Each trial’s completion time follows a Gamma distribution with mean μᵢ and dispersion parameter φ.
Part 2 — Link Function and Linear Predictor:
\[\log(\mu_i) = \beta_0 + \beta_1(\text{NonCommercial}_i) + \beta_2(\text{NumSites}_i) + \beta_3(\text{Enrollment}_i) + \beta_4(\text{Arms}_i) + \beta_5(\text{CovidTiming}_i)\]
This says: The log of the expected completion time is a linear combination of our predictors.
Putting it together:
\[\mu_i = e^{\beta_0 + \beta_1(\text{NonCommercial}_i) + \beta_2(\text{NumSites}_i) + ...}\]
The Gamma distribution determines the shape of the error (right-skewed, variance ∝ μ²). The log link ensures predictions are always positive. The β coefficients tell us the effect of each predictor.
# Fit Gamma GLM with log link
gamma_model <- glm(completion_time ~ sponsor + num_sites_z + enrollment_z + arms_z + covid_timing_z,
data = mydata,
family = Gamma(link = "log"))
# View results
summary(gamma_model)
##
## Call:
## glm(formula = completion_time ~ sponsor + num_sites_z + enrollment_z +
## arms_z + covid_timing_z, family = Gamma(link = "log"), data = mydata)
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.086976 0.035362 200.413 < 2e-16 ***
## sponsorNon-Commercial 0.010169 0.041675 0.244 0.80733
## num_sites_z 0.050988 0.017317 2.944 0.00341 **
## enrollment_z -0.021633 0.018233 -1.187 0.23607
## arms_z 0.009169 0.017765 0.516 0.60602
## covid_timing_z -0.078083 0.017377 -4.493 8.98e-06 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Gamma family taken to be 0.129226)
##
## Null deviance: 88.442 on 443 degrees of freedom
## Residual deviance: 84.048 on 438 degrees of freedom
## AIC: 6764.9
##
## Number of Fisher Scoring iterations: 5
For a log link, we exponentiate coefficients to get multiplicative effects:
# Get coefficients
coefs <- coef(gamma_model)
pvals <- summary(gamma_model)$coefficients[, 4]
# Create interpretation table
results_gamma <- data.frame(
Variable = names(coefs),
Coefficient = round(coefs, 4),
Exp_Coefficient = round(exp(coefs), 4),
Percent_Change = round((exp(coefs) - 1) * 100, 1),
P_Value = round(pvals, 4)
)
print(results_gamma)
## Variable Coefficient Exp_Coefficient
## (Intercept) (Intercept) 7.0870 1196.2851
## sponsorNon-Commercial sponsorNon-Commercial 0.0102 1.0102
## num_sites_z num_sites_z 0.0510 1.0523
## enrollment_z enrollment_z -0.0216 0.9786
## arms_z arms_z 0.0092 1.0092
## covid_timing_z covid_timing_z -0.0781 0.9249
## Percent_Change P_Value
## (Intercept) 119528.5 0.0000
## sponsorNon-Commercial 1.0 0.8073
## num_sites_z 5.2 0.0034
## enrollment_z -2.1 0.2361
## arms_z 0.9 0.6060
## covid_timing_z -7.5 0.0000
How to interpret:
# Residuals vs Fitted plot
par(mfrow = c(1, 2))
# Plot 1: Residuals vs Fitted
plot(fitted(gamma_model), residuals(gamma_model, type = "deviance"),
xlab = "Fitted Values", ylab = "Deviance Residuals",
main = "Gamma: Residuals vs Fitted")
abline(h = 0, col = "red", lty = 2)
# Plot 2: Q-Q Plot
qqnorm(residuals(gamma_model, type = "deviance"), main = "Gamma: Q-Q Plot")
qqline(residuals(gamma_model, type = "deviance"), col = "red")
par(mfrow = c(1, 1))
What we look for:
cat("Gamma Model Fit Statistics\n")
## Gamma Model Fit Statistics
cat("==========================\n")
## ==========================
cat("AIC:", AIC(gamma_model), "\n")
## AIC: 6764.899
cat("BIC:", BIC(gamma_model), "\n")
## BIC: 6793.57
cat("Null Deviance:", gamma_model$null.deviance, "\n")
## Null Deviance: 88.44229
cat("Residual Deviance:", gamma_model$deviance, "\n")
## Residual Deviance: 84.04837
cat("Deviance Explained:", round((1 - gamma_model$deviance/gamma_model$null.deviance) * 100, 1), "%\n")
## Deviance Explained: 5 %
Part 1 — Distribution:
\[\text{CompletionTime}_i \sim \text{InverseGaussian}(\mu_i, \lambda)\]
This says: Each trial’s completion time follows an Inverse Gaussian distribution with mean μᵢ and shape parameter λ.
Part 2 — Link Function and Linear Predictor:
\[\log(\mu_i) = \beta_0 + \beta_1(\text{NonCommercial}_i) + \beta_2(\text{NumSites}_i) + \beta_3(\text{Enrollment}_i) + \beta_4(\text{Arms}_i) + \beta_5(\text{CovidTiming}_i)\]
Key difference from Gamma: The Inverse Gaussian has a heavier right tail and assumes variance ∝ μ³ (instead of μ²). Everything else stays the same.
# Fit Inverse Gaussian GLM with log link
# We increase max iterations because IG can be harder to fit
ig_model <- glm(completion_time ~ sponsor + num_sites_z + enrollment_z + arms_z + covid_timing_z,
data = mydata,
family = inverse.gaussian(link = "log"),
control = glm.control(maxit = 100))
# View results
summary(ig_model)
##
## Call:
## glm(formula = completion_time ~ sponsor + num_sites_z + enrollment_z +
## arms_z + covid_timing_z, family = inverse.gaussian(link = "log"),
## data = mydata, control = glm.control(maxit = 100))
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.302e+01 5.472e+17 0 1
## sponsorNon-Commercial 2.630e+01 6.682e+17 0 1
## num_sites_z -2.024e+01 1.678e+17 0 1
## enrollment_z 2.311e+01 3.560e+17 0 1
## arms_z -2.613e+02 2.199e+17 0 1
## covid_timing_z 1.367e+01 3.191e+17 0 1
##
## (Dispersion parameter for inverse.gaussian family taken to be 4.240697e+52)
##
## Null deviance: 1.8086e-01 on 443 degrees of freedom
## Residual deviance: 2.8573e+36 on 438 degrees of freedom
## AIC: 45162
##
## Number of Fisher Scoring iterations: 16
# Get coefficients
coefs_ig <- coef(ig_model)
pvals_ig <- summary(ig_model)$coefficients[, 4]
# Create interpretation table
results_ig <- data.frame(
Variable = names(coefs_ig),
Coefficient = round(coefs_ig, 4),
Exp_Coefficient = round(exp(coefs_ig), 4),
Percent_Change = round((exp(coefs_ig) - 1) * 100, 1),
P_Value = round(pvals_ig, 4)
)
print(results_ig)
## Variable Coefficient Exp_Coefficient
## (Intercept) (Intercept) 23.0231 9972869242
## sponsorNon-Commercial sponsorNon-Commercial 26.3008 264426562036
## num_sites_z num_sites_z -20.2377 0
## enrollment_z enrollment_z 23.1149 10931830637
## arms_z arms_z -261.3229 0
## covid_timing_z covid_timing_z 13.6734 867563
## Percent_Change P_Value
## (Intercept) 9.972869e+11 1
## sponsorNon-Commercial 2.644266e+13 1
## num_sites_z -1.000000e+02 1
## enrollment_z 1.093183e+12 1
## arms_z -1.000000e+02 1
## covid_timing_z 8.675620e+07 1
par(mfrow = c(1, 2))
# Plot 1: Residuals vs Fitted
plot(fitted(ig_model), residuals(ig_model, type = "deviance"),
xlab = "Fitted Values", ylab = "Deviance Residuals",
main = "Inverse Gaussian: Residuals vs Fitted")
abline(h = 0, col = "red", lty = 2)
# Plot 2: Q-Q Plot
qqnorm(residuals(ig_model, type = "deviance"), main = "Inverse Gaussian: Q-Q Plot")
qqline(residuals(ig_model, type = "deviance"), col = "red")
par(mfrow = c(1, 1))
cat("Inverse Gaussian Model Fit Statistics\n")
## Inverse Gaussian Model Fit Statistics
cat("=====================================\n")
## =====================================
cat("AIC:", AIC(ig_model), "\n")
## AIC: 45161.98
cat("BIC:", BIC(ig_model), "\n")
## BIC: 45190.65
cat("Null Deviance:", ig_model$null.deviance, "\n")
## Null Deviance: 0.1808587
cat("Residual Deviance:", ig_model$deviance, "\n")
## Residual Deviance: 2.857272e+36
cat("Deviance Explained:", round((1 - ig_model$deviance/ig_model$null.deviance) * 100, 1), "%\n")
## Deviance Explained: -1.579837e+39 %
cat("MODEL COMPARISON\n")
## MODEL COMPARISON
cat("================\n\n")
## ================
cat("AIC (lower is better):\n")
## AIC (lower is better):
cat(" Gamma:", round(AIC(gamma_model), 1), "\n")
## Gamma: 6764.9
cat(" Inverse Gaussian:", round(AIC(ig_model), 1), "\n")
## Inverse Gaussian: 45162
cat(" Winner:", ifelse(AIC(gamma_model) < AIC(ig_model), "Gamma", "Inverse Gaussian"), "\n\n")
## Winner: Gamma
cat("BIC (lower is better):\n")
## BIC (lower is better):
cat(" Gamma:", round(BIC(gamma_model), 1), "\n")
## Gamma: 6793.6
cat(" Inverse Gaussian:", round(BIC(ig_model), 1), "\n")
## Inverse Gaussian: 45190.7
cat(" Winner:", ifelse(BIC(gamma_model) < BIC(ig_model), "Gamma", "Inverse Gaussian"), "\n")
## Winner: Gamma
cat("SPONSOR TYPE EFFECT\n")
## SPONSOR TYPE EFFECT
cat("===================\n\n")
## ===================
# Gamma model
gamma_effect <- round((exp(coef(gamma_model)["sponsorNon-Commercial"]) - 1) * 100, 1)
gamma_p <- round(summary(gamma_model)$coefficients["sponsorNon-Commercial", 4], 4)
# IG model
ig_effect <- round((exp(coef(ig_model)["sponsorNon-Commercial"]) - 1) * 100, 1)
ig_p <- round(summary(ig_model)$coefficients["sponsorNon-Commercial", 4], 4)
cat("Gamma Model:\n")
## Gamma Model:
cat(" Non-commercial trials take", gamma_effect, "% longer than commercial\n")
## Non-commercial trials take 1 % longer than commercial
cat(" p-value:", gamma_p, "\n\n")
## p-value: 0.8073
cat("Inverse Gaussian Model:\n")
## Inverse Gaussian Model:
cat(" Non-commercial trials take", ig_effect, "% longer than commercial\n")
## Non-commercial trials take 2.644266e+13 % longer than commercial
cat(" p-value:", ig_p, "\n")
## p-value: 1
Does the effect of sponsor type depend on how big the trial is (number of sites)?
Maybe commercial sponsors are especially efficient at running large multi-site trials, but the difference disappears for small trials.
Part 1 — Distribution:
\[\text{CompletionTime}_i \sim \text{[Gamma or InverseGaussian]}(\mu_i, \phi)\]
We use whichever distribution fit better in the comparison above.
Part 2 — Link Function and Linear Predictor (with interaction):
\[\log(\mu_i) = \beta_0 + \beta_1(\text{NonCommercial}_i) + \beta_2(\text{NumSites}_i) + \beta_3(\text{NonCommercial}_i \times \text{NumSites}_i) + \beta_4(\text{Enrollment}_i) + \beta_5(\text{Arms}_i) + \beta_6(\text{CovidTiming}_i)\]
What’s new? The term \(\beta_3(\text{NonCommercial} \times \text{NumSites})\)
How to interpret:
If β₃ is significant, the lines in our plot will NOT be parallel — meaning the sponsor type effect depends on trial size.
What’s new? The term \(\beta_3(\text{NonCommercial} \times \text{NumSites})\)
How to interpret:
If β₃ is significant, the lines in our plot will NOT be parallel — meaning the sponsor type effect depends on trial size.
We use whichever distribution won the comparison above.
# Use the better-fitting distribution
if (AIC(gamma_model) <= AIC(ig_model)) {
best_family <- Gamma(link = "log")
best_name <- "Gamma"
} else {
best_family <- inverse.gaussian(link = "log")
best_name <- "Inverse Gaussian"
}
cat("Using", best_name, "distribution for interaction model\n\n")
## Using Gamma distribution for interaction model
# Fit model with interaction (sponsor * num_sites)
interaction_model <- glm(completion_time ~ sponsor * num_sites_z + enrollment_z + arms_z + covid_timing_z,
data = mydata,
family = best_family,
control = glm.control(maxit = 100))
summary(interaction_model)
##
## Call:
## glm(formula = completion_time ~ sponsor * num_sites_z + enrollment_z +
## arms_z + covid_timing_z, family = best_family, data = mydata,
## control = glm.control(maxit = 100))
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 7.063874 0.036032 196.045 < 2e-16 ***
## sponsorNon-Commercial 0.031207 0.042042 0.742 0.45830
## num_sites_z 0.269399 0.089675 3.004 0.00282 **
## enrollment_z -0.027341 0.018315 -1.493 0.13620
## arms_z 0.005772 0.017728 0.326 0.74488
## covid_timing_z -0.080548 0.017338 -4.646 4.49e-06 ***
## sponsorNon-Commercial:num_sites_z -0.228690 0.091275 -2.505 0.01259 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for Gamma family taken to be 0.1279069)
##
## Null deviance: 88.442 on 443 degrees of freedom
## Residual deviance: 83.228 on 437 degrees of freedom
## AIC: 6762.4
##
## Number of Fisher Scoring iterations: 5
We compare the main effects model to the interaction model:
# Get the main effects model (whichever was better)
main_model <- if(AIC(gamma_model) <= AIC(ig_model)) gamma_model else ig_model
# Likelihood ratio test
lr_test <- anova(main_model, interaction_model, test = "F")
print(lr_test)
## Analysis of Deviance Table
##
## Model 1: completion_time ~ sponsor + num_sites_z + enrollment_z + arms_z +
## covid_timing_z
## Model 2: completion_time ~ sponsor * num_sites_z + enrollment_z + arms_z +
## covid_timing_z
## Resid. Df Resid. Dev Df Deviance F Pr(>F)
## 1 438 84.048
## 2 437 83.228 1 0.82033 6.4135 0.01167 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Get p-value
interaction_pvalue <- lr_test$`Pr(>F)`[2]
cat("\n")
cat("INTERACTION TEST RESULT\n")
## INTERACTION TEST RESULT
cat("=======================\n")
## =======================
cat("F-test p-value:", round(interaction_pvalue, 4), "\n\n")
## F-test p-value: 0.0117
if (interaction_pvalue < 0.05) {
cat("CONCLUSION: The interaction IS significant (p < 0.05)\n")
cat("The effect of sponsor type DOES depend on trial size.\n")
} else {
cat("CONCLUSION: The interaction is NOT significant (p >= 0.05)\n")
cat("The effect of sponsor type does NOT depend on trial size.\n")
cat("The simpler main effects model is preferred.\n")
}
## CONCLUSION: The interaction IS significant (p < 0.05)
## The effect of sponsor type DOES depend on trial size.
# Create prediction data
pred_data <- expand.grid(
sponsor = c("Commercial", "Non-Commercial"),
num_sites_z = seq(-0.5, 2, by = 0.1),
enrollment_z = 0,
arms_z = 0,
covid_timing_z = 0
)
# Get predictions
pred_data$predicted <- predict(interaction_model, newdata = pred_data, type = "response")
# Convert num_sites back to original scale for plotting
pred_data$num_sites <- pred_data$num_sites_z * sd(mydata$num_sites) + mean(mydata$num_sites)
# Plot
ggplot(pred_data, aes(x = num_sites, y = predicted, color = sponsor)) +
geom_line(size = 1.2) +
labs(title = "Predicted Completion Time by Sponsor Type and Trial Size",
subtitle = "Other variables held at their means",
x = "Number of Sites",
y = "Predicted Completion Time (days)",
color = "Sponsor Type") +
theme(legend.position = "bottom")
# Determine best model
if (AIC(gamma_model) <= AIC(ig_model)) {
best_model <- gamma_model
best_dist <- "Gamma"
} else {
best_model <- ig_model
best_dist <- "Inverse Gaussian"
}
# Get sponsor effect from best model
sponsor_effect <- round((exp(coef(best_model)["sponsorNon-Commercial"]) - 1) * 100, 1)
sponsor_p <- round(summary(best_model)$coefficients["sponsorNon-Commercial", 4], 4)
interaction_p <- round(interaction_pvalue, 4)
Does sponsor type predict Phase 2 oncology trial completion time?
Non-commercial vs Commercial sponsors:
if (sponsor_p < 0.05 & sponsor_effect > 0) {
cat("Non-commercial trials (academic and government sponsors) take significantly\n")
cat("longer to complete than commercial (industry) trials. Specifically, they take\n")
cat("approximately", sponsor_effect, "% longer (p =", sponsor_p, ").\n\n")
cat("This may reflect:\n")
cat("- Differences in operational infrastructure and dedicated staff\n")
cat("- Resource constraints in academic settings\n")
cat("- The pandemic's disproportionate impact on academic medical centers\n")
} else if (sponsor_p < 0.05 & sponsor_effect < 0) {
cat("Non-commercial trials complete significantly FASTER than commercial trials.\n")
} else {
cat("No significant difference was found between commercial and non-commercial\n")
cat("sponsors in trial completion time (p =", sponsor_p, ").\n")
}
## No significant difference was found between commercial and non-commercial
## sponsors in trial completion time (p = 0.8073 ).
if (best_dist == "Gamma") {
cat("The Gamma distribution provided better fit because:\n\n")
cat("1. The completion time distribution has moderate right skew\n")
cat("2. There are not extreme outliers that require the heavier tail of Inverse Gaussian\n")
cat("3. The Gamma's variance assumption (Var proportional to mean squared) matched our data\n")
} else {
cat("The Inverse Gaussian distribution provided better fit because:\n\n")
cat("1. The completion time distribution has a heavy right tail\n")
cat("2. Some trials take much longer than expected (extreme values)\n")
cat("3. Trial completion can be thought of as 'time to reach a goal' which\n")
cat(" is exactly what the Inverse Gaussian models (first passage time)\n")
}
## The Gamma distribution provided better fit because:
##
## 1. The completion time distribution has moderate right skew
## 2. There are not extreme outliers that require the heavier tail of Inverse Gaussian
## 3. The Gamma's variance assumption (Var proportional to mean squared) matched our data