Prediction Accuracy
Comparacion de la exactitud de la estimaciĂłn entre modelo parametrico vs KM
Comparacion de la exactitud de los dos mejores modelos y el modelo Kaplan-Meier
Mediana del tiempo de vida predicho mediante metodos parametricos para un individuo con 44 repeticiones - Modelo Loglogistic Median and IC95. Male Female
## [1] 41.49976 43.80286 46.23377
## [1] 46.23755 48.40292 50.66970
## Call: survfit(formula = Surv(onset, status) ~ 1, data = sub_hd_female)
##
## 4 observations deleted due to missingness
## time n.risk n.event survival std.err lower 95% CI upper 95% CI
## 1 8 0 1.000 0.000 1.000 1
## 30 8 1 0.875 0.117 0.673 1
Variance explained by predictive models based on cross-validation (VEcv), and Pearson’s correlation coefficient between actual values and predicted values
## exponential gaussian logistic loglogistic lognormal weibull
## vecv 24.46505 0.654203 6.291917 40.4155 29.92314 21.80015
## pearson_r 0.4994575 0.3666552 0.4853607 0.6424356 0.5627177 0.5344158
Summary of the Loglogistic model (the best model)
##
## Call:
## survreg(formula = Surv(survival_time, status) ~ big_allele +
## sex, data = survival_data, na.action = na.omit, dist = model_name[i])
## Value Std. Error z p
## (Intercept) 6.11098 0.18963 32.2 <2e-16
## big_allele -0.05071 0.00422 -12.0 <2e-16
## sex -0.09986 0.03568 -2.8 0.0051
## Log(scale) -2.04978 0.07204 -28.4 <2e-16
##
## Scale= 0.129
##
## Log logistic distribution
## Loglik(model)= -529.2 Loglik(intercept only)= -578.9
## Chisq= 99.42 on 2 degrees of freedom, p= 2.6e-22
## Number of Newton-Raphson Iterations: 6
## n= 167
Bootstrap estimate of VEcv IC95
### Crossvalidation function: returns the variance explained by the model
crossval_function <- function(input_data, distribution = "loglogistic"){
## "input_data" must have 4 columns: onset, status, sex, big_allele
uncensored_list <- which(!is.na(input_data$onset))
uncensored_survival <- input_data$onset[uncensored_list]
predictions <- rep(0, length(uncensored_list))
for(j in seq_along(uncensored_list)){
loo <- uncensored_list[j]
training_data <- input_data[-loo,]
test_data <- input_data[loo,]
xval_fit <- survreg(Surv(training_data$onset, training_data$status) ~ big_allele + sex,
dat = training_data, na.action = na.omit, dist = distribution)
xval_pred <- predict(xval_fit, newdata = test_data)
predictions[j] <- xval_pred
}
## VARIANCE EXPLAINED-CROSS VALIDATION
vecv <- (1-sum((uncensored_survival-predictions)^2)/sum((uncensored_survival-mean(uncensored_survival))^2))*100
pred_cor <- cor(uncensored_survival,predictions )
return(list(vecv = vecv, pearson_r = pred_cor))
}
### Bootstrap function: Returns the new data
boots_surv <- function(data){
idx <- sample(x = 1:nrow(data), size = nrow(data), replace = T)
return(data[idx,])
}
### xval function on a single random bootstrap sample: Returns an estimate of the VEcv on resampled data
xval_vecv_var <- function(survdata, distrib = "loglogistic"){
newdata <- boots_surv(survdata)
res <- crossval_function(input_data = newdata, distribution = distrib )
return(res)
}
### Sanity check
#xval_vecv_var(survdata = survival_data)
### Bootstrap-estimate the variance of cvce and pearson's r2
nreplicates = 500
bootstrap_vecv_estimates <- replicate(n = nreplicates, xval_vecv_var(survdata = survival_data))
vecv_bootstrap <- unlist(bootstrap_vecv_estimates[1,])
r_bootstrap <- unlist(bootstrap_vecv_estimates[2,])
plot(density(vecv_bootstrap), main = "VEcv estimates", lwd = 3, xlab = "VEcv (%)", cex.lab = 1.2)
#vecv_boots_975 <- mean(vecv_bootstrap + 1.959964*sd(vecv_bootstrap))
#vecv_boots_025 <- mean(vecv_bootstrap - 1.959964*sd(vecv_bootstrap))
vecv_boots_025 <- sort(vecv_bootstrap)[floor(0.025*nreplicates)]
vecv_boots_975 <- sort(vecv_bootstrap)[ceiling(0.975*nreplicates)]
vecv_boots_median<- sort(vecv_bootstrap)[round(0.50*nreplicates)]
grid()
abline(v = c(vecv_boots_025, vecv_boots_median, vecv_boots_975),
lwd = 3, col=adjustcolor("dodgerblue4", alpha.f = 0.5), lty = 2)
text(x = c(vecv_boots_025, vecv_boots_median, vecv_boots_975) + 7, y = 0.02,
lab = paste(round(c(vecv_boots_025, vecv_boots_median, vecv_boots_975), 1), "%"),
col = adjustcolor("dodgerblue4", alpha.f = 0.5), cex = 1.2)
summary(vecv_bootstrap)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -4.001 33.684 41.848 40.544 49.299 67.631