# Set seed for random process
set.seed(1)
# Loading packages
pacman::p_load("dplyr", # Data wrangling
"ggplot2", # Data visualization
"ggExtra", # Add-ons for ggplot2
"mice", # Multiple imputation of data
"pROC", # C-statistic for logistic models
"splines", # Natural cubic splines
"survival", # Developing Cox prediction models
"rio", # Importing data
"rms" # Developing prediction models and restricted cubic splines
)PREDICT - Prediction Practical 1
Analysis and Reporting of Prediction Modelling Studies
1 Introduction
In this practical, we will see what are the minimal requirements to develop a prediction model in R (i.e. not from a statistical point of view) and develop a prediction model ourselves.
At the end of this practical, you will know what is required to develop and validate a prediction model in R. Additionally, you will understand the differences between different packages for prediction modelling.
2 Setting up R
Before we get into the fun part, of course we have to load a number of packages that will help us. Note that we use pacman::p_load() for this, instead of library(), because it loads the package, installs it if missing, and it can call multiple packages at once.
Note that we use {pacman} instead of the usual approach with install.packages() and library(), but that this requires a one-time prior installation of {pacman}:
3 Loading data
For this practical, we will use two datasets.
3.1 Traumatic brain injury (TBI) data
This dataset contains 2,159 patients from the international and US Tirilazad trials (distributed here for didactic purposes only). The primary outcome was the Glasgow Outcome (range 1 through 5) at 6 months.
The TBI dataset was sent to you together with this practical and can be loaded from your local device:
# Specify path of TBI dataset
path <- "C:/users/avid_PREDICT_student/documents/pint/TBI.txt"
# Load TBI data
tbi <- import(path)Warning in (function (input = "", file = NULL, text = NULL, cmd = NULL, :
Detected 24 column names but the data has 25 columns (i.e. invalid file). Added
an extra default column name for the first column which is guessed to be row
names or an index. Use setnames() afterwards if this guess is not correct, or
fix the file write command that created the file to create a valid file.
The warning we receive is the result of row numbers being present in the .txt file (as also guessed by import()). Given that import() guessed correct, we can ignore this warning and treat the extra column V1 as individual identifiers.
3.2 NCCTG lung cancer data
The NCCTG lung cancer dataset contains 228 patients from the North Central Cancer Treatment group, with information on time until death and performance scores.
The NCCTG lung cancer dataset is provided by the {survival} package and can be loaded from within R:
More formally, we could call the lung dataset with:
However, this will show all related cancer datasets available in {survival} as promises (i.e. available to you but not loaded into memory). If you run lung once, it will be loaded into your global environment:
Nonetheless, the other datasets remain visible as promises in your global environment, which can be rather messy.
Below are the codebooks for both the TBI and the lung data.
| TBI | Lung | ||
| Variable | Description | Variable | Description |
| V1 | Identifier | inst | Institution code |
| trial | Trial identification | time | Survival time (days) |
| d.gos | Glasgow Outcome Scale at 6 months
|
status | Censoring status:
|
| d.mort | Mortality at 6 months | age | Age (yrs) |
| d.unfav | Unfavourable outcome at 6 months | ||
| cause | Cause of injury
|
ph.ecog | ECOG performance score as rated by the physician
|
| age | Age (yrs) | ph.karno | Karnofsky performance score rated by the physician (range 0-100) |
| d.motor | Admission motor score (range 1-6) | pat.karno | Karnofsky performance score as rated by patient |
| d.pupil | Pupillary reactivity
|
meal.cal | Calories consumed at meals |
| hypoxia | Hypoxia before or at admission | ||
| hypotens | Hypotension before or at admission | ||
| ctclass | Marshall CT classification (range 1-6) | ||
| tsah | tSAH at CT | ||
| edh | EDH at CT | ||
| cisterns | Compressed cisterns at CT
|
||
| shift | Midline shift >5 mm at CT | ||
| glucose | Glucose at admission (mmol/L) | ||
| glucoset | Truncated glucose values (mmol/L) | ||
| ph | pH | ||
| sodium | Sodium (mmol/L) | ||
| hb | Hemoglobin (g/dL) | ||
| hbt | Truncated hemoglobin (g/dL) |
4 Predicting a binary unfavourable outcome
In this practical, we will develop a prediction model for the risk of an unfavourable outcome after a TBI (d.unfav). As a first step, let’s better understand our data.
4.1 Getting to know our data
Suppose we consulted some TBI experts, who told us that the most important predictors we want to include are motor activity (d.motor), pupillary reactivity (d.pupil), TBI cause (cause), and age (age).
First, let’s inspect the data available on the outcome and predictors chosen based on clinical expertise.
# Summarize key variables from the TBI dataset
# Outcome: d.unfav
# Predictors: d.motor, d.pupil, cause, age
summary(tbi[, c("d.unfav", "d.motor", "d.pupil", "cause", "age")]) d.unfav d.motor d.pupil cause
Min. :0.0000 Min. :1.000 Length:2159 Length:2159
1st Qu.:0.0000 1st Qu.:3.000 Class :character Class :character
Median :0.0000 Median :4.000 Mode :character Mode :character
Mean :0.3942 Mean :3.991
3rd Qu.:1.0000 3rd Qu.:5.000
Max. :1.0000 Max. :6.000
age
Min. :14.00
1st Qu.:22.00
Median :30.00
Mean :33.21
3rd Qu.:43.00
Max. :79.00
This shows us that the outcome variable d.unfav is binary, the predictors d.motor has six categories, and age is a continuous variable ranging from 14 to 79 years. Let’s further inspect d.pupil and cause.
both reactive no reactive pupils one reactive
1430 327 279
Assault domestic/fall Motorbike
134 370 420
other Road traffic accident
387 848
The predictors d.pupil and cause are categorical variables with three and five levels, respectively. Let’s further inspect the outcome. The data summary indicates that these variables are currently stored as character types. To use them effectively in our analysis, we will convert them to factors, which allows for proper handling of categorical information in modeling.
For completeness, we will also convert d.motor to a factor, as this is necessary for proper analysis. It is important to always code categorical variables as factors in R. Otherwise, they may be treated as continuous variables in the model, resulting in a single coefficient that assumes equal effects between levels, an assumption that is typically inappropriate for categorical data.
Let’s further inspect the outcome.
0 1
1308 851
0 1
60.5836 39.4164
Thus, we have 851 cases (39.4%). The likelihood that a prediction model is overfitted depends partly on the number of cases and non-cases. Fortunately, the number of cases is quite substantial in our dataset. Using the predictors suggested by the clinical expert, we estimate 13 coefficients in the model, including the intercept. This is because motor activity, pupillary reactivity, and cause are categorical variables, and each uses one reference category. You can verify the number of coefficients in the next section. Given the relatively high number of events, we are not too concerned about overfitting.
4.2 Dealing with missingness
Let’s inspect the number of missings.
d.unfav d.motor d.pupil cause age
0 0 123 0 0
This confirms that the outcome variable d.unfav has no missing values. All predictor variables are complete, except for d.pupil, which is missing in 123 cases. For the purposes of this practical exercise, we will apply single imputation for simplicity. Although single imputation is often unsatisfactory, for our current didactic purposes it is enough.
1single.imputation <- mice::mice(tbi[, c("d.unfav",
"d.motor",
"d.pupil",
"cause",
"age")],
2 m = 1,
3 seed = 1)- 1
- We include only the necessary variables in the imputation model. Note that these variables may differ from those in your (initial) prediction model.
- 2
- For now, we’ve set the number of imputations to one, but in practice, this should typically be greater than one.
- 3
- Because imputation is a random process, we set a seed to ensure reproducible results. Without a seed, each run would produce different imputed values.
iter imp variable
1 1 d.pupil
2 1 d.pupil
3 1 d.pupil
4 1 d.pupil
5 1 d.pupil
R is teling us that it performed a single imputation with five iterations on the variable d.pupil. These iterations are used by the mice algorithm to refine and stabilize the imputed values through chained equations before finalizing the result. To be able to use the single imputed data set in further analysis, we need to extract it form the mice object.
4.3 Method I: glm()
4.3.1 Fit an initial prediction model
We are now ready to develop our prediction model. Since the outcome variable is binary, we will use logistic regression. Logistic regression belongs to the family of generalized linear models (GLMs), so we can use the glm() function, as demonstrated earlier. By specifying the family = binomial, we indicate that the model should use a logistic link function appropriate for binary outcomes. For didactic purposes, let’s first use the data set with missing values for d.pupil.
# Fit the prediction model
fit.glm.tbi <- stats::glm(
1 d.unfav ~
2 age +
d.motor +
d.pupil +
cause,
data = tbi,
3 family = binomial)- 1
-
Fit a logistic regression model with the outcome
d.unfav. - 2
- Add age as a linear predictor for now (we will inspect non-linearity next).
- 3
- Specify that the data source and the family to define a logistic model.
Let’s see what the output looks like:
Call:
stats::glm(formula = d.unfav ~ age + d.motor + d.pupil + cause,
family = binomial, data = tbi)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.164267 0.624123 -0.263 0.7924
age 0.037503 0.004037 9.289 < 2e-16 ***
d.motor2 0.648468 0.621264 1.044 0.2966
d.motor3 0.057519 0.614843 0.094 0.9255
d.motor4 -0.660139 0.609902 -1.082 0.2791
d.motor5 -1.304401 0.610827 -2.135 0.0327 *
d.motor6 -1.628588 0.680168 -2.394 0.0166 *
d.pupil1 -0.733449 0.185156 -3.961 7.46e-05 ***
d.pupil2 -1.285761 0.148476 -8.660 < 2e-16 ***
cause2 -0.020937 0.145428 -0.144 0.8855
cause3 0.143078 0.153424 0.933 0.3510
cause4 -0.183664 0.231363 -0.794 0.4273
cause5 0.251604 0.146468 1.718 0.0858 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 2726.7 on 2035 degrees of freedom
Residual deviance: 2251.5 on 2023 degrees of freedom
(123 observations deleted due to missingness)
AIC: 2277.5
Number of Fisher Scoring iterations: 4
The attentive reader might notice that in one of the last lines, the following message is printed:
(123 observations deleted due to missingness)
It is important to realize that, like much statistical software, individuals with missing values are excluded by default. However, if the missingness is informative (which is often the case), this likely results in a biased prediction model (and we lose some power). Therefore, we have performed single imputation to deal with missingness.
If we refit the model with the imputed values for d.pupil, we get:
# Fit the prediction model
fit.glm.tbi <- stats::glm(d.unfav ~
age +
d.motor +
d.pupil +
cause,
data = imputed.tbi,
family = binomial)
# Print output
summary(fit.glm.tbi)
Call:
stats::glm(formula = d.unfav ~ age + d.motor + d.pupil + cause,
family = binomial, data = imputed.tbi)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) -0.02818 0.61014 -0.046 0.96316
age 0.03794 0.00389 9.752 < 2e-16 ***
d.motor2 0.45496 0.60584 0.751 0.45268
d.motor3 -0.08903 0.59987 -0.148 0.88202
d.motor4 -0.77002 0.59508 -1.294 0.19567
d.motor5 -1.40657 0.59582 -2.361 0.01824 *
d.motor6 -1.81714 0.66589 -2.729 0.00636 **
d.pupil1 -0.77994 0.17883 -4.362 1.29e-05 ***
d.pupil2 -1.29120 0.14347 -9.000 < 2e-16 ***
cause2 -0.03586 0.14089 -0.255 0.79906
cause3 0.05621 0.14879 0.378 0.70558
cause4 -0.12692 0.22010 -0.577 0.56418
cause5 0.26330 0.14136 1.863 0.06251 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 2895.5 on 2158 degrees of freedom
Residual deviance: 2410.5 on 2146 degrees of freedom
AIC: 2436.5
Number of Fisher Scoring iterations: 4
The message on observations removed due to missingness is now gone (because no individuals were removed).
4.3.2 Consider non-linearities for continuous variables
Age is a continuous variable, and therefore we should investigate whether adding it as a linear term is sufficient or if we model it non-linearly (read more here). We can do this by modelling the outcome as a function of age and visually inspecting the results.
# Use a logistic regression model to model the outcome as a function of age
fit.glm.tbi.nonlinear <- stats::glm(
1 d.unfav ~ rms::rcs(age, 4) +
d.motor +
d.pupil +
cause,
data = imputed.tbi,
family = binomial
)
# Plot age on log odds scale
2ggplot2::ggplot(
data = imputed.tbi,
ggplot2::aes(x = age, y = fit.glm.tbi.nonlinear[["linear.predictors"]])
) +
3 ggplot2::geom_point() +
4 ggplot2::geom_smooth(
method = "loess",
se = TRUE,
color = "blue",
fill = "lightblue",
formula = "y ~ x"
) +
5 ggplot2::xlab("Age (yrs)") + ggplot2::ylab("Log odds") +
6 ggplot2::theme_bw()- 1
-
Fit a logistic regression model with the outcome
d.unfavas a non-linear transformation ofageand including all other predictors using restricted cubic splines (read more about rcs here). - 2
-
Create the basis for a plot, where age is on the x-axis and the linear predictor (
linear.predictors) from the logistic regression model are on the y-axis. - 3
- Draw points in the plot based on the data.
- 4
- Add a smoothed curve to the plot using a locally estimated scatterplot smoothing (LOESS) curve.
- 5
- Change the axes labels.
- 6
- Change the look of the plot using a pre-specified theme.
This plot is not the most visually appealing; an improved alternative is presented in the following section.
From the figure we observe that age is somewhat linearly associated with the outcome, even if we allow for non-linearities. Still, we want to formally test whether a non-linear model for age might be more appropriate.
# perform likelihood ratio test between the nested models
anova(fit.glm.tbi, fit.glm.tbi.nonlinear, test = "Chisq")Analysis of Deviance Table
Model 1: d.unfav ~ age + d.motor + d.pupil + cause
Model 2: d.unfav ~ rms::rcs(age, 4) + d.motor + d.pupil + cause
Resid. Df Resid. Dev Df Deviance Pr(>Chi)
1 2146 2410.5
2 2144 2406.8 2 3.7058 0.1568
The likelihood ratio test (p-value = 0.1568) indicates there is no significant linearity. Thus, we do not need to apply any further non-linear transformation to age.
Please note, when fitting the non-linear model, we use three coefficients:
Call:
stats::glm(formula = d.unfav ~ rms::rcs(age, 4) + d.motor + d.pupil +
cause, family = binomial, data = imputed.tbi)
Coefficients:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.990484 0.806998 1.227 0.21968
rms::rcs(age, 4)age -0.009789 0.024986 -0.392 0.69523
rms::rcs(age, 4)age' 0.281545 0.154575 1.821 0.06854 .
rms::rcs(age, 4)age'' -0.520271 0.295252 -1.762 0.07805 .
d.motor2 0.430456 0.606625 0.710 0.47796
d.motor3 -0.113539 0.600783 -0.189 0.85010
d.motor4 -0.792502 0.595839 -1.330 0.18350
d.motor5 -1.427317 0.596828 -2.392 0.01678 *
d.motor6 -1.854870 0.666942 -2.781 0.00542 **
d.pupil1 -0.760307 0.179220 -4.242 2.21e-05 ***
d.pupil2 -1.289541 0.143464 -8.989 < 2e-16 ***
cause2 -0.055930 0.141515 -0.395 0.69268
cause3 0.040592 0.149166 0.272 0.78553
cause4 -0.118230 0.221658 -0.533 0.59376
cause5 0.253768 0.141560 1.793 0.07303 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
(Dispersion parameter for binomial family taken to be 1)
Null deviance: 2895.5 on 2158 degrees of freedom
Residual deviance: 2406.8 on 2144 degrees of freedom
AIC: 2436.8
Number of Fisher Scoring iterations: 4
Thus, while the linear model (fit) initially used 13 coefficients, the non-linear model employed 15. Even though we ultimately choose the linear model, it’s important to note that 15 degrees of freedom were used to explore potential non-linear effects. Therefore, when assessing the risk of overfitting, we should account for all 15 degrees of freedom.
4.3.3 Validating the prediction model
Now that we made a prediction model, we have to make sure it actually works. To do this, we compare the observed and the predicted values. To this end, we will have to somehow get the predicted values from the prediction model.
There are two easy ways to do that:
# Get predicted values from model fit object and add to imputed tbi data
1imputed.tbi[["preds"]] <- fit.glm.tbi[["fitted.values"]]
# Get predicted values from function
2preds_fun <- predict(fit.glm.tbi, type = "response")
# Compare values
3table(imputed.tbi[["preds"]] == preds_fun)- 1
- The predicted values are already stored in the model fit object. This makes it easy for us to extract and use those values.
- 2
-
We can also use the predict function. The predict function contains many subclasses that recognize what type of model we fitted. In this case, R automatically detects it should use the subclass
predict.glm(). By defining ‘response’, we tell R that we want the predicted probabilities. The advantage ofpredict()is that we can also supply thenewdataargument. By supplyingnewdatawith a data frame with column names corresponding to the predictors, thepredict()function calculates predictions for this new data. - 3
- We can see that the predictions are the same for each individual.
TRUE
2159
With these predictions, we can plot an ROC curve and calculate the AUC, which is equivalent to the C-statistic for binary outcomes.
# create ROC object
ROC_obj <- pROC::roc(response = imputed.tbi[["d.unfav"]],
predictor = imputed.tbi[["preds"]])- 3
-
Flips the x-axis from 1→0 to 0→1 using
legacy.axes = TRUE. - 4
- Add the 45-degree line.
- 5
- Add the AUC in the Figure.
- 6
- Change the axes labels.
- 7
- Change the look of the plot using a pre-specified theme.
Setting levels: control = 0, case = 1
Setting direction: controls < cases
# Calculate AUC
AUC <- pROC::auc(ROC_obj)
cat("AUC:", round(AUC, 3))
# plot the ROC curve
3pROC::ggroc(ROC_obj, legacy.axes = TRUE) +
4 ggplot2::geom_abline(linetype = "dashed", color = "gray") +
5 ggplot2::annotate("text", x = 0.6, y = 0.2,
label = paste("AUC =", round(AUC, 3)),
size = 5) +
6 ggplot2::labs(title = "ROC Curve",
x = "False Positive Rate (1 - Specificity)",
y = "True Positive Rate (Sensitivity)") +
7 ggplot2::theme_bw()AUC: 0.765
Thus, the discrimination of our model is is 0.765.
To calculate our calibration intercept and calibration slope, we will need the to extract the linear predictor from the fit object.
# Get linear predictors from model fit for calibration intercept and slope
1imputed.tbi[["lps"]] <- fit.glm.tbi[["linear.predictors"]]
# Calculate calibration intercept
2fit.offest.lp <- glm(d.unfav ~ offset(lps),
data = imputed.tbi, family = binomial)
3cal.int <- fit.offest.lp[["coefficients"]][["(Intercept)"]]
# Calculate calibration slope
4fit.lp <- glm(d.unfav ~ lps,
data = imputed.tbi, family = binomial)
5cal.slope <- fit.lp[["coefficients"]][["lps"]]
# Print intercept and slope
6cat("Intercept:", format(round(cal.int, 3), nsmall = 3),
"\nSlope:", format(round(cal.slope, 3), nsmall = 3))- 1
-
As with getting the predicted values, we can use the model fit, or use
predict(fit, type = "link"), which has the advantage of allowing new data. - 2
- To calculate the calibration intercept, we fit a model using the outcome while treating the original linear predictor as an offset. This forces its coefficient to remain fixed at one.
- 3
- The intercept from the model in step 2 is the calibration intercept.
- 4
- To calculate the calibration slope, we fit a model predicting the outcome using the linear predictor.
- 5
- The coefficient of the linear predictor of the model in step 2 is the calibration slope.
- 6
-
Print the intercept and the slope (i.e. the regression coefficient for the linear predictors).
round()rounds the values andformat()prints the decimals up to the value supplied innsmall, even if they are only 0’s.
Intercept: 0.000
Slope: 1.000
As expected, the calibration intercept is 0 and the calibration slope is 1, since the model is being validated on the same data it was developed with. If these values differ, it suggests an issue with the model fitting process. In unseen data, the calibration intercept and slope are expected to differ from zero and one, respectively.
Finally, we can create a calibration plot:
# Create calibration plot
ggplot2::ggplot(imputed.tbi,
1 ggplot2::aes(x = preds, y = d.unfav)) +
2 ggplot2::geom_abline(
linewidth = 1,
colour = "black",
alpha = 0.33
) +
3 ggplot2::geom_point(alpha = 0.33) +
4 ggplot2::geom_smooth(
colour = "#785EF0",
fill = "#785EF0",
method = "loess",
formula = "y ~ x"
) +
5 ggplot2::xlab("Predicted probability") +
ggplot2::ylab("Observed probability") +
6 ggplot2::theme_bw()- 1
- We create a base layer with the data for the calibration plot.
- 2
-
We add the unity line that shows perfect calibration, which is 67% transparent (
alpha = 0.33). - 3
- We draw points on the plot that represent the predicted vs. observed values for each individual.
- 4
- We draw a locally estimated scatterplot smoothing (LOESS) curve which shows the calibration over the range of predicted values.
- 5
- We define the labels for the axes.
- 6
- We change the look of the plot based on a prespecified theme.
We can also add the histogram of predicted probabilities using ggMarginal() from {ggExtras}.
# Create calibration plot
1p <- ggplot2::ggplot(imputed.tbi, ggplot2::aes(
x = preds,
y = d.unfav,
colour = as.factor(d.unfav)
)) +
ggplot2::geom_abline(linewidth = 1,
colour = "black",
alpha = 0.33) +
ggplot2::geom_point(alpha = 0.33) +
ggplot2::geom_smooth(
colour = "#785EF0",
fill = "#785EF0",
method = "loess",
formula = "y ~ x"
) +
2 ggplot2::scale_colour_manual(
values = c("#648FFF", "#FFB000"),
labels = c("Favourable outcome", "Unfavourable outcome")
) +
ggplot2::xlab("Predicted probability") +
ggplot2::ylab("Observed probability") +
ggplot2::theme_bw() +
3 ggplot2::theme(
legend.position = "bottom",
legend.title = ggplot2::element_blank()
)
# Add histograms
4ggExtra::ggMarginal(
p,
type = "histogram",
margins = "x",
binwidth = 0.01,
groupFill = TRUE
) - 1
-
To have two separately coloured histograms for individuals with and without the outcome, we need to specify on what variable to stratify colour. As a side effect, this also colours the points drawn with
geom_point(). - 2
- With a manual scale, we define the colours and labels corresponding to each colour.
- 3
- We slightly adjust the theme by moving the legend to the bottom of the plot and removing the title of the legend.
- 4
- We add the histograms to the plot.
4.4 Method II: lrm()
We now used the glm() function from one of the base R packages ({stats}). However, there is another package that is often used to develop prediction models: {rms}.
4.4.1 Consider non-linearities for continuous variables
To develop our prediction model using {rms}, we use the function lrm() for a logistic regression model. Note that we are directly using the imputed data set. First, let’s check if age should be modeled non-linearly. The {rms} package requires to tell R what the data distributions are using datadist().
# define data distributions
dd <- rms::datadist(imputed.tbi)
options(datadist="dd")
# fit a model allowing age to be non-linearly related to the outcome
fit.rms.tbi.nonlinear <- rms::lrm(
d.unfav ~ rms::rcs(age, 4) +
d.motor +
d.pupil +
cause,
data = imputed.tbi,
1 x = TRUE, y = TRUE)- 1
-
We have to specify x and y as TRUE to use the functions to validate our data using the
{rms}.
We can obtain the hazard ratios from an rms object very easily.
1summary(fit.rms.tbi.nonlinear)
2summary(fit.rms.tbi.nonlinear, age = c(20, 30))
3summary(fit.rms.tbi.nonlinear, age = c(30, 40))- 1
-
By default, hazard ratios are calculated for a change from the 25th to the 75th percentile. This is why we used
datadist()to inform R about the data distribution. - 2
- You can specify a custom range for the odds ratio. For example, changing age from 20 to 30 gives an odds ratio of 1.2356.
- 3
- Because age was modeled non-linearly, the odds ratio for a change from 30 to 40 is different, namely 1.7141.
Effects Response : d.unfav
Factor Low High Diff. Effect S.E. Lower 0.95 Upper 0.95
age 22 43 21 0.892280 0.13817 0.621470 1.16310
Odds Ratio 22 43 21 2.440700 NA 1.861700 3.19980
d.motor - 1:5 5 1 NA 1.427300 0.59683 0.257560 2.59710
Odds Ratio 5 1 NA 4.167500 NA 1.293800 13.42400
d.motor - 2:5 5 2 NA 1.857800 0.17196 1.520700 2.19480
Odds Ratio 5 2 NA 6.409500 NA 4.575600 8.97830
d.motor - 3:5 5 3 NA 1.313800 0.14598 1.027700 1.59990
Odds Ratio 5 3 NA 3.720200 NA 2.794500 4.95250
d.motor - 4:5 5 4 NA 0.634820 0.12540 0.389040 0.88059
Odds Ratio 5 4 NA 1.886700 NA 1.475600 2.41230
d.motor - 6:5 5 6 NA -0.427550 0.32365 -1.061900 0.20679
Odds Ratio 5 6 NA 0.652100 NA 0.345800 1.22970
d.pupil - 0:2 3 1 NA 1.289500 0.14346 1.008400 1.57070
Odds Ratio 3 1 NA 3.631100 NA 2.741100 4.81010
d.pupil - 1:2 3 2 NA 0.529230 0.14372 0.247550 0.81092
Odds Ratio 3 2 NA 1.697600 NA 1.280900 2.25000
cause - 2:1 1 2 NA -0.055930 0.14152 -0.333290 0.22143
Odds Ratio 1 2 NA 0.945610 NA 0.716560 1.24790
cause - 3:1 1 3 NA 0.040592 0.14917 -0.251770 0.33295
Odds Ratio 1 3 NA 1.041400 NA 0.777420 1.39510
cause - 4:1 1 4 NA -0.118230 0.22166 -0.552670 0.31621
Odds Ratio 1 4 NA 0.888490 NA 0.575410 1.37190
cause - 5:1 1 5 NA 0.253770 0.14156 -0.023683 0.53122
Odds Ratio 1 5 NA 1.288900 NA 0.976600 1.70100
Effects Response : d.unfav
Factor Low High Diff. Effect S.E. Lower 0.95 Upper 0.95
age 20 30 10 0.211590 0.10493 0.0059241 0.41726
Odds Ratio 20 30 10 1.235600 NA 1.0059000 1.51780
d.motor - 1:5 5 1 NA 1.427300 0.59683 0.2575600 2.59710
Odds Ratio 5 1 NA 4.167500 NA 1.2938000 13.42400
d.motor - 2:5 5 2 NA 1.857800 0.17196 1.5207000 2.19480
Odds Ratio 5 2 NA 6.409500 NA 4.5756000 8.97830
d.motor - 3:5 5 3 NA 1.313800 0.14598 1.0277000 1.59990
Odds Ratio 5 3 NA 3.720200 NA 2.7945000 4.95250
d.motor - 4:5 5 4 NA 0.634820 0.12540 0.3890400 0.88059
Odds Ratio 5 4 NA 1.886700 NA 1.4756000 2.41230
d.motor - 6:5 5 6 NA -0.427550 0.32365 -1.0619000 0.20679
Odds Ratio 5 6 NA 0.652100 NA 0.3458000 1.22970
d.pupil - 0:2 3 1 NA 1.289500 0.14346 1.0084000 1.57070
Odds Ratio 3 1 NA 3.631100 NA 2.7411000 4.81010
d.pupil - 1:2 3 2 NA 0.529230 0.14372 0.2475500 0.81092
Odds Ratio 3 2 NA 1.697600 NA 1.2809000 2.25000
cause - 2:1 1 2 NA -0.055930 0.14152 -0.3332900 0.22143
Odds Ratio 1 2 NA 0.945610 NA 0.7165600 1.24790
cause - 3:1 1 3 NA 0.040592 0.14917 -0.2517700 0.33295
Odds Ratio 1 3 NA 1.041400 NA 0.7774200 1.39510
cause - 4:1 1 4 NA -0.118230 0.22166 -0.5526700 0.31621
Odds Ratio 1 4 NA 0.888490 NA 0.5754100 1.37190
cause - 5:1 1 5 NA 0.253770 0.14156 -0.0236830 0.53122
Odds Ratio 1 5 NA 1.288900 NA 0.9766000 1.70100
Effects Response : d.unfav
Factor Low High Diff. Effect S.E. Lower 0.95 Upper 0.95
age 30 40 10 0.538920 0.10936 0.324570 0.75326
Odds Ratio 30 40 10 1.714100 NA 1.383400 2.12390
d.motor - 1:5 5 1 NA 1.427300 0.59683 0.257560 2.59710
Odds Ratio 5 1 NA 4.167500 NA 1.293800 13.42400
d.motor - 2:5 5 2 NA 1.857800 0.17196 1.520700 2.19480
Odds Ratio 5 2 NA 6.409500 NA 4.575600 8.97830
d.motor - 3:5 5 3 NA 1.313800 0.14598 1.027700 1.59990
Odds Ratio 5 3 NA 3.720200 NA 2.794500 4.95250
d.motor - 4:5 5 4 NA 0.634820 0.12540 0.389040 0.88059
Odds Ratio 5 4 NA 1.886700 NA 1.475600 2.41230
d.motor - 6:5 5 6 NA -0.427550 0.32365 -1.061900 0.20679
Odds Ratio 5 6 NA 0.652100 NA 0.345800 1.22970
d.pupil - 0:2 3 1 NA 1.289500 0.14346 1.008400 1.57070
Odds Ratio 3 1 NA 3.631100 NA 2.741100 4.81010
d.pupil - 1:2 3 2 NA 0.529230 0.14372 0.247550 0.81092
Odds Ratio 3 2 NA 1.697600 NA 1.280900 2.25000
cause - 2:1 1 2 NA -0.055930 0.14152 -0.333290 0.22143
Odds Ratio 1 2 NA 0.945610 NA 0.716560 1.24790
cause - 3:1 1 3 NA 0.040592 0.14917 -0.251770 0.33295
Odds Ratio 1 3 NA 1.041400 NA 0.777420 1.39510
cause - 4:1 1 4 NA -0.118230 0.22166 -0.552670 0.31621
Odds Ratio 1 4 NA 0.888490 NA 0.575410 1.37190
cause - 5:1 1 5 NA 0.253770 0.14156 -0.023683 0.53122
Odds Ratio 1 5 NA 1.288900 NA 0.976600 1.70100
Furthermore, we can easily plot the effects of the predictors using plot(Predict()).
From the figure we see that age is somewhat linearly associated with the outcome, even if we allow for non-linearities. If we want to formally test for this, we can use the anova() function.
Wald Statistics Response: d.unfav
Factor Chi-Square d.f. P
age 99.16 3 <.0001
Nonlinear 3.72 2 0.1553
d.motor 159.78 5 <.0001
d.pupil 83.82 2 <.0001
cause 4.99 4 0.2884
TOTAL 369.68 14 <.0001
The anova() function gives the result of a Wald test, which is similar but slightly different from a likelihood ratio test which we performed before. The Wald test (p-value = 0.2884) indicates there is no significant linearity. Thus, we do not need to apply any further non-linear transformation to age.
4.4.2 Validating the prediction model
The {rms} package offers an easy function validate prediction models: validate().
# Validate the model
val.model <- rms::validate(
fit.rms.tbi.linear,
1 method = "boot",
2 B = 40,
3 bw = FALSE)- 1
-
The
validate()function was configured to use bootstrapping to estimate optimism. A bootstrap sample is created by resampling the original dataset with replacement until it reaches the same size as the original. For each bootstrap sample, the model is refitted and performance metrics (e.g., AUC) are calculated and compared to the original data. This process is repeated and the results are aggregated across all bootstraps. This provides insight in how the model might perform on slightly different data. - 2
- We set the number of bootstraps to 40, meaning 40 bootstrap samples are generated, 40 models are fitted, and their performance metrics are combined across these 40 iterations.
- 3
-
Currently, we’ve used the default settings for
validate(). However, if you move on to more advanced modeling and apply backward selection, ensure thatbw = TRUE, you specify the stopping rule (e.g.,rule = "aic"), and the selection criterion (e.g., only include predictors with a p-value below 0.05:sls = 0.05). This is important because the selection process needs to be repeated within each bootstrap sample to properly account for optimism.
How do we interpret this output? The validate() function applies resampling, which is a method to account for overfitting, with the results being printed in the index.corrected column. However, if we are just interested in the apparent performance, we can just look at index.orig. To calculate the C-statistic, we can use \(c = \frac{Dxy}{2} + 0.5\). In our case, the C-statistic would be:
cat(" Apparent C-statistic",
round(val.model["Dxy", "index.orig"]/2 + 0.5, 3), "\n",
"Optimism-corrected C-statistic",
round(val.model["Dxy", "index.corrected"]/2 + 0.5, 3)) Apparent C-statistic 0.765
Optimism-corrected C-statistic 0.759
The intercept and slope are given under ‘Intercept’ and ‘Slope’. To get the calibration plot, we can use the calibrate() function (which aims to give overfitting-corrected estimates):
n=2159 Mean absolute error=0.005 Mean squared error=3e-05
0.9 Quantile of absolute error=0.009
Another useful function val.prob() gives the calibration plot with all the performance measures:
# Calibration plot
1rms::val.prob(p = plogis(predict(fit.rms.tbi.linear, type = "lp")),
y = imputed.tbi[["d.unfav"]]) - 1
-
The linear predictors (LP) are first obtained from the
fit_rmsobject. Subsequently, theplogisfunction applies the inverse logit transformation (\[P(Y = 1 \mid X) = \text{logistic}(LP) = \frac{1}{1 + e^{-LP}}\]) to the linear predictor to obtain probabilities.
Dxy C (ROC) R2 D D:Chi-sq
5.298920e-01 7.649460e-01 2.724802e-01 2.241995e-01 4.850466e+02
D:p U U:Chi-sq U:p Q
0.000000e+00 -9.263548e-04 -3.183231e-12 1.000000e+00 2.251258e-01
Brier Intercept Slope Emax E90
1.880106e-01 -6.394705e-16 1.000000e+00 9.129085e-03 6.044859e-03
Eavg S:z S:p
3.120131e-03 -4.152230e-02 9.668795e-01
4.5 Method I or Method II?
So why would we pick one method over the other? Well, both methods have their advantages and disadvantages. It may be clear that {rms} has more native (i.e. inherent in the package) support for prediction modelling in the terms of validation with functions such as validate(), calibate(), and val.prob(). Additionally, it makes it easy to get optimism-corrected performance measures. However, {rms} forces you to keep using the functions from its package, which are not always user intuitive. Moreover, only a few packages have been created as add-ons to {rms}, which makes it difficult to integrate into other analyses, or analyses that were not implemented/supported by the author. Conversely, glm() is part of the {stats} package, one of the base packages of R. A huge amount of packages have been developed which built on results from glm(). Additionally, as a base function of R, there is more focus on intuitive use and error-proofing.
4.6 The curse of the model fit object
Whether we develop a prediction model using glm() or lrm(), functions such as predict(), validate(), and calibrate(), require us to supply the model fit object: the R object which contains the fitted model. This makes it much easier for the developer of the prediction model (who has the model fit object) to determine performance. However, validation becomes much more difficult. Either the validator needs to be supplied with the model fit object (meaning a certain level of skill in the corresponding programming language of the model fit object is required), or the validator needs to use functions that do not require the model fit.
Because the model fit object is often not supplied, the validator has to resort to the functions that do not require the model fit. For the C-statistic, this could be roc() from the {pROC} package. However, to get the linear predictors, the modeller could use predict(), but the validator will have to calculate these based on the coefficients. This can mean a lot of (unnecessary) manual work, including the risk of error when writing down the predictor coefficients.
5 Predicting survival in advanced lung cancer
We will now also develop a survival prediction model based on Cox regression using the lung dataset. We will use the {survival} package, which is part of the core R packages and which is closely related to the glm() function. As a second method, we will use the cph() function from {rms} package.
5.1 Getting to know our data
We (again hypothetically) consulted some lung cancer experts, who told us that the most important predictors we want to include are age (age), biological sex (sex), and weight loss in the last six months (wt.loss) and that we should predict death within two years.
First, let’s inspect the number of missing values.
# remove previous variables from other data set
rm(list=ls())
# Set seed for random process
set.seed(1)
# load data from survival package
lung <- survival::lung[, c("time", "status", "age", "sex", "wt.loss")]
summary(lung) time status age sex
Min. : 5.0 Min. :1.000 Min. :39.00 Min. :1.000
1st Qu.: 166.8 1st Qu.:1.000 1st Qu.:56.00 1st Qu.:1.000
Median : 255.5 Median :2.000 Median :63.00 Median :1.000
Mean : 305.2 Mean :1.724 Mean :62.45 Mean :1.395
3rd Qu.: 396.5 3rd Qu.:2.000 3rd Qu.:69.00 3rd Qu.:2.000
Max. :1022.0 Max. :2.000 Max. :82.00 Max. :2.000
wt.loss
Min. :-24.000
1st Qu.: 0.000
Median : 7.000
Mean : 9.832
3rd Qu.: 15.750
Max. : 68.000
NA's :14
Some survival times go beyond two years, so let’s cap those survival times.
# Cap survival times to two years
lung <- lung |>
# Change variables
dplyr::mutate(# Set status to censored if we observed individuals longer than two years
1 status = ifelse(time > 365.25 * 2, 1, status),
# Cap time to two years
2 time = ifelse(time > 365.25 * 2, 365.25 * 2, time))- 1
-
If individuals died after two years, we did not observe an event within two years, therefore they become censored. Two years is defined by
365.25 * 2. - 2
- We change the time until event to a maximum of two years. If the time is already shorter than two years, we keep that time.
Note that we use 365.25 days in a year. Although formally this should be 365.24, the second decimal is negligible for most follow-up durations in medical studies.
The degree to which a prediction model is likely not overfitted depends in part on the number of cases and non-cases:
# First, recode status to 0 and 1
lung[["status"]] <- lung[["status"]] - 1
# Counts of non-cases and cases
table(lung[["status"]])
0 1
69 159
0 1
30.26316 69.73684
In the lung dataset, we can see that we have more cases than non-cases. In this case, we look at the number of non-cases to get an idea of the risk of overfitting. Given the already small sample size and the small number of non-cases (n = 69, 30.3%), we likely have a high risk of overfitting. Given the scope of this practical, we will leave that be for now, but recommend that in real research settings, you evaluate this risk and adjust your statistical analysis accordingly.
5.2 Dealing with missingness
Because weight loss had some missings, we used single imputation. Again, given the didactic purpose of this practical, this is fine, but in a real research setting, multiple imputation should be used.
# Single imputation of data
1imputed.lung <- mice::complete(mice::mice(lung, m = 1, seed = 1))- 1
-
As explained above, we immediately extract the single imputation from the
miceobject, which was configured for one imputation and a fixed seed to ensure reproducibility.
iter imp variable
1 1 wt.loss
2 1 wt.loss
3 1 wt.loss
4 1 wt.loss
5 1 wt.loss
5.3 Consider non-linearities for continuous variables
Because age and weight loss are continuous, we should investigate whether adding them as a linear term is sufficient or if we should model them non-linearly. This time, we will first use the rms package, and the alternative survival package will be used subsequently.
5.4 Method I: cph()
# define data distributions
dd <- rms::datadist(imputed.lung)
options(datadist="dd")
# fit a model allowing age to be non-linearly related to the outcome
fit.rms.lung.nonlinear <- rms::cph(
1 survival::Surv(time, status) ~
2 rms::rcs(age, 4) +
sex +
rms::rcs(wt.loss, 4),
3 data = imputed.lung)
# show model
4summary(fit.rms.lung.nonlinear)- 1
- Fit a Cox regression model, with the outcome being a survival object containing the event and time to event.
- 2
- Add predictors non-linear age, linear sex, and non-linear weight loss.
- 3
- We use a single imputed dataset.
- 4
- We present the hazard ratios using default settings.
Effects Response : survival::Surv(time, status)
Factor Low High Diff. Effect S.E. Lower 0.95 Upper 0.95
age 56 69.00 13.00 0.040185 0.21948 -0.38999 0.47036
Hazard Ratio 56 69.00 13.00 1.041000 NA 0.67706 1.60060
sex 1 2.00 1.00 -0.525160 0.17187 -0.86202 -0.18829
Hazard Ratio 1 2.00 1.00 0.591460 NA 0.42231 0.82838
wt.loss 0 15.25 15.25 0.249890 0.19062 -0.12372 0.62350
Hazard Ratio 0 15.25 15.25 1.283900 NA 0.88363 1.86540
Instead of the predicted probability, we plot the log relative hazard as a function of the predictor, using the plot(Predict()) function from {rms}.
Visually, it seems that age and wt.loss are non-linearly related to the outcome. Let’s formally test this.
Wald Statistics Response: survival::Surv(time, status)
Factor Chi-Square d.f. P
age 6.11 3 0.1062
Nonlinear 2.02 2 0.3636
sex 9.34 1 0.0022
wt.loss 4.42 3 0.2200
Nonlinear 4.34 2 0.1141
TOTAL NONLINEAR 6.40 4 0.1714
TOTAL 21.35 7 0.0033
However, the Wald test shows that the non-linear terms are not significant, so age and weight loss can be modeled as linearly.
5.4.1 Validating the prediction model
# fit linear model
fit.rms.lung.linear <- rms::cph(
survival::Surv(time, status) ~
age +
sex +
wt.loss,
x = TRUE,
y = TRUE,
data = imputed.lung
)
# internal bootstrap validation
rms::validate(fit.rms.lung.linear) index.orig training test optimism index.corrected n
Dxy 0.2091 0.2228 0.1918 0.0310 0.1781 40
R2 0.0667 0.0830 0.0588 0.0242 0.0425 40
Slope 1.0000 1.0000 0.8847 0.1153 0.8847 40
D 0.0100 0.0127 0.0087 0.0040 0.0059 40
U -0.0014 -0.0013 0.0010 -0.0024 0.0010 40
Q 0.0113 0.0141 0.0077 0.0064 0.0049 40
g 0.3795 0.4194 0.3516 0.0679 0.3117 40
5.5 Method II: coxph()
To fit the prediction model, we can use the coxph() function from the {survival} package. This works just like glm() in how we specify the prediction model, except that we specify the outcome using the Surv() function.
# Fit the non-linear prediction model
fit.rms.lung.nonlinear <- survival::coxph(
1 survival::Surv(time, status) ~
2 age + sex +
3 splines::ns(wt.loss, knots = c(1, 12)),
data = imputed.lung
) - 1
- Fit a Cox regression model, with the outcome being a survival object containing the event and time to event.
- 2
- Add predictors age (linearly) and sex.
- 3
-
Add weight loss as a natural cubic spline using the
ns()function from{splines}. We put knots at 1 and 12 based on the previous graph. The advantage ofrcs()is that it does not transform the underlying variable (i.e. not a B-spline matrix basis for a natural cubic spline) and is therefore easier to interpret and write down in a concise formula. However, we are forced to use at least 3 knots meaning more coefficients are estimated, whilstns()allows us to use a single knot.
Let’s see what the linear model looks like:
Call:
survival::coxph(formula = survival::Surv(time, status) ~ age +
sex + splines::ns(wt.loss, knots = c(1, 12)), data = imputed.lung)
n= 228, number of events= 159
coef exp(coef) se(coef) z
age 0.017394 1.017546 0.009493 1.832
sex -0.541317 0.581981 0.171290 -3.160
splines::ns(wt.loss, knots = c(1, 12))1 -0.314830 0.729913 0.483386 -0.651
splines::ns(wt.loss, knots = c(1, 12))2 -3.237030 0.039280 1.603606 -2.019
splines::ns(wt.loss, knots = c(1, 12))3 -1.264345 0.282424 0.930900 -1.358
Pr(>|z|)
age 0.06691 .
sex 0.00158 **
splines::ns(wt.loss, knots = c(1, 12))1 0.51485
splines::ns(wt.loss, knots = c(1, 12))2 0.04353 *
splines::ns(wt.loss, knots = c(1, 12))3 0.17440
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
exp(coef) exp(-coef) lower .95
age 1.01755 0.9828 0.998789
sex 0.58198 1.7183 0.416013
splines::ns(wt.loss, knots = c(1, 12))1 0.72991 1.3700 0.283017
splines::ns(wt.loss, knots = c(1, 12))2 0.03928 25.4580 0.001695
splines::ns(wt.loss, knots = c(1, 12))3 0.28242 3.5408 0.045553
upper .95
age 1.0367
sex 0.8142
splines::ns(wt.loss, knots = c(1, 12))1 1.8825
splines::ns(wt.loss, knots = c(1, 12))2 0.9103
splines::ns(wt.loss, knots = c(1, 12))3 1.7510
Concordance= 0.611 (se = 0.025 )
Likelihood ratio test= 19.33 on 5 df, p=0.002
Wald test = 18.66 on 5 df, p=0.002
Score (logrank) test = 19.19 on 5 df, p=0.002
We see that weight loss has three coefficients, one for the lowest value through 1 (1), one for 1 through 12 (2), and one for 12 through the highest value (3).
Now, let’s formally test if wt.loss needs to be modelled non-linearly using natural cubic splines.
# Fit the linear prediction model
fit.rms.lung.linear <- survival::coxph(survival::Surv(time, status) ~
age + sex +
wt.loss,
data = imputed.lung)
# likelihood ratio test
anova(fit.rms.lung.linear, fit.rms.lung.nonlinear, test = "Chisq")Analysis of Deviance Table
Cox model: response is survival::Surv(time, status)
Model 1: ~ age + sex + wt.loss
Model 2: ~ age + sex + splines::ns(wt.loss, knots = c(1, 12))
loglik Chisq Df Pr(>|Chi|)
1 -729.18
2 -727.36 3.6258 2 0.1632
This illustrates that weight loss does not need to be modelled linearly.
In the summary of our model, please note that we do not have an intercept. To calculate the final predicted risk of a Cox proportional hazards model, we need to use the baseline hazard instead of the intercept. However, the baseline hazard is different for each timepoint.
We can use basehaz() to calculate the baseline hazard at our timepoint of interest (2 years):
# Extract baseline hazard at 2 years
h0 <- dplyr::filter(survival::basehaz(fit.rms.lung.linear),
time == 365.25 * 2)[["hazard"]]
print(h0)[1] 2.137266
Discrimination of this model can be calculated using Harrell’s C-statistic: * The function cindex() from the {dynpred} package (does not require model fit, but does require development formula). * The function rcorr.cens() from the {Hmisc} package (does not require model fit, but may be slow). * The function cIndex() from the {intsurv} package.
The calibration slope can be calculated as before, using coxph() and a calibration can also be drawn using the same methods as discussed before. In the case of competing risks, take note of validation measures for competing risk settings. More on that can be found in Van Geloven et al. 2021 and Ramspek et al. 2021.
5.6 The curse of the model fit object revisited
We already saw that with the model fit object, validation of a prediction model becomes quite easy.
predict()
If we use predict() to calculate the survival probability for an individual based on a Cox regression model (thus using predict(fit, type = "survival"), note that the survival probability is not the probability at our specified time horizon, but at the end of their observed follow-up. Therefore, we cannot use this to calculate predictions (see also this issue opened by the author of this practical, who turned out to be wrong but learned along the way)
To get individual predictions, instead use predict(fit, type = "lp") and calculate the survival probability from the linear predictor using \(\hat{y}(t) = 1 - S_0(t)^{e^{LP_i}}\) where \(S_0(t) = e^{-H_0(t)}\) and \(H_0(t)\) is the baseline hazard.
However, in the absence of a model fit object, the validator may struggle more, especially for Cox regression models. By default, both coxph() and cph() estimate a centered Cox regression model. This means that the reference individual for the model is not someone without any binary predictor and with all continuous variables at 0, but someone without any binary predictor and with all continuous variables at the mean of that variable. If we then want to calculate an individual’s linear predictor, we also should have the linear predictor of the sample (\(LP_{sample}\)). To calculate an individual’s linear predictor for the final prediction, we substract \(LP_{sample}\) from their individual linear predictor \(LP_{individual}\): \(\hat{y}(t) = 1 - S_0(t)^{e^{LP_{individual} - LP_{sample}}}\). More on this can be found in this blogpost.
6 Final words
We have now taken a look at the basics for prediction modelling in R. Although many more packages are available for our goals (e.g. {riskRegression), we introduced the two most commonly used packages. If you start developing or validating a prediction model on your own, look back to this practical and never be afraid to ask, whether that be an experienced modeller, the internet, or (with due caution applied) a large language model.
7 Postscript
This practical was developed on:
R version 4.5.0 (2025-04-11 ucrt)
Platform: x86_64-w64-mingw32/x64
Running under: Windows 11 x64 (build 22631)
Matrix products: default
LAPACK version 3.12.1
locale:
[1] LC_COLLATE=English_United States.utf8
[2] LC_CTYPE=English_United States.utf8
[3] LC_MONETARY=English_United States.utf8
[4] LC_NUMERIC=C
[5] LC_TIME=English_United States.utf8
time zone: Europe/Amsterdam
tzcode source: internal
attached base packages:
[1] splines stats graphics grDevices utils datasets methods
[8] base
other attached packages:
[1] rms_8.0-0 Hmisc_5.2-3 rio_1.2.4 survival_3.8-3 pROC_1.19.0.1
[6] mice_3.18.0 ggExtra_0.11.0 ggplot2_3.5.2 dplyr_1.1.4
loaded via a namespace (and not attached):
[1] tidyselect_1.2.1 farver_2.1.2 R.utils_2.13.0 fastmap_1.2.0
[5] TH.data_1.1-3 pacman_0.5.1 promises_1.3.3 digest_0.6.37
[9] rpart_4.1.24 mime_0.13 lifecycle_1.0.4 cluster_2.1.8.1
[13] magrittr_2.0.3 compiler_4.5.0 rlang_1.1.6 tools_4.5.0
[17] yaml_2.3.10 data.table_1.17.2 knitr_1.50 labeling_0.4.3
[21] htmlwidgets_1.6.4 RColorBrewer_1.1-3 multcomp_1.4-28 miniUI_0.1.2
[25] polspline_1.1.25 withr_3.0.2 foreign_0.8-90 purrr_1.1.0
[29] R.oo_1.27.1 nnet_7.3-20 grid_4.5.0 jomo_2.7-6
[33] xtable_1.8-4 colorspace_2.1-1 scales_1.4.0 iterators_1.0.14
[37] MASS_7.3-65 mvtnorm_1.3-3 cli_3.6.5 rmarkdown_2.29
[41] reformulas_0.4.1 generics_0.1.4 rstudioapi_0.17.1 minqa_1.2.8
[45] stringr_1.5.1 base64enc_0.1-3 vctrs_0.6.5 boot_1.3-31
[49] glmnet_4.1-9 Matrix_1.7-3 sandwich_3.1-1 jsonlite_2.0.0
[53] SparseM_1.84-2 mitml_0.4-5 Formula_1.2-5 htmlTable_2.4.3
[57] foreach_1.5.2 tidyr_1.3.1 glue_1.8.0 nloptr_2.2.1
[61] pan_1.9 codetools_0.2-20 stringi_1.8.7 shape_1.4.6.1
[65] gtable_0.3.6 later_1.4.4 lme4_1.1-37 tibble_3.2.1
[69] pillar_1.10.2 htmltools_0.5.8.1 quantreg_6.1 R6_2.6.1
[73] Rdpack_2.6.4 evaluate_1.0.3 shiny_1.11.1 lattice_0.22-6
[77] R.methodsS3_1.8.2 rbibutils_2.3 backports_1.5.0 broom_1.0.8
[81] httpuv_1.6.16 MatrixModels_0.5-4 Rcpp_1.0.14 gridExtra_2.3
[85] nlme_3.1-168 checkmate_2.3.2 mgcv_1.9-1 xfun_0.52
[89] zoo_1.8-14 pkgconfig_2.0.3