RM2 2025 S2 Assignment 3
HTML Version: https://rpubs.com/bfrps/1366550
Abstract
Using Cox proportional hazards regression, we model the association between various factors and all-cause mortality.
The multivariable model demonstrated good discriminative performance (c-index = 0.733). Age had a strong linear association (HR = 1.09 per year, 95% CI: 1.08-1.11). Female gender was highly protective (HR = 0.46, 95% CI: 0.37-0.57). Diabetes conferred substantial mortality risk (HR = 2.64, 95% CI: 1.64-4.25). Obesity conferred moderate mortality risk (HR = 1.38, 95% CI: 1.02-1.89). Time-dependent angina was associated with 83% increased mortality hazard (HR = 1.83, 95% CI: 1.40-2.39). Education showed no significant association after multivariable adjustment.
These findings quantify key cardiovascular risk factors and support targeted interventions for high-risk populations.
Introduction
The Framingham Heart Study is a landmark longitudinal cardiovascular epidemiological study that began in 1948 and continues today, originally designed to identify risk factors for heart disease in a community-based population in Framingham, Massachusetts1.
This analysis investigates associations between key demographic and clinical factors (age, sex, BMI, diabetes, education, and angina) and all-cause mortality, using data from the Framingham Heart Study’s original 1948 cohort.
A critical aspect of this analysis is the treatment of angina as a time-dependent covariate. Unlike traditional risk factors that are measured at baseline, angina status can change during the follow-up period. Participants may develop angina at any point during observation, or may have been diagnosed prior to study enrolment. This time-varying nature requires the use of survival analysis techniques to account for the changing risk profile over time.
Understanding these associations allows us to estimate the relative cardiovascular risk of the various covariates and quantify their individual contributions to mortality outcomes. This analysis provides hazard ratio estimates that can be used to inform clinical decision-making, patient counselling, and potentially guide targeted interventions for high-risk populations.
Methods
Statistical Analysis
Cox proportional hazards regression was used to examine the association between covariates and mortality, accounting for the time-dependent nature of angina. The model included age, sex, BMI category, diabetes, education, and angina. Univariate and multi-variate analysis were performed.
Covariate Processing
Angina was modeled as a time-dependent covariate. Each participant’s follow-up was divided at the point of angina occurrence, creating separate rows for distinct risk periods with different angina status, as outlined in Table 3.
BMI categories were collapsed from six original categories into three categories (Underweight/Healthy Weight, Overweight, and Obese) to ensure adequate sample sizes and focus on clinically meaningful thresholds.
Results
Baseline Characteristics
The study population demonstrated typical characteristics of a middle-aged to older adult cohort. The mean age at baseline reflects the demographic profile expected for cardiovascular risk assessment, with representation across educational levels. Due to low numbers in some BMI categories necessitated collapsing the original six categories into three groups due to small numbers in some individual categories, particularly in the extreme BMI ranges (underweight and highest obesity categories). The diabetes prevalence and mortality rate during follow-up are consistent with expected outcomes for this population and follow-up duration.
Survival Analysis
Univariate Analysis
The univariate analyses revealed significant crude associations for most covariates examined:
- Age had a strong linear association with mortality - each additional year associated with 8.9% increased hazard (HR = 1.089, 95% CI: 1.075-1.103, p < 0.001)
- Women had 39% lower mortality hazard compared to men (HR = 0.609, 95% CI: 0.494-0.750, p < 0.001)
- Having diabetes substantially increased mortality hazard (estimated more than four-fold!) (HR = 4.057, 95% CI: 2.552-6.450, p < 0.001)
- Education had a mild protective effect with high school completion (HR = 0.641, 95% CI: 0.497-0.827, p = 0.001) and some college (HR = 0.614, 95% CI: 0.444-0.849, p = 0.003) showing significant protection. College degree showed non-significant protective trend (HR = 0.819, 95% CI: 0.583-1.151, p = 0.250)
- Angina had a strong association with increased mortality risk - more than 2.6-fold increased hazard following angina onset (HR = 2.617, 95% CI: 2.021-3.390, p < 0.001)
- Obesity demonstrated significantly increased mortality risk (HR = 1.867, 95% CI: 1.390-2.508, p < 0.001) compared to underweight/healthy weight reference. Overweight status showed non-significant trend toward increased risk (HR = 1.231, 95% CI: 0.977-1.550, p = 0.078) compared to underweight/healthy weight reference.
Multivariable Analysis
The multivariable Cox proportional hazards model had strong overall performance with a concordance index of 0.733, indicating good discriminative ability. The likelihood ratio test (χ² = 266.7, df = 9, p < 0.001) confirmed the model significantly improved prediction compared to a null model.
The results of the multivariable analysis revealed several key findings:
- Age was a strong predictor of mortality, with each additional year associated with a 9% increase in hazard (HR = 1.09, 95% CI: 1.08-1.11, p < 0.001)
- Gender was a very strong predictor of mortality, with women having 54% lower hazard of death compared to men (HR = 0.46, 95% CI: 0.37-0.57, p < 0.001).
- Diabetes was the strongest association, with those with diabetes having a 2.64-fold increase in mortality risk (HR = 2.64, 95% CI: 1.64-4.25, p < 0.001). The magnitude of the association was substantially attenuated in the multivariable model (univariate HR = 4.06 vs multivariable HR = 2.64), indicating that age and other factors partially mediate the association of diabetes and all-cause mortality.
- Obesity significantly elevated mortality risk by 38% compared to underweight/healthy weight participants (HR = 1.38, 95% CI: 1.02-1.89, p = 0.038). Overweight status showed no significant association with mortality (HR = 0.92, 95% CI: 0.73-1.17, p = 0.49)
- The development of angina during follow-up was associated with an 83% increase in mortality hazard (HR = 1.83, 95% CI: 1.40-2.39, p < 0.001), highlighting the clinical effects of incident cardiovascular symptoms on future cardiovascular risk.
- Despite showing significant protective effects in univariate analysis (high school: HR = 0.64, p = 0.001; some college: HR = 0.61, p = 0.003), education level showed no significant association with mortality after adjustment for other covariates (global p = 0.5), suggesting that socioeconomic effects may be mediated through other measured risk factors.
Model Performance
The model demonstrated good discriminative performance (c-index = 0.733) and satisfied key statistical assumptions, including proportional hazards. The comprehensive diagnostic assessment revealed some evidence of potential non-linearity in age effects, but sensitivity analyses using splined age terms confirmed that linear modeling provides adequate fit without significantly changing substantive conclusions.
Conclusion
This analysis of Framingham Heart Study data demonstrates strong independent associations between age, sex, diabetes, obesity, and incident angina with all-cause mortality. The time-dependent modeling captured angina’s dynamic impact on mortality risk.
The findings’ generalizability is limited by the study’s restriction to the predominantly white 1948 Framingham cohort, which may not reflect contemporary populations or diverse demographic groups.
The model’s good discriminative performance supports its utility for risk stratification, though broader validation is needed for generalizability to modern clinical practice.
References
Appendices
Appendix 1: Univariate Model Results
Appendix 2: Multivariate Model Results
Appendix 3: Kaplan-Meier Curves
Appendix 4: Time-Dependent Angina Covariate Structure
Appendix 5: Study Participant Characteristics
Appendix 6: Model Diagnostics
Appendix 7: Alternative Model - Restricted Cubic Splines for Age
for reference, not reported due to worse fit
Further analysis of appropriate model specification accommodating for systematic patterns in deviance residuals is considered trivial and left as an exercise for the reader.
Appendix 8: R Code
Show code
blockquote {
font-size: inherit;
}
knitr::opts_chunk$set(echo = TRUE, message = FALSE, warning = FALSE)
# Load required libraries
library(knitr)
library(tidyverse)
library(survival)
library(survminer)
library(gridExtra)
library(kableExtra)
library(gtsummary)
library(broom.helpers)
library(parameters)
library(rms)
library(broom)
library(gt)
# Set ggplot theme
theme_set(theme_minimal())
# Load the Framingham data
data <- read.csv("assn3_2025_S2.csv")
# Convert categorical variables to factors
data <- data |>
mutate(
sex = factor(sex, levels = c(1, 2), labels = c("Male", "Female")),
bmicat = factor(bmicat, levels = 1:6,
labels = c("Underweight", "Healthy Weight", "Overweight",
"Obese I", "Obese II", "Obese III")),
diabetes = factor(diabetes, levels = c(0, 1), labels = c("No", "Yes")),
educ = factor(educ, levels = 1:4,
labels = c("0-11 years", "High School", "Some College", "College degree")),
death = factor(death, levels = c(0, 1), labels = c("Censored", "Death"))
) |>
# Combine BMI categories
mutate(
bmicat_collapsed = case_when(
bmicat %in% c("Underweight", "Healthy Weight") ~ "Underweight/Healthy Weight",
bmicat == "Overweight" ~ "Overweight",
bmicat %in% c("Obese I", "Obese II", "Obese III") ~ "Obese",
TRUE ~ as.character(bmicat)
),
bmicat_collapsed = factor(bmicat_collapsed,
levels = c("Underweight/Healthy Weight", "Overweight", "Obese"))
)
# Create time-dependent dataset for angina scenarios
tdc_data <- data |>
rowwise() |>
mutate(
rows = case_when(
# Case 1: Angina at baseline (timeap = 0)
timeap == 0 ~ list(tibble(
randid = randid,
tstart = 0,
tstop = timedth,
death = as.numeric(death) - 1,
age = age,
sex = sex,
bmicat = bmicat,
bmicat_collapsed = bmicat_collapsed,
diabetes = diabetes,
educ = educ,
angina = 1
)),
# Case 2: Angina during follow-up (timeap != 0)
timeap <= timedth ~ list(bind_rows(
# Before angina
tibble(
randid = randid,
tstart = 0,
tstop = timeap,
death = 0,
age = age,
sex = sex,
bmicat = bmicat,
bmicat_collapsed = bmicat_collapsed,
diabetes = diabetes,
educ = educ,
angina = 0
),
# After angina
tibble(
randid = randid,
tstart = timeap,
tstop = timedth,
death = as.numeric(death) - 1,
age = age,
sex = sex,
bmicat = bmicat,
bmicat_collapsed = bmicat_collapsed,
diabetes = diabetes,
educ = educ,
angina = 1
)
)),
# Case 3: No angina during study
TRUE ~ list(tibble(
randid = randid,
tstart = 0,
tstop = timedth,
death = as.numeric(death) - 1,
age = age,
sex = sex,
bmicat = bmicat,
bmicat_collapsed = bmicat_collapsed,
diabetes = diabetes,
educ = educ,
angina = 0
))
)
) |>
# Expand nested rows and clean up
select(rows) |>
unnest(rows) |>
ungroup()
# Create survival objects
surv_obj <- Surv(data$timedth, as.numeric(data$death) - 1)
surv_tdc <- Surv(tdc_data$tstart, tdc_data$tstop, tdc_data$death)
# Univariate models
uni_models <- list(
age = coxph(surv_tdc ~ age, data = tdc_data),
sex = coxph(surv_tdc ~ sex, data = tdc_data),
bmicat_collapsed = coxph(surv_tdc ~ bmicat_collapsed, data = tdc_data),
diabetes = coxph(surv_tdc ~ diabetes, data = tdc_data),
educ = coxph(surv_tdc ~ educ, data = tdc_data),
angina = coxph(surv_tdc ~ angina, data = tdc_data)
)
# Create univariate results table
uni_results <- map_dfr(names(uni_models), ~{
model <- uni_models[[.x]]
tidy(model, exponentiate = TRUE, conf.int = TRUE) |>
mutate(variable = .x)
}) |>
select(variable, term, estimate, conf.low, conf.high, p.value)
kable(uni_results,
col.names = c("Variable", "Term", "HR", "95% CI Lower", "95% CI Upper", "P-value"),
digits = 3)
# Fit multivariable model
mv_model <- coxph(surv_tdc ~ age + sex + bmicat_collapsed + diabetes + educ + angina,
data = tdc_data)
# Display results using gtsummary
tbl_regression(mv_model,
exponentiate = TRUE,
add_estimate_to_reference_rows = TRUE,
label = list(
age ~ "Age (per year)",
sex ~ "Sex",
bmicat_collapsed ~ "BMI Category",
diabetes ~ "Diabetes",
educ ~ "Education Level",
angina ~ "Angina (Time-dependent)"
)) |>
add_n(location = "level") |>
add_nevent(location = "level") |>
add_global_p() |>
add_glance_source_note(
label = list(
concordance = "Concordance (c-index)",
statistic.log = "Likelihood Ratio Test",
p.value.log = "LR Test p-value",
AIC = "AIC",
BIC = "BIC"
),
include = c(statistic.log, p.value.log, concordance)
) |>
bold_p(t = 0.05) |>
bold_labels() |>
as_gt()
# KM curves by different variables
km_plot_sex <- ggsurvplot(survfit(surv_obj ~ sex, data = data),
data = data, pval = TRUE, conf.int = TRUE,
title = "Survival by Sex", xlab = "Time (days)")$plot
km_plot_diabetes <- ggsurvplot(survfit(surv_obj ~ diabetes, data = data),
data = data, pval = TRUE, conf.int = TRUE,
title = "Survival by Diabetes", xlab = "Time (days)")$plot
km_plot_education <- ggsurvplot(survfit(surv_obj ~ educ, data = data),
data = data, pval = TRUE, conf.int = TRUE,
title = "Survival by Education", xlab = "Time (days)")$plot
km_plot_bmi <- ggsurvplot(survfit(surv_obj ~ bmicat_collapsed, data = data),
data = data, pval = TRUE, conf.int = TRUE,
title = "Survival by BMI Category", xlab = "Time (days)")$plot
grid.arrange(km_plot_sex, km_plot_diabetes, km_plot_education, km_plot_bmi, ncol = 2)
angina_table <- data.frame(
"Participant Group" = c("Angina at baseline (timeap = 0)",
"Develop angina during follow-up",
"Never develop angina"),
"# of Intervals" = c("1", "2", "1"),
"Interval Description" = c("Single interval with angina = 1",
"Before angina onset (angina = 0) & after onset (angina = 1)",
"Single interval with angina = 0"),
check.names = FALSE
)
kable(angina_table,
format = "markdown",
align = c("l", "c", "l"))
baseline_data <- data |>
select(age, sex, bmicat_collapsed, diabetes, educ, death, timedth) |>
mutate(
death_binary = ifelse(death == "Death", 1, 0),
follow_up_years = timedth / 365.25
)
tbl_summary(baseline_data,
include = c(age, sex, bmicat_collapsed, diabetes, educ, death_binary, follow_up_years),
statistic = list(
all_continuous() ~ "{mean} ({sd})",
all_categorical() ~ "{n} ({p}%)"
),
label = list(
age ~ "Age (years)",
sex ~ "Sex",
bmicat_collapsed ~ "BMI Category",
diabetes ~ "Diabetes",
educ ~ "Education Level",
death_binary ~ "Deaths",
follow_up_years ~ "Follow-up (years)"
)) |>
add_n() |>
bold_labels()
# Histograms for continuous variables
histogram_age <- ggplot(data, aes(x = age)) +
geom_histogram(bins = 30, fill = "thistle2", color = "black", alpha = 0.7) +
labs(title = "Distribution of Age", x = "Age (years)", y = "Frequency")
histogram_timedth <- ggplot(data, aes(x = timedth)) +
geom_histogram(bins = 30, fill = "lightblue2", color = "black", alpha = 0.7) +
labs(title = "Distribution of Follow-up Time",
x = "Time to Death/Censoring (days)", y = "Frequency")
histogram_timeap <- ggplot(data, aes(x = timeap)) +
geom_histogram(bins = 30, fill = "lightgreen", color = "black", alpha = 0.7) +
labs(title = "Distribution of Time to Angina",
x = "Time to Angina Diagnosis (days)", y = "Frequency")
grid.arrange(histogram_age, histogram_timedth, histogram_timeap, ncol = 3)
# Boxplots for continuous variables
boxplot_age <- ggplot(data, aes(x = age)) +
geom_boxplot(fill = "thistle2", alpha = 0.7) +
labs(title = "Boxplot of Age", x = "Age (years)") +
coord_flip()
boxplot_timedth <- ggplot(data, aes(x = timedth)) +
geom_boxplot(fill = "lightblue2", alpha = 0.7) +
labs(title = "Boxplot of Follow-up Time", x = "Time (days)") +
coord_flip()
boxplot_timeap <- ggplot(data, aes(x = timeap)) +
geom_boxplot(fill = "lightgreen", alpha = 0.7) +
labs(title = "Boxplot of Time to Angina", x = "Time (days)") +
coord_flip()
grid.arrange(boxplot_age, boxplot_timedth, boxplot_timeap, ncol = 3)
# Stacked bar charts for categorical variables
bar_sex <- ggplot(data, aes(x = "", fill = sex)) +
geom_bar(position = "stack", alpha = 0.7, color = "black", width = 0.7) +
geom_text(aes(label = after_stat(sprintf("%d\n(%.1f%%)", count, 100*count/sum(count)))),
stat = 'count', position = position_stack(vjust = 0.5), size = 3.5) +
labs(title = "Distribution of Sex", x = "", y = "Frequency", fill = "Sex")
bar_diabetes <- ggplot(data, aes(x = "", fill = diabetes)) +
geom_bar(position = "stack", alpha = 0.7, color = "black", width = 0.7) +
geom_text(aes(label = after_stat(sprintf("%d\n(%.1f%%)", count, 100*count/sum(count)))),
stat = 'count', position = position_stack(vjust = 0.5), size = 3.5) +
labs(title = "Distribution of Diabetes", x = "", y = "Frequency", fill = "Diabetes")
bar_educ <- ggplot(data, aes(x = "", fill = educ)) +
geom_bar(position = "stack", alpha = 0.7, color = "black", width = 0.7) +
geom_text(aes(label = after_stat(sprintf("%d\n(%.1f%%)", count, 100*count/sum(count)))),
stat = 'count', position = position_stack(vjust = 0.5), size = 3.5) +
labs(title = "Distribution of Education",
x = "", y = "Frequency", fill = "Education Level")
bar_bmicat <- ggplot(data, aes(x = "", fill = bmicat)) +
geom_bar(position = "stack", alpha = 0.7, color = "black", width = 0.7) +
geom_text(aes(label = after_stat(sprintf("%d\n(%.1f%%)", count, 100*count/sum(count)))),
stat = 'count', position = position_stack(vjust = 0.5), size = 3) +
labs(title = "Distribution of BMI Category",
x = "", y = "Frequency", fill = "BMI Category") +
theme(legend.text = element_text(size = 8))
bar_death <- ggplot(data, aes(x = "", fill = death)) +
geom_bar(position = "stack", alpha = 0.7, color = "black", width = 0.7) +
geom_text(aes(label = after_stat(sprintf("%d\n(%.1f%%)", count, 100*count/sum(count)))),
stat = 'count', position = position_stack(vjust = 0.5), size = 3.5) +
labs(title = "Distribution of Death Status",
x = "", y = "Frequency", fill = "Death Status")
# Combine BMI categories
data <- data |>
mutate(
bmicat_collapsed = case_when(
bmicat %in% c("Underweight", "Healthy Weight") ~ "Underweight/Healthy Weight",
bmicat == "Overweight" ~ "Overweight",
bmicat %in% c("Obese I", "Obese II", "Obese III") ~ "Obese",
TRUE ~ as.character(bmicat)
),
bmicat_collapsed = factor(bmicat_collapsed,
levels = c("Underweight/Healthy Weight", "Overweight", "Obese"))
)
# Stacked bar chart of clapsed BMI Category
bar_bmicat_collapsed <- ggplot(data, aes(x = "", fill = bmicat_collapsed)) +
geom_bar(position = "stack", alpha = 0.7, color = "black", width = 0.7) +
geom_text(aes(label = after_stat(sprintf("%d\n(%.1f%%)", count, 100*count/sum(count)))),
stat = 'count', position = position_stack(vjust = 0.5), size = 3.5) +
labs(title = "Distribution of Collapsed BMI Category",
x = "", y = "Frequency", fill = "BMI Category") +
theme_minimal()
grid.arrange(bar_sex, bar_diabetes, bar_educ, bar_bmicat, bar_bmicat_collapsed, bar_death, ncol = 3)
# Proportional hazards assumption test
ph_test <- cox.zph(mv_model)
print(ph_test)
# Plot Schoenfeld residuals
ggcoxzph(ph_test, font.main = 12)
# Martingale residuals to test for linearity of age
mart_resid <- residuals(mv_model, type = "martingale")
# Martingale residuals plot for age
plot(tdc_data$age, mart_resid,
xlab = "Age (years)",
ylab = "Martingale Residuals",
main = "Assessment of Age Linearity")
lines(lowess(tdc_data$age, mart_resid), col = "thistle2", lwd = 2)
abline(h = 0, lty = 2, col = "gray")
# Deviance residuals
dev_resid <- residuals(mv_model, type = "deviance")
par(mfrow = c(2, 2))
# Deviance residuals plot for linear predictor
plot(predict(mv_model), dev_resid,
xlab = "Linear Predictor", ylab = "Deviance Residuals",
main = "Deviance Residuals vs Linear Predictor")
abline(h = 0, col = "thistle2", lty = 2)
# Deviance residuals plot for age
plot(tdc_data$age, dev_resid,
xlab = "Age", ylab = "Deviance Residuals",
main = "Deviance Residuals vs Age")
abline(h = 0, col = "thistle2", lty = 2)
# DFBETA plot for age
dfbeta_vals <- residuals(mv_model, type = "dfbeta")
plot(dfbeta_vals[,1], ylab = "DFBETA for Age",
main = "Influential Observations (Age)")
abline(h = c(-2/sqrt(nrow(tdc_data)), 2/sqrt(nrow(tdc_data))),
col = "thistle2", lty = 2)
# DFBETA plot for angina
angina_col <- which(colnames(dfbeta_vals) == "angina")
if(length(angina_col) > 0) {
plot(dfbeta_vals[,angina_col], ylab = "DFBETA for Angina",
main = "Influential Observations (Angina)")
abline(h = c(-2/sqrt(nrow(tdc_data)), 2/sqrt(nrow(tdc_data))),
col = "thistle2", lty = 2)
}
par(mfrow = c(1, 1))
# Fit model with restricted cubic splines for age
dd <- datadist(tdc_data)
options(datadist = 'dd')
# Fit spline model
mv_model_spline <- coxph(surv_tdc ~ rcs(age, 4) + sex + bmicat_collapsed +
diabetes + educ + angina, data = tdc_data)
# Likelihood ratio test
anova(mv_model, mv_model_spline)
# Display spline model coefficients
tbl_regression(mv_model_spline,
exponentiate = TRUE,
label = list(
"rcs(age, 4)" ~ "Age (splined)",
sex ~ "Sex",
bmicat_collapsed ~ "BMI Category",
diabetes ~ "Diabetes",
educ ~ "Education Level",
angina ~ "Angina (Time-dependent)"
)) |>
add_glance_source_note(
label = list(concordance = "Concordance"),
include = concordance
) |>
bold_p(t = 0.05) |>
bold_labels() |>
as_gt() |>
tab_footnote("Spline model not selected due to non-significant improvement (LR test p = 0.34)")
# Spline plot for age
age_vals <- seq(min(tdc_data$age), max(tdc_data$age), length.out = 100)
# Create prediction data
pred_data <- data.frame(
age = age_vals,
sex = factor("Male", levels = levels(tdc_data$sex)),
bmicat_collapsed = factor("Underweight/Healthy Weight",
levels = levels(tdc_data$bmicat_collapsed)),
diabetes = factor("No", levels = levels(tdc_data$diabetes)),
educ = factor("High School", levels = levels(tdc_data$educ)),
angina = 0
)
# Get predictions
linear_pred <- predict(mv_model, newdata = pred_data, type = "lp")
spline_pred <- predict(mv_model_spline, newdata = pred_data, type = "lp")
# Plot comparison
plot_data <- data.frame(
age = age_vals,
linear = exp(linear_pred - linear_pred[1]),
spline = exp(spline_pred - spline_pred[1])
) |>
pivot_longer(cols = c(linear, spline), names_to = "model", values_to = "hr")
ggplot(plot_data, aes(x = age, y = hr, color = model)) +
geom_line(size = 1.2) +
labs(x = "Age (years)",
y = "Relative Hazard Ratio",
title = "Age-Mortality Relationship: Linear vs Splined",
color = "Model") +
scale_color_manual(values = c("linear" = "lightblue2", "spline" = "thistle2"),
labels = c("Linear", "Splined (4 knots)"))
codebook <- tribble(
~Variable, ~Description,
"randid", "Identifying ID",
"age", "Age in years",
"sex", "Sex (1=male, 2=female)",
"bmicat", "BMI category (1-6: Underweight to Obese III)",
"diabetes", "Diabetes diagnosis (0=no, 1=yes)",
"educ", "Education (1-4: 0-11 years to College degree)",
"timeap", "Time to angina diagnosis (days)",
"death", "Death (0=censored, 1=death)",
"timedth", "Time to death/censoring (days)"
)
codebook |>
gt() |>
cols_label(Variable = "Variable", Description = "Description")