Mehraveh_H

Importing the file:

library(readr)
whas <- read_csv("~/Documents/UW/PHARM 656/8th Assignment/whas.csv")
Rows: 481 Columns: 12
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
dbl (12): id, age, sex, cpk, sho, chf, miord, mitype, lenfol, fstat, vartime...

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
View(whas)

1. Plot a Kaplan-Meier Curve

options(repos = c(CRAN = "https://cran.r-project.org"))
install.packages("survival")

The downloaded binary packages are in
    /var/folders/ml/9gd13rnj3h58_bnp0zq09v5r0000gn/T//RtmplftjFE/downloaded_packages
library(survival)
install.packages("survminer")

The downloaded binary packages are in
    /var/folders/ml/9gd13rnj3h58_bnp0zq09v5r0000gn/T//RtmplftjFE/downloaded_packages
library(survminer)
Loading required package: ggplot2
Loading required package: ggpubr

Attaching package: 'survminer'
The following object is masked from 'package:survival':

    myeloma
km_fit <- survfit(Surv(lenfol, fstat) ~ 1, data = whas)

# Plot the survival curve
ggsurvplot(km_fit, 
           data = whas,
           xlab = "Days",
           ylab = "Survival Probability",
           title = "Kaplan-Meier Survival Curve")

The Kaplan-Meier survival curve shows how survival chances change over time. At the start of the study, everyone is alive (100% survival), but as time goes on, the survival rate gradually decreases. About 75% of participants are still alive at 1,000 days, around half make it to 3,000 days, and roughly 25% survive beyond 4,000 days. The shaded area around the curve shows the 95% confidence intervals, giving an idea of the uncertainty in the estimates. These intervals get wider as time goes on because there are fewer participants to track later in the study. Overall, the curve paints a clear picture of a steady decline in survival rates throughout the study.

2. Fit a Proportional Hazard Model with AGE and SEX

# Cox model with AGE and SEX
cox_fit <- coxph(Surv(lenfol, fstat) ~ age + sex, data = whas)
summary(cox_fit)
Call:
coxph(formula = Surv(lenfol, fstat) ~ age + sex, data = whas)

  n= 481, number of events= 249 

        coef exp(coef) se(coef)     z Pr(>|z|)    
age 0.042907  1.043841 0.005533 7.755  8.8e-15 ***
sex 0.084890  1.088597 0.132682 0.640    0.522    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

    exp(coef) exp(-coef) lower .95 upper .95
age     1.044     0.9580    1.0326     1.055
sex     1.089     0.9186    0.8393     1.412

Concordance= 0.665  (se = 0.018 )
Likelihood ratio test= 70.79  on 2 df,   p=4e-16
Wald test            = 69.09  on 2 df,   p=1e-15
Score (logrank) test = 69.68  on 2 df,   p=7e-16

The Cox proportional hazards model reveals that age plays a key role in predicting survival. Each additional year of age increases the risk of death by 4.4% (hazard ratio: 1.044, 95% confidence interval: 1.033–1.055, p < 0.001). On the other hand, sex (male versus female) does not significantly affect survival (hazard ratio: 1.089, 95% confidence interval: 0.839–1.412, p = 0.522). The model’s concordance index of 0.665 indicates a moderate ability to predict survival times based on the factors studied. In summary, age is a strong and significant predictor of survival, whereas sex does not appear to have a meaningful impact in this group.

3. Test the Significance of Interaction Between AGE and SEX

# Cox model with interaction
cox_fit_interaction <- coxph(Surv(lenfol, fstat) ~ age * sex, data = whas)
summary(cox_fit_interaction)
Call:
coxph(formula = Surv(lenfol, fstat) ~ age * sex, data = whas)

  n= 481, number of events= 249 

             coef exp(coef)  se(coef)      z Pr(>|z|)    
age      0.051817  1.053183  0.007301  7.097 1.28e-12 ***
sex      1.549953  4.711246  0.792489  1.956   0.0505 .  
age:sex -0.020350  0.979856  0.010885 -1.869   0.0616 .  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

        exp(coef) exp(-coef) lower .95 upper .95
age        1.0532     0.9495    1.0382     1.068
sex        4.7112     0.2123    0.9967    22.269
age:sex    0.9799     1.0206    0.9592     1.001

Concordance= 0.669  (se = 0.017 )
Likelihood ratio test= 74.26  on 3 df,   p=5e-16
Wald test            = 68.96  on 3 df,   p=7e-15
Score (logrank) test = 70.03  on 3 df,   p=4e-15

The Cox proportional hazards model, including an interaction between age and sex, confirms that age is a strong predictor of survival. For every additional year of age, the risk of death increases by 5.3% (hazard ratio: 1.053, 95% confidence interval: 1.038–1.068, p < 0.001). While the effect of sex on survival is close to being significant (p = 0.0505), the hazard ratio of 4.711 (95% CI: 0.997–22.269) suggests a potential higher risk for females compared to males, although the wide confidence interval highlights uncertainty in this estimate. The interaction term between age and sex is not statistically significant (p = 0.0616), with a hazard ratio of 0.980 (95% CI: 0.959–1.001), indicating that the impact of age on survival does not differ substantially between males and females. The model’s concordance index of 0.669 suggests a moderate ability to predict survival times. In summary, while age remains a strong predictor of survival, adding the interaction between age and sex does not significantly improve the model’s predictive power.

4. Hazard Ratio for SEX (Female vs Male) at Age 50

# Calculate hazard ratio at age 50
coef <- coef(cox_fit_interaction)
hr_sex <- exp(coef["sex"] + coef["age:sex"] * 50)
hr_sex
     sex 
1.703143 

The hazard ratio for SEX (female vs male) for a person 50 years old is :

1.703143 

5. Likelihood Ratio Test for Including MITYPE

# Model with MITYPE
cox_with_mitype <- coxph(Surv(lenfol, fstat) ~ age + sex + mitype, data = whas)

# Likelihood ratio test
anova(cox_fit, cox_with_mitype, test = "LRT")
Analysis of Deviance Table
 Cox model: response is  Surv(lenfol, fstat)
 Model 1: ~ age + sex
 Model 2: ~ age + sex + mitype
   loglik  Chisq Df Pr(>|Chi|)
1 -1384.6                     
2 -1383.8 1.6108  1     0.2044

The deviance analysis compares two Cox proportional hazards models: Model 1 includes age and sex, while Model 2 adds myocardial infarction type (mitype) as a variable. Although the log-likelihood slightly improves in Model 2 (from -1384.6 to -1383.8), the likelihood ratio test indicates that this improvement is not statistically significant (Chi-squared = 1.6108, df = 1, p = 0.2044). In simpler terms, adding mitype does not meaningfully enhance the model’s ability to predict survival outcomes compared to using age and sex alone. As a result, including mitype in the model is not supported by this analysis.

6. Hazard Ratio and CI for 10 Unit Increase in AGE

Fit a model with age and compute the hazard ratio for a 10-year increase.

# Cox model with AGE
cox_age <- coxph(Surv(lenfol, fstat) ~ age, data = whas)
summary(cox_age)
Call:
coxph(formula = Surv(lenfol, fstat) ~ age, data = whas)

  n= 481, number of events= 249 

        coef exp(coef) se(coef)     z Pr(>|z|)    
age 0.043918  1.044897 0.005302 8.284   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

    exp(coef) exp(-coef) lower .95 upper .95
age     1.045      0.957     1.034     1.056

Concordance= 0.664  (se = 0.018 )
Likelihood ratio test= 70.38  on 1 df,   p=<2e-16
Wald test            = 68.63  on 1 df,   p=<2e-16
Score (logrank) test = 69.08  on 1 df,   p=<2e-16
# Hazard ratio for 10-unit increase
hr_10yr <- exp(coef(cox_age)["age"] * 10)
ci_10yr <- exp(confint(cox_age)["age", ] * 10)
list(HR = hr_10yr, CI = ci_10yr)
$HR
     age 
1.551441 

$CI
   2.5 %   97.5 % 
1.398327 1.721321 

The Cox proportional hazards model highlights age as a highly significant factor in predicting survival (p < 2e-16). For each additional year of age, the risk of death increases by about 4.5% (hazard ratio: 1.045, 95% CI: 1.034–1.056). The model’s concordance index of 0.664 reflects a moderate ability to predict survival times based on age. Statistical tests, including the likelihood ratio, Wald, and log-rank tests, all strongly confirm the importance of age as a predictor. Furthermore, a 10-year increase in age is associated with a 55% higher risk of death (hazard ratio: 1.55, 95% CI: 1.40–1.72), emphasizing the significant impact of age on survival outcomes. In summary, age is a powerful and reliable predictor of survival in this population.

7. Adjusted Survival Curves for CHF and CPK

# Model with AGE, CHF, and CPK
cox_chf_cpk <- coxph(Surv(lenfol, fstat) ~ age + chf + cpk, data = whas)

# Adjusted survival curves
new_data <- data.frame(age = mean(whas$age, na.rm = TRUE),
                       cpk = mean(whas$cpk, na.rm = TRUE),
                       chf = c(0, 1))
surv_fit <- survfit(cox_chf_cpk, newdata = new_data)

# Plot adjusted curves
ggsurvplot(surv_fit, 
           data = whas,
           legend.title = "CHF",
           legend.labs = c("No CHF", "CHF"))

The survival curves compare patients with and without congestive heart failure (CHF). Patients without CHF (red curve) consistently show higher survival rates throughout the study than those with CHF (blue curve). At every time point, patients with CHF have lower survival probabilities, clearly indicating that CHF is linked to a worse outcome. The shaded areas around the curves represent the 95% confidence intervals, and these intervals remain separate for the two groups, further confirming the difference in survival. In summary, having CHF significantly reduces survival chances in this group.

8. Check Proportional Hazard Assumption

# Schoenfeld residuals for CHF
cox.zph(cox_chf_cpk)
        chisq df       p
age     0.136  1   0.712
chf     5.508  1   0.019
cpk    25.568  1 4.3e-07
GLOBAL 32.140  3 4.9e-07
# Time-dependent interaction term for MIORD
whas$miord_time <- whas$miord * whas$lenfol
cox_time <- coxph(Surv(lenfol, fstat) ~ age + chf + miord + miord_time, data = whas)
summary(cox_time)
Call:
coxph(formula = Surv(lenfol, fstat) ~ age + chf + miord + miord_time, 
    data = whas)

  n= 481, number of events= 249 

                 coef  exp(coef)   se(coef)      z Pr(>|z|)    
age         0.0302328  1.0306944  0.0056368  5.363 8.16e-08 ***
chf         0.5794898  1.7851273  0.1358585  4.265 2.00e-05 ***
miord       2.0558255  7.8132849  0.1871876 10.983  < 2e-16 ***
miord_time -0.0012323  0.9987684  0.0001395 -8.835  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

           exp(coef) exp(-coef) lower .95 upper .95
age           1.0307     0.9702    1.0194     1.042
chf           1.7851     0.5602    1.3678     2.330
miord         7.8133     0.1280    5.4138    11.276
miord_time    0.9988     1.0012    0.9985     0.999

Concordance= 0.786  (se = 0.013 )
Likelihood ratio test= 266  on 4 df,   p=<2e-16
Wald test            = 216.9  on 4 df,   p=<2e-16
Score (logrank) test = 250  on 4 df,   p=<2e-16

The Cox proportional hazards model identifies several key factors that significantly impact survival. Age is a significant predictor, with each additional year increasing the risk of death by 3.1% (hazard ratio: 1.031, 95% CI: 1.019–1.042). Congestive heart failure (CHF) greatly increases the risk, raising the hazard of death by 78.5% (HR: 1.785, 95% CI: 1.368–2.330). The order of myocardial infarction (MI) also has a strong effect, with recurrent MIs associated with a much higher risk of death compared to the first MI (HR: 7.813, 95% CI: 5.414–11.276). Additionally, the time-dependent interaction term for MI order (miord_time) shows that the hazard slightly decreases over time (HR: 0.999, 95% CI: 0.998–0.999). The overall model is highly significant (p < 2e-16), and its concordance index of 0.786 indicates strong predictive power. In summary, age, CHF, and MI-related factors play a critical role in influencing survival outcomes.