We import the required data set.
larynxCancer <- read.csv("larynxCancer.csv")
larynxCancer %>% head() %>% kable()
| stage | time | age | year | death |
|---|---|---|---|---|
| 1 | 0.6 | 77 | 76 | 1 |
| 1 | 1.3 | 53 | 71 | 1 |
| 1 | 2.4 | 45 | 71 | 1 |
| 1 | 2.5 | 57 | 78 | 0 |
| 1 | 3.2 | 58 | 74 | 1 |
| 1 | 3.2 | 51 | 77 | 0 |
According to the question, we know there are 90 observations. The meanings of each variable are as follows:
stage means the stage fo the patient’s cancer, where
\(i\) represents stage \(i\);
time means the years between first treatment and either
death or the end of the study;
age means the patient’s age at the time of
diagnosis;
year means the year of diagnosis;
death means the state of patient, where 0 represents
censoring and 1 represents observation.
We fit the Cox proportional model to the time to death of the male larynx cancer patients with only stage in the model: \[\lambda(t|\text{stage})=\lambda_0(t)\mbox{exp}\big\{\beta_1I(\text{stage}=2)+\beta_2I(\text{stage}=3)+\beta_3I(\text{stage}=4)\big\}.\] The corresponding R code and output are as follows:
model1 <- coxph(Surv(time, death)~factor(stage), data = larynxCancer)
summary(model1)
## Call:
## coxph(formula = Surv(time, death) ~ factor(stage), data = larynxCancer)
##
## n= 90, number of events= 50
##
## coef exp(coef) se(coef) z Pr(>|z|)
## factor(stage)2 0.06481 1.06696 0.45843 0.141 0.8876
## factor(stage)3 0.61481 1.84930 0.35519 1.731 0.0835 .
## factor(stage)4 1.73490 5.66838 0.41939 4.137 3.52e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## factor(stage)2 1.067 0.9372 0.4344 2.62
## factor(stage)3 1.849 0.5407 0.9219 3.71
## factor(stage)4 5.668 0.1764 2.4916 12.90
##
## Concordance= 0.668 (se = 0.037 )
## Likelihood ratio test= 16.49 on 3 df, p=9e-04
## Wald test = 19.24 on 3 df, p=2e-04
## Score (logrank) test = 22.88 on 3 df, p=4e-05
According to the output above, we know the hazard ratio of Stage II vs. Stage I is \(\text{exp}(\hat{\beta}_1)=1.06696\); the hazard ratio of Stage III vs. Stage I is \(\text{exp}(\hat{\beta}_2)=1.84930\); the hazard ratio of Stage IV vs. Stage I is \(\text{exp}(\hat{\beta}_3)=5.66838\).
We fit the Cox proportional model with different stages of disease and the age of patients: \[\lambda(t|\text{stage})=\lambda_0(t)\mbox{exp}\big\{\beta_1I(\text{stage}=2)+\beta_2I(\text{stage}=3)+\beta_3I(\text{stage}=4)+\beta_4\text{age}\big\}.\]
model2 <- coxph(Surv(time, death)~factor(stage)+age, data = larynxCancer)
We are interest in testing the hypothesis: \[H_0:\beta_1=\beta_2=\beta_3=0 \ \text{ against } \ H_1:\text{at least on of } \ \beta_1,\beta_2,\beta_3 \ \text{is nonzero.}\] We conduct the Wald test as follows:
sub_beta <- model2$coefficients[1:3]
I_11 <- model2$var[1:3, 1:3]
z_wald <- (t(sub_beta) %*% solve(I_11) %*% sub_beta)[1,1]
p_value <- 1 - pchisq(z_wald, df = 3)
cat("The p value is", p_value)
## The p value is 0.0004566683
Because the \(p\)-value is \(0.00046<0.05\), we must reject the null hypothesis. That is, we accept the alternative hypothesis \(H_1:\) at least one of \(\beta_1,\beta_2,\beta_3\) is nonzero. Therefore, we think at least one of stage 2,3,4 is significantly different from stage I.
Then, we use likelihood ratio test to this hypothesis. We fit the Cox proportional model with only age of patients: \[\lambda(t|\text{stage})=\lambda_0(t)\mbox{exp}\big\{\beta_4\text{age}\big\}.\]
model3 <- coxph(Surv(time, death)~age, data = larynxCancer)
And we conduct the likelihood ratio test as follows:
anova(model2, model3)
## Analysis of Deviance Table
## Cox model: response is Surv(time, death)
## Model 1: ~ factor(stage) + age
## Model 2: ~ age
## loglik Chisq Df Pr(>|Chi|)
## 1 -187.71
## 2 -195.55 15.681 3 0.001318 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We find that \(p\)-value is \(0.001318<0.05\), we must reject the null hypothesis. Therefore, we obtain the same conclusion with the Wald test.
We are interest in testing the hypothesis: \[H_0:\beta_1=\beta_2=\beta_3 \ \text{ against } \ H_1:\beta_1,\beta_2,\beta_3 \ \text{are not all equal.}\] The null hypothesis \(H_0\) is equivalent to \(\beta_1-\beta_2=0\) and \(\beta_2-\beta_3=0\), which can be expression as \[\boldsymbol{C}\boldsymbol{\beta}=0,\] where \(\boldsymbol{\beta}=(\beta_1,\beta_2,\beta_3,\beta_4)^\top\) and \[ \boldsymbol{C}=\left( \begin{matrix} 1 & -1 & 0 & 0 \\ 0 & 1 & -1 & 0 \end{matrix} \right). \] According to question 4, we know the Wald test statistic is \[Z_W^2=\widehat{\boldsymbol{\beta}}^\top\boldsymbol{C}^\top\big[\boldsymbol{C}I^{-1}(\widehat{\boldsymbol{\beta}})\boldsymbol{C}^\top\big]^{-1}\boldsymbol{C}\widehat{\boldsymbol{\beta}}\sim\chi^2(q).\] Therefore, the corresponding R code and output is as follows:
C <- matrix(c(1,0,-1,1,0,-1,0,0), nrow = 2)
beta <- model2$coefficients %>% as.numeric()
I <- model2$var
z_wald <- (t(C %*% beta) %*% solve(C %*% I %*% t(C)) %*% C %*% beta)[1,1]
p_value <- 1 - pchisq(z_wald, df = 2)
cat("The p value is", p_value)
## The p value is 0.00429843
Because the \(p\)-value is \(0.00429843<0.05\), we must reject the null hypothesis. That is, we accept the alternative hypothesis \(H_1:\beta_1,\beta_2,\beta_3\) are not all equal. Therefore, we think the stage 2,3,4 are not all the same.
Then, we use likelihood ratio test to this hypothesis.
for (i in 1:90) {
if (larynxCancer$stage[i] != 1) {
larynxCancer$stage[i] <- 0
}
}
model4 <- coxph(Surv(time, death)~factor(stage) + age, data = larynxCancer)
anova(model2, model4)
## Analysis of Deviance Table
## Cox model: response is Surv(time, death)
## Model 1: ~ factor(stage) + age
## Model 2: ~ factor(stage) + age
## loglik Chisq Df Pr(>|Chi|)
## 1 -187.71
## 2 -192.74 10.057 2 0.006548 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
We find that \(p\)-value is \(0.006548<0.05\), we must reject the null hypothesis. Therefore, we obtain the same conclusion with the Wald test.