According to the National Office of Vital Statistics in the US Department of Health, the average number of accidental drownings per year in the United States is 3.0 per 100,000 persons in the population. Use the Poisson distribution to calculate the probability that in a city of 200,000 people there will be:
\[ \begin{aligned} 3 \text{ per 100,000} & \implies\lambda = 6 \text{ per 200,000}\\ &\text{thus we use Poisson distribution:}\\ f\left(x\right) &=Pr(X=x) =\frac{e^{-\lambda}\lambda^x}{x!} =\frac{e^{-6}6^x}{x!}\\ Pr(X=2) & =\frac{e^{-6}6^2}{2!}=0.0446 \end{aligned} \]
\[ \begin{aligned} Pr(3\leq X\leq 6) & =\frac{e^{-6}6^3}{3!}+\frac{e^{-6}6^4}{4!}+\frac{e^{-6}6^5}{5!}+\frac{e^{-6}6^6}{6!}\\ Pr(3\leq X\leq 6) &=0.5443 \end{aligned} \]
\[ \begin{aligned} Pr(X\lt 3) & =\frac{e^{-6}6^0}{0!}+\frac{e^{-6}6^1}{1!}+\frac{e^{-6}6^2}{2!}\\ Pr(X\lt 3) &=0.0620 \end{aligned} \]
\[ \begin{aligned} P(X\geq 3) & =1-P(X\lt 3)\\ P(X\geq 3) & =1-\left(\frac{e^{-6}6^0}{0!}+\frac{e^{-6}6^1}{1!}+\frac{e^{-6}6^2}{2!}\right)\\ P(X\geq 3) & =1-0.0620\\ P(X\geq 3) & =0.938 \end{aligned} \]
Using the “melanom” data found in the package ISwR, conduct a survival analysis using R software.
But before conducting the analysis, what is malignant melanoma? From literature, what are some of the key determinants of survival for patients who have undergone an operation in malignant melanoma?
Malignant melanoma, also known as melanoma is a type of cancer occurs on the skin. Melanoma develops from melanocytes; melanocytes are pigment-producing cells. In rare cases melanoma may also occur in the eyes, mouth or intestines. In men, melanoma commonly occur in the back while in women they commonly occur on the legs. One of the causes of melanoma is ultraviolet light (UV) exposure that may be triggered by low levels of melanin, a skin pigment. Other predisposing factors include: many moles, genes, compromised immune system among others.
Diagnosis is done via biopsy.
Avoiding UV light and use of sunscreen may prevent melanoma. Treatment is removal of cancerous tissue through surgery. Biologic therapy, radiation therapy, immunotherapy or chemotherapy is use for treating advanced melanoma and may improve survival.
Key determinants of the survival of patients who have undergone surgery in melanoma include: tumor thickness, metastasis rate and whether or not the overlying skin has broken down (commonly referred as ulceration).
Consider the “melanom” data. Which variables do we find in the data?
What is the structure (string, factor, numeric, etc) of the variables in
the data? Convert ulceration and sex to factor variables and check that
the structure of the variables in the data recognizes the two as
factors. To check the structure of the variables, use the R code:
str(melanom)
Explain what the data is all about. Explain in general, the role of
covariates in the melanoma data in relation to M. melanoma.
knitr::opts_chunk$set(echo = TRUE)
library(ISwR)
data("melanom")
names(melanom)
## [1] "no" "status" "days" "ulc" "thick" "sex"
str(melanom)
## 'data.frame': 205 obs. of 6 variables:
## $ no : int 789 13 97 16 21 469 685 7 932 944 ...
## $ status: int 3 3 2 3 1 1 1 1 3 1 ...
## $ days : int 10 30 35 99 185 204 210 232 232 279 ...
## $ ulc : int 1 2 2 2 1 1 1 1 1 1 ...
## $ thick : int 676 65 134 290 1208 484 516 1288 322 741 ...
## $ sex : int 2 2 2 1 2 2 2 2 1 1 ...
#?melanom() # get detailed description of "melanom" data
melanom$ulc <- as.factor(melanom$ulc) # convert to factor
melanom$sex <- as.factor(melanom$sex) # convert to factor
class(melanom$sex) # check the variable type
## [1] "factor"
class(melanom$ulc) # check the variable type
## [1] "factor"
The melanom data frame has 205 rows and 7 columns. It contains data relating to the survival of patients after an operation for malignant melanoma, collected at Odense University Hospital by K.T. Drzewiecki. The data frame has the following variables:
Survival following operation may depend on a number of factors for example, tumor thickness. These factors generally become the covariates when fitting a Cox PH model to the data. For example, \(\texttt{thick}\), \(\texttt{sex}\), \(\texttt{ulc}\) in the “melanom” data. However, their effect on the survival should be tested to check if they are significant.
You wish to test the effect of covariate sex on post operation survival for malignant melanoma. Determine the hazard ratio associated with the covariate “sex.” What is its confidence interval? Based on the associated p-value, do you think that males are more likely to die after an operation from M. melanoma?
knitr::opts_chunk$set(echo = TRUE)
library(survival)
##
## Attaching package: 'survival'
## The following object is masked from 'package:ISwR':
##
## lung
(cox.fit <- coxph(Surv(days, status == 1) ~ sex, data = melanom)) # Fit a cox regression model with covariate sex
## Call:
## coxph(formula = Surv(days, status == 1) ~ sex, data = melanom)
##
## coef exp(coef) se(coef) z p
## sex2 0.6622 1.9390 0.2651 2.498 0.0125
##
## Likelihood ratio test=6.15 on 1 df, p=0.01314
## n= 205, number of events= 57
summary(cox.fit)
## Call:
## coxph(formula = Surv(days, status == 1) ~ sex, data = melanom)
##
## n= 205, number of events= 57
##
## coef exp(coef) se(coef) z Pr(>|z|)
## sex2 0.6622 1.9390 0.2651 2.498 0.0125 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## sex2 1.939 0.5157 1.153 3.26
##
## Concordance= 0.59 (se = 0.034 )
## Likelihood ratio test= 6.15 on 1 df, p=0.01
## Wald test = 6.24 on 1 df, p=0.01
## Score (logrank) test = 6.47 on 1 df, p=0.01
The model fit above is significant (Likelihood ratio test = 6.15 on 1 d.f, p-value = 0.01). The hazard ratio associated with the covariate “\(\texttt{sex}\)” is 1.939, with a confidence interval of (1.153, 3.26) and p-value of 0.0125. Thus the effect of sex on post operation survival for malignant melanoma is significant.
Interpretation: males are \(93.9\%\) more likely to die following operation per unit time as compared to females.
Adjust for Tumour thickness, in the model in 2 above (model the logarithm of thickness instead of thickness in the model). How does the effect of sex change? What is its effect of tumour thickness itself on the post operation survival to M. melanoma?
knitr::opts_chunk$set(echo = TRUE)
# Fit a cox regression model with sex and thick as covariates
library(survival)
melanom$logthick <- log(melanom$thick)
(cox.fit2 <- coxph(Surv(days, status == 1) ~ sex + logthick, data = melanom))
## Call:
## coxph(formula = Surv(days, status == 1) ~ sex + logthick, data = melanom)
##
## coef exp(coef) se(coef) z p
## sex2 0.4580 1.5809 0.2687 1.705 0.0883
## logthick 0.7809 2.1834 0.1573 4.963 6.94e-07
##
## Likelihood ratio test=33.45 on 2 df, p=5.453e-08
## n= 205, number of events= 57
summary(cox.fit2)
## Call:
## coxph(formula = Surv(days, status == 1) ~ sex + logthick, data = melanom)
##
## n= 205, number of events= 57
##
## coef exp(coef) se(coef) z Pr(>|z|)
## sex2 0.4580 1.5809 0.2687 1.705 0.0883 .
## logthick 0.7809 2.1834 0.1573 4.963 6.94e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## sex2 1.581 0.6326 0.9337 2.677
## logthick 2.183 0.4580 1.6040 2.972
##
## Concordance= 0.749 (se = 0.033 )
## Likelihood ratio test= 33.45 on 2 df, p=5e-08
## Wald test = 31 on 2 df, p=2e-07
## Score (logrank) test = 32.52 on 2 df, p=9e-08
The model fit above is significant (Likelihood ratio test = 33.45 on 2 d.f, p-value = <0.001). Adjusting for tumor thickness, the hazard ratio associated with the covariate “sex” now changes to 1.581, with a confidence interval of (0.9337, 2.677) and p-value of 0.09. Thus the effect of sex on post operation survival for malignant melanoma is no longer significant after adjusting for tumor thickness.
Interpretation: Adjusting for tumor thickness, males are 58.1% more likely to die per unit time as compared to females.
Adjusting for sex, the hazard ratio associated with the covariate log of “thick” is 2.183, with a confidence interval of (1.604, 2.972) and p-value of <0.001. Thus the effect of tumor thickness on post operation survival for malignant melanoma is significant after adjusting for sex.
Interpretation: Adjusting for sex, a person is 2.183 times likely to die per unit time for every additional unit in log of tumor thickness.
To the model in 3 above, add ulceration as a stratification covariate. Is there any effect modification on the covariates sex and thickness of tumour?
knitr::opts_chunk$set(echo = TRUE)
# Fit a cox regression model with covariates sex and thick; and ulceration as a stratification covariate
library(survival)
(cox.fit3 <- coxph(Surv(days, status == 1) ~ sex + logthick + strata(ulc), data = melanom))
## Call:
## coxph(formula = Surv(days, status == 1) ~ sex + logthick + strata(ulc),
## data = melanom)
##
## coef exp(coef) se(coef) z p
## sex2 0.3600 1.4333 0.2702 1.332 0.1828
## logthick 0.5599 1.7505 0.1784 3.139 0.0017
##
## Likelihood ratio test=13.3 on 2 df, p=0.001296
## n= 205, number of events= 57
summary(cox.fit3)
## Call:
## coxph(formula = Surv(days, status == 1) ~ sex + logthick + strata(ulc),
## data = melanom)
##
## n= 205, number of events= 57
##
## coef exp(coef) se(coef) z Pr(>|z|)
## sex2 0.3600 1.4333 0.2702 1.332 0.1828
## logthick 0.5599 1.7505 0.1784 3.139 0.0017 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## sex2 1.433 0.6977 0.844 2.434
## logthick 1.750 0.5713 1.234 2.483
##
## Concordance= 0.673 (se = 0.039 )
## Likelihood ratio test= 13.3 on 2 df, p=0.001
## Wald test = 12.88 on 2 df, p=0.002
## Score (logrank) test = 12.98 on 2 df, p=0.002
The model above is significant (Likelihood ratio test = 13.3 on 2 d.f, p-value = 0.001). Adding a stratification variable \(\texttt{ulc}\) modifies the effect of \(\texttt{sex}\) and \(\texttt{tumor thickness}\) on survival for malignant melanoma. \(\texttt{sex}\) is no longer significant in predicting survival for malignant melanoma (p-value = 0.18). However, tumor thickness is still significant (p-value = 0.0017) with hazards ratio 1.750 and confidence interval (1.234, 2.483).
Adjusting for sex and stratifying by ulceration, a person is \(75\%\) more likely to die per unit time for every additional unit in log of tumor thickness.
Adjusting for tumor thickness and stratifying by ulceration, a male patient is \(43.3\%\) more likely to die per unit time as compared to a female patient. However, the effect of sex is not statistically significant.
Compare the above models using the Likelihood Ratio statistics.
knitr::opts_chunk$set(echo = TRUE)
# To test for significance of model fit, the p-value of the likelihood ratio test statistic can be obtained as follows:
#anova(cox.fit, cox.fit2, test="LRT") # compare model 1 with model 2
#anova(cox.fit2, cox.fit3, test="LRT") # compare model 2 with model 3
# Alternative R commands
library(zoo)
##
## Attaching package: 'zoo'
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
library(lmtest)
lrtest(cox.fit, cox.fit2)
## Likelihood ratio test
##
## Model 1: Surv(days, status == 1) ~ sex
## Model 2: Surv(days, status == 1) ~ sex + logthick
## #Df LogLik Df Chisq Pr(>Chisq)
## 1 1 -280.12
## 2 2 -266.48 1 27.299 1.743e-07 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
lrtest(cox.fit2, cox.fit3)
## Likelihood ratio test
##
## Model 1: Surv(days, status == 1) ~ sex + logthick
## Model 2: Surv(days, status == 1) ~ sex + logthick + strata(ulc)
## #Df LogLik Df Chisq Pr(>Chisq)
## 1 2 -266.48
## 2 2 -229.74 0 73.475 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Check the proportional hazard assumption for all covariates in the melanoma data.
The proportional hazards (PH) assumption can be checked using statistical tests and graphical diagnostics based on the scaled Schoenfeld residuals. In principle, the Schoenfeld residuals are independent of time. A plot that shows a non-random pattern against time is evidence of violation of the PH assumption. The function cox.zph() [in the survival package] provides a convenient solution to test the proportional hazards assumption for each covariate included in a Cox refression model fit. For each covariate, the function cox.zph() correlates the corresponding set of scaled Schoenfeld residuals with time, to test for independence between residuals and time. Additionally, it performs a global test for the model as a whole. The proportional hazard assumption is supported by a non-significant relationship between residuals and time, and refuted by a significant relationship.
knitr::opts_chunk$set(echo = TRUE)
library(survival)
library(ggplot2)
library(survminer)
## Loading required package: ggpubr
##
## Attaching package: 'survminer'
## The following object is masked from 'package:survival':
##
## myeloma
cox.fit3 <- coxph(Surv(days, status == 1) ~ sex + logthick + strata(ulc), data = melanom)
# test for the proportional-hazards (PH) assumption
(test.ph <- cox.zph(cox.fit3))
## chisq df p
## sex 0.754 1 0.385
## logthick 4.469 1 0.035
## GLOBAL 4.755 2 0.093
From the output above, the test is not statistically significant for each of the covariates, and the global test is also not statistically significant. Therefore, we can assume the proportional hazards.
It’s possible to do a graphical diagnostic using the function ggcoxzph() [in the survminer package], which produces, for each covariate, graphs of the scaled Schoenfeld residuals against the transformed time.
knitr::opts_chunk$set(echo = TRUE)
library(survival)
library(ggplot2)
library(survminer)
ggcoxzph(test.ph)
In the figure above, the solid line is a smoothing spline fit to the plot, with the dashed lines representing a +/- 2-standard-error band around the fit. Systematic departures from a horizontal line are indicative of non-proportional hazards, since proportional hazards assumes that estimates \(\beta_1, \beta_2\) do not vary much over time. From the graphical inspection, there is no pattern with time. The assumption of proportional hazards appears to be supported for the covariates sex and tumor thickness.
Are there any influential observations in any of the covariates?
To test influential observations or outliers, we can visualize either:
The function ggcoxdiagnostics()[in survminer package] provides a convenient solution for checkind influential observations. The simplified format is as follow:
ggcoxdiagnostics(fit, type = , linear.predictions = TRUE)
Specifying the argument type = “dfbeta”, plots the estimated changes in the regression coefficients upon deleting each observation in turn; likewise, type=“dfbetas” produces the estimated changes in the coefficients divided by their standard errors.
knitr::opts_chunk$set(echo = TRUE)
ggcoxdiagnostics(cox.fit3, type = "dfbeta",
linear.predictions = FALSE, ggtheme = theme_bw())
## `geom_smooth()` using formula = 'y ~ x'
(Index plots of dfbeta for the Cox regression of survival time post
melanoma operation on sex and tumor thickness)
The above index plots show that comparing the magnitudes of the largest
dfbeta values to the regression coefficients suggests that none of the
observations is terribly influential individually, even though some of
the dfbeta values for sex and log of tumor thickness are large compared
with the others.
It’s also possible to check outliers by visualizing the deviance residuals. The deviance residual is a normalized transform of the martingale residual. These residuals should be roughtly symmetrically distributed about zero with a standard deviation of 1.
Example of deviance residuals:
knitr::opts_chunk$set(echo = TRUE)
ggcoxdiagnostics(cox.fit3, type = "deviance",
linear.predictions = FALSE, ggtheme = theme_bw())
## `geom_smooth()` using formula = 'y ~ x'
The pattern looks fairly symmetric around 0.
For the continuous variable “tomour thickness,” check if the linearity assumption holds and if not, which functional form is suitable for modeling thickness (logarithm, square root, etc)?
Plotting the Martingale residuals against continuous covariates is a
common approach used to detect nonlinearity or, in other words, to
assess the functional form of a covariate. For a given continuous
covariate, patterns in the plot may suggest that the variable is not
properly fit.
Martingale residuals may present any value in the range \((-\infty, +1)\):
To assess the functional form of a continuous variable in a Cox
proportional hazards model, we use the function ggcoxfunctional() [in
the survminer R package].
The function ggcoxfunctional() displays graphs of continuous covariates
against martingale residuals of null cox proportional hazards model.
This might help to properly choose the functional form of continuous
variable in the Cox model. Fitted lines with lowess function should be
linear to satisfy the Cox proportional hazards model assumptions.
knitr::opts_chunk$set(echo = TRUE)
ggcoxfunctional(Surv(days, status == 1) ~ thick + log(thick) + sqrt(thick), data = melanom)
## Warning: arguments formula is deprecated; will be removed in the next version;
## please use fit instead.
It appears that, nonlinearity is slightly here. Logarithm appears to be the most suitable functional form for modelling tumor thickness.
Now consider the ovarian data in survival package in R.
What is the data about? Explain the covariates and the survival and status variable outcome in the data?
knitr::opts_chunk$set(echo = TRUE)
library(survival) # load the survival package
data("ovarian") # Get ovarian data in survival package
## Warning in data("ovarian"): data set 'ovarian' not found
str(ovarian) #
## 'data.frame': 26 obs. of 6 variables:
## $ futime : num 59 115 156 421 431 448 464 475 477 563 ...
## $ fustat : num 1 1 1 0 1 0 1 1 0 1 ...
## $ age : num 72.3 74.5 66.5 53.4 50.3 ...
## $ resid.ds: num 2 2 2 2 2 1 2 2 2 1 ...
## $ rx : num 1 1 1 2 1 1 2 2 1 2 ...
## $ ecog.ps : num 1 1 2 1 1 2 2 2 1 2 ...
#help("ovarian") # Get description of ovarian data
Ovarian Cancer Survival Data
Description
Survival in a randomised trial comparing two treatments for ovarian
cancer
“ovarian” data provides survival status of ovarian cancer patients. Survival may depend on various factors such as age, treatment, and presence of residual disease. These factors generally become the covariates when fitting a Cox PH model to the data. However, their effect on the survival should be tested to check if they are significant.
For a CoxPH model with age and ECOG performance as covariates, test for proportional hazards assumption in the covariates.
Cox regression model with covariates age and ECOG
performance
knitr::opts_chunk$set(echo = TRUE)
# Fit a cox regression model with covariates age and ECOG performance
library(survival)
(cox.model <- coxph(Surv(futime, fustat == 1) ~ age + factor(ecog.ps), data = ovarian))
## Call:
## coxph(formula = Surv(futime, fustat == 1) ~ age + factor(ecog.ps),
## data = ovarian)
##
## coef exp(coef) se(coef) z p
## age 0.16150 1.17527 0.04992 3.235 0.00122
## factor(ecog.ps)2 0.01866 1.01884 0.59908 0.031 0.97515
##
## Likelihood ratio test=14.29 on 2 df, p=0.000787
## n= 26, number of events= 12
summary(cox.model)
## Call:
## coxph(formula = Surv(futime, fustat == 1) ~ age + factor(ecog.ps),
## data = ovarian)
##
## n= 26, number of events= 12
##
## coef exp(coef) se(coef) z Pr(>|z|)
## age 0.16150 1.17527 0.04992 3.235 0.00122 **
## factor(ecog.ps)2 0.01866 1.01884 0.59908 0.031 0.97515
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## exp(coef) exp(-coef) lower .95 upper .95
## age 1.175 0.8509 1.0657 1.296
## factor(ecog.ps)2 1.019 0.9815 0.3149 3.296
##
## Concordance= 0.784 (se = 0.085 )
## Likelihood ratio test= 14.29 on 2 df, p=8e-04
## Wald test = 10.54 on 2 df, p=0.005
## Score (logrank) test = 12.26 on 2 df, p=0.002
The model above is significant (Likelihood ratio test = 14.29 on 2 d.f, p-value <0.001)
The proportional hazard ratio for covariate age is 1.175 with confidence interval (1.066, 1.296) and p-value = 0.001. Thus age is significant in predicting survival adjusting for ECOG performance. Hence adjusting for ECOG performance, an individual is 17.5% more likely to die per unit time for every additional year. ECOG performance is not significant (p-value = 0.975) adjusting for age.
Proportional hazards
The proportional hazards (PH) assumption can be checked using
statistical tests and graphical diagnostics based on the scaled
Schoenfeld residuals. In principle, the Schoenfeld residuals are
independent of time. A plot that shows a non-random pattern against time
is evidence of violation of the PH assumption. The function cox.zph()
[in the survival package] provides a convenient solution to test the
proportional hazards assumption for each covariate included in a Cox
refression model fit. For each covariate, the function cox.zph()
correlates the corresponding set of scaled Schoenfeld residuals with
time, to test for independence between residuals and time. Additionally,
it performs a global test for the model as a whole. The proportional
hazard assumption is supported by a non-significant relationship between
residuals and time, and refuted by a significant relationship.
knitr::opts_chunk$set(echo = TRUE)
# test for the proportional-hazards (PH) assumption for covariates age and ECOG performance
(test.ph2 <- cox.zph(cox.model))
## chisq df p
## age 0.698 1 0.40
## factor(ecog.ps) 2.371 1 0.12
## GLOBAL 3.633 2 0.16
From the output above, the test is not statistically significant for each of the covariates, and the global test is also not statistically significant. Therefore, we can assume the proportional hazards.
It’s possible to do a graphical diagnostic using the function ggcoxzph() [in the survminer package], which produces, for each covariate, graphs of the scaled Schoenfeld residuals against the transformed time.
knitr::opts_chunk$set(echo = TRUE)
ggcoxzph(test.ph2)
In the figure above, the solid line is a smoothing spline fit to the plot, with the dashed lines representing a +/- 2-standard-error band around the fit. Systematic departures from a horizontal line are indicative of non-proportional hazards, since proportional hazards assumes that estimates \(\beta_1, \beta_2\) do not vary much over time. From the graphical inspection, there is no pattern with time. The assumption of proportional hazards appears to be supported for the covariates age and ecog.
Construct a similar model with an interaction term between age and ECOG performance. Test for proportional hazards assumption again. What do you observe?
Cox regression model with interaction between age ECOG
performance
knitr::opts_chunk$set(echo = TRUE)
# Fit a cox regression model with covariates age and ECOG performance and interaction term between the two covariates
library(survival)
(cox.model2 <- coxph(Surv(futime, fustat == 1) ~ age * ecog.ps, data = ovarian))
## Call:
## coxph(formula = Surv(futime, fustat == 1) ~ age * ecog.ps, data = ovarian)
##
## coef exp(coef) se(coef) z p
## age 0.159995 1.173505 0.134564 1.189 0.234
## ecog.ps -0.042843 0.958061 5.140546 -0.008 0.993
## age:ecog.ps 0.001008 1.001008 0.083669 0.012 0.990
##
## Likelihood ratio test=14.29 on 3 df, p=0.00253
## n= 26, number of events= 12
summary(cox.model2)
## Call:
## coxph(formula = Surv(futime, fustat == 1) ~ age * ecog.ps, data = ovarian)
##
## n= 26, number of events= 12
##
## coef exp(coef) se(coef) z Pr(>|z|)
## age 0.159995 1.173505 0.134564 1.189 0.234
## ecog.ps -0.042843 0.958061 5.140546 -0.008 0.993
## age:ecog.ps 0.001008 1.001008 0.083669 0.012 0.990
##
## exp(coef) exp(-coef) lower .95 upper .95
## age 1.1735 0.8521 9.015e-01 1.528
## ecog.ps 0.9581 1.0438 4.034e-05 22752.743
## age:ecog.ps 1.0010 0.9990 8.496e-01 1.179
##
## Concordance= 0.784 (se = 0.083 )
## Likelihood ratio test= 14.29 on 3 df, p=0.003
## Wald test = 10.55 on 3 df, p=0.01
## Score (logrank) test = 12.27 on 3 df, p=0.007
The model is significant after including interaction between age and ECOG performance (Likelihood ratio test = 14.29 p-value = 0.003) though p-value has changed from <0.001 to 0.003. However, after adding interaction term the covariates are no longer significant (p-values > 0.05)
Proportional Hazards
knitr::opts_chunk$set(echo = TRUE)
# test for the proportional-hazards (PH) assumption
(test.ph3 <- cox.zph(cox.model2))
## chisq df p
## age 0.732 1 0.392
## ecog.ps 3.443 1 0.064
## age:ecog.ps 2.285 1 0.131
## GLOBAL 4.444 3 0.217
From the output above, the test is not statistically significant for each of the covariates, including the interaction term, and the global test is also not statistically significant. Therefore, we can assume the proportional hazards.
knitr::opts_chunk$set(echo = TRUE)
ggcoxzph(test.ph3)
In the figure above, the solid line is a smoothing spline fit to the plot, with the dashed lines representing a +/- 2-standard-error band around the fit. Systematic departures from a horizontal line are indicative of non-proportional hazards, since proportional hazards assumes that estimates \(\beta_1, \beta_2, \beta_3\) do not vary much over time. From the graphical inspection, there is no pattern with time. The assumption of proportional hazards appears to be supported for the covariates age and ecog and interaction term.
Is the linear functional form for age suitable in this model? If not (or rather assume so), what functional form is appropriate for the covariate age in this model?
Plotting the Martingale residuals against continuous covariates is used to detect nonlinearity or, in other words, to assess the functional form of a covariate. For a given continuous covariate (age in this case), patterns in the plot may suggest that the variable is not properly fit. Martingale residuals may present any value in the range \((-\infty, +1)\):
knitr::opts_chunk$set(echo = TRUE)
ggcoxfunctional(Surv(futime, fustat == 1) ~ age + log(age) + sqrt(age), data = ovarian)
## Warning: arguments formula is deprecated; will be removed in the next version;
## please use fit instead.
Linear form for age maybe suitable for for the model. However we may take logarithm or squareroot functional form of age to eliminate non-linearility.
Are there any influential observations in any of the covariates that may lead to over or under - estimation of the effect of any of the covariates?
To test influential observations or outliers, we can visualize either:
The function ggcoxdiagnostics()[in survminer package] provides a convenient solution for checkind influential observations. The simplified format is as follow:
ggcoxdiagnostics(fit, type = , linear.predictions = TRUE)
Specifying the argument type = “dfbeta”, plots the estimated changes in the regression coefficients upon deleting each observation in turn; likewise, type=“dfbetas” produces the estimated changes in the coefficients divided by their standard errors.
knitr::opts_chunk$set(echo = TRUE)
# model without interaction between age and ECOG
ggcoxdiagnostics(cox.model, type = "dfbeta",
linear.predictions = FALSE, ggtheme = theme_bw())
## `geom_smooth()` using formula = 'y ~ x'
# model with interaction term between age and ECOG
ggcoxdiagnostics(cox.model2, type = "dfbeta",
linear.predictions = FALSE, ggtheme = theme_bw())
## `geom_smooth()` using formula = 'y ~ x'
(First plot: Index plots of dfbeta for the Cox regression of survival
time on age and ECOG performance)
(Second plot: Index plots of dfbeta for the Cox regression of survival
time on age and ECOG performance and the interaction term)
The above index plots show that comparing the magnitudes of the largest
dfbeta values to the regression coefficients suggests that none of the
observations is terribly influential individually, even though some of
the dfbeta values for age and ECOG performance and interaction term are
large compared with the others.
It’s also possible to check outliers by visualizing the deviance residuals. The deviance residual is a normalized transform of the martingale residual. These residuals should be roughtly symmetrically distributed about zero with a standard deviation of 1.
knitr::opts_chunk$set(echo = TRUE)
# model with interaction term between age and ECOG performance
ggcoxdiagnostics(cox.model2, type = "deviance",
linear.predictions = FALSE, ggtheme = theme_bw())
## `geom_smooth()` using formula = 'y ~ x'
# model without interaction term
ggcoxdiagnostics(cox.model, type = "deviance",
linear.predictions = FALSE, ggtheme = theme_bw())
## `geom_smooth()` using formula = 'y ~ x'
For both models, the pattern looks fairly symmetric around 0.