- Regression models are useful to model the impacts of different factors seperately
- This is important as lives are often not homogenous, aka lives with different characteristics have different level of mortality.
Cox Regression Assumption
- The hazard rate for the \(ith\) life with covariates \(Z_i\) is:
- Note that as Cox considers the exponential of the covariates, \(\beta\) represents a percentage change rather than an absolute change on the hazard rate.
\[ \lambda(t;Z_i) = \lambda_0(t)\exp(\beta Z_i^\top) = \lambda_0(t)\exp\left(\sum^p_{j=1}\beta_j z_{ij}\right) \]
- Where:
- \(\lambda_0(t)\) is the baseline hazard function, aka when all covariates are 0
- Note this is dependent on \(t\) but not the covariates
- \(\beta_1, \beta_2, ..., \beta_p\) are the regression parameters
- Note this is dependent on the covariates but independent of \(t\)
- \(z_{i1},z_{i2}, ..., z_{ip}\) are the covariates for the \(ith\) subject
- \(\lambda_0(t)\) is the baseline hazard function, aka when all covariates are 0
Relative Risk
- The ratio of hazard rates for two different lives \(x\) and \(y\) is constant at all times, which explains why the model is considered a ‘proportional hazard model’. This ratio is called the relative risk ratio:
\[ \frac{\lambda(t;Z_x)}{\lambda(t;Z_y)} = \exp\left(\sum^p_{j=1}\beta_j(z_{xj}-z_{yj})\right)\]
- \(\lambda_0(t)\) is a non linear function of \(t\), which is shifted up or down based on the exponential of the covariates.
Partial Likelihood Estimate of \(\beta\)
- We estimate \(\mathbf{\beta}\) by maximising the partial likelihood under the assumptions of noninformative censoring and independent lives.
- Note that \(R(t_j)\) is the set of the labels of all lives who are still under study at a time just prior to \(t_j\)
- This is called a partial likelihood as only the likelihood of deaths are considered, while censored observations still contribute to the denominator
Case 1 - One Death at each Observed time
Assume that only one death occurs at each \(t_j\), aka \(d_j=1\) for each \(t_j\)
The partial likelihood for the death at \(t_j\) is:
\[ P[\text{ Individual j dies at }t_j|\text{First death in }R(t_j)\text{ occurs at }t_j\text{ and is the only death at }t_j]\]
- Therefore the whole likelihood is the product of these probabilities:
\[ L = \prod^k_{j=1}\frac{\exp(\beta Z^\top_{(j)})}{\sum_{i\in R(t_j)}\exp(\beta Z_i^\top)}\]
Case 2 - Possible ties in the data
- Consider ties in the data, that is some \(d_j > 1\) or some observations are censored at an observed lifetime
- This complicates the partial likelihood as one needs to include the lives censored at time \(t_j\) in the risk set \(R(t_j)\) and all permutations of simultaneous events
- Partial Likelihood:
- Note \(s_j\) is the sum of the covariate vectors \(Z\) of the \(d_j\) lives observed to die at time \(t_j\). Note this is a vector of sums for each respective beta
- This approximation works well when the number of ties are relatively small
\[ L = \prod^k_{j=1}\frac{\exp(\beta s_j^\top)}{\left[\sum_{i\in R(t_j)}\exp(\beta Z_i^\top)\right]^{d_j}} \]
Properties of Maximum Partial Likelihood Estimator
The same properties as a normal maximum likelihood estimator
- Asymptotically unbiased, multivariate normally distributed
- MLE is the solution to the first derivative equated to 0
- Variance matrix is the information matrix \((I(\hat{\beta}))^{-1}\), where \((I(\hat{\beta}))\) is given by:
\[ I(\hat{\beta}) = \left(-\frac{\partial^2\ln L(\beta)}{\partial\beta_i\partial\beta_j}|_{\beta=\hat{\beta}}\right)\]
- Confidence interval is therefore:
\[ \hat{\beta}\pm z_{1-\alpha/2}\sqrt{Var(\hat{\beta})} \]
Signficance Testing of Extra \(\beta\)
- When testing whether it is worth adding a set of q \(\beta\)’s to an existing set of p \(\beta\)’s, the null hypothesis is:
\[ H_0: \beta_{p+1}-\beta_{p+2}=...=\beta_{p+q}=0\]
Likelihood Ratio Test
- The test statistic is:
\[ T = -2[\log(L_{p})-\log(L_{p+q})]\]
- The test statistic has \(\chi^2\) distribution with \(q\) degrees of freedom for large n under the null hypothesis
Wald Test
- The Wald statistic is:
\[ T = (\hat{\beta}_{p+1},...,\hat{\beta}_{p+q})\underbrace{[Cov(\hat{\beta}_{p+1},...,\hat{\beta}_{p+q})]^{-1}}_{q*q\text{ matrix}}(\hat{\beta}_{p+1},...,\hat{\beta}_{p+q})^\top\]
- The test statistic has \(\chi^2\) distribution with \(q\) degrees of freedom for large n under the null hypothesis
Estimation of Survival Function
- We need to fit a proportional hazards model to the data and obtain the maximum partial likelihood estimators.
- We also need to estimate the baseline cumulative hazard rate:
\[ \Lambda_0(t)=\int^t_0\lambda_0(s)ds \] * Breslow’s estimator of the baseline cumulative hazard rate is: + Note that we treat all individuals differently in the denominator, otherwise it would be the same estimate as the NA estimate
\[ \hat{\Lambda}_0(t) = \sum_{t_j\leq t}\frac{d_j}{\sum_{i\in R(t_j)}\exp(\hat{\beta}Z_i^\top)}\]
- Therefore since \(\hat{S}_0(t) = \exp(-\hat{\Lambda}_0(t))\):
\[ \hat{S}(t) = \exp(-\hat{\Lambda}_0(t)*\exp(\mathbf{\hat{\beta}}Z^\top)) = \hat{S}_0(t)^{\exp(\mathbf{\hat{\beta}}Z^\top)}\]
Cox-Snell Residuals
- Cox-Snell residuals are defined as:
\[ e_j = -\log(\hat{S}(X_j;z_j)) = \hat{\Lambda}_0(X_j)\exp\left(\sum^p_{k=1}\hat{\beta}_kz^\top_{jk}\right)\]
- If the model is correct and \(\hat{\beta}\) are approximately equal to the true values of \(\beta\), then the Cox-Snell residuals \(e_j\) behave as a censored sample from a unit exponential distribution.
- Therefore to check the goodness of fit of a Cox model, we check whether the Cox-Snell residuals behave as samples from a unit exponential random variable
- To check the behavior we need to:
- Compute thee NA estimator of the cumulative hazards rate of \(e_j\)’s: \(\Lambda_E(e_j)\)
- Plot \(\Lambda_E(e_j)\) against \(e_j\). This should be roughly a straight line from origin with a slope of 1.
Assessment of Proportionality Assumption
- To check for the proportional hazard assumption for a given covariate \(Z_1\) after adjusting for all other relevant covariates:
- Consider \(Z = (z_1, Z_2)\) where \(Z_2\) is the remaining \(p-1\) covariates
- Assume not interaction between \(z_1\) and other covariates
- Assume that \(z_1\) has \(K\) possible values
- Fit a cox model stratified on each value of \(z_1\), and let \(\hat{H}_{g0}(t)\) be the estimated cumlative baseline rate in the \(gth\) stratum
- So we have \(K\) models that should be proportional for the assumption to be valid with respect to covariate \(z_1\)
Graphical Diagnostic Tools
- Plot \(\ln[\hat{H}_{10}(t)], \ln[\hat{H}_{20}(t)], ..., \ln[\hat{H}_{K0}(t)]\) versus t
The difference of any two of them should not be dependent on \(t\), where \(z_{g_k}\) are the respective possible outcomes for \(z\)
\[ \ln\left(\frac{e^{\beta_1z_{1,g_1}}}{e^{\beta_1z_{1,g_2}}}\right) = \beta_1(z_{1,g_1} - z_{1,g_2})\]
Therefore if the assumption holds these curves should be approximately parallel.
- Could also plot the differences $- , …, - $ vs \(t\)
- This corresponds to plotting the expression above
- If the assumption holds, each curve should be roughly constant