Semi-Parametric

Regression models are useful to model the impacts of different factors seperately
- This is important as lives are often not homogenous, aka lives with different characteristics have different level of mortality.

Cox Regression Assumption

The hazard rate for the $ith$ life with covariates $Z_i$ is:
- Note that as Cox considers the exponential of the covariates, $\beta$ represents a percentage change rather than an absolute change on the hazard rate.

\[ \lambda(t;Z_i) = \lambda_0(t)\exp(\beta Z_i^\top) = \lambda_0(t)\exp\left(\sum^p_{j=1}\beta_j z_{ij}\right) \]

Where:
- $\lambda_0(t)$ is the baseline hazard function, aka when all covariates are 0
  - Note this is dependent on $t$ but not the covariates
- $\beta_1, \beta_2, ..., \beta_p$ are the regression parameters
  - Note this is dependent on the covariates but independent of $t$
- $z_{i1},z_{i2}, ..., z_{ip}$ are the covariates for the $ith$ subject

Relative Risk

The ratio of hazard rates for two different lives $x$ and $y$ is constant at all times, which explains why the model is considered a ‘proportional hazard model’. This ratio is called the relative risk ratio:

\[ \frac{\lambda(t;Z_x)}{\lambda(t;Z_y)} = \exp\left(\sum^p_{j=1}\beta_j(z_{xj}-z_{yj})\right)\]

$\lambda_0(t)$ is a non linear function of $t$, which is shifted up or down based on the exponential of the covariates.

Partial Likelihood Estimate of $\beta$

We estimate $\mathbf{\beta}$ by maximising the partial likelihood under the assumptions of noninformative censoring and independent lives.
Note that $R(t_j)$ is the set of the labels of all lives who are still under study at a time just prior to $t_j$
This is called a partial likelihood as only the likelihood of deaths are considered, while censored observations still contribute to the denominator

Case 1 - One Death at each Observed time

Assume that only one death occurs at each $t_j$, aka $d_j=1$ for each $t_j$
The partial likelihood for the death at $t_j$ is:

\[ P[\text{ Individual j dies at }t_j|\text{First death in }R(t_j)\text{ occurs at }t_j\text{ and is the only death at }t_j]\]

Therefore the whole likelihood is the product of these probabilities:

\[ L = \prod^k_{j=1}\frac{\exp(\beta Z^\top_{(j)})}{\sum_{i\in R(t_j)}\exp(\beta Z_i^\top)}\]

Case 2 - Possible ties in the data

Consider ties in the data, that is some $d_j > 1$ or some observations are censored at an observed lifetime
- This complicates the partial likelihood as one needs to include the lives censored at time $t_j$ in the risk set $R(t_j)$ and all permutations of simultaneous events
Partial Likelihood:
- Note $s_j$ is the sum of the covariate vectors $Z$ of the $d_j$ lives observed to die at time $t_j$. Note this is a vector of sums for each respective beta
- This approximation works well when the number of ties are relatively small

\[ L = \prod^k_{j=1}\frac{\exp(\beta s_j^\top)}{\left[\sum_{i\in R(t_j)}\exp(\beta Z_i^\top)\right]^{d_j}} \]

Properties of Maximum Partial Likelihood Estimator

The same properties as a normal maximum likelihood estimator
- Asymptotically unbiased, multivariate normally distributed
- MLE is the solution to the first derivative equated to 0
- Variance matrix is the information matrix $(I(\hat{\beta}))^{-1}$, where $(I(\hat{\beta}))$ is given by:
\[ I(\hat{\beta}) = \left(-\frac{\partial^2\ln L(\beta)}{\partial\beta_i\partial\beta_j}|_{\beta=\hat{\beta}}\right)\]
- Confidence interval is therefore:
\[ \hat{\beta}\pm z_{1-\alpha/2}\sqrt{Var(\hat{\beta})} \]

Signficance Testing of Extra $\beta$

When testing whether it is worth adding a set of q $\beta$’s to an existing set of p $\beta$’s, the null hypothesis is:

\[ H_0: \beta_{p+1}-\beta_{p+2}=...=\beta_{p+q}=0\]

Likelihood Ratio Test

The test statistic is:

\[ T = -2[\log(L_{p})-\log(L_{p+q})]\]

The test statistic has $\chi^2$ distribution with $q$ degrees of freedom for large n under the null hypothesis

Wald Test

The Wald statistic is:

\[ T = (\hat{\beta}_{p+1},...,\hat{\beta}_{p+q})\underbrace{[Cov(\hat{\beta}_{p+1},...,\hat{\beta}_{p+q})]^{-1}}_{q*q\text{ matrix}}(\hat{\beta}_{p+1},...,\hat{\beta}_{p+q})^\top\]

The test statistic has $\chi^2$ distribution with $q$ degrees of freedom for large n under the null hypothesis

Estimation of Survival Function

We need to fit a proportional hazards model to the data and obtain the maximum partial likelihood estimators.
We also need to estimate the baseline cumulative hazard rate:

\[ \Lambda_0(t)=\int^t_0\lambda_0(s)ds \] * Breslow’s estimator of the baseline cumulative hazard rate is: + Note that we treat all individuals differently in the denominator, otherwise it would be the same estimate as the NA estimate

\[ \hat{\Lambda}_0(t) = \sum_{t_j\leq t}\frac{d_j}{\sum_{i\in R(t_j)}\exp(\hat{\beta}Z_i^\top)}\]

Therefore since $\hat{S}_0(t) = \exp(-\hat{\Lambda}_0(t))$:

\[ \hat{S}(t) = \exp(-\hat{\Lambda}_0(t)*\exp(\mathbf{\hat{\beta}}Z^\top)) = \hat{S}_0(t)^{\exp(\mathbf{\hat{\beta}}Z^\top)}\]

Cox-Snell Residuals

Cox-Snell residuals are defined as:

\[ e_j = -\log(\hat{S}(X_j;z_j)) = \hat{\Lambda}_0(X_j)\exp\left(\sum^p_{k=1}\hat{\beta}_kz^\top_{jk}\right)\]

If the model is correct and $\hat{\beta}$ are approximately equal to the true values of $\beta$, then the Cox-Snell residuals $e_j$ behave as a censored sample from a unit exponential distribution.
- Therefore to check the goodness of fit of a Cox model, we check whether the Cox-Snell residuals behave as samples from a unit exponential random variable
To check the behavior we need to:
- Compute thee NA estimator of the cumulative hazards rate of $e_j$’s: $\Lambda_E(e_j)$
- Plot $\Lambda_E(e_j)$ against $e_j$. This should be roughly a straight line from origin with a slope of 1.

Assessment of Proportionality Assumption

To check for the proportional hazard assumption for a given covariate $Z_1$ after adjusting for all other relevant covariates:
- Consider $Z = (z_1, Z_2)$ where $Z_2$ is the remaining $p-1$ covariates
- Assume not interaction between $z_1$ and other covariates
- Assume that $z_1$ has $K$ possible values
- Fit a cox model stratified on each value of $z_1$, and let $\hat{H}_{g0}(t)$ be the estimated cumlative baseline rate in the $gth$ stratum
  - So we have $K$ models that should be proportional for the assumption to be valid with respect to covariate $z_1$

Graphical Diagnostic Tools

Plot $\ln[\hat{H}_{10}(t)], \ln[\hat{H}_{20}(t)], ..., \ln[\hat{H}_{K0}(t)]$ versus t
- The difference of any two of them should not be dependent on $t$, where $z_{g_k}$ are the respective possible outcomes for $z$
  
  \[ \ln\left(\frac{e^{\beta_1z_{1,g_1}}}{e^{\beta_1z_{1,g_2}}}\right) = \beta_1(z_{1,g_1} - z_{1,g_2})\]
- Therefore if the assumption holds these curves should be approximately parallel.

Could also plot the differences $- , …, - $ vs $t$
- This corresponds to plotting the expression above
- If the assumption holds, each curve should be roughly constant