Semi-Parametric

Jake Warby

01/04/2022

  • Regression models are useful to model the impacts of different factors seperately
    • This is important as lives are often not homogenous, aka lives with different characteristics have different level of mortality.

Cox Regression Assumption

  • The hazard rate for the \(ith\) life with covariates \(Z_i\) is:
    • Note that as Cox considers the exponential of the covariates, \(\beta\) represents a percentage change rather than an absolute change on the hazard rate.

\[ \lambda(t;Z_i) = \lambda_0(t)\exp(\beta Z_i^\top) = \lambda_0(t)\exp\left(\sum^p_{j=1}\beta_j z_{ij}\right) \]

  • Where:
    • \(\lambda_0(t)\) is the baseline hazard function, aka when all covariates are 0
      • Note this is dependent on \(t\) but not the covariates
    • \(\beta_1, \beta_2, ..., \beta_p\) are the regression parameters
      • Note this is dependent on the covariates but independent of \(t\)
    • \(z_{i1},z_{i2}, ..., z_{ip}\) are the covariates for the \(ith\) subject

Relative Risk

  • The ratio of hazard rates for two different lives \(x\) and \(y\) is constant at all times, which explains why the model is considered a ‘proportional hazard model’. This ratio is called the relative risk ratio:

\[ \frac{\lambda(t;Z_x)}{\lambda(t;Z_y)} = \exp\left(\sum^p_{j=1}\beta_j(z_{xj}-z_{yj})\right)\]

  • \(\lambda_0(t)\) is a non linear function of \(t\), which is shifted up or down based on the exponential of the covariates.

Partial Likelihood Estimate of \(\beta\)

  • We estimate \(\mathbf{\beta}\) by maximising the partial likelihood under the assumptions of noninformative censoring and independent lives.
  • Note that \(R(t_j)\) is the set of the labels of all lives who are still under study at a time just prior to \(t_j\)
  • This is called a partial likelihood as only the likelihood of deaths are considered, while censored observations still contribute to the denominator

Case 1 - One Death at each Observed time

  • Assume that only one death occurs at each \(t_j\), aka \(d_j=1\) for each \(t_j\)

  • The partial likelihood for the death at \(t_j\) is:

\[ P[\text{ Individual j dies at }t_j|\text{First death in }R(t_j)\text{ occurs at }t_j\text{ and is the only death at }t_j]\]

  • Therefore the whole likelihood is the product of these probabilities:

\[ L = \prod^k_{j=1}\frac{\exp(\beta Z^\top_{(j)})}{\sum_{i\in R(t_j)}\exp(\beta Z_i^\top)}\]

Case 2 - Possible ties in the data

  • Consider ties in the data, that is some \(d_j > 1\) or some observations are censored at an observed lifetime
    • This complicates the partial likelihood as one needs to include the lives censored at time \(t_j\) in the risk set \(R(t_j)\) and all permutations of simultaneous events
  • Partial Likelihood:
    • Note \(s_j\) is the sum of the covariate vectors \(Z\) of the \(d_j\) lives observed to die at time \(t_j\). Note this is a vector of sums for each respective beta
    • This approximation works well when the number of ties are relatively small

\[ L = \prod^k_{j=1}\frac{\exp(\beta s_j^\top)}{\left[\sum_{i\in R(t_j)}\exp(\beta Z_i^\top)\right]^{d_j}} \]

Properties of Maximum Partial Likelihood Estimator

  • The same properties as a normal maximum likelihood estimator

    • Asymptotically unbiased, multivariate normally distributed
    • MLE is the solution to the first derivative equated to 0
    • Variance matrix is the information matrix \((I(\hat{\beta}))^{-1}\), where \((I(\hat{\beta}))\) is given by:

    \[ I(\hat{\beta}) = \left(-\frac{\partial^2\ln L(\beta)}{\partial\beta_i\partial\beta_j}|_{\beta=\hat{\beta}}\right)\]

    • Confidence interval is therefore:

    \[ \hat{\beta}\pm z_{1-\alpha/2}\sqrt{Var(\hat{\beta})} \]

Signficance Testing of Extra \(\beta\)

  • When testing whether it is worth adding a set of q \(\beta\)’s to an existing set of p \(\beta\)’s, the null hypothesis is:

\[ H_0: \beta_{p+1}-\beta_{p+2}=...=\beta_{p+q}=0\]

Likelihood Ratio Test

  • The test statistic is:

\[ T = -2[\log(L_{p})-\log(L_{p+q})]\]

  • The test statistic has \(\chi^2\) distribution with \(q\) degrees of freedom for large n under the null hypothesis

Wald Test

  • The Wald statistic is:

\[ T = (\hat{\beta}_{p+1},...,\hat{\beta}_{p+q})\underbrace{[Cov(\hat{\beta}_{p+1},...,\hat{\beta}_{p+q})]^{-1}}_{q*q\text{ matrix}}(\hat{\beta}_{p+1},...,\hat{\beta}_{p+q})^\top\]

  • The test statistic has \(\chi^2\) distribution with \(q\) degrees of freedom for large n under the null hypothesis

Estimation of Survival Function

  • We need to fit a proportional hazards model to the data and obtain the maximum partial likelihood estimators.
  • We also need to estimate the baseline cumulative hazard rate:

\[ \Lambda_0(t)=\int^t_0\lambda_0(s)ds \] * Breslow’s estimator of the baseline cumulative hazard rate is: + Note that we treat all individuals differently in the denominator, otherwise it would be the same estimate as the NA estimate

\[ \hat{\Lambda}_0(t) = \sum_{t_j\leq t}\frac{d_j}{\sum_{i\in R(t_j)}\exp(\hat{\beta}Z_i^\top)}\]

  • Therefore since \(\hat{S}_0(t) = \exp(-\hat{\Lambda}_0(t))\):

\[ \hat{S}(t) = \exp(-\hat{\Lambda}_0(t)*\exp(\mathbf{\hat{\beta}}Z^\top)) = \hat{S}_0(t)^{\exp(\mathbf{\hat{\beta}}Z^\top)}\]

Cox-Snell Residuals

  • Cox-Snell residuals are defined as:

\[ e_j = -\log(\hat{S}(X_j;z_j)) = \hat{\Lambda}_0(X_j)\exp\left(\sum^p_{k=1}\hat{\beta}_kz^\top_{jk}\right)\]

  • If the model is correct and \(\hat{\beta}\) are approximately equal to the true values of \(\beta\), then the Cox-Snell residuals \(e_j\) behave as a censored sample from a unit exponential distribution.
    • Therefore to check the goodness of fit of a Cox model, we check whether the Cox-Snell residuals behave as samples from a unit exponential random variable
  • To check the behavior we need to:
    • Compute thee NA estimator of the cumulative hazards rate of \(e_j\)’s: \(\Lambda_E(e_j)\)
    • Plot \(\Lambda_E(e_j)\) against \(e_j\). This should be roughly a straight line from origin with a slope of 1.

Assessment of Proportionality Assumption

  • To check for the proportional hazard assumption for a given covariate \(Z_1\) after adjusting for all other relevant covariates:
    • Consider \(Z = (z_1, Z_2)\) where \(Z_2\) is the remaining \(p-1\) covariates
    • Assume not interaction between \(z_1\) and other covariates
    • Assume that \(z_1\) has \(K\) possible values
    • Fit a cox model stratified on each value of \(z_1\), and let \(\hat{H}_{g0}(t)\) be the estimated cumlative baseline rate in the \(gth\) stratum
      • So we have \(K\) models that should be proportional for the assumption to be valid with respect to covariate \(z_1\)

Graphical Diagnostic Tools

  • Plot \(\ln[\hat{H}_{10}(t)], \ln[\hat{H}_{20}(t)], ..., \ln[\hat{H}_{K0}(t)]\) versus t
    • The difference of any two of them should not be dependent on \(t\), where \(z_{g_k}\) are the respective possible outcomes for \(z\)

      \[ \ln\left(\frac{e^{\beta_1z_{1,g_1}}}{e^{\beta_1z_{1,g_2}}}\right) = \beta_1(z_{1,g_1} - z_{1,g_2})\]

    • Therefore if the assumption holds these curves should be approximately parallel.

  • Could also plot the differences $- , …, - $ vs \(t\)
    • This corresponds to plotting the expression above
    • If the assumption holds, each curve should be roughly constant