David Springate

March 2014

But can be used anywhere you want to know what factors affect the time for an event to occur:

- Germination timing
- Arrival of a migrant or parasite
- Dispersal of seeds or offspring
- Failure time in mechanical systems
- Response to stimulus

Where the date of death is unknown but is after some known date

e.g.

- Date of death is after the end of the study
- Subject is removed from the study (patient withdraws, animal escapes, plant gets eaten etc.)

Survival analysis can account for this kind of censoring

Occurs when a subject's survival time is incomplete on the left side of the follow-up period

- e.g. Following up a patient after being tested for an infection, we don't know the exact time of exposure
- Less common
- Survival analysis can account for this (see ref 4)

\[ S(t)=Pr(T>t) \]

The Survival function (\( S \)) is the probability that the time of death (\( T \)) is greater than some specified time (\( t \))

It is composed of:

- The underlying
*Hazard function*(How the risk of death per unit time changes over time at baseline covariates) - The effect parameters (How the hazard varies in response to the covariates)

The most common model used to determine the effects of covariates on survival

\[ h_i(t)=h_0(t)exp(\beta_{1}x_{i1} + \beta_{2}x_{ik} + ... + \beta_{2}x_{ik} ) \]

It is a semi-parametric model:

- The baseline hazard function is unspecified
- The effects of the covariates are multiplicative
- Doesn't make arbitrary assumptions about the shape/form of the baseline hazard function

- Covariates multiply the hazard by some constant
- e.g. a drug may halve a subjects risk of death at any time \( t \)
- The effect is the same at any time point

Uses the `survival`

package.

The response variable is a `Surv`

object taking in

*start time*(after study start)*stop time*(after study start)- whether or not an
*event*occurred

Otherwise, the model is specified in the same way as for a standard regression, using the `coxph`

function

**Toolkit**

`Surv()`

: Define a survival object

`coxph()`

: Run a cox PH regression

`survfit()`

: Fit a survival curve to a model or formula

From the electronic medical records of 13 million UK patients, I identified:

- Patients aged 40-85 with a first diagnosis of colon cancer between 1997 and 2006
- Classified according to exposure to \( \beta \)-blockers in the 1 year period prior to diagnosis
- Patients with heart disease, stroke, asthma, diabetes, kidney disease etc. were excluded
- controlling for sex, age, year of diagnosis, numer of drugs being taken, smoking status, level of social deprivation

1104 patients not on \( \beta \)-blockers - CONTROL

695 patients on \( \beta \)-blockers - TREATMENT

- Start time: time (days) of diagnosis date after the study start date
- Stop time: time to whichever comes first from
- Study end
- patient dies
- patient leaves practice

- event: binary does the patient die?

```
# Create the response variable:
S <- Surv(
time = colon$start,
time2 = colon$stop,
event = colon$death)
```

This includes the *right censoring* information with the survival time

```
model <- coxph(S ~ sex + treatment + age +
year + deprivation +
num_drugs + smoking_status,
data = colon)
```

- Use the
`coxph`

function - The new response variable goes on the LHS
- Otherwise, it uses the same syntax as
`lm()`

,`glm()`

etc.

The

`exp(coef)`

variable is the

- HR = 1: No effect
- HR < 1: Reduction in the hazard
- HR > 1: Increase in Hazard

```
# Plot the baseline survival function
plot(survfit(model),
xscale = 365.25,
xlab = "Years after diagnosis",
ylab = "Proportion survived",
main = "Baseline Hazard Curve")
```

First, build a dataframe for the effect levels you want to look at

Other covariates are held at their lowest level (or means if they are continuous)

```
treat <- with(colon,
data.frame(
treatment = levels(treatment),
age = rep(levels(age)[1], 2),
sex = rep(levels(sex)[1], 2),
year = rep(levels(year)[1], 2),
deprivation = rep(levels(deprivation)[1], 2),
num_drugs = rep(levels(num_drugs)[1], 2),
smoking_status = rep(
levels(smoking_status)[1], 2)))
```

First, build a dataframe for the effect levels you want to look at

Other covariates are held at their lowest level (or means if they are continuous)

```
str(treat)
```

```
'data.frame': 2 obs. of 7 variables:
$ treatment : Factor w/ 2 levels "BBs","No BBs": 2 1
$ age : Factor w/ 1 level "55-65": 1 1
$ sex : Factor w/ 1 level "M": 1 1
$ year : Factor w/ 1 level "1997-1998": 1 1
$ deprivation : Factor w/ 1 level "0": 1 1
$ num_drugs : Factor w/ 1 level "0-4": 1 1
$ smoking_status: Factor w/ 1 level "0": 1 1
```

Then add the `newdata`

arg to call to `survfit`

:

```
plot(survfit(model, newdata = treat),
xscale = 365.25,
conf.int = TRUE,
xlab = "Years after diagnosis",
ylab = "Proportion survived",
col = c("red", "green"))
legend(8, 0.9,
legend = c("Beta blockers",
"No beta blockers"),
lty = 1,
col = c("green", "red"))
```

Hypothesis test for proportional hazards:

```
# Note this is very conservative!
cox.zph(model)
```

It is often better to check the graphs for systematic trends:

```
a <- cox.zph(model)
par(mfrow = c(3, 1))
plot(a[1], main = "Sex")
plot(a[6], main = "diagnosis year")
plot(a[18], main = "Smoking status")
```

Include an interaction with time for the variables:

- This factors time out of the main effect
- Only use if it makes sense to have a linear interaction between the covariate and time (look at the graphs!)

```
mod2 <- coxph(S ~ sex + treatment + age +
year + deprivation +
num_drugs + year + year:stop + smoking_status,
data = colon)
```

Note you don't need to include `stop`

as a main effect since it is already factored in `S`

Alternatively, Stratify the data by that covariate

```
mod3 <- coxph(S ~ sex + treatment + age +
deprivation +
num_drugs + strata(year) +
smoking_status,
data = colon)
```

- Each strata has a different baseline hazard function but the remaining covariates are assumed to be constant
- Does not assume a linear relationship
**But**you cannot examine the effects of the stratification variable- though you can plot them …

`survfit`

plots a survival curve for each strata

```
plot(survfit(mod3), col = 1:5)
legend(11, 1,
legend = levels(colon$year),
lty = 1,
col = 1:5,
title = "Diagnosis year")
```

e.g. several plants in a family, patients in practices etc.

Deal with this by using `cluster(variable)`

:

```
mod3 <- coxph(S ~ sex + treatment + age +
deprivation +
cluster(practice) +
num_drugs + year +
smoking_status,
data = colon)
```

- Robust standard errors are produced taking into account the clusters
- Simpler than a full mixed-effects model

- A Package for Survival Analysis in S
- Cox Proportional-Hazards Regression for Survival Data
- CRAN Task View: Survival Analysis
- Use Software R to do Survival Analysis and Simulation. A tutorial
- Using Time Dependent Covariates and Time Dependent Coefficients in the Cox Model
- PermAlgo: Permutational algorithm to simulate survival data