Class 1. Introduction

STA779: Applied Survival Analysis

Sarah C. Lotspeich, PhD.

A morbid curiosity

After 150+ years of Bubonic plagues, King Henry VIII ordered priests to report plague deaths in 1518. - Bills of mortality: London’s weekly mortality statistics. - Over time, shifted to deaths from all causes, creating the world’s first continuous public health database.

Figure 1. Read more: Birch, Thomas Collection of Yearly Bills of Mortality, from 1657 to 1758 Inclusive, 1759.

Fathers of survival

In the mid- to late-1600s, two Englishmen decided to start counting up everyone who had died (and sometimes why).

Figure 2. (Alleged) portrait of Captain John Graunt (1620-1674).

Figure 3. Portrait of Edmond Halley (1656-1742).

It all starts with the end

More than 360 years ago, “survival analysis” was born out of the seventeenth century mortality studies.

  • 1662: Started with John Graunt and the London Bill of Mortality and
  • 1693: Edmond Halley with his Breslau lifetables.

Figure 4. Breslau lifetable (1687–1691)

Their datasets differed slightly, as did their intentions.

Time to death \(\rightarrow\) time to anything

Graunt and Healey started out by measuring time to death, but in the centuries since survival analysis methods have become more widely applicable.

  1. During World War II (1939–1945), reliability of military equipment became a critical issue, leading to the study of durability or “lifetime” of industrial devices (rather than people).
  1. After the War, what engineers called “lifetime analysis” was adopted by cancer researchers and rebranded as “survival analysis.”
  1. Now, one of the most frequently used methods in many disciplines: medicine, epidemiology, environmental health, criminology, marketing, astronomy.

Examples of time-to-event data

Survival analysis methods are most often employed with time-to-event or failure-time data.

Examples might include:

  • Time to death (in days),
  • Time to onset of a disease (in days),
  • Time to relapse of a disease (in days),or
  • Length of stay in a hospital, i.e., time to discharge (in days).

And really, it doesn’t have to be “time” at all

More generally, survival analysis methods can be used on any positive real-valued random variables, like

  • Money paid by health insurance (in dollars),
  • Viral load labs for patients with HIV (in copies/milliliter),
  • Patients’ weight after starting a treatment (in pounds), or
  • Distance run in a 1-hour period (in miles or kilometers).

However, the “time” variables are still the most common.

Your turn: Examples of “survival” data

Come up with some examples (from any discipline) of time-to-event outcomes that we could model with survival analysis methods.

Your turn: Examples of “survival” data (…)

For inspiration, here are some more of Dr. L’s examples:

  • Bridgerton-inspired: The time (in weeks) between when a debutante “debuts” in society and secures a marriage proposal from an eligible suitor.
  • Netflix binge: How long (in hours or days) it takes you to finish watching all the episodes of a show that on Netflix.
  • Classroom meta: How long (in minutes) it takes a student to complete their survival analysis exam.

Your turn: Examples of “survival” data (…)

For inspiration, here are some more of Dr. L’s examples:

  • Reality TV: How many episodes until the first in-villa breakup on a season of the TV show “Love Island.”
  • Childhood development: How long (in weeks) from birth until a child begins talking (or rolling over, or walking, etc).
  • Publishing success: How many books an author writes until they write their first NYT Bestseller.

Course objectives

In this course, we will discuss how to do the following.

  • Diagnose different types of survival data and appropriate methods to analyze them.
  • Identify basic quantities and models, as well as know how to compare them between samples or subgroups.
  • Conduct basic inference for the associations between time-to-event outcomes and covariates.
  • Know and test for all necessary assumptions to ensure valid inference, as well as methods to correct for violations.
  • Apply all of these concepts in R.

Connection to discrete outcomes

As the name time-to-event suggests, we are often interested in the absence or presence of an event as our outcome.

  • Such an outcome is captured as a binary variable like \[\Delta = \textrm{I}(\textrm{Event Happened}),\] where \(\textrm{I}(\cdot)\) is the indicator function that equals \(1\) if the condition is true and \(0\) otherwise.
  • If data were collected over a fixed time period (like 2 years) and you were interested in comparing 2-year mortality between subgroups, you might fit a logistic regression.

Connection to discrete outcomes (…)

As the name time-to-event suggests, we are often interested in the absence or presence of an event as our outcome.

  • But we often don’t collect data this way and/or aren’t interested in these questions.
  • Plus, this model is throwing away potentially valuable information about how fast people in different subgroups died. You’d need survival methods to capture that.

Data collection

Survival data can be collected many ways, but most commonly…

  1. Clinical trials:
  2. Prospective cohort studies:
  3. Retrospective cohort studies:
  1. Clinical trials: “A research study in which one or more human subjects are prospectively assigned to one or more interventions… to evaluate the effects of those interventions on health-related biomedical or behavioral outcomes” (National Institutes of Health)
  2. Prospective cohort studies:
  3. Retrospective cohort studies:
  1. Clinical trials:
  2. Prospective cohort studies: “A research study that follows over time groups of individuals who are alike in many ways but differ by a certain characteristic… and compares them for a particular outcome” (National Cancer Institute)
  3. Retrospective cohort studies:
  1. Clinical trials:
  2. Prospective cohort studies:
  3. Retrospective cohort studies: “A research study in which the medical records of groups of individuals who are alike in many ways but differ by a certain characteristic are compared for a particular outcome” (National Cancer Institute)
  1. Clinical trials:
  2. Prospective cohort studies:
  3. Retrospective cohort studies:

There is one major way in which survival data tend to be unique: they are often not fully observed, but rather censored or truncated.

Your turn: Data collection examples

Researchers are interested in whether smoking (an exposure) increases patients’ risk of developing lung cancer (an outcome).

Describe how each of the following ways to collect data could be used to answer this question.

  1. Clinical trials:
  2. Prospective cohort studies:
  3. Retrospective cohort studies:

Defining a survival outcome

Defining a failure-time variable \(X\) requires the following:

  1. A clear time origin (i.e., when you started counting \(X\) or what \(X = 0\) represents),

  2. A set time scale (i.e., how you’re going to count or increment \(X\)),

  3. A definition of the failure event (i.e., what you’re interested in or where \(X\) is intended to “stop”).

We decide these things based on our data/context.

Your turn: Identifying key factors

Consider a cohort of people living with HIV. Researchers are interested in studying the time between antiretroviral treatment initiation and AIDS diagnosis (in years). (They are interested in comparing this between subgroups; we’ll get to that later.)

Define a failure-time outcome \(X\) that would help them answer this question. Specifically, define \(X\)’s…

  1. Origin:
  2. Scale:
  3. Event:

Figure 4. Example of (right) censored survival data.

Other typical features of survival data

Figure 4 illustrates other typical features of survival data:

  • Staggered entry: Individuals can enter the study at different times.
    • Ex: The study may start when the funding comes through, but individuals are recruited (and therefore, enter the study) on a rolling basis thereafter.

Other typical features of survival data (…)

Figure 4 illustrates other typical features of survival data:

  • Censoring: Not everyone had the event during the study, but they (probably) aren’t immune.
    • Ex: Someone still alive at the end of a 10-year study of mortality (probably) isn’t immortal. They will one day die, but that day must be after the study ends.
    • Subjects 2-4 are all censored, but at different times/for different reasons.

Activity: Solving a Sudoku puzzle

Sudoku is a logic puzzle game where you have a 9 x 9 grid of numbers that’s further broken down into smaller 3 x 3 grids.

Goal: Fill the grid with numbers 1 to 9 so that every row, column, and smaller 3 × 3 box contains each number once.

Activity: Solving a Sudoku puzzle (…)

Let’s design an experiment about how quickly people finish Sudoku puzzles. How would you collect these data? What are some explanatory variables / predictors that might be of interest?

Activity: Solving a Sudoku puzzle (…)

We are going to conduct a randomized experiment and collect these data for ourselves.

Plot twist: There are four different versions of the Sudoku puzzles! They differ in the types of characters used (above).

Activity: Solving a Sudoku puzzle (…)

We are going to conduct a randomized experiment and collect these data for ourselves.

  1. Without looking, take a Sudoku puzzle.
  2. Once “start” is called, begin the puzzle.
  3. Once you finish, check the clock and write down the time.
  4. Record your time and puzzle version in the Google Sheet.

Once everyone has finished, we will analyze the results!