Fake Fleas
The Scenario
I’m going to imagine that you have two groups of 60 fleas each. The treatment group has been prepared to withstand salt. I’ll imagine further that once you expose the fleas to salt, you observe the fleas every six hours. Thus for each flea you’ll know its survival time within a six-hour interval:
- 0-6 hours
- 6-12 hours …
- and so on until
- 42-48 hours
- 48 hours and up (if the flea is still alive by the end of the study)
The Data
Here is some fake data:
Each row is a flea in the study. The grp variable says whether the flea is from treatment or control.
The time1 variable is the left-endpoint of the flea’s survival time and the time2 variable is the right-hand endpoint. If the flea is alive by the end of the study then its time1 is 48 and itstime2 does not show.
The Model: Kaplan-Meier Estimation
We’ll use the survival package:
We now make Kaplan-Meier estimates for the survival curves.
First, we create the survival object:
# I'll explain this later or you could study
# the help file on the Surv function from the survival package:
fleas$event <- 3
survObj <- with(fleas, Surv(time = time1, time2 = time2,
event = event, type = "interval"))
fleas$survObj <- survObjNow the model:
Your Basic Plot
For plotting we’ll use the excellent ggsurvplot() function from the survminer package:
The plot below shows estimated survival rates at various times, for each of the two groups.
ggsurvplot(fit = kmByGroup, data = fleas,
conf.int = TRUE,
palette = c("#E7B800", "#2E9FDF"),
ggtheme = theme_bw())The shaded areas around each group are approximate 95% confidence “bands”. We are 95%-confident, based on the data, that the “real” survival curve for each group (the one we would see if we could put ALL fleas into the vat instead of just 60) would lie more or less in the the band surrounding the estimate-curve.
The fact that the the band for the treatment group lies mostly above the band for the control group indicates strong evidence that survival rates for treated fleas are higher than survival rates for untreated fleas.
Notes
The File We Start With
You can create an Excel file like the data table I showed above, one row for each flea, and three columns: one for the group, another for the last time the flea was found alive, and the third for the first time the flea was found dead. (You can leave this entry blank if the flea is still alive after 48 hours.
Save your Excel file as a csv file.
Time Interval
The more observations, the better. One each hour would be great, but as you see we can handle coarser time-intervals. You don’t even have to space the intervals evenly. Kaplan-Meier can deal with all of that.