I think Sean’s analysis should use survival statistics. Let me try to prove that point.



First the data.

library(dplyr)
mortality_rate = 1 / 1000
cancer_rate = 1 / 2000
effect_of_treatment = 2
n <- 50000

control_lifespan   <- rweibull(n, 5, 1/mortality_rate)
treatment_lifespan <- rweibull(n, 5, 1/(mortality_rate * effect_of_treatment))

control_onset   <- rexp(n, cancer_rate)
treatment_onset <- rexp(n, cancer_rate * effect_of_treatment)

data <- data.frame(
  group =    c(rep("control", n), 
               rep("treated", n)),
  lifespan = c(round(control_lifespan), 
               round(treatment_lifespan)),
  onset =    c(round(control_onset), 
               round(treatment_onset))
) %>%
  mutate(
    cancer = onset < lifespan
  )

head(data)
    group lifespan onset cancer
1 control      998   807   TRUE
2 control      890  1890  FALSE
3 control      626   209   TRUE
4 control     1099   944   TRUE
5 control      640   989  FALSE
6 control      564  2070  FALSE

We have simulated two populations, one treated, one control. The treated population has double the mortality rate of the control and double the cancer rate. We estimate the date of cancer onset for a sample from an exponential distribution. If cancer onset occurs before death then we label the animal as having cancer at death.



Let’s show off the survival curves.

library(ggplot2)
library(scales)

ggplot(data, aes(lifespan, color=cancer)) + 
  stat_ecdf() + 
  facet_grid(group ~ .) +
  scale_y_reverse() +
  theme_bw()



Both controls and treated groups have approximately equal rates of cancer at death.

table(data$group, data$cancer)
         
          FALSE  TRUE
  control 31736 18264
  treated 31790 18210



So a logistic regression shows no effect of treatment.

model <- glm(cancer ~ group,
             family = "binomial",
             data=data)

round(coef(summary(model)), 3)
             Estimate Std. Error z value Pr(>|z|)
(Intercept)    -0.553      0.009 -59.489    0.000
grouptreated   -0.005      0.013  -0.355    0.723

But, we know from our setup that treatment doubles the cancer rate. This is not apparent in the data because treatment also doubled the overall mortality rate.



We can see the increased risk if we estimate the rate of lifetime cancer incidence divided by the the number of days at risk.

data %>%
  group_by(group) %>%
  summarize(rate=weighted.mean(cancer / lifespan, lifespan))
Source: local data frame [2 x 2]

    group         rate
1 control 0.0003979414
2 treated 0.0007927002

Now treatment risks are doubled, as expected.



We can also use regression to find this value by using a cox proportional hazards model with interval censoring. Cliff AB on Stats Exchange gave a great explanation:

with current status data, if we inspect subject i at time ci, and the event has occurred, we record their interval as (0,ci) (i.e. we know the event occurred before ci), but if it has not occurred, we say (ci,∞) (i.e. all we know is that the event has not occurred by time ci). (source)

library(icenReg)

# Add censoring information (like Cliff suggested)
data <- data %>%
  mutate(left_censor = ifelse(cancer, 0, lifespan),
         right_censor = ifelse(cancer, lifespan, Inf))
head(data, 3)
    group lifespan onset cancer left_censor right_censor
1 control      998   807   TRUE           0          998
2 control      890  1890  FALSE         890          Inf
3 control      626   209   TRUE           0          626
# Model at will
model <- ic_par(
  Surv(left_censor, right_censor, type='interval2') ~ group, 
  data=data
)

summary(model)

Model:  Cox PH 
Baseline:  weibull 
Call: ic_par(formula = Surv(left_censor, right_censor, type = "interval2") ~ 
    group, data = data)

             Estimate Exp(Est) Std.Error z-value      p
log_shape    -0.02631    0.974   0.02345  -1.122 0.2617
log_scale     7.27500 1443.000   0.01916 379.600 0.0000
grouptreated  0.67030    1.955   0.01901  35.260 0.0000

final llk =  -64623.86 
Iterations =  42 

Lo! Treatment increases risk 2 fold. Cliff is correct. Thanks Cliff!