The rats data set in the survival package in R can be loaded by typing “data(rats)”. It contains data on 300 rats-3 selected from each of 100 litters-that were followed for tumor incidence. The variable time is the follow-up times and the variable status is the event indicator. a. How many rats had tumors? What was the total rat-time contributed by all the animals? What was the incidence rate for tumors?

library(survival)
##data(rats, package="survival")

rats1 <- rats[ which(rats$status=='1'), ] ##no. of rats who had tumor
##nrows(rats1)
sum(rats$time) ##total rat time
## [1] 27132

42 rats had tumors incidence rate=42/27132=0.00154

  1. Calculate a Wald 95% confidence interval for the true incidence rate lambda. (which is also the exponential rate parameter and hazard).
## m=42
## T=27132
0.00154+1.96*sqrt(42/(27132^2))
## [1] 0.002008165
0.00154-1.96*sqrt(42/(27132^2))
## [1] 0.001071835
  1. Calculate a log-transformed 95% confidence interval for lambda.
(42/27132)*exp(1.96*sqrt(1/42))
## [1] 0.002094658
(42/27132)*exp(-1.96*sqrt(1/42))
## [1] 0.001143989
  1. Use the function survreg to fit an exponential distribution and calculate the 95% confidence interval for lambda. Does it agree with the Wald or log-transformed confidence intervals?
expfit<-survreg(Surv(time, status) ~1, data=rats, dist="exponential")
summary(expfit)
## 
## Call:
## survreg(formula = Surv(time, status) ~ 1, data = rats, dist = "exponential")
##             Value Std. Error    z      p
## (Intercept) 6.471      0.154 41.9 <2e-16
## 
## Scale fixed at 1 
## 
## Exponential distribution
## Loglik(model)= -313.8   Loglik(intercept only)= -313.8
## Number of Newton-Raphson Iterations: 5 
## n= 300
exp(-coef(expfit))
## (Intercept) 
## 0.001547988
confint(expfit)
##               2.5 %   97.5 %
## (Intercept) 6.16837 6.773229
exp(-confint(expfit))
##                   2.5 %      97.5 %
## (Intercept) 0.002094646 0.001143995

It agrees more with log-transformed confidence intervals.

  1. Fit a Weibull distribution and get point estimates and 95% confidence intervals for the rate and shape parameters. Is the exponential distribution a good model for this data?
weibfit<-survreg(Surv(time, status) ~1, data=rats, dist="weibull")
summary(weibfit)
## 
## Call:
## survreg(formula = Surv(time, status) ~ 1, data = rats, dist = "weibull")
##               Value Std. Error     z      p
## (Intercept)  5.0791     0.0825 61.55 <2e-16
## Log(scale)  -1.2995     0.1425 -9.12 <2e-16
## 
## Scale= 0.273 
## 
## Weibull distribution
## Loglik(model)= -287.1   Loglik(intercept only)= -287.1
## Number of Newton-Raphson Iterations: 9 
## n= 300
lambda<-exp(-coef(weibfit))
lambda
## (Intercept) 
## 0.006225671
confint(weibfit)
##                2.5 %   97.5 %
## (Intercept) 4.917331 5.240818
exp(-confint(weibfit))
##                   2.5 %      97.5 %
## (Intercept) 0.007318641 0.005295925

Compared to Weibull distribution, exponential distribution is a good model for this data.