Question 2. Using the built-in data set rats in the survival package, we will investigate the effect of sex on the onset of tumors. Since sex is encoded as m for males and f for females, you can use sex == “m” to get males and sex == “f” to get females. In a regression model, R will use females as the reference category because “f” comes before “m” alphabetically. To remind you of this, it labels the coefficient comparing males to females sexm in the output. a. Calculate the tumor incidence rates among if for female rats and m for male rats. Get a point estimate and 95% confidence interval for the incidence rate ratio p = lambdam=lambdaf .
library(survival)
rats<-rats
###Q2a
tum_ratf<- subset(rats, sex=="f" & status==1)
ratf<-subset(rats, sex=="f")
sum(ratf$time)
## [1] 13414
##incidence rate for female
40/13414
## [1] 0.002981959
tum_ratm<- subset(rats, sex=="m" & status==1)
ratm<-subset(rats, sex=="m")
sum(ratm$time)
## [1] 13718
##incidence rate for male
2/13718
## [1] 0.0001457938
##IRR
0.0001457938/0.002981959
## [1] 0.04889195
#95% CI
0.04889195 * exp(1.96*0.7245688)
## [1] 0.2023032
0.04889195 * exp((-1.96)*0.7245688)
## [1] 0.01181604
####Q2b
lexpfit<-survreg(Surv(time,status) ~ sex, data=rats, dist="exponential")
summary(lexpfit)
##
## Call:
## survreg(formula = Surv(time, status) ~ sex, data = rats, dist = "exponential")
## Value Std. Error z p
## (Intercept) 5.815 0.158 36.78 < 2e-16
## sexm 3.018 0.725 4.17 3.1e-05
##
## Scale fixed at 1
##
## Exponential distribution
## Loglik(model)= -292.3 Loglik(intercept only)= -313.8
## Chisq= 43 on 1 degrees of freedom, p= 5.5e-11
## Number of Newton-Raphson Iterations: 7
## n= 300
exp(-coef(lexpfit))
## (Intercept) sexm
## 0.002981959 0.048891968
exp(-sum(coef(lexpfit)))
## [1] 0.0001457938
exp(-confint(lexpfit))
## 2.5 % 97.5 %
## (Intercept) 0.004065261 0.002187333
## sexm 0.202298016 0.011816352
Question 4. a. Add the predicted cumulative hazard functions for male and female rats from the exponential AFT model in Exercise 2. Use the options color and/or lty to differentiate these lines from the Nelson-Aalen estimates.
###Q4a
ratsNA <- survfit(Surv(time, status) ~ sex, data = rats)
plot(ratsNA,
fun = "cumhaz", mark.time = FALSE, lty = c("solid", "dashed"),
main = "Cumulative hazard estimates",
xlab = "Days after treatment", ylab = "Cumulative hazard"
)
legend("topleft", lty = c("solid", "dashed"), legend = c("Female", "Male"))
H<- function(t, lambda) exp(-lambda*t)
lambda0<-exp(-coef(lexpfit)["(Intercept)"])
x<-seq(0,120,.1)
lines(x, H(x,lambda0), lty="solid", col="gray")
#Q4B
wfit<-survreg(Surv(time, status) ~ sex, data = rats, dist = "weibull")
summary(wfit)
##
## Call:
## survreg(formula = Surv(time, status) ~ sex, data = rats, dist = "weibull")
## Value Std. Error z p
## (Intercept) 4.8904 0.0613 79.73 < 2e-16
## sexm 0.8231 0.2262 3.64 0.00027
## Log(scale) -1.3129 0.1409 -9.32 < 2e-16
##
## Scale= 0.269
##
## Weibull distribution
## Loglik(model)= -264.8 Loglik(intercept only)= -287.1
## Chisq= 44.58 on 1 degrees of freedom, p= 2.4e-11
## Number of Newton-Raphson Iterations: 12
## n= 300
coef(wfit)
## (Intercept) sexm
## 4.8904421 0.8230543
confint(wfit)
## 2.5 % 97.5 %
## (Intercept) 4.7702270 5.010657
## sexm 0.3797503 1.266358
vcov(wfit)
## (Intercept) sexm Log(scale)
## (Intercept) 0.003762031 0.003257871 0.006225618
## sexm 0.003257871 0.051157259 0.016159501
## Log(scale) 0.006225618 0.016159501 0.019852297
exp(-coef(wfit))
## (Intercept) sexm
## 0.007518098 0.439088484
gamma <- 1 / wfit$scale
gamma
## [1] 3.716782
ratsNA <- survfit(Surv(time, status) ~ sex, data = rats)
plot(ratsNA,
fun = "cumhaz", mark.time = FALSE, lty = c("solid", "dashed"),
main = "Cumulative hazard estimates",
xlab = "Days after treatment", ylab = "Cumulative hazard"
)
legend("topleft", lty = c("solid", "dashed"), legend = c("Female", "Male"))
HW<-function(t, lambda, gamma) (lambda*t)^gamma
lambda0<-exp(-coef(wfit) ["(Intercept)"])
lamda1<-exp(-sum(coef(wfit)))
x<-seq(0,120,0.1)
lines(x, HW(x, lambda0, gamma), lty="solid", col="gray")
(c)Fit a log-logistic AFT model. Get point estimates for the rates lambdaf and lambdam and the shape. Add the predicted cumulative hazard functions to the Nelson-Aalen plot.
##Q4C
llogfit<- survreg(Surv(time, status) ~ sex, data = rats, dist = "loglogistic")
summary(llogfit)
##
## Call:
## survreg(formula = Surv(time, status) ~ sex, data = rats, dist = "loglogistic")
## Value Std. Error z p
## (Intercept) 4.8271 0.0606 79.66 < 2e-16
## sexm 0.8021 0.2114 3.79 0.00015
## Log(scale) -1.3880 0.1381 -10.05 < 2e-16
##
## Scale= 0.25
##
## Log logistic distribution
## Loglik(model)= -265.1 Loglik(intercept only)= -287.2
## Chisq= 44.2 on 1 degrees of freedom, p= 3e-11
## Number of Newton-Raphson Iterations: 6
## n= 300
coef(llogfit)
## (Intercept) sexm
## 4.8270805 0.8020612
confint(llogfit)
## 2.5 % 97.5 %
## (Intercept) 4.7083082 4.945853
## sexm 0.3876451 1.216477
vcov(llogfit)
## (Intercept) sexm Log(scale)
## (Intercept) 0.003672265 0.001997738 0.005439163
## sexm 0.001997738 0.044707157 0.014438887
## Log(scale) 0.005439163 0.014438887 0.019068761
exp(-coef(llogfit))
## (Intercept) sexm
## 0.008009872 0.448403766
exp(-confint(llogfit))
## 2.5 % 97.5 %
## (Intercept) 0.009020025 0.007112847
## sexm 0.678653160 0.296272012
exp(-sum(coef(llogfit)))
## [1] 0.003591657
gamma <- 1 / llogfit$scale
gamma
## [1] 4.00668
vcov(llogfit)
## (Intercept) sexm Log(scale)
## (Intercept) 0.003672265 0.001997738 0.005439163
## sexm 0.001997738 0.044707157 0.014438887
## Log(scale) 0.005439163 0.014438887 0.019068761
vcov(llogfit)["Log(scale)", "Log(scale)"]
## [1] 0.01906876
ratsNA <- survfit(Surv(time, status) ~ sex, data = rats)
plot(ratsNA,
fun = "cumhaz", mark.time = FALSE, lty = c("solid", "dashed"),
main = "Cumulative hazard estimates",
xlab = "Days after treatment", ylab = "Cumulative hazard"
)
legend("topleft", lty = c("solid", "dashed"), legend = c("Female", "Male"))
HL<-function(t, lambda, gamma) log(1+(lambda*t)^gamma)
lambda0<-exp(-coef(llogfit) ["(Intercept)"])
lamda1<-exp(-sum(coef(llogfit)))
x<-seq(0,120,0.1)
lines(x, HL(x, lambda0, gamma), lty="solid", col="gray")
(e)One way to compare regression models is the Akaike Information Cri-terion (AIC). To get the AIC for an AFT model called model, use AIC(model). The model with the lowest AIC fits the best. Do the AIC values reflect the results of the plots?
AIC(lexpfit)
## [1] 588.5472
AIC(wfit)
## [1] 535.6206
AIC(lexpfit)
## [1] 588.5472