PHC_6177_HW5_question4

Question 2. Using the built-in data set rats in the survival package, we will investigate the effect of sex on the onset of tumors. Since sex is encoded as m for males and f for females, you can use sex == “m” to get males and sex == “f” to get females. In a regression model, R will use females as the reference category because “f” comes before “m” alphabetically. To remind you of this, it labels the coefficient comparing males to females sexm in the output. a. Calculate the tumor incidence rates among if for female rats and m for male rats. Get a point estimate and 95% confidence interval for the incidence rate ratio p = lambdam=lambdaf .

library(survival)
rats<-rats

###Q2a
tum_ratf<- subset(rats, sex=="f" & status==1)
ratf<-subset(rats, sex=="f")
sum(ratf$time)

## [1] 13414

##incidence rate for female
40/13414

## [1] 0.002981959

tum_ratm<- subset(rats, sex=="m" & status==1)
ratm<-subset(rats, sex=="m")
sum(ratm$time)

## [1] 13718

##incidence rate for male
2/13718

## [1] 0.0001457938

##IRR
0.0001457938/0.002981959

## [1] 0.04889195

#95% CI
0.04889195 * exp(1.96*0.7245688)

## [1] 0.2023032

0.04889195 * exp((-1.96)*0.7245688)

## [1] 0.01181604

Fit an exponential AFT model for the effect of sex on tumor onset. Calculate point estimates and 95% confidence intervals for lambdaf and lambdam. Compare these to the results of part (a).

####Q2b
lexpfit<-survreg(Surv(time,status) ~ sex, data=rats, dist="exponential")
summary(lexpfit)

## 
## Call:
## survreg(formula = Surv(time, status) ~ sex, data = rats, dist = "exponential")
##             Value Std. Error     z       p
## (Intercept) 5.815      0.158 36.78 < 2e-16
## sexm        3.018      0.725  4.17 3.1e-05
## 
## Scale fixed at 1 
## 
## Exponential distribution
## Loglik(model)= -292.3   Loglik(intercept only)= -313.8
##  Chisq= 43 on 1 degrees of freedom, p= 5.5e-11 
## Number of Newton-Raphson Iterations: 7 
## n= 300

exp(-coef(lexpfit))

## (Intercept)        sexm 
## 0.002981959 0.048891968

exp(-sum(coef(lexpfit)))

## [1] 0.0001457938

exp(-confint(lexpfit))

##                   2.5 %      97.5 %
## (Intercept) 0.004065261 0.002187333
## sexm        0.202298016 0.011816352

Question 4. a. Add the predicted cumulative hazard functions for male and female rats from the exponential AFT model in Exercise 2. Use the options color and/or lty to differentiate these lines from the Nelson-Aalen estimates.

###Q4a

ratsNA <- survfit(Surv(time, status) ~ sex, data = rats)
plot(ratsNA,
     fun = "cumhaz", mark.time = FALSE, lty = c("solid", "dashed"),
     main = "Cumulative hazard estimates",
     xlab = "Days after treatment", ylab = "Cumulative hazard"
 )
legend("topleft", lty = c("solid", "dashed"), legend = c("Female", "Male"))

H<- function(t, lambda) exp(-lambda*t)
lambda0<-exp(-coef(lexpfit)["(Intercept)"])
x<-seq(0,120,.1)
lines(x, H(x,lambda0), lty="solid", col="gray")

Fit a Weibull AFT model. Get point estimates for the rates lmbdaf and lambdam and the shape. Add the predicted cumulative hazard functions to the Nelson-Aalen plot.

#Q4B
wfit<-survreg(Surv(time, status) ~ sex, data = rats, dist = "weibull")
summary(wfit)

## 
## Call:
## survreg(formula = Surv(time, status) ~ sex, data = rats, dist = "weibull")
##               Value Std. Error     z       p
## (Intercept)  4.8904     0.0613 79.73 < 2e-16
## sexm         0.8231     0.2262  3.64 0.00027
## Log(scale)  -1.3129     0.1409 -9.32 < 2e-16
## 
## Scale= 0.269 
## 
## Weibull distribution
## Loglik(model)= -264.8   Loglik(intercept only)= -287.1
##  Chisq= 44.58 on 1 degrees of freedom, p= 2.4e-11 
## Number of Newton-Raphson Iterations: 12 
## n= 300

coef(wfit)

## (Intercept)        sexm 
##   4.8904421   0.8230543

confint(wfit)

##                 2.5 %   97.5 %
## (Intercept) 4.7702270 5.010657
## sexm        0.3797503 1.266358

vcov(wfit)

##             (Intercept)        sexm  Log(scale)
## (Intercept) 0.003762031 0.003257871 0.006225618
## sexm        0.003257871 0.051157259 0.016159501
## Log(scale)  0.006225618 0.016159501 0.019852297

exp(-coef(wfit))

## (Intercept)        sexm 
## 0.007518098 0.439088484

gamma <- 1 / wfit$scale
gamma

## [1] 3.716782

ratsNA <- survfit(Surv(time, status) ~ sex, data = rats)
plot(ratsNA,
     fun = "cumhaz", mark.time = FALSE, lty = c("solid", "dashed"),
     main = "Cumulative hazard estimates",
     xlab = "Days after treatment", ylab = "Cumulative hazard"
 )
legend("topleft", lty = c("solid", "dashed"), legend = c("Female", "Male"))

HW<-function(t, lambda, gamma) (lambda*t)^gamma
lambda0<-exp(-coef(wfit) ["(Intercept)"])
lamda1<-exp(-sum(coef(wfit)))
x<-seq(0,120,0.1)

lines(x, HW(x, lambda0, gamma), lty="solid", col="gray")

(c)Fit a log-logistic AFT model. Get point estimates for the rates lambdaf and lambdam and the shape. Add the predicted cumulative hazard functions to the Nelson-Aalen plot.

##Q4C
llogfit<- survreg(Surv(time, status) ~ sex, data = rats, dist = "loglogistic")
summary(llogfit)

## 
## Call:
## survreg(formula = Surv(time, status) ~ sex, data = rats, dist = "loglogistic")
##               Value Std. Error      z       p
## (Intercept)  4.8271     0.0606  79.66 < 2e-16
## sexm         0.8021     0.2114   3.79 0.00015
## Log(scale)  -1.3880     0.1381 -10.05 < 2e-16
## 
## Scale= 0.25 
## 
## Log logistic distribution
## Loglik(model)= -265.1   Loglik(intercept only)= -287.2
##  Chisq= 44.2 on 1 degrees of freedom, p= 3e-11 
## Number of Newton-Raphson Iterations: 6 
## n= 300

coef(llogfit)

## (Intercept)        sexm 
##   4.8270805   0.8020612

confint(llogfit)

##                 2.5 %   97.5 %
## (Intercept) 4.7083082 4.945853
## sexm        0.3876451 1.216477

vcov(llogfit)

##             (Intercept)        sexm  Log(scale)
## (Intercept) 0.003672265 0.001997738 0.005439163
## sexm        0.001997738 0.044707157 0.014438887
## Log(scale)  0.005439163 0.014438887 0.019068761

exp(-coef(llogfit))

## (Intercept)        sexm 
## 0.008009872 0.448403766

exp(-confint(llogfit))

##                   2.5 %      97.5 %
## (Intercept) 0.009020025 0.007112847
## sexm        0.678653160 0.296272012

exp(-sum(coef(llogfit)))

## [1] 0.003591657

gamma <- 1 / llogfit$scale
gamma

## [1] 4.00668

vcov(llogfit)

##             (Intercept)        sexm  Log(scale)
## (Intercept) 0.003672265 0.001997738 0.005439163
## sexm        0.001997738 0.044707157 0.014438887
## Log(scale)  0.005439163 0.014438887 0.019068761

vcov(llogfit)["Log(scale)", "Log(scale)"]

## [1] 0.01906876

ratsNA <- survfit(Surv(time, status) ~ sex, data = rats)
plot(ratsNA,
     fun = "cumhaz", mark.time = FALSE, lty = c("solid", "dashed"),
     main = "Cumulative hazard estimates",
     xlab = "Days after treatment", ylab = "Cumulative hazard"
 )
legend("topleft", lty = c("solid", "dashed"), legend = c("Female", "Male"))

HL<-function(t, lambda, gamma) log(1+(lambda*t)^gamma)
lambda0<-exp(-coef(llogfit) ["(Intercept)"])
lamda1<-exp(-sum(coef(llogfit)))
x<-seq(0,120,0.1)

lines(x, HL(x, lambda0, gamma), lty="solid", col="gray")

Which of the three models seems to fit the best based on the plots? Ans:

(e)One way to compare regression models is the Akaike Information Cri-terion (AIC). To get the AIC for an AFT model called model, use AIC(model). The model with the lowest AIC fits the best. Do the AIC values reflect the results of the plots?

AIC(lexpfit)

## [1] 588.5472

AIC(wfit)

## [1] 535.6206

AIC(lexpfit)

## [1] 588.5472

PHC_6177_HW5_question4

Kunal Dolas

9/27/2021