library(plyr)
library(reshape2) ## for melt()
library(ggplot2); theme_set(theme_bw())
library(mgcv) ## for gam()
Looked at some SAS documentation, which does convince me that the exponentiated cloglog GLM parameter is in fact the hazard ratio. Here’s the argument:
If we define \(\lambda=1-\exp(-\exp(\eta))\) as the discrete-time hazard, with an implicit exposure time of 1 unit, then \(\eta\) is the continuous-time log-hazard; \(\gamma=\exp(\eta)\) is the continuous-time hazard, so the probability of mortality is \(1-\exp(-\gamma t)\). This is consistent with taking \(\lambda=1-\exp(-\exp(\eta+\log(\tau)))\) where \(\tau\) is the exposure time. This implies the statement above, that the exponential of a GLM parameter is the hazard ratio.
An alternative (perhaps more confusing!) way to think about this: The complementary log-log link is \(\eta=\log(-\log(1-p))\); the inverse link is \(p=1-\exp(-\exp(\eta))\). Therefore if we have a baseline probability of \(p_0\) and an effect size of \(\alpha\), and \(\beta=\exp(\alpha)\), then
(note formatting below renders correctly with RStudio “knit HTML” button, but not with knitr::knit2html)
\[ \begin{split} \eta_0 & = \log(-\log(1-p_0)) \\ \eta_1 & = \log(-\log(1-p_0))+\alpha \\ & = \log(-\log(1-p_0))+\log(\beta) \\ p_1 & = 1-\exp(-\exp(\log(-\log(1-p_0))+\log(\beta))) \\ & = 1-\exp(-\beta(-\log(1-p_0))) \\ & = 1-\exp(\beta \log(1-p_0)) \\ & = 1-(1-p_0)^\beta \end{split} \]
Thus if we have an effect of \(\alpha\) on the complementary log-log scale, its effect on the original scale will be to raise survival (complementary) probability to a power \(\exp(\alpha)\). (If \(\alpha\) is small there should be some useful simple approximation, but I can’t figure it out at the moment; \(\exp(\alpha) \approx 1+\alpha\) in that case, so \((1-p)^(1+\alpha) \approx (1-p)\cdot(1-p)^\alpha\) …)
f <- binomial(link="cloglog")
cloglog <- f$linkfun
invcloglog <- f$linkinv
p0 <- 0.3
alpha <- 1.6
eta0 <- cloglog(p0)
invcloglog(eta0+alpha)
## [1] 0.8290907
1-(1-p0)^exp(alpha)
## [1] 0.8290907
So the probability of survival is raised to the \(\exp(\alpha)\)=4.95: rounding, we can see that this is about right:
(1-p0)^5
## [1] 0.16807
It may be useful in general to look at the inverse link function to get an idea of the scaling of the link function (i.e. the x axis here shows “cloglog units”):
par(las=1,bty="l")
curve(invcloglog(x),from=-3,to=3)
Then I decided to play with the example on the SAS web page, for fun.
beetles <- read.table(text="
time m20 f20 m32 f32 m50 f50 m80 f80
1 3 0 7 1 5 0 4 2
2 11 2 10 5 8 4 10 7
3 10 4 11 11 11 6 8 15
4 7 8 16 10 15 6 14 9
5 4 9 3 5 4 3 8 3
6 3 3 2 1 2 1 2 4
7 2 0 1 0 1 1 1 1
8 1 0 0 1 1 4 0 1
9 0 0 1 1 0 0 0 0
10 0 0 0 0 0 0 1 1
11 0 0 0 0 1 1 0 0
12 1 0 0 0 0 1 0 0
13 1 0 0 0 0 1 0 0
14 101 126 19 47 7 17 2 4",
header=TRUE)
b2 <- transform(melt(beetles,id.var="time"),
sex=substr(variable,1,1),
conc=as.numeric(substr(variable,2,3))/100)
b2 <- subset(b2,select=-variable) ## don't need this any more
## compute number exposed at each time
b3 <- subset(ddply(b2,c("sex","conc"),
transform,
tot=sum(value)-
c(0,head(cumsum(value),-1))),
## head(x,-1) == x[-length(x)] == all but last value
time<14)
options(contrasts=c("contr.SAS","contr.poly"))
summary(fit1 <- glm(value/tot~sex+conc+factor(time),
family=binomial(link="cloglog"),
weight=tot,data=b3))
##
## Call:
## glm(formula = value/tot ~ sex + conc + factor(time), family = binomial(link = "cloglog"),
## data = b3, weights = tot)
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.9424 -0.8720 -0.4057 0.6594 2.9201
##
## Coefficients:
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -5.660206 0.713378 -7.934 2.12e-15 ***
## sexf -0.565081 0.114179 -4.949 7.46e-07 ***
## conc 3.091822 0.231186 13.374 < 2e-16 ***
## factor(time)1 1.163751 0.740745 1.571 0.116170
## factor(time)2 2.220041 0.721443 3.077 0.002089 **
## factor(time)3 2.696645 0.718061 3.755 0.000173 ***
## factor(time)4 3.099828 0.716508 4.326 1.52e-05 ***
## factor(time)5 2.603172 0.725483 3.588 0.000333 ***
## factor(time)6 1.989159 0.745246 2.669 0.007605 **
## factor(time)7 1.124771 0.801253 1.404 0.160389
## factor(time)8 1.303467 0.790595 1.649 0.099206 .
## factor(time)9 -0.058914 1.000433 -0.059 0.953041
## factor(time)10 -0.039923 0.997448 -0.040 0.968073
## factor(time)11 -0.018043 0.999671 -0.018 0.985600
## factor(time)12 -0.007843 1.000270 -0.008 0.993744
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for binomial family taken to be 1)
##
## Null deviance: 632.78 on 103 degrees of freedom
## Residual deviance: 137.19 on 89 degrees of freedom
## AIC: 361.61
##
## Number of Fisher Scoring iterations: 6
arm::coefplot(fit1)
b3$pred <- predict(fit1,type="response")
ggplot(b3,aes(time,value/tot,colour=conc,shape=sex))+
geom_point(aes(size=tot),alpha=0.5)+
geom_line(aes(y=pred,group=conc))+
facet_wrap(~sex)
b4 <- ddply(b3,c("sex","conc"),mutate,
dmax=tot[1],
ptot=tot/dmax)
ggplot(b4,aes(time,ptot,colour=conc,lty=sex,shape=sex))+
geom_line(aes(group=interaction(conc,sex),size=dmax))+
## facet_wrap(~sex)+
scale_size_continuous(range=c(1,3))
From the picture, there should be a sex \(\times\) concentration interaction, and possibly a sex \(\times\) concentration \(\times\) time interaction …
Can we do better by fitting a smooth curve to the mortality as a function of time? (This is interesting but doesn’t resolve the non-additivity …)
fit2 <- gam(value/tot~sex+conc+s(time),
family=binomial(link="cloglog"),
weight=tot,data=b3)
pframe <- expand.grid(time=seq(0,13,length=51),
conc=unique(b3$conc),sex=levels(b3$sex))
pframe$pred <- predict(fit2,newdata=pframe,
type="response")
ggplot(b3,aes(time,value/tot,colour=conc,shape=sex))+
geom_point(aes(size=tot),alpha=0.5)+
geom_line(data=pframe,aes(y=pred,group=conc))+
facet_wrap(~sex)