Playing with complementary log-log links

library(plyr)
library(reshape2) ## for melt()
library(ggplot2); theme_set(theme_bw())
library(mgcv) ## for gam()

Looked at some SAS documentation, which does convince me that the exponentiated cloglog GLM parameter is in fact the hazard ratio. Here’s the argument:

If we define \(\lambda=1-\exp(-\exp(\eta))\) as the discrete-time hazard, with an implicit exposure time of 1 unit, then \(\eta\) is the continuous-time log-hazard; \(\gamma=\exp(\eta)\) is the continuous-time hazard, so the probability of mortality is \(1-\exp(-\gamma t)\). This is consistent with taking \(\lambda=1-\exp(-\exp(\eta+\log(\tau)))\) where \(\tau\) is the exposure time. This implies the statement above, that the exponential of a GLM parameter is the hazard ratio.

An alternative (perhaps more confusing!) way to think about this: The complementary log-log link is \(\eta=\log(-\log(1-p))\); the inverse link is \(p=1-\exp(-\exp(\eta))\). Therefore if we have a baseline probability of \(p_0\) and an effect size of \(\alpha\), and \(\beta=\exp(\alpha)\), then

(note formatting below renders correctly with RStudio “knit HTML” button, but not with knitr::knit2html)

\[ \begin{split} \eta_0 & = \log(-\log(1-p_0)) \\ \eta_1 & = \log(-\log(1-p_0))+\alpha \\ & = \log(-\log(1-p_0))+\log(\beta) \\ p_1 & = 1-\exp(-\exp(\log(-\log(1-p_0))+\log(\beta))) \\ & = 1-\exp(-\beta(-\log(1-p_0))) \\ & = 1-\exp(\beta \log(1-p_0)) \\ & = 1-(1-p_0)^\beta \end{split} \]

Thus if we have an effect of \(\alpha\) on the complementary log-log scale, its effect on the original scale will be to raise survival (complementary) probability to a power \(\exp(\alpha)\). (If \(\alpha\) is small there should be some useful simple approximation, but I can’t figure it out at the moment; \(\exp(\alpha) \approx 1+\alpha\) in that case, so \((1-p)^(1+\alpha) \approx (1-p)\cdot(1-p)^\alpha\) …)

f <- binomial(link="cloglog")
cloglog <- f$linkfun
invcloglog <- f$linkinv
p0 <- 0.3
alpha <- 1.6
eta0 <- cloglog(p0)
invcloglog(eta0+alpha)
## [1] 0.8290907
1-(1-p0)^exp(alpha)
## [1] 0.8290907

So the probability of survival is raised to the \(\exp(\alpha)\)=4.95: rounding, we can see that this is about right:

(1-p0)^5
## [1] 0.16807

It may be useful in general to look at the inverse link function to get an idea of the scaling of the link function (i.e. the x axis here shows “cloglog units”):

par(las=1,bty="l")
curve(invcloglog(x),from=-3,to=3)

Then I decided to play with the example on the SAS web page, for fun.

beetles <- read.table(text="
time m20 f20 m32 f32 m50 f50 m80 f80                  
 1   3   0  7  1  5  0  4  2
 2  11   2 10  5  8  4 10  7
 3  10   4 11 11 11  6  8 15
 4   7   8 16 10 15  6 14  9
 5   4   9  3  5  4  3  8  3
 6   3   3  2  1  2  1  2  4
 7   2   0  1  0  1  1  1  1
 8   1   0  0  1  1  4  0  1
 9   0   0  1  1  0  0  0  0
10   0   0  0  0  0  0  1  1
11   0   0  0  0  1  1  0  0
12   1   0  0  0  0  1  0  0
13   1   0  0  0  0  1  0  0
14 101 126 19 47  7 17  2  4",
 header=TRUE)
b2 <- transform(melt(beetles,id.var="time"),
          sex=substr(variable,1,1),
          conc=as.numeric(substr(variable,2,3))/100)
b2 <- subset(b2,select=-variable) ## don't need this any more
## compute number exposed at each time
b3 <- subset(ddply(b2,c("sex","conc"),
                   transform,
                   tot=sum(value)-
                      c(0,head(cumsum(value),-1))),
                   ## head(x,-1) == x[-length(x)] == all but last value
             time<14)
options(contrasts=c("contr.SAS","contr.poly"))
summary(fit1 <- glm(value/tot~sex+conc+factor(time),
           family=binomial(link="cloglog"),
           weight=tot,data=b3))
## 
## Call:
## glm(formula = value/tot ~ sex + conc + factor(time), family = binomial(link = "cloglog"), 
##     data = b3, weights = tot)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.9424  -0.8720  -0.4057   0.6594   2.9201  
## 
## Coefficients:
##                 Estimate Std. Error z value Pr(>|z|)    
## (Intercept)    -5.660206   0.713378  -7.934 2.12e-15 ***
## sexf           -0.565081   0.114179  -4.949 7.46e-07 ***
## conc            3.091822   0.231186  13.374  < 2e-16 ***
## factor(time)1   1.163751   0.740745   1.571 0.116170    
## factor(time)2   2.220041   0.721443   3.077 0.002089 ** 
## factor(time)3   2.696645   0.718061   3.755 0.000173 ***
## factor(time)4   3.099828   0.716508   4.326 1.52e-05 ***
## factor(time)5   2.603172   0.725483   3.588 0.000333 ***
## factor(time)6   1.989159   0.745246   2.669 0.007605 ** 
## factor(time)7   1.124771   0.801253   1.404 0.160389    
## factor(time)8   1.303467   0.790595   1.649 0.099206 .  
## factor(time)9  -0.058914   1.000433  -0.059 0.953041    
## factor(time)10 -0.039923   0.997448  -0.040 0.968073    
## factor(time)11 -0.018043   0.999671  -0.018 0.985600    
## factor(time)12 -0.007843   1.000270  -0.008 0.993744    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 632.78  on 103  degrees of freedom
## Residual deviance: 137.19  on  89  degrees of freedom
## AIC: 361.61
## 
## Number of Fisher Scoring iterations: 6
arm::coefplot(fit1)

b3$pred <- predict(fit1,type="response")
ggplot(b3,aes(time,value/tot,colour=conc,shape=sex))+
    geom_point(aes(size=tot),alpha=0.5)+
    geom_line(aes(y=pred,group=conc))+
    facet_wrap(~sex)

b4 <- ddply(b3,c("sex","conc"),mutate,
            dmax=tot[1],
            ptot=tot/dmax)
ggplot(b4,aes(time,ptot,colour=conc,lty=sex,shape=sex))+
    geom_line(aes(group=interaction(conc,sex),size=dmax))+
    ## facet_wrap(~sex)+
    scale_size_continuous(range=c(1,3))

From the picture, there should be a sex \(\times\) concentration interaction, and possibly a sex \(\times\) concentration \(\times\) time interaction …

Can we do better by fitting a smooth curve to the mortality as a function of time? (This is interesting but doesn’t resolve the non-additivity …)

fit2 <- gam(value/tot~sex+conc+s(time),
           family=binomial(link="cloglog"),
           weight=tot,data=b3)
pframe <- expand.grid(time=seq(0,13,length=51),
                         conc=unique(b3$conc),sex=levels(b3$sex))
pframe$pred <- predict(fit2,newdata=pframe,
                    type="response")
ggplot(b3,aes(time,value/tot,colour=conc,shape=sex))+
    geom_point(aes(size=tot),alpha=0.5)+
    geom_line(data=pframe,aes(y=pred,group=conc))+
    facet_wrap(~sex)