Back-transformation and meanings of back-transformed parameters

(This isn’t particularly fleshed out, but I’ll post it until (if and when) I get around to updating it)

Let’s set up a really simple example: two groups. We’ll fit with a no-intercept model so that we can just think about the issues of back-transforming coefficients, not go through the additional step of computing predictions. (However, the basic issues are exactly the same.)

Generalized linear models

set.seed(101)
d <- data.frame(f=paste0("f",rep(1:2,each=20)),
                y=rpois(40,lambda=rep(c(3,6),each=20)))

Compute means:

## base R:
get_mean <- function(d) with(d,tapply(y,list(f),mean))
get_mean(d)

##   f1   f2 
## 2.95 6.15

## alternatively, with the 'new hotness':
library("dplyr")
d %>% group_by(f) %>% summarise(m=mean(y))

## Source: local data frame [2 x 2]
## 
##    f    m
## 1 f1 2.95
## 2 f2 6.15

Alternatively:

get_mean_model <- function(d,invlink,...) {
    g1 <- glm(y~f-1,data=d,...)
    invlink(coef(g1))
}
get_mean_model(d,exp,family="poisson")

##  ff1  ff2 
## 2.95 6.15

This works perfectly for (simple) GLMs. It would get more complicated in the case of GLMMs, where we would have to worry about

Now let’s suppose we have logit-Normally distributed data:

link <- qlogis    ## logit
invlink <- plogis ## logistic
d2 <- data.frame(f=paste0("f",rep(1:2,each=20)),
                y=invlink(rnorm(40,mean=rep(c(3,6),each=20),sd=1)))
m1 <- get_mean(d2)
m2 <- get_mean_model(transform(d2,y=link(y)),
               family="gaussian",
               invlink=invlink)

g1 <- lm(link(y)~f-1,data=d2)
backtransf <- function(lmfit) {
    ## second derivative
    d2fun <- deriv(D(expression(1/(1+exp(-x))),"x"),"x",
                   function.arg=TRUE)
    x <- coef(lmfit)
    d2val <- c(attr(d2fun(x),"gradient"))
    s <- summary(lmfit)$sigma
    return(plogis(x)+d2val*s^2/2)
}
m3 <- backtransf(g1)

The other way to solve this problem is to back-transform simulated data:

backtransf_sim <- function(lmfit,n=100000) {
    x <- coef(lmfit)
    s <- summary(lmfit)$sigma
    sim <- lapply(x,rnorm,n=n,sd=s)
    sapply(sim,function(x) mean(plogis(x)))
}
m4 <- backtransf_sim(g1)
cbind(raw=m1,bt=m2,bt_corr=m3,bt_corr_sim=m4)

##          raw        bt   bt_corr bt_corr_sim
## f1 0.9540271 0.9680887 0.9446481   0.9407790
## f2 0.9947408 0.9978494 0.9961176   0.9952491