## Back-transformation and meanings of back-transformed parameters

(This isn’t particularly fleshed out, but I’ll post it until (if and when) I get around to updating it)

Let’s set up a really simple example: two groups. We’ll fit with a no-intercept model so that we can just think about the issues of back-transforming coefficients, not go through the additional step of computing predictions. (However, the basic issues are exactly the same.)

### Generalized linear models

set.seed(101)
d <- data.frame(f=paste0("f",rep(1:2,each=20)),
y=rpois(40,lambda=rep(c(3,6),each=20)))

Compute means:

## base R:
get_mean <- function(d) with(d,tapply(y,list(f),mean))
get_mean(d)
##   f1   f2
## 2.95 6.15
## alternatively, with the 'new hotness':
library("dplyr")
d %>% group_by(f) %>% summarise(m=mean(y))
## Source: local data frame [2 x 2]
##
##    f    m
## 1 f1 2.95
## 2 f2 6.15

Alternatively:

get_mean_model <- function(d,invlink,...) {
g1 <- glm(y~f-1,data=d,...)
}
get_mean_model(d,exp,family="poisson")
##  ff1  ff2
## 2.95 6.15

This works perfectly for (simple) GLMs. It would get more complicated in the case of GLMMs, where we would have to worry about

Now let’s suppose we have logit-Normally distributed data:

link <- qlogis    ## logit
d2 <- data.frame(f=paste0("f",rep(1:2,each=20)),
m1 <- get_mean(d2)
family="gaussian",
invlink=invlink)
g1 <- lm(link(y)~f-1,data=d2)
backtransf <- function(lmfit) {
## second derivative
d2fun <- deriv(D(expression(1/(1+exp(-x))),"x"),"x",
function.arg=TRUE)
x <- coef(lmfit)
s <- summary(lmfit)$sigma return(plogis(x)+d2val*s^2/2) } m3 <- backtransf(g1) The other way to solve this problem is to back-transform simulated data: backtransf_sim <- function(lmfit,n=100000) { x <- coef(lmfit) s <- summary(lmfit)$sigma
cbind(raw=m1,bt=m2,bt_corr=m3,bt_corr_sim=m4)
##          raw        bt   bt_corr bt_corr_sim
## f2 0.9947408 0.9978494 0.9961176   0.9952491