Interpreting parameters of linear models

Coding for categorical predictors: contrasts

What do the parameters of a linear model mean?
Start with categorical variables, because they're potentially more confusing (“intercept and slope” isn't too hard)
Default R behaviour: treatment contrasts. $ \beta_1 $ = expected value in baseline group (= first level of the factor variable, by default the first in alphabetical order); $ \beta_i $ = expected difference between group $ i $ and the first group.

For example, if we took the ants example from our permutations and ran a linear model, the results would look like this:

pr <- function(m) printCoefmat(coef(summary(m)),digits=3,signif.stars=FALSE)
pr(lm1 <- lm(colonies~place,data=ants))

##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)    10.75       0.98   10.97  4.2e-06
## placeforest    -3.75       1.27   -2.96    0.018

(pr is just a utility function I made up for printing compact summaries of the estimates from a linear regression: summary() would give the same information plus a lot more I'm not interested in right now.) The (Intercept) row refers to $ \beta_1 $, which is the mean density in the “field” sites (“field” comes before “forest”). The placeforest row tells us we are looking at the effect of the place variable on the forest level, i.e. the difference between the “forest” and “field” sites. (The only ways we could know that “field” is the baseline site are (1) to remember, or look at levels(ants$place) or (2) to notice which level is missing from the list of parameter estimates.)

R's behaviour may seem annoyng at first – it seems like the estimated values of the groups are what we're really interested in – but it is really designed for testing differences among groups. To get the estimates per group (what SAS calls LSMEANS), you could:

use a regression formula colonies~place-1, or equivalently colonies~place+0, to suppress the implicit intercept term:

##             Estimate Std. Error t value Pr(>|t|)
## placefield     10.75       0.98   10.97  4.2e-06
## placeforest     7.00       0.80    8.75  2.3e-05

Use the predict function:

predict(lm1,newdata=data.frame(place=c("field","forest")),interval="confidence")

Use the effects package:

library("effects")
summary(allEffects(lm1))

You can (and should) look at graphical summaries of your results via

plot(allEffects(lm1))

library("coefplot2")
coefplot2(lm0,intercept=TRUE)

(or coefplot2(lm1)).

When you use the colonies~place-1 formula, the meanings of the parameters change: $ \beta_1 $ is the same (mean of “field”), but $ \beta_2 $ is 'mean of “field”' rather than (“(field)-(forest)”).

Things get slightly more interesting/complicated when we have more than two levels of a categorical variable. I'll look at some data on lizard perching behaviour, from the brglm package (and before that from McCullagh and Nelder Generalized Linear Models 1989, ultimately from Schoener 1970 Ecology 51:408-418). I'm going to ignore the fact that these data might best be fitted with generalized linear models.

lizards <- read.csv("lizards.csv")

A quick look at the data: response is number of Anolis grahami lizards found on a perches in particular conditions. plot of chunk unnamed-chunk-5

For a moment we're going to just look at the time variable. If we leave the factors as is (alphabetical) then $ \beta_1 $=“early”, $ \beta_2 $=“late”-“early”, $ \beta_3 $=“midday”-“early”. At the very least, it probably makes sense to change the order of the levels:

lizards$time <- factor(lizards$time,levels=c("early","midday","late"))

All this does (since we haven't changed the baseline factor) is swap the definitions of $ \beta_2 $ and $ \beta_3 $.

In a linear model, we could also use so-called sum-to-zero contrasts:

pr(lm(grahami~time,data=lizards,contrasts=list(time=contr.sum)))

##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)    19.30       3.53    5.47  2.4e-05
## time1          -1.67       4.93   -0.34     0.74
## time2          12.85       5.10    2.52     0.02

Now the (Intercept) parameter is the overall mean: time1 and time2 are the deviations of the first (“early”) and second (“midday”) groups from the overall mean. (The names are useless: the car package offers a slightly better alternative called contr.Sum). There are other ways to change the contrasts (i.e., use the contrasts() function to change the contrasts for a particular variable permanently, or use options(contrasts=c("contr.sum","contr.poly"))) to change the contrasts for all variables), but the way shown above may be the most transparent.

There are other options for contrasts such as MASS::contr.sdif(), which gives the successive differences between levels.

library("MASS")
pr(lm(grahami~time,data=lizards,contrasts=list(time=contr.sdif)))

##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)    19.30       3.53    5.47  2.4e-05
## time2-1        14.52       8.74    1.66    0.112
## time3-2       -24.02       8.74   -2.75    0.012

You might have particular contrasts in mind (e.g. “control” vs. all other treatments, then “low” vs “high” within treatments), in which case it is probably worth learning how to set contrasts. (We will talk about testing all pairwise differences later, when we discuss multiple comparisons. This approach is probably not as useful as it is common.)

Multiple treatments and interactions

Let's consider the light variable in addition to time.

pr(lmTL1 <- lm(grahami~time+light,data=lizards))

##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)    27.29       5.63    4.85  0.00011
## timemidday     13.14       7.11    1.85  0.08010
## timelate       -9.50       6.85   -1.39  0.18174
## lightsunny    -19.32       5.73   -3.37  0.00321

Here's a graphical interpretation of the parameters:

plot of chunk lizardcontrasts1

$ \beta_1 $ is the intercept (“early”,“sunny”); $ \beta_2 $ and $ \beta_3 $ are the differences from the baseline level (“early”) of the first variable (time) in the baseline level of the other parameter(s) (light=“shady”); $ \beta_4 $ is the difference from the baseline level (“sunny”) of the second variable (light) in the baseline level of time (“early”).

Now let's look at an interaction model:

pr(lmTL2 <- lm(grahami~time*light,data=lizards))

##                       Estimate Std. Error t value Pr(>|t|)
## (Intercept)              23.50       5.38    4.37  0.00042
## timemidday               27.75       7.60    3.65  0.00198
## timelate                -12.75       7.60   -1.68  0.11180
## lightsunny              -11.75       7.60   -1.55  0.14061
## timemidday:lightsunny   -32.83      11.19   -2.93  0.00927
## timelate:lightsunny       6.50      10.75    0.60  0.55343

plot of chunk lizardcontrasts2

Parameters $ \beta_1 $ to $ \beta_4 $ have the same meanings as before. Now we also have $ \beta_5 $ and $ \beta_6 $, labelled “timemidday:lightsunny” and “timelate:lightsunny”, which describe the difference between the expected mean value of these treatment combinations based on the additive model (which are $ \beta_1 + \beta_2 + \beta_4 $ and $ \beta_1 + \beta_3 + \beta_4 $ respectively) and their actual values.

Now re-do this for sum-to-zero contrasts … the fits are easy:

pr(lmTL1S <- update(lmTL1,contrasts=list(time=contr.sum,light=contr.sum)))

##             Estimate Std. Error t value Pr(>|t|)
## (Intercept)    18.84       2.87    6.57  2.7e-06
## time1          -1.21       4.01   -0.30   0.7654
## time2          11.93       4.15    2.87   0.0097
## light1          9.66       2.87    3.37   0.0032

pr(lmTL2S <- update(lmTL2,contrasts=list(time=contr.sum,light=contr.sum)))

##              Estimate Std. Error t value Pr(>|t|)
## (Intercept)    18.236      2.255    8.09  3.1e-07
## time1          -0.611      3.146   -0.19  0.84830
## time2          10.722      3.271    3.28  0.00444
## light1         10.264      2.255    4.55  0.00028
## time1:light1   -4.389      3.146   -1.39  0.18100
## time2:light1   12.028      3.271    3.68  0.00187

(The intercept doesn't stay exactly the same when we add the interaction because the data are unbalanced: try with(lizards,table(light,time)))

The pictures will be a pain – maybe draw by hand?

Other refs

http://sas-and-r.blogspot.com/2010/10/example-89-contrasts.html
gmodels::fit.contrast() (show parameter estimates based on re-fitting models with new contrasts), rms::contrast.rms() (ditto, for rms-based fits)
http://www.ats.ucla.edu/stat/r/library/contrast_coding.htm
Crawley Statistical Computing: An Introduction to Data Analysis using S-PLUS, chapter 18

library("wwiki")
options(MWUser="Bb")
pushWiki("contrasts2.rmd",page="linear model parameters",wiki="bio_708")