Edit: I expanded the middle section substantially to better help convey what the numbers/estimates meant in each of the different types of models and cleaned up my writing to hopefully better convey my thoughts.

Background/Introduction

This demonstration arose from a long discussion we had in our lab (over several months) about the utility of using sum contrasts in regression models that contain interaction terms.

As a cognitive psychology grad student, I mainly deal with categorical data and in most cases. While I have largely used ANOVAs and t-tests primarily in the past, I’ve begun using linear mixed-effects models a lot more in recent years because of their ability to deal with missing data and unequall cells. However, as with most linear models, the defult LME model uses dummy/treatment coding. This caused some headache for me initially when I examined interaction effects, as the LME models would display the marginal effects rather than main effects.

The purpose of this demonstration is mainly to document my experience and understanding of how to interpret LM or LM-like outputs for categorical IVs in the presence (and absence) of an interaction term, as well as how to translate marginal effects to main effects using sum contrasts. I will also demonstrate how the model estimates map onto condition means (including calculations) and their relation with one another, as well as how the output of LM compare to ANOVA outputs.

I want to emphasize here that a significant portion of what I show here are based on my own learning experience in stats classes and playing around with the models/numbers/calculations myself. Where I had trouble, I tried to read up on other people’s explanation to gain a better understanding; this post and this post were both helpful for undersatnding how to interpret the interaction term when a sum contrast is used. All this is to say is that this documentation/demonstration is by no means perfect, nor am I an expert on the subject.

For the purposes of this demonstration, I will be using a fake data set (courtesy of my supervisor) with 2 categorical variables. The first factor is “Gender” with two levels, “Woman” (level 1) and “Man” (level 2). The second factor is “Strategy” which also has 2 levels: “Hide” (level 1) and “Seek” (level 2). The first level of each factor is also the reference level. I will be constructing linear regression models using this example data set, but the same interpretation is generalizable to other types of regression models (e.g., glm, lmer, etc).

Note that this is a very basic example of how to interpret LM outputs with and without sum contrasts; for categorical models with more than 2 levels, the interpretation would be slightly different.

Here is the full data set:

##    Suj Gender Strategy        RT
## 1    1    Man     Seek 621.06016
## 2    2    Man     Seek 142.51360
## 3    3    Man     Seek 786.46244
## 4    4    Man     Seek 890.81441
## 5    5    Man     Hide  19.92714
## 6    6    Man     Hide 860.28039
## 7    7    Man     Hide 254.22702
## 8    8    Man     Hide  43.81335
## 9    9  Woman     Seek 201.76769
## 10  10  Woman     Seek 952.44865
## 11  11  Woman     Seek 252.15705
## 12  12  Woman     Seek 761.46274
## 13  13  Woman     Hide 727.73781
## 14  14  Woman     Hide 460.28642
## 15  15  Woman     Hide 170.63794
## 16  16  Woman     Hide 569.04296

And here is a graph showing the means for each of the 2x2 conditions:

   

Default Linear Model (Dummy/Treatment Coded)

By default, when a linear regression is constructed, it will use what is known as dummy or treatment coding. In this type of coding, the reference level is coded as 0 by default. In this example, Gender = Woman is coded as 0 (Man as 1) and Strategy = Hide is coded as 0 (Seek as 1).

First, let’s construct a linear regression with Gender, Strategy, and their interaction term as predictors, and Response Time as the criterion variable.

  RT
Predictors Estimates CI Statistic p
(Intercept) 481.93 113.78 – 850.08 2.85 0.015
Gender [Man] -187.36 -708.01 – 333.28 -0.78 0.448
Strategy [Seek] 60.03 -460.61 – 580.67 0.25 0.806
Gender [Man] * Strategy
[Seek]
255.62 -480.68 – 991.92 0.76 0.464
Observations 16
R2 / R2 adjusted 0.139 / -0.077

 

Let’s break down each component in the table:

At the top of the table, we have Predictors, which are the names of each of the predictor. Note that there is a row dedicated to the intercept/constant in the model. This is not something you see in an ANOVA output, and I’ll explain what this is in the context of the model shortly. Estimates represent the estimated unstandardized beta coefficients of each predictor. CI is the 95% confidence interval. Statistic here refers to the t-value of the estimate and p is self-explanatory.

In this model, the Intercept corresponds to the reference condition. That is, the group mean when Gender = Woman (level 1) and Strategy = Hide (level 1). If we look at the graph, we can see that the estimate for the intercept 481.93 matches the mean for that group 481.93. In regression models, all of the remaining estimates are interpreted relative to the reference level.

The estimate associated with Gender [Man] is the marginal effect of the Man/Hide condition compared to the reference condition. That is, the difference between these cells. If we add the intercept to the estimate associated with GenderMan we get 481.93 + (-187.36) = 294.56, which is the group mean for the the Man/Hide condition.

Similarly, the estimate for Strategy [Seek] corresponds the marginal effect of the Woman/Seek condition compared to the reference condition; adding adding this estimate to the intercept, we get 481.93 + 60.03 = 541.96 , which is again the correct mean for the Woman/Seek condition.

The interaction term, Gender [Man]:Strategy [Seek] represents the “differeces between the differences” across each condition. For example, take the difference between the Hide and Seek levels when Gender for Women (541.96 - 481.93) and then subtract it from the difference between the Hide and Seek levels for Men (610.21 - 294.56), we obtain the estimate for the interaction term. In other words, (610.21 - 294.56) - (541.96 - 481.93) = 255.62. Using the estimate for the interaction term, we will be able to get the mean for the last group (Man/Seek). To do this, we add up all of the cells: 481.93 + (-187.36) + 60.03 + 255.62 = 610.21.

In this example model, we can see that when we include an interaction term in a (default) regression model, the estimates represent marginal effects relative to the reference level.

 

Linear Models with Sum Contrast

What if you’re not interested in marginal effects and want to see if there are main effects of Gender or Strategy? This is where you can use sum contrasts (also known as deviation coding). As mentioned earlier, a default regression model uses dummy coding. This means that if there are 2 levels within a categorical factor, they would be coded as 0 (first or reference level) and 1 (second level). In sum contrasts, these numerical codings are changed such that the first level becomes -0.5 (or -1) and the second level becomes 0.5 (or 1). Although in a sum contrast coded model there isn’t an explicit 0 level, 0 is still used as the level reference; but rather than representing the mean of any particular condition, the value when the reference level is at 0 now represents the “mean” effect of that factor. If you are familiar with using continuous predictors in regression models, sum contrasts are similar to mean-centering a particular continuous predictor.

Below is a linear model wherein sum contrasts have been applied to both categorical factors:

  RT
Predictors Estimates CI Statistic p
(Intercept) 482.16 298.09 – 666.24 5.71 <0.001
Gender1 29.78 -154.30 – 213.85 0.35 0.731
Strategy1 -93.92 -278.00 – 90.15 -1.11 0.288
Gender1 : Strategy1 63.90 -120.17 – 247.98 0.76 0.464
Observations 16
R2 / R2 adjusted 0.139 / -0.077

 

Again, we have 4 predictors with names that are virtually identical to the names in the previous model except that the notation after Gender and Strategy has changed to 1. The 1 here represents the first level of that factor (in other words, the reference level). Let’s again break down what the estimates mean for each of these 4 predictors.

When sum contrasts are used for all categorical factors, we’ve effectively “mean-centered” all of these categorical variables. As such, the Intercept in a regression model that uses sum contrasts becomes the grand mean of all of the cells. We can confirm that this is the case by averaging across the means for all 4 conditions: (481.93 + 541.96 + 294.56 + 610.21)/4 = 482.16. As with the previous model/example, the remaining estimates need to be interpreted relative to the intercept/reference value.

The estimate for Gender1 is the “main effect” of Gender. I used quotation around the term “main effect” here because main effects are traditionally interpreted as the difference between levels of a given factor, collapsed across all other factors. I’ve plotted the main effect of Gender here:

 

As we can see, the difference between the main effect (Woman - Man) should be 511.94 - 452.39 = 59.56. But that’s not what’s shown for the estimate for Gender1. In fact, the estimate (29.78) is exactly half of that. This is because this “main effect” needs to be interpreted relative to the intercept of the model. That is, 29.78 is the difference between the mean score for Women compared to the grand mean. We can test this by adding the estimate for Gender1 to the grand mean: 482.16 + 29.78 = 511.94. To get the condition mean for the Man condition, simply subtract the estimate from the grand mean.

Similarily, Strategy1 is the (halved) “main effect” of Strategy. Again, we can see that if add this estimate to the grand mean, we get the group mean for the first level of Strategy (i.e., the Hide condition): 482.16 + (-93.92) = 388.24.

One thing worth noting is that the idea of a “halved main effect” only works if there are 2 levels for a given categorical variable; when there are more than 2 levels, this interpretation would not make sense.

 

Compared to in the previous regression model, the Gender1:Strategy1 interaction term is less straightforward. It’s the difference between half the difference between Factor A (e.g., levels 1 and 2 of Gender ) at one level of Factor B (e.g., level 1 of Strategy) and half the main effect of Factor A. Let’s calculate this using an example. The averaged difference across Woman and Man at the ‘Hide’ level of Strategy is (481.93 - 294.56)/2 = 93.68. The averaged difference across Gender is (511.94 - 452.39)/2 = 29.78. We then subtract the difference of the main effect from the marginal effect at one of the levels, and we get 93.68 - 29.78 = 63.9. You can also try it yourself using a different combination of main and marginal effect, and it will give you the same result of 63.90.

One thing to note is that regardless of whether it’s a default LM model or one with sum contrasts, the t/p-value remain the same even though the estimates change. We can see that in both of the example models, the t-value for the interaction term is always 0.76 and the p-value is always 0.46.

 

Regression Model Without Interaction Terms

Some people might take the approach of constructing 2 models – a model with an interaction term and a model without – in order to obtain both the estimate for the main effect to avoid doing sum contrasts. I would argue that this might not be the best approach for a few reasons.

Let’s take a look at a regression model without an interaction term:

  RT
Predictors Estimates CI Statistic p
(Intercept) 418.02 107.14 – 728.90 2.90 0.012
Gender [Man] -59.56 -418.53 – 299.42 -0.36 0.726
Strategy [Seek] 187.84 -171.13 – 546.82 1.13 0.279
Observations 16
R2 / R2 adjusted 0.098 / -0.041

 

In a regression model without an interaction term, the estimate for the Gender [Man] predictor does in fact reflect the main effect of gender. As demonstrated in the section above, if we look at the difference between the condition means between the two levels of Gender, it does correspond to 59.56(the estimate is negative in this model because Gender = Man is the second level, which means that relative to the reference level, Men’s response time is 59.56 units lower).

Likewise, the estimate for Strategy [Seek] is the difference between the average score for Hide and the average score for seek, collapsed across levels of Strategy.

In fact, these two estimates may be a more accurate representataion of the main effect of Gender and Strategy. So what’s wrong with using this approach?

The first thing to point out is the value for the Intercept. As explained previously, the intercept always reflects the estimated value when all factors are at their reference level (i.e., 0). But if we look at the intercept value (418.02), it’s way off from the condition mean for the Woman/Hide condition (481.93). The reason for this difference is that in this additive model, the intercept was estimated based on the assumption that there is no interaction term. Put differently, this model assumes that these two factors are completely orthogonal/independent from each other. Let’s make a graph based on the estimated means. The intercept refers to the reference condition (Woman/Hide); Gender [Man] refers to the Man/Hide condition; Strategy [seek] refers to the Woman/Seek condition; and we can obtain the value of the last condition by adding all 3 estimates (we don’t require any other information since we’ve assumed that the 2 categorical variables are orthogonal).

 

In the graph above, I’ve overlayed the estimated condition means (open circle, dotted lines) based on the additive model on top of the actual condition means. The estimated cell means are printed in brackets. Notice that the lines between the estimated condition means in the additive model are parallel to each other.

One thing you will notice is that if you were to add the interaction term we estimated in the model containing the sum contrast, you do get the correct value (418.02 + 63.90 = 481.93).

The second thing to point out is that while the estimated main effects may be correct, the t- and p-value between a model with and without the interaction are not the same.

For comparison’s sake, let’s look at this same linear model, but use sum contrasts for each of the categorical predictors. Let’s then and compare it with a sum contrast model that includes an interaction term (the same model from the above section).

  RT RT
Predictors Estimates CI Statistic p Estimates CI Statistic p
(Intercept) 482.16 302.68 – 661.65 5.80 <0.001 482.16 298.09 – 666.24 5.71 <0.001
Gender1 29.78 -149.71 – 209.27 0.36 0.726 29.78 -154.30 – 213.85 0.35 0.731
Strategy1 -93.92 -273.41 – 85.57 -1.13 0.279 -93.92 -278.00 – 90.15 -1.11 0.288
Gender1 : Strategy1 63.90 -120.17 – 247.98 0.76 0.464
Observations 16 16
R2 / R2 adjusted 0.098 / -0.041 0.139 / -0.077

 

We can see here that the intercepts for the two models are the same: both represent the grand mean. We can also see that the estimates for both Gender and Strategy are the same. However, we can also see that the confidence intervals as well as the t- and p-values, while close, are the same between the two models.

 

ANOVA vs Linear Regression

The last thing I want to highlight is the idea that ANOVA is just a special case of regression with categorical predictors. That is, the output of an ANOVA model is the same as if you ran a regression model and used sum contrasts. To illustrate, below is an ANOVA table using the same data set.

## Warning: Converting "Suj" to factor for ANOVA.
## Coefficient covariances computed by hccm()
## $ANOVA
##            Effect DFn DFd       SSn     SSd         F         p p<.05
## 1          Gender   1  12  14187.36 1370405 0.1242321 0.7306020      
## 2        Strategy   1  12 141138.04 1370405 1.2358802 0.2880435      
## 3 Gender:Strategy   1  12  65340.52 1370405 0.5721566 0.4639950      
##          ges
## 1 0.01024659
## 2 0.09337348
## 3 0.04550982
## 
## $`Levene's Test for Homogeneity of Variance`
##   DFn DFd     SSn      SSd         F         p p<.05
## 1   3  12 46565.4 506895.3 0.3674558 0.7778481

 

If you were to take the square root of the F values, you can see that they are identical to the t-values in the regression model that uses sum contrast for Gender1, Strategy1, and the interaction term:

## [1] 0.352 1.112 0.756

The p-values are also the same:

## [1] 0.731 0.288 0.464

Concluding Remarks

Hopefully with all of the explanations and calculations I’ve provided using these example models, I’ve demonstrated the utility of using sum contrasts to interpret main effects for regression models with categorical predictors.

Main takeaways:

  1. The intercept reflects the estimate when all conditions equal 0; for dummy coded variables, this refers to the reference, or first level, of all conditions; for sum contrast coded variables, this refers to the mid-poing between the categories.

  2. For a dummy coded LM model that includes an interaction term, the estimates for each predictor represents the marginal effect of a given conditiion relative to the reference condition.

  3. We can switch from having estimates that reflect marginal effects to estimates that reflect main effects by using sum contrasts.