Comparing coefficients in linear models

This tutorial shows a few examples on how to interpret and compare coefficients of fitted linear models. It might be useful to also look through the notes of a previous R session:

2013-03-28 : A linear model exercise

Pupae data (lm)

The pupae data are described in the R manual, and we used this dataset many times during the R course. We first read the data and make sure Gender and CO2 treatment are factors (they are both coded as numbers).

pupae <- read.csv("pupae.csv")
pupae$Gender <- as.factor(pupae$Gender)
pupae$CO2_treatment <- as.factor(pupae$CO2_treatment)

The first question we ask is whether frass production by the pupae depends on Gender. Naively, we perform a t-test.

# Frass differs by Gender, but this is fishy?
t.test(Frass ~ Gender, data=pupae)

## 
##  Welch Two Sample t-test
## 
## data:  Frass by Gender
## t = -4.056, df = 67.94, p-value = 0.0001309
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.5499 -0.1873
## sample estimates:
## mean in group 0 mean in group 1 
##           1.670           2.038

And yes, it is different. However, we then realize an important covariate is missing, pupal weight. It turns out that this covariate explains a lot of variation in frass production, and we might want to rephrase our question to : does Gender have an effect on Frass production in addition to pupal weight?

# Always first visualize the data
palette(c("blue","red"))
with(pupae, plot(PupalWeight, Frass, pch=19, col=Gender))
legend("topleft", levels(pupae$Gender), col=palette(), pch=19)

plot of chunk unnamed-chunk-3

# Fit an Ancova type model, where Gender is a factor and PupalWeight a continuous covariate.
fit <- lm(Frass ~ Gender * PupalWeight,data=pupae)
anova(fit)

## Analysis of Variance Table
## 
## Response: Frass
##                    Df Sum Sq Mean Sq F value  Pr(>F)    
## Gender              1   2.59   2.593   22.41 1.1e-05 ***
## PupalWeight         1   3.01   3.012   26.04 2.6e-06 ***
## Gender:PupalWeight  1   0.06   0.061    0.53    0.47    
## Residuals          73   8.45   0.116                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Although the interaction is not significant, we leave it in the model for the current exercise.

The anova result shows that Gender is significant, but Gender:PupalWeight is not, which means Gender has an effect on the intercept of the relationship, but not on the slope. The intercept is obviously the Frass production when PupalWeight is zero, but why would we be interested in that? It would make more sense to test the difference at a more common pupal weight, for example, around the mean of the dataset.

This is what ‘least-square means’ do, as well as effect from the effects package. But first, we estimate this mean by hand, using predict.

# Make dataframe with values for the predictors that we want to esimate Frass at.
# This includes all these combinations (expand.grid gives all combinations of the vectors you give it).
pdata <- with(pupae, 
              expand.grid(Gender = levels(Gender), PupalWeight = mean(PupalWeight)))         

# The predict method, with se=TRUE, gives standard errors as well as estimates at mean pupal weight.
predict(fit, newdata = pdata, se = TRUE)

## $fit
##     1     2 
## 1.810 1.838 
## 
## $se.fit
##       1       2 
## 0.06654 0.07756 
## 
## $df
## [1] 73
## 
## $residual.scale
## [1] 0.3401

This shows the estimates of frass production for both Genders (fit), as well as the standard error (se). That’s pretty cool, but this does not allow a direct hypothesis test. You could of course construct confidence intervals using the degrees of freedom and standard errors shown above, and assuming a t-distribution, but we don’t have to do this by hand.

Instead let’s look at lsmeans from the lsmeans package (Warning : the lmerTest package also has an lsmeans function so be careful to use the right one!). We also use the effects package to calculate pretty much the same thing (the underlying details differ a bit).

# Least-square means ('marginal means')
library(lsmeans)
lsmeans(fit, "Gender")

## NOTE: Results may be misleading due to involvement in interactions

##  Gender lsmean      SE df lower.CL upper.CL
##  0       1.802 0.06527 73    1.672    1.932
##  1       1.828 0.07924 73    1.670    1.986
## 
## Confidence level used: 0.95

# Or adjusted effects (effects package)
library(effects)

## Loading required package: lattice
## Loading required package: grid
## Loading required package: colorspace

as.data.frame(effect("Gender", fit))

## NOTE: Gender is not a high-order term in the model

##   Gender   fit      se lower upper
## 1      0 1.802 0.06518 1.672 1.932
## 2      1 1.827 0.07936 1.669 1.985

Note we have to use as.data.frame in the above to also output the confidence intervals (this is not obvious!). It is clear that the confidence intervals overlap a lot, so Gender has clearly no effect on frass production. Contrast this again with the anova statement above, which would have led to a much different conclusion!

More interactions

We oversimplified the analysis a bit in the above, because the pupae were also subjected to CO2 and temperature treatments. We’d like to know whether those treatments had an effect on frass production, after accounting for pupal weight, and whether there might be an interaction.

First let’s look at the treatments. Recall from above that although CO2_treatment is represented by numbers (the CO2 concentration in ppm), it really is a factor.

with(pupae, tapply(Frass, list(CO2_treatment, T_treatment), mean, na.r=TRUE))

##     ambient elevated
## 280   1.954    1.480
## 400   2.122    1.912

For the purpose of illustration, I leave Gender in the model as well, although you could probably get rid of it (since we showed above it did almost nothing).

# A very full model
lmfit <- lm(Frass ~ PupalWeight * CO2_treatment * Gender * T_treatment, data=pupae)
anova(lmfit)

## Analysis of Variance Table
## 
## Response: Frass
##                                              Df Sum Sq Mean Sq F value
## PupalWeight                                   1   5.59    5.59   97.22
## CO2_treatment                                 1   1.65    1.65   28.65
## Gender                                        1   0.08    0.08    1.36
## T_treatment                                   1   1.13    1.13   19.65
## PupalWeight:CO2_treatment                     1   0.20    0.20    3.53
## PupalWeight:Gender                            1   0.06    0.06    1.06
## CO2_treatment:Gender                          1   0.35    0.35    6.01
## PupalWeight:T_treatment                       1   0.01    0.01    0.23
## CO2_treatment:T_treatment                     1   0.97    0.97   16.95
## Gender:T_treatment                            1   0.02    0.02    0.26
## PupalWeight:CO2_treatment:Gender              1   0.17    0.17    2.96
## PupalWeight:CO2_treatment:T_treatment         1   0.04    0.04    0.66
## PupalWeight:Gender:T_treatment                1   0.00    0.00    0.02
## CO2_treatment:Gender:T_treatment              1   0.01    0.01    0.18
## PupalWeight:CO2_treatment:Gender:T_treatment  1   0.33    0.33    5.70
## Residuals                                    61   3.51    0.06        
##                                               Pr(>F)    
## PupalWeight                                  3.0e-14 ***
## CO2_treatment                                1.4e-06 ***
## Gender                                       0.24748    
## T_treatment                                  3.9e-05 ***
## PupalWeight:CO2_treatment                    0.06493 .  
## PupalWeight:Gender                           0.30661    
## CO2_treatment:Gender                         0.01714 *  
## PupalWeight:T_treatment                      0.63073    
## CO2_treatment:T_treatment                    0.00012 ***
## Gender:T_treatment                           0.60884    
## PupalWeight:CO2_treatment:Gender             0.09035 .  
## PupalWeight:CO2_treatment:T_treatment        0.41993    
## PupalWeight:Gender:T_treatment               0.89626    
## CO2_treatment:Gender:T_treatment             0.67629    
## PupalWeight:CO2_treatment:Gender:T_treatment 0.02005 *  
## Residuals                                               
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

That’s a monster of an anova table, and a bunch of things are significant, but what does it all mean? I will focus on the CO2 x T interaction, which turns out to be significant. But how, and what does it mean? And of the four CO2 x T combinations, which are different from which?

# First the effects package, useful to illustrate but we don't get confidence intervals!
# These effects are calculated by keeping other covariates in the model constant,
# at the average values of those covariates. 
# multiline=TRUE allows overplotting of multiple lines.
library(effects)
plot(effect("CO2_treatment:T_treatment", lmfit), multiline=TRUE, ylim=c(0,3))

plot of chunk unnamed-chunk-8

# Perhaps nicer is the visreg package
library(visreg)
visreg(lmfit, "CO2_treatment", by="T_treatment", overlay=TRUE)

plot of chunk unnamed-chunk-8

Those results already tell us visually what’s going on, but we’d like p-values to place in our manuscript. We can do a multiple comparison using least-square means as in the first example. This time, we have to realize that all other covariates in the model will be accounted for first. This is done differently for factors and numeric variables in the model. For factors, the average response is taken (first, an estimate at the first level is taken, than at the second, and the estimates are averaged). For continuous variables, like we saw with PupalWeight above, the fitted model is evaluated around (ca.) the mean of the variable in the dataset.

It is important to know this, so we can interpret the following results accordingly.

# Calculate the least-square means for two factors
lsm <- lsmeans(lmfit, c("CO2_treatment","T_treatment"))

## NOTE: Results may be misleading due to involvement in interactions

lsm

##  CO2_treatment T_treatment lsmean      SE df lower.CL upper.CL
##  280           ambient      2.308 0.16831 61    1.972    2.645
##  400           ambient      1.860 0.07219 61    1.715    2.004
##  280           elevated     1.496 0.06136 61    1.373    1.618
##  400           elevated     1.938 0.08421 61    1.770    2.106
## 
## Results are averaged over the levels of: Gender 
## Confidence level used: 0.95

# We can now make a 'compact letter display', useful for plotting!
cld(lsm, Letters=letters)

##  CO2_treatment T_treatment lsmean      SE df lower.CL upper.CL .group
##  280           elevated     1.496 0.06136 61    1.373    1.618  a    
##  400           ambient      1.860 0.07219 61    1.715    2.004   b   
##  400           elevated     1.938 0.08421 61    1.770    2.106   b   
##  280           ambient      2.308 0.16831 61    1.972    2.645   b   
## 
## Results are averaged over the levels of: Gender 
## Confidence level used: 0.95 
## P value adjustment: tukey method for a family of 4 means 
## significance level used: alpha = 0.05

Note that in the above, we make sure to use letters rather than numbers to code the differences. Many R users seem to be unaware that letters is a built-in object containing the alphabet (and LETTERS its capitalized version).

You should check whether those difference make sense when comparing it to the visreg plot above.

Memory data (glm)

The second example is for the memory data, which was also included in the R course. Basically, old and young people were tested on how many words they could remember, when a different process was used to memorize them. First let’s take a quick look at Process in the dataset

# Raw data is tab-delimied
memory <- read.table("eysenck.txt", header=T)
plot(Words ~ Process, data=memory)

plot of chunk unnamed-chunk-10

I prefer to reorder the levels so they are in order of number of words remembered. This is fine since Process in the dataset has no particular order.

memory$Process <- with(memory, reorder(Process, Words, mean))
plot(Words ~ Process, data=memory)

plot of chunk unnamed-chunk-11

We may ask, which processes are different from which? What about the interaction with Age (which is a factor with two levels, younger and older).

First a simplified example, using only the Younger data. It is a good idea for complex data to first analyze a subset so you are confident you know what you are doing.

In this example I fit a generalized linear model (glm), mostly to show this works for lm as well as glm (and even mostly lmer as will see soon).

# assume first we only have Age=younger in the model
youngones <- subset(memory, Age == "Younger")

# For count data, the poisson distribution is more appropriate.
m0 <- glm(Words ~ Process, data=youngones, family=poisson)

# A standard Tukey test on Process (a `multiple comparison`)
# Don't forget the summary bit otherwise it prints very little info.
library(multcomp)

## Loading required package: mvtnorm
## Loading required package: survival
## Loading required package: splines
## Loading required package: TH.data

summary(glht(m0, linfct=mcp(Process = "Tukey")))

## 
##   Simultaneous Tests for General Linear Hypotheses
## 
## Multiple Comparisons of Means: Tukey Contrasts
## 
## 
## Fit: glm(formula = Words ~ Process, family = poisson, data = youngones)
## 
## Linear Hypotheses:
##                              Estimate Std. Error z value Pr(>|z|)    
## Rhyming - Counting == 0        0.1563     0.1689    0.93     0.88    
## Adjective - Counting == 0      0.8228     0.1488    5.53   <1e-04 ***
## Imagery - Counting == 0        0.9961     0.1451    6.86   <1e-04 ***
## Intentional - Counting == 0    1.0883     0.1434    7.59   <1e-04 ***
## Adjective - Rhyming == 0       0.6665     0.1411    4.72   <1e-04 ***
## Imagery - Rhyming == 0         0.8398     0.1373    6.12   <1e-04 ***
## Intentional - Rhyming == 0     0.9320     0.1354    6.88   <1e-04 ***
## Imagery - Adjective == 0       0.1733     0.1115    1.55     0.52    
## Intentional - Adjective == 0   0.2655     0.1093    2.43     0.10    
## Intentional - Imagery == 0     0.0922     0.1042    0.88     0.90    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)

This shows all we really need to know, each level of the single factor in the model (Process) is tested against all other levels. We can get a compact letter display with cld, as we did for lsmeans above.

cldm0 <- cld(glht(m0, linfct=mcp(Process = "Tukey")))
cldm0

##    Counting     Rhyming   Adjective     Imagery Intentional 
##         "a"         "a"         "b"         "b"         "b"

Although this result looks like a nice character vector, it is far from that (check str(cldm0) to see for yourself). After some digging we find the actual significance letters are buried in there, and we can use them to label a boxplot.

plot(Words ~ Process, data=youngones, ylim=c(0,30), at=1:5)
text(1:5, 27.5, cldm0$mcletters$Letters, font=2)

plot of chunk unnamed-chunk-14

Including the Age interaction

Now that we are comfortable performing a multiple comparison on the levels of a single factor, we will add another factor as an interaction.

This time, we’d like to ask whether Younger is different from Older, and whether the interaction is significant (which we would interpret as age effects on memory differ by process).

We first fit the model and inspect the anova. If you were naive, that’s where you would stop.

m1 <- glm(Words ~ Age*Process, data=memory, family=poisson)
summary(m1)

## 
## Call:
## glm(formula = Words ~ Age * Process, family = poisson, data = memory)
## 
## Deviance Residuals: 
##    Min      1Q  Median      3Q     Max  
## -2.290  -0.492  -0.199   0.562   2.377  
## 
## Coefficients:
##                               Estimate Std. Error z value Pr(>|z|)    
## (Intercept)                     1.9459     0.1195   16.28  < 2e-16 ***
## AgeYounger                     -0.0741     0.1723   -0.43  0.66703    
## ProcessRhyming                 -0.0144     0.1696   -0.08  0.93241    
## ProcessAdjective                0.4520     0.1529    2.96  0.00311 ** 
## ProcessImagery                  0.6493     0.1475    4.40  1.1e-05 ***
## ProcessIntentional              0.5390     0.1504    3.58  0.00034 ***
## AgeYounger:ProcessRhyming       0.1707     0.2394    0.71  0.47577    
## AgeYounger:ProcessAdjective     0.3708     0.2133    1.74  0.08218 .  
## AgeYounger:ProcessImagery       0.3468     0.2069    1.68  0.09378 .  
## AgeYounger:ProcessIntentional   0.5493     0.2078    2.64  0.00821 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for poisson family taken to be 1)
## 
##     Null deviance: 227.503  on 99  degrees of freedom
## Residual deviance:  60.994  on 90  degrees of freedom
## AIC: 501.3
## 
## Number of Fisher Scoring iterations: 4

library(car)

## 
## Attaching package: 'car'
## 
## The following object is masked from 'package:effects':
## 
##     Prestige

Anova(m1)

## Analysis of Deviance Table (Type II tests)
## 
## Response: Words
##             LR Chisq Df Pr(>Chisq)    
## Age             20.8  1    5.2e-06 ***
## Process        137.5  4    < 2e-16 ***
## Age:Process      8.3  4      0.082 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Note we used Anova from car, because the standard anova gives limited output. The naive conclusion here is that Age has a significant effect on memory, but since the interaction is not significant (p > 0.08), we expect that age has the same additive effect on memory regardless of the process. If we dig a little deeper we find this conclusion to be incorrect.

# Visualize!
library(visreg)
visreg(m1, "Process", by="Age", overlay=TRUE)

plot of chunk unnamed-chunk-16

This clearly shows that Age has a significant effect on memory (nr of words remembered) for some Processes, but not others. Just because the overall p-value for the interaction was not significant does not mean there are no individual differences between factor combinations!

Now let’s try to figure out differences between levels of Process when we keep the Age interaction in the model. As before, we can use lsmeans, which will average over all other predictors in the model. The next example shows visually what this means.

library(visreg)
visreg(m1, "Process", by="Age", overlay=TRUE)

# Extract the least-square means as a vector
l <- lsmeans::lsmeans(m1, "Process")

## NOTE: Results may be misleading due to involvement in interactions

lsmeans <- summary(l)$lsmean

# Plot them on top of the visreg plot (I had to guess at the appropriate X values here!)
points(seq(0.08,0.92,length=5), lsmeans, pch=4, cex=2, col="red", lwd=3)

plot of chunk unnamed-chunk-17

Note that I also make sure here to use lsmeans from the package of the same name, otherwise it might load that from lmerTest, if you happen to have that loaded.

The plot shows that the least-square means estimates are right in between the estimates for the two levels of Age, and confirms our expectation that lsmeans averages over the factor levels.

Multiple comparisons of all combinations

Next I would like to test all the combinations of Age and Process against each other. The most straightforward way to do this is to make a new variable using interaction, and treating that as a single factor variable as we have done in the above example.

# New variable, makes a factor with all combinations
memory$AgeProcess <- with(memory, interaction(Age,Process))

# Fit simple model
m2 <- lm(Words ~ AgeProcess, data=memory)

# Do Tukey test and store significance letters
gl <- glht(m2, linfct=mcp(AgeProcess="Tukey"))
summary(gl)

## 
##   Simultaneous Tests for General Linear Hypotheses
## 
## Multiple Comparisons of Means: Tukey Contrasts
## 
## 
## Fit: lm(formula = Words ~ AgeProcess, data = memory)
## 
## Linear Hypotheses:
##                                              Estimate Std. Error t value
## Younger.Counting - Older.Counting == 0          -0.50       1.27   -0.39
## Older.Rhyming - Older.Counting == 0             -0.10       1.27   -0.08
## Younger.Rhyming - Older.Counting == 0            0.60       1.27    0.47
## Older.Adjective - Older.Counting == 0            4.00       1.27    3.16
## Younger.Adjective - Older.Counting == 0          7.80       1.27    6.16
## Older.Imagery - Older.Counting == 0              6.40       1.27    5.05
## Younger.Imagery - Older.Counting == 0           10.60       1.27    8.37
## Older.Intentional - Older.Counting == 0          5.00       1.27    3.95
## Younger.Intentional - Older.Counting == 0       12.30       1.27    9.71
## Older.Rhyming - Younger.Counting == 0            0.40       1.27    0.32
## Younger.Rhyming - Younger.Counting == 0          1.10       1.27    0.87
## Older.Adjective - Younger.Counting == 0          4.50       1.27    3.55
## Younger.Adjective - Younger.Counting == 0        8.30       1.27    6.55
## Older.Imagery - Younger.Counting == 0            6.90       1.27    5.45
## Younger.Imagery - Younger.Counting == 0         11.10       1.27    8.76
## Older.Intentional - Younger.Counting == 0        5.50       1.27    4.34
## Younger.Intentional - Younger.Counting == 0     12.80       1.27   10.10
## Younger.Rhyming - Older.Rhyming == 0             0.70       1.27    0.55
## Older.Adjective - Older.Rhyming == 0             4.10       1.27    3.24
## Younger.Adjective - Older.Rhyming == 0           7.90       1.27    6.24
## Older.Imagery - Older.Rhyming == 0               6.50       1.27    5.13
## Younger.Imagery - Older.Rhyming == 0            10.70       1.27    8.45
## Older.Intentional - Older.Rhyming == 0           5.10       1.27    4.03
## Younger.Intentional - Older.Rhyming == 0        12.40       1.27    9.79
## Older.Adjective - Younger.Rhyming == 0           3.40       1.27    2.68
## Younger.Adjective - Younger.Rhyming == 0         7.20       1.27    5.68
## Older.Imagery - Younger.Rhyming == 0             5.80       1.27    4.58
## Younger.Imagery - Younger.Rhyming == 0          10.00       1.27    7.89
## Older.Intentional - Younger.Rhyming == 0         4.40       1.27    3.47
## Younger.Intentional - Younger.Rhyming == 0      11.70       1.27    9.23
## Younger.Adjective - Older.Adjective == 0         3.80       1.27    3.00
## Older.Imagery - Older.Adjective == 0             2.40       1.27    1.89
## Younger.Imagery - Older.Adjective == 0           6.60       1.27    5.21
## Older.Intentional - Older.Adjective == 0         1.00       1.27    0.79
## Younger.Intentional - Older.Adjective == 0       8.30       1.27    6.55
## Older.Imagery - Younger.Adjective == 0          -1.40       1.27   -1.11
## Younger.Imagery - Younger.Adjective == 0         2.80       1.27    2.21
## Older.Intentional - Younger.Adjective == 0      -2.80       1.27   -2.21
## Younger.Intentional - Younger.Adjective == 0     4.50       1.27    3.55
## Younger.Imagery - Older.Imagery == 0             4.20       1.27    3.32
## Older.Intentional - Older.Imagery == 0          -1.40       1.27   -1.11
## Younger.Intentional - Older.Imagery == 0         5.90       1.27    4.66
## Older.Intentional - Younger.Imagery == 0        -5.60       1.27   -4.42
## Younger.Intentional - Younger.Imagery == 0       1.70       1.27    1.34
## Younger.Intentional - Older.Intentional == 0     7.30       1.27    5.76
##                                              Pr(>|t|)    
## Younger.Counting - Older.Counting == 0         1.0000    
## Older.Rhyming - Older.Counting == 0            1.0000    
## Younger.Rhyming - Older.Counting == 0          1.0000    
## Older.Adjective - Older.Counting == 0          0.0633 .  
## Younger.Adjective - Older.Counting == 0        <0.001 ***
## Older.Imagery - Older.Counting == 0            <0.001 ***
## Younger.Imagery - Older.Counting == 0          <0.001 ***
## Older.Intentional - Older.Counting == 0        0.0057 ** 
## Younger.Intentional - Older.Counting == 0      <0.001 ***
## Older.Rhyming - Younger.Counting == 0          1.0000    
## Younger.Rhyming - Younger.Counting == 0        0.9970    
## Older.Adjective - Younger.Counting == 0        0.0205 *  
## Younger.Adjective - Younger.Counting == 0      <0.001 ***
## Older.Imagery - Younger.Counting == 0          <0.001 ***
## Younger.Imagery - Younger.Counting == 0        <0.001 ***
## Older.Intentional - Younger.Counting == 0      0.0014 ** 
## Younger.Intentional - Younger.Counting == 0    <0.001 ***
## Younger.Rhyming - Older.Rhyming == 0           0.9999    
## Older.Adjective - Older.Rhyming == 0           0.0512 .  
## Younger.Adjective - Older.Rhyming == 0         <0.001 ***
## Older.Imagery - Older.Rhyming == 0             <0.001 ***
## Younger.Imagery - Older.Rhyming == 0           <0.001 ***
## Older.Intentional - Older.Rhyming == 0         0.0043 ** 
## Younger.Intentional - Older.Rhyming == 0       <0.001 ***
## Older.Adjective - Younger.Rhyming == 0         0.1962    
## Younger.Adjective - Younger.Rhyming == 0       <0.001 ***
## Older.Imagery - Younger.Rhyming == 0           <0.001 ***
## Younger.Imagery - Younger.Rhyming == 0         <0.001 ***
## Older.Intentional - Younger.Rhyming == 0       0.0260 *  
## Younger.Intentional - Younger.Rhyming == 0     <0.001 ***
## Younger.Adjective - Older.Adjective == 0       0.0962 .  
## Older.Imagery - Older.Adjective == 0           0.6731    
## Younger.Imagery - Older.Adjective == 0         <0.001 ***
## Older.Intentional - Older.Adjective == 0       0.9986    
## Younger.Intentional - Older.Adjective == 0     <0.001 ***
## Older.Imagery - Younger.Adjective == 0         0.9830    
## Younger.Imagery - Younger.Adjective == 0       0.4578    
## Older.Intentional - Younger.Adjective == 0     0.4578    
## Younger.Intentional - Younger.Adjective == 0   0.0211 *  
## Younger.Imagery - Older.Imagery == 0           0.0409 *  
## Older.Intentional - Older.Imagery == 0         0.9830    
## Younger.Intentional - Older.Imagery == 0       <0.001 ***
## Older.Intentional - Younger.Imagery == 0       0.0012 ** 
## Younger.Intentional - Younger.Imagery == 0     0.9410    
## Younger.Intentional - Older.Intentional == 0   <0.001 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)

siglets <- cld(gl)

# Get some averages with tapply
x <- with(memory, tapply(Words,AgeProcess,mean))

# Make a barplot. Store the result of barplot into b, which contains the X values of the bars.
# We can then use that to overlay text.
par(mar=c(8,4,4,4), cex.axis=0.9)
b <- barplot(x, las=2, ylim=c(0,22), col=c("dimgrey","royalblue"))
text(as.vector(b), 20, siglets$mcletters$Letters)

plot of chunk unnamed-chunk-18

Multiple comparisons of some combinations

The previous figure is probably overkill, in practice we are not interested in comparing each and every combination of factor levels. More likely, given some research question, we want to make some comparisons but not others. We can use ‘contrasts’ to test parameter combinations. This is a bit tricky to set up, but offers the most flexibility.

We can write any linear model like,

\[Y = b_0 + b_1*X_1 + b_2*X_2 + b_3*X_3\]

Where the \(X_1\) and so on are predictors (these could be factor levels, continuous variables, or interactions).

If we want to test certain parameters against each other, we can use so-called contrasts, which are either -1, 0 or 1, and are multiplied with the coefficients. The resulting equation is then tested against zero. For example, suppose I want to test \(b_1\) against \(b_2\), I would multiply the coefficients with these contrasts:

\[0*b_0 + -1*b_1 + 1*b_2 + 0*b_3 = 0\]

If we simplify this equation, we find that we are testing,

\[b_2 - b_1 = 0\]

which is akin to testing \(b_1 = b_2\).

In the case of the memory linear model, we first want to refit the model so that each factor level is estimated by itself, rather than as the difference from the intercept (the first level of the factor). We can do that by simply removing the intercept from the model and refitting,

m3 <- lm(Words ~ AgeProcess-1, data=memory)

coef(m3)

##      AgeProcessOlder.Counting    AgeProcessYounger.Counting 
##                           7.0                           6.5 
##       AgeProcessOlder.Rhyming     AgeProcessYounger.Rhyming 
##                           6.9                           7.6 
##     AgeProcessOlder.Adjective   AgeProcessYounger.Adjective 
##                          11.0                          14.8 
##       AgeProcessOlder.Imagery     AgeProcessYounger.Imagery 
##                          13.4                          17.6 
##   AgeProcessOlder.Intentional AgeProcessYounger.Intentional 
##                          12.0                          19.3

The latter shows that we have 10 coefficients. Let’s test Younger vs. Older for each Process separately. We do this by setting up a so-called ‘contrast matrix’, which consists of a bunch of vectors of length 10 (in this case). We insert -1 and 1 for the two coefficients we want to test against each other, and keep the rest as 0.

In this case our matrix becomes:

ctr <- rbind(counting =    c(-1,1,0,0,0,0,0,0,0,0),
             rhyming =     c(0,0,-1,1,0,0,0,0,0,0),
             adjective =   c(0,0,0,0,-1,1,0,0,0,0),
             imagery =     c(0,0,0,0,0,0,-1,1,0,0),
             intentional = c(0,0,0,0,0,0,0,0,-1,1)
             )

We can now send this matrix to glht.

glres <- glht(m3, ctr)
summary(glres)

## 
##   Simultaneous Tests for General Linear Hypotheses
## 
## Fit: lm(formula = Words ~ AgeProcess - 1, data = memory)
## 
## Linear Hypotheses:
##                  Estimate Std. Error t value Pr(>|t|)    
## counting == 0       -0.50       1.27   -0.39   0.9972    
## rhyming == 0         0.70       1.27    0.55   0.9867    
## adjective == 0       3.80       1.27    3.00   0.0173 *  
## imagery == 0         4.20       1.27    3.32   0.0066 ** 
## intentional == 0     7.30       1.27    5.76   <1e-04 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## (Adjusted p values reported -- single-step method)

Here you can make sure you set up the contrast matrix correctly by looking at the Estimate column, and comparing that to the coef(m3) statement from the above, to make sure you are comparing the right coefficients.

Finally we may use this information in an enhanced barplot.

# Extract P-values
Pvals <- summary(glres)$test$pvalues

# Set up '*' for significant, '' otherwise.
sigcode <- ifelse(Pvals < 0.05, "*", "")

# Get means by Age and Process as a matrix
# We can send this to barplot, which will make grouped bars.
grmeans <- with(memory, tapply(Words, list(Age,Process),mean))

# Grouped bar plot
par(mar=c(8,4,4,4), cex.axis=0.9)
Cols <- c("dimgrey","royalblue")
b <- barplot(grmeans, beside=TRUE, las=2, ylim=c(0,25), col=Cols)
X <- apply(b, 2, mean)   # get average X value of bar pairs.
text(X, 22.5,sigcode,font=2,cex=2)
legend("topleft", levels(memory$Age), fill=Cols,title="Age")

plot of chunk unnamed-chunk-22