Multiple explanatory variables: R code for Chapter 18 examples

Note: This document was converted to R-Markdown from this page by M. Drew LaMar. You can download the R-Markdown here.

Download the R code on this page as a single file here

New methods

Hover over a function argument for a short description of its meaning. The variable names are plucked from the examples further below.

Fit a linear model for two fixed factors:

algaeFullModel <- lm(sqrtArea ~ height * herbivores, data = algae)

Residual plot:

plot(residuals(moleRatNoInteractModel) ~ fitted(moleRatNoInteractModel))

Other new methods:

Linear model for randomized block design.
Linear model for one categorical and one numeric variable (ANCOVA).
Interaction plot.
Visualize fit of null and full models to the data.
Type I and Type III sums of squares.

Figure 18.1-1 Modeling with linear regression

Compare the fits of the null and univariate regression models to data on the relationship between stability of plant biomass production and the initial number of plant species assigned to plots.
The data are from Example 17.3.

Read and inspect data.

prairie <- read.csv(url("http://www.zoology.ubc.ca/~schluter/WhitlockSchluter/wp-content/data/chapter17/chap17e3PlantDiversityAndStability.csv"))
head(prairie)

##   nSpecies biomassStability
## 1        1             7.47
## 2        1             6.74
## 3        1             6.61
## 4        1             6.40
## 5        1             5.67
## 6        1             5.26

Take the log-transformation of stability.

prairie$logStability <- log(prairie$biomassStability)
head(prairie)

##   nSpecies biomassStability logStability
## 1        1             7.47     2.010895
## 2        1             6.74     1.908060
## 3        1             6.61     1.888584
## 4        1             6.40     1.856298
## 5        1             5.67     1.735189
## 6        1             5.26     1.660131

Fit the null model to the data, which in simple linear regression is a line of 0 slope.

prairieNullModel <- lm(logStability ~ 1, data = prairie)

Fit the full model, which includes the treatment variable.

prairieRegression <- lm(logStability ~ nSpecies, data = prairie)

Scatter plot to compare models visually.

plot(logStability ~ nSpecies, data = prairie, bty = "l", col="firebrick", pch = 1, las = 1, cex = 1.5, xlim = c(0,16), xaxp = c(0,16,8), xlab = "Species number treatment", ylab = "Log-transformed ecosystem stability")
abline(prairieNullModel, lty = 2)
abline(prairieRegression)

The F-test of improvement in fit of the full (regression) model.

anova(prairieRegression)

## Analysis of Variance Table
## 
## Response: logStability
##            Df  Sum Sq Mean Sq F value    Pr(>F)    
## nSpecies    1  5.5169  5.5169  45.455 2.733e-10 ***
## Residuals 159 19.2980  0.1214                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Figure 18.1-2. Generalizing linear regression

Compare the fits of the null and single-factor ANOVA model to data on phase shift in the circadian rhythm of melatonin production in participants given alternative light treatments.
The data are from Example 15.1.

Read and inspect data.

circadian <- read.csv(url("http://www.zoology.ubc.ca/~schluter/WhitlockSchluter/wp-content/data/chapter15/chap15e1KneesWhoSayNight.csv"))

Set the order of groups for tables and graphs.

circadian$treatment <- factor(circadian$treatment, levels = c("control", "knee","eyes"))

Fit the null model to the data. In single-factor ANOVA, this fits a constant (grand mean) to all groups.

circadianNullModel <- lm(shift ~ 1, data = circadian)

Fit the full model to the data, which includes the treatment effect.

circadianAnova <- lm(shift ~ treatment, data = circadian)

Strip chart to compare models visually. The dashed line is the null model. The value adjustAmount is used to control the width of the horizontal line segments.

par(bty="l")
adjustAmount <- 0.15
stripchart(shift ~ treatment, data = circadian, method = "jitter", vertical = TRUE, las = 1, pch = 1, xlab = "Light treatment", ylab = "Shift in circadian rhythm (h)", col = "firebrick", cex = 1.2)

xpts <- as.numeric(circadian$treatment)
ypts <- predict(circadianNullModel)
segments(xpts - adjustAmount, ypts, xpts + adjustAmount, ypts, lty = 2)

xpts <- as.numeric(circadian$treatment)
ypts <- predict(circadianAnova)
segments(xpts - adjustAmount, ypts, xpts + adjustAmount, ypts, col = "firebrick")

\(F\)-test of improvement in fit of the full model.

anova(circadianAnova)

## Analysis of Variance Table
## 
## Response: shift
##           Df Sum Sq Mean Sq F value   Pr(>F)   
## treatment  2 7.2245  3.6122  7.2894 0.004472 **
## Residuals 19 9.4153  0.4955                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Example 18.2. Zooplankton depredation

Analyze data from a randomized block experiment, which measured the effects of fish abundance (the factor of interest) on zooplankton diversity. Treatments were repeated at 5 locations in a lake (blocks).

Read and inspect data.

zooplankton <- read.csv(url("http://www.zoology.ubc.ca/~schluter/WhitlockSchluter/wp-content/data/chapter18/chap18e2ZooplanktonDepredation.csv"))
head(zooplankton)

##   treatment diversity block
## 1   control       4.1     1
## 2       low       2.2     1
## 3      high       1.3     1
## 4   control       3.2     2
## 5       low       2.4     2
## 6      high       2.0     2

Set the order of groups in tables and plots.

zooplankton$treatment <- factor(zooplankton$treatment, levels = c("control","low","high"))

Table of the data (Table 18.2-1). One measurement is available from each combination of treatment and block

tapply(zooplankton$diversity, list(Treatment = zooplankton$treatment,  Location = zooplankton$block), unique)

##          Location
## Treatment   1   2   3   4   5
##   control 4.1 3.2 3.0 2.3 2.5
##   low     2.2 2.4 1.5 1.3 2.6
##   high    1.3 2.0 1.0 1.0 1.6

A blocking variable is typically analyzed as a random effect. We need to load the nlme package.

library(nlme)

Fit the null model, which includes block but leaves out the treatment variable.

zoopNullModel <- lme(diversity ~ 1, random = ~ 1| block, data = zooplankton)

Fit the full model, which includes treatment.

zoopRBModel <- lme(diversity ~ treatment, random = ~ 1| block, data = zooplankton)

Visualize model fits, beginning with the null model. The result here differs from that in the book, which shows block means. Lines are coincident when we analyze the data using lme, which estimates hardly any variance among the block means.

interaction.plot(zooplankton$treatment, zooplankton$block, response = predict(zoopNullModel), ylim = range(zooplankton$diversity), trace.label = "Block", las = 1)

points(zooplankton$treatment, zooplankton$diversity, pch = as.numeric(zooplankton$block))

Visualize model fits, continuing with the full model. With the treatment term included, lme estimates that there is indeed variation among block means.

interaction.plot(zooplankton$treatment, zooplankton$block, response = predict(zoopRBModel), ylim = range(zooplankton$diversity), trace.label = "Block", las = 1)

points(zooplankton$treatment, zooplankton$diversity, pch = as.numeric(zooplankton$block))

\(F\)-test of improvement in fit of the full model. This represents the test of treatment effect. Notice that R will not test the random (block) effect of an lme model.

anova(zoopRBModel)

##             numDF denDF   F-value p-value
## (Intercept)     1     8 116.69518  <.0001
## treatment       2     8  16.36595  0.0015

Example 18.3 Intertidal interaction zone

Analyze data from a factorial experiment investigating the effects of herbivore presence, height above low tide, and the interaction between these factors, on abundance of a red intertidal alga using two-factor ANOVA.

Read and inspect the data.

algae <- read.csv(url("http://www.zoology.ubc.ca/~schluter/WhitlockSchluter/wp-content/data/chapter18/chap18e3IntertidalAlgae.csv"))
head(algae)

##   height herbivores  sqrtArea
## 1    low      minus  9.405573
## 2    low      minus 34.467736
## 3    low      minus 46.673485
## 4    low      minus 16.642139
## 5    low      minus 24.377498
## 6    low      minus 38.350604

Fit the null model having both main effects but no interaction term. Note that we use lm because both factors are fixed effects.

algaeNoInteractModel <- lm(sqrtArea ~ height + herbivores, data = algae)

Fit the full model, with interaction term included.

algaeFullModel <- lm(sqrtArea ~ height * herbivores, data = algae)

Visualize the model fits, beginning with the no-interaction model.

interaction.plot(algae$herbivores, algae$height, response = predict(algaeNoInteractModel), ylim = range(algae$sqrtArea), trace.label = "Height", las = 1, ylab = "Square root surface area (cm)", xlab = "Herbivore treatment")

adjustAmount = 0.05

points(sqrtArea ~ c(jitter(as.numeric(herbivores), factor = 0.2) + adjustAmount), data = subset(algae, height == "low"))

points(sqrtArea ~ c(jitter(as.numeric(herbivores), factor = 0.2) - adjustAmount), data = subset(algae, height == "mid"), pch = 16)

Visualize the model fits, continuing with the full model including the interaction term.

interaction.plot(algae$herbivores, algae$height, response = predict(algaeFullModel), ylim = range(algae$sqrtArea), trace.label = "Height", las = 1, ylab = "Square root surface area (cm)", xlab = "Herbivore treatment")

adjustAmount = 0.05

points(sqrtArea ~ c(jitter(as.numeric(herbivores), factor = 0.2) + adjustAmount), data = subset(algae, height == "low"))

points(sqrtArea ~ c(jitter(as.numeric(herbivores), factor = 0.2) - adjustAmount), data = subset(algae, height == "mid"), pch = 16)

Test the improvement in fit of the model including the interaction term.

anova(algaeNoInteractModel, algaeFullModel)

## Analysis of Variance Table
## 
## Model 1: sqrtArea ~ height + herbivores
## Model 2: sqrtArea ~ height * herbivores
##   Res.Df   RSS Df Sum of Sq      F   Pr(>F)   
## 1     61 16888                                
## 2     60 14270  1      2617 11.003 0.001549 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Test all terms in the model in a single ANOVA table. Most commonly, this is done using either “Type III” sums of squares (see footnote 5 on p 618 of the book) or “Type I” sums of squares (which is the default in R). In the present example the two methods give the same answer, because the design is completely balanced, but this will not generally happen when the design is not balanced.

Here is how to test all model terms using “Type III” sums of squares. We need to include a contrasts argument for the two categorical variables in the lm command. Then we need to load the car package and use its Anova command. Note that “A” is in upper case in Anova - a very subtle difference.

algaeFullModelTypeIII <- lm(sqrtArea ~ height * herbivores, data = algae, contrasts = list(height = contr.sum, herbivores = contr.sum))

library(car)

Anova(algaeFullModelTypeIII, type = "III") # note "A" in Anova is capitalized

## Anova Table (Type III tests)
## 
## Response: sqrtArea
##                   Sum Sq Df  F value    Pr(>F)    
## (Intercept)        33381  1 140.3509 < 2.2e-16 ***
## height                89  1   0.3741  0.543096    
## herbivores          1512  1   6.3579  0.014360 *  
## height:herbivores   2617  1  11.0029  0.001549 ** 
## Residuals          14271 60                       
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Here is how we test all model terms using “Type I” (sequential) sums of squares. Note that “a” is in lower case in anova.

anova(algaeFullModel)

## Analysis of Variance Table
## 
## Response: sqrtArea
##                   Df  Sum Sq Mean Sq F value   Pr(>F)   
## height             1    89.0   88.97  0.3741 0.543096   
## herbivores         1  1512.2 1512.18  6.3579 0.014360 * 
## height:herbivores  1  2617.0 2616.96 11.0029 0.001549 **
## Residuals         60 14270.5  237.84                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

A residual plot from the full model.

plot( residuals(algaeFullModel) ~ fitted(algaeFullModel) )
abline(0,0)

A normal quantile plot of the residuals.

qqnorm(residuals(algaeNoInteractModel), pch = 16, col = "firebrick", las = 1, ylab = "Residuals", xlab = "Normal quantile", main = "")

Example 18.4 Mole-rat layabouts

Analyze a factor while adjusting for a covariate, comparing energy expenditure of two castes of naked mole-rat while adjusting for differences in body mass using analysis of covariance ANCOVA.

Read and inspect the data.

moleRat <- read.csv(url("http://www.zoology.ubc.ca/~schluter/WhitlockSchluter/wp-content/data/chapter18/chap18e4MoleRatLayabouts.csv"))
head(moleRat)

##    caste   lnMass lnEnergy
## 1 worker 3.850148 3.688879
## 2 worker 3.988984 3.688879
## 3 worker 4.110874 3.688879
## 4 worker 4.174387 3.663562
## 5 worker 4.248495 3.871201
## 6 worker 4.262680 3.850148

We are going to sort the data according to the value of the ln of body mass. This simplifies graphing of the model fits. The graph commands below assume that the data are sorted in this way.

moleRatSorted <- moleRat[ order(moleRat$lnMass), ]

Scatter plot of the data.

plot(lnEnergy ~ lnMass, data = moleRat, type = "n", las = 1, bty = "l")

points(lnEnergy ~ lnMass, data = subset(moleRatSorted, caste == "worker"), pch = 1, col = "firebrick")
points(lnEnergy ~ lnMass, data = subset(moleRatSorted, caste == "lazy"), pch = 16, col = "firebrick")

Fit models to the data, beginning with the model lacking an interaction term. Use lm because caste and mass are fixed effects. Save the predicted values in the data frame.

moleRatNoInteractModel <- lm(lnEnergy ~ lnMass + caste, data = moleRatSorted)
moleRatSorted$predictedNoInteract <- predict(moleRatNoInteractModel)

Fit the full model, which includes the interaction term. Again, save the predicted values in the data frame.

moleRatFullModel <- lm(lnEnergy ~ lnMass * caste, data = moleRatSorted)
moleRatSorted$predictedInteract <- predict(moleRatFullModel)

Visualize the model fits, beginning with the fit of the no-interaction model. Redraw the scatter plot (see commands above), if necessary, before issuing the following commands.

# BEGIN: Redrawing scatter plot
plot(lnEnergy ~ lnMass, data = moleRat, type = "n", las = 1, bty = "l")

points(lnEnergy ~ lnMass, data = subset(moleRatSorted, caste == "worker"), pch = 1, col = "firebrick")
points(lnEnergy ~ lnMass, data = subset(moleRatSorted, caste == "lazy"), pch = 16, col = "firebrick")
# END: Redrawing scatter plot

lines(predictedNoInteract ~ lnMass, data = subset(moleRatSorted, caste == "worker"), lwd = 1.5)
lines(predictedNoInteract ~ lnMass, data = subset(moleRatSorted, caste == "lazy"), lwd = 1.5)

Visualize the fit of the full model, including the interaction term. Redraw the scatter plot, if necessary, before issuing the following commands.

# BEGIN: Redrawing scatter plot
plot(lnEnergy ~ lnMass, data = moleRat, type = "n", las = 1, bty = "l")

points(lnEnergy ~ lnMass, data = subset(moleRatSorted, caste == "worker"), pch = 1, col = "firebrick")
points(lnEnergy ~ lnMass, data = subset(moleRatSorted, caste == "lazy"), pch = 16, col = "firebrick")
# END: Redrawing scatter plot

lines(predictedInteract ~ lnMass, data = subset(moleRatSorted, caste == "worker"), lwd = 1.5, lty = 2)
lines(predictedInteract ~ lnMass, data = subset(moleRatSorted, caste == "lazy"), lwd = 1.5, lty = 2)

Test the improvement in fit of the full model, including the interaction term. This is a test of the interaction term only.

anova(moleRatNoInteractModel, moleRatFullModel)

## Analysis of Variance Table
## 
## Model 1: lnEnergy ~ lnMass + caste
## Model 2: lnEnergy ~ lnMass * caste
##   Res.Df    RSS Df Sum of Sq      F Pr(>F)
## 1     32 2.8145                           
## 2     31 2.7249  1  0.089557 1.0188 0.3206

Test for differences in ln body mass between castes, assuming that no interaction term is present in the mole rat population (i.e., assuming that the two castes hve equal slopes). Most commonly this is done using either “Type III” sums of squares (see footnote 5 on p 618 of the book) or “Type I” sums of squares (the default in R). The two methods do not give identical answers here because the design is not balanced (in a balanced design, each value of the x-variable would have the same number of y-observations from each group).

Test using “Type III” sums of squares. We need to include a contrasts argument for the categorical variable in the lm command. Then we need to load the car package and use its Anova command. Note that “A” is in upper case in Anova().

moleRatNoInteractModelTypeIII <- lm(lnEnergy ~ lnMass + caste, data = moleRat, contrasts = list(caste = contr.sum))

library(car)
Anova(moleRatNoInteractModelTypeIII, type = "III") # note "A" in Anova is capitalized

## Anova Table (Type III tests)
## 
## Response: lnEnergy
##              Sum Sq Df F value    Pr(>F)    
## (Intercept) 0.00111  1  0.0126    0.9112    
## lnMass      1.88152  1 21.3923 5.887e-05 ***
## caste       0.63747  1  7.2478    0.0112 *  
## Residuals   2.81450 32                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Test using “Type I” (sequential) sums of squares. Make sure that the covariate (lnMass) comes before the factor (caste) in the lm formula, as shown. Note that “a” is in lower case in anova.

moleRatNoInteractModel <- lm(lnEnergy ~ lnMass + caste, data = moleRat)
anova(moleRatNoInteractModel)

## Analysis of Variance Table
## 
## Response: lnEnergy
##           Df  Sum Sq Mean Sq F value    Pr(>F)    
## lnMass     1 1.31061 1.31061 14.9013 0.0005178 ***
## caste      1 0.63747 0.63747  7.2478 0.0111984 *  
## Residuals 32 2.81450 0.08795                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual plot from the linear model.

plot( residuals(moleRatNoInteractModel) ~ fitted(moleRatNoInteractModel) )
abline(0,0)

Normal quantile plot of residuals.

qqnorm(residuals(moleRatNoInteractModel), pch = 16, col = "firebrick", las = 1, ylab = "Residuals", xlab = "Normal quantile", main = "")

Multiple explanatory variables: R code for Chapter 18 examples

Michael Whitlock and Dolph Schluter

New methods

Figure 18.1-1 Modeling with linear regression

Figure 18.1-2. Generalizing linear regression

Example 18.2. Zooplankton depredation

Example 18.3 Intertidal interaction zone

Example 18.4 Mole-rat layabouts