Multiple explanatory variables (cont'd)

M. Drew LaMar
April 27, 2016

Class announcements

Student evaluations on blackboard (http://evals.wm.edu)
Reading for Friday: https://qubeshub.org/collections/post/1233
Projects

Factorial designs using GLMs

Linear model	Other name	Example study design
*\( Y = \mu + A + B + AB \)**	Two-way, fixed- effects ANOVA	Factorial experiment
\( Y = \mu + A + b + A*b \)	Two-way, mixed- effects ANOVA	Factorial experiment
\( Y = \mu + X + A \)	Analysis of covariance (ANCOVA)	Observational study

The explanatory variables are called factors, as they represent treatments of direct interest.

\( \mathrm{A} \) and \( \mathrm{B} \) are called main effects; they represent effects of each factor alone, when averaged over the categories of the other factor.

Factorial designs using GLMs

Linear model	Other name	Example study design
*\( Y = \mu + A + B + AB \)**	Two-way, fixed- effects ANOVA	Factorial experiment
\( Y = \mu + A + b + A*b \)	Two-way, mixed- effects ANOVA	Factorial experiment
\( Y = \mu + X + A \)	Analysis of covariance (ANCOVA)	Observational study

The explanatory variables are called factors, as they represent treatments of direct interest.

\( \mathrm{A*B} \) is the interaction term.

Factorial designs using GLMs

Example 18.3: Interaction zone

Factorial designs using GLMs

Harley (2003) investigated how herbivores affect the abundance of plants living in the intertidal habitat of coastal Washington using field transplants of a red alga, Mazzaella parksii. The experiment also examined whether the effect of herbivores on the algae depended on where in the intertidal zone the plants were growing. Thirty-two study plots were established just above the low-tide mark, and another 32 plots were set at mid-height between the low- and high-tide marks. Using copper fencing, herbivores were excluded from a randomly chosen half of the plots at each height.

Factorial designs using GLMs

\[ \begin{align} \mathrm{ALGAE} = & \mathrm{CONSTANT} + \mathrm{HERBIVORY} + \mathrm{HEIGHT} +\\ & \mathrm{HERBIVORY*HEIGHT} \end{align} \]

Factorial designs using GLMs

\[ \begin{align} \mathrm{ALGAE} = & \mathrm{CONSTANT} + \mathrm{HERBIVORY} + \mathrm{HEIGHT} +\\ & \mathrm{HERBIVORY*HEIGHT} \end{align} \]

We need to examine the improvement in fit of the model to the data with and without each term (i.e. main effects and interaction effect).

Question: Does herbivory have an effect on mean algal cover?

Null Model (Type I): \[ \mathrm{ALGAE} = \mathrm{CONSTANT} \]

Alt Model: \[ \mathrm{ALGAE} = \mathrm{CONSTANT} + \mathrm{HERBIVORY} \]

Factorial designs using GLMs

\[ \begin{align} \mathrm{ALGAE} = & \mathrm{CONSTANT} + \mathrm{HERBIVORY} + \mathrm{HEIGHT} +\\ & \mathrm{HERBIVORY*HEIGHT} \end{align} \]

We need to examine the improvement in fit of the model to the data with and without each term (i.e. main effects and interaction effect).

Question: Does height have an effect on mean algal cover?

Null Model (Type I): \[ \mathrm{ALGAE} = \mathrm{CONSTANT} \]

Alt Model: \[ \mathrm{ALGAE} = \mathrm{CONSTANT} + \mathrm{HEIGHT} \]

Factorial designs using GLMs

\[ \begin{align} \mathrm{ALGAE} = & \mathrm{CONSTANT} + \mathrm{HERBIVORY} + \mathrm{HEIGHT} +\\ & \mathrm{HERBIVORY*HEIGHT} \end{align} \]

We need to examine the improvement in fit of the model to the data with and without each term (i.e. main effects and interaction effect).

Question: Does effect of herbivory on mean algal cover depend on height?

Null Model: \[ \mathrm{ALGAE} = \mathrm{CONSTANT} + \mathrm{HERBIVORY} + \mathrm{HEIGHT} \]

Alt Model: \[ \begin{align} \mathrm{ALGAE} = & \mathrm{CONSTANT} + \mathrm{HERBIVORY} + \mathrm{HEIGHT} +\\ & \mathrm{HERBIVORY*HEIGHT} \end{align} \]

Factorial designs using GLMs

All of these terms get tested at one time using the following notation in R.

algaeFullModel <- lm(sqrtArea ~ height * herbivores, data = algae)

Caution: height * herbivores refers to all terms, while height : herbivores refers to the interaction term, i.e. the following performs the same analysis:

algaeFullModel <- lm(sqrtArea ~ height + herbivores + height : herbivores, data = algae)

Factorial designs using GLMs

Let's load the data.

'data.frame':   64 obs. of  3 variables:
 $ height    : Factor w/ 2 levels "low","mid": 1 1 1 1 1 1 1 1 1 1 ...
 $ herbivores: Factor w/ 2 levels "minus","plus": 1 1 1 1 1 1 1 1 1 1 ...
 $ sqrtArea  : num  9.41 34.47 46.67 16.64 24.38 ...

Factorial designs using GLMs

anova(algaeFullModel)

Analysis of Variance Table

Response: sqrtArea
                  Df  Sum Sq Mean Sq F value   Pr(>F)   
height             1    89.0   88.97  0.3741 0.543096   
herbivores         1  1512.2 1512.18  6.3579 0.014360 * 
height:herbivores  1  2617.0 2616.96 11.0029 0.001549 **
Residuals         60 14270.5  237.84                    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Factorial designs using GLMs

algaeFullModel$coefficients

             (Intercept)                heightmid           herbivoresplus 
                32.91450                -10.43090                -22.51075 
heightmid:herbivoresplus 
                25.57809

Term	Estimate
(Intercept)	32.9145029
heightmid	-10.430905
herbivoresplus	-22.5107478
heightmid:herbivoresplus	25.5780939

Factorial designs using GLMs

Term	Estimate
(Intercept)	32.9145029
heightmid	-10.430905
herbivoresplus	-22.5107478
heightmid:herbivoresplus	25.5780939

plot of chunk unnamed-chunk-8

ANCOVA as a GLM

Linear model	Other name	Example study design
\( Y = \mu + A + B + A*B \)	Two-way, fixed- effects ANOVA	Factorial experiment
\( Y = \mu + A + b + A*b \)	Two-way, mixed- effects ANOVA	Factorial experiment
\( Y = \mu + X + A \)	Analysis of covariance (ANCOVA)	Observational study

\[ \begin{align} \mathrm{RESPONSE} = & \mathrm{CONSTANT} + \mathrm{COVARIATE} + \\ & \mathrm{TREATMENT} \end{align} \]

Used to investigate if linear regressions fit to data with two or more groups have the same slope.

For observational studies, the numerical explanatory is called a confounding variable.

ANCOVA as a GLM

Example 18.4: Mole-rat layabouts

ANCOVA as a GLM

“Mole rats are the only known mammals with distinct social castes. A single queen and a small number of males are the only reproducing individuals in a colony. Remaining individuals, called workers, gather food, defend the colony, care for the young, and maintain the burrows. Recently, it was discovered that there might be two worker castes in the Damaraland mole rat. "Frequent workers” do almost all of the work in the colony, whereas “infrequent workers” do little work except on rare occasions… Scantlebury et al. (2006) compared daily energy expenditures of wild mole rats during a dry season. Energy expenditure appears to vary with body mass in both groups, but infrequent workers are heavier than frequent workers.

ANCOVA as a GLM

Open circles: “frequent workers”
Solid circles: “infrequent workers”

Question: How different is mean energy expenditure between the two groups when adjusted for differences in body mass?

ANCOVA as a GLM

Step 0: Load data

moleRat <- read.csv(url("http://www.zoology.ubc.ca/~schluter/WhitlockSchluter/wp-content/data/chapter18/chap18e4MoleRatLayabouts.csv"))
moleRatSorted <- moleRat %>% arrange(lnMass)
str(moleRatSorted)

'data.frame':   35 obs. of  3 variables:
 $ caste   : Factor w/ 2 levels "lazy","worker": 2 2 2 2 2 2 2 2 2 2 ...
 $ lnMass  : num  3.85 3.95 3.99 3.99 4.11 ...
 $ lnEnergy: num  3.69 4.11 3.69 4.19 3.69 ...

ANCOVA as a GLM

Step 1: Test for interaction

moleRatNoInteractModel <- lm(lnEnergy ~ lnMass + caste, data = moleRatSorted)

moleRatFullModel <- lm(lnEnergy ~ lnMass * caste, data = moleRatSorted)

anova(moleRatNoInteractModel, moleRatFullModel)

Analysis of Variance Table

Model 1: lnEnergy ~ lnMass + caste
Model 2: lnEnergy ~ lnMass * caste
  Res.Df    RSS Df Sum of Sq      F Pr(>F)
1     32 2.8145                           
2     31 2.7249  1  0.089557 1.0188 0.3206

Discuss: Conclusion?

ANCOVA as a GLM

Step 2: Test for differences in castes, assuming no interaction

anova(moleRatNoInteractModel)

Analysis of Variance Table

Response: lnEnergy
          Df  Sum Sq Mean Sq F value    Pr(>F)    
lnMass     1 1.31061 1.31061 14.9013 0.0005178 ***
caste      1 0.63747 0.63747  7.2478 0.0111984 *  
Residuals 32 2.81450 0.08795                      
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Discuss: Conclusion?

ANCOVA as a GLM

plot of chunk unnamed-chunk-12

ANCOVA as a GLM

Effect size: \( \ \hat{Y}_{\mathrm{frequent}}-\hat{Y}_{\mathrm{infrequent}} = 0.39334 \)

ANCOVA as a GLM

Estimation of effect size

moleRatNoInteractModel


Call:
lm(formula = lnEnergy ~ lnMass + caste, data = moleRatSorted)

Coefficients:
(Intercept)       lnMass  casteworker  
   -0.09687      0.89282      0.39334

confint(moleRatNoInteractModel)

                  2.5 %    97.5 %
(Intercept) -2.01627751 1.8225407
lnMass       0.49961872 1.2860115
casteworker  0.09573442 0.6909503