2024-01-29
In a traditional framework, one would have
yield ~ fertiliseryield ~ block + fertilisersq.blooms ~ water + shadeb.weight ~ av.foodb.weight ~ av.food + g.size + areaAnd what if one categorical EV and one continuous EV?
LM brings these all under one framework.
This example is from Crawley (2015), Statistics: an introduction using R. Wiley.
How does grazing affect seed production in scarlet gilia, Ipomopsis aggregata?
grazing is a two-level factor: grazed or ungrazed (protected by fences);root);seed.wt) at the end of the growing season.Ipomopsis <- read.csv(file = "data/Ipomopsis.csv", stringsAsFactors=T) names(Ipomopsis) # check variable names
## [1] "root" "seed.wt" "grazing"
levels(Ipomopsis$grazing) # check names of `grazing` levels
## [1] "Grazed" "Ungrazed"
# set "Ungrazed" as reference level, and set sum contrasts:
Ipomopsis <- within(Ipomopsis, {
grazing <- factor(grazing, levels = c("Ungrazed", "Grazed"))
contrasts(grazing) <- contr.sum
})
m1 <- lm(seed.wt ~ root, Ipomopsis) # LM for regression
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | -41.28612 | 10.72259 | -3.85039 | 0.00044 |
| root | 14.02235 | 1.46317 | 9.58354 | 0.00000 |
m2 <- lm(seed.wt ~ grazing, Ipomopsis)
| Estimate | Std. Error | t value | Pr(>|t|) | |
|---|---|---|---|---|
| (Intercept) | 59.4105 | 3.70219 | 16.04741 | 0.00000 |
| grazing1 | -8.5300 | 3.70219 | -2.30404 | 0.02678 |
This seems very strange… can this be right?
grazing on seed.wt.root size on seed.wt.Yo, I’ll tell you what I want, what I really, really want So tell me what you want, what you really, really want I’ll tell you what I want, what I really, really want So tell me what you want, what you really, really want I wanna, (ha) I wanna, (ha) I wanna, (ha) I wanna, (ha) I wanna really, really, really wanna zigazig AH-N-COVA
Spice Girls, Wannabe (1996)
We want to account for both EVs in a single model:
root) on seed.wt when comparing ungrazed and grazedgrazing groups, ungrazed and grazed.root is the “covariate” or “nuissance variable”: similar to blocking!LM framework has us covered again:
lm(seed.wt ~ root + grazing)
We get an ANCOVA simply by running anova() on the lm() fit.
m3 <- lm(seed.wt ~ root + grazing, Ipomopsis) anova(m3)
| Df | Sum Sq | Mean Sq | F value | Pr(>F) | |
|---|---|---|---|---|---|
| root | 1 | 16795.0 | 16795.0 | 368.9 | 0 |
| grazing | 1 | 5264.4 | 5264.4 | 115.6 | 0 |
| Residuals | 37 | 1684.5 | 45.5 |
m3.b <- lm(seed.wt ~ grazing + root, Ipomopsis) anova(m3.b)
| Df | Sum Sq | Mean Sq | F value | Pr(>F) | |
|---|---|---|---|---|---|
| grazing | 1 | 2910.4 | 2910.4 | 63.9 | 0 |
| root | 1 | 19148.9 | 19148.9 | 420.6 | 0 |
| Residuals | 37 | 1684.5 | 45.5 |
Use adjusted SSQ.
summary(m3)
## ## Call: ## lm(formula = seed.wt ~ root + grazing, data = Ipomopsis) ## ## Residuals: ## Min 1Q Median 3Q Max ## -17.1920 -2.8224 0.3223 3.9144 17.3290 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) -109.778 8.318 -13.20 1.45e-15 ## root 23.560 1.149 20.51 < 2e-16 ## grazing1 18.052 1.679 10.75 6.11e-13 ## ## Residual standard error: 6.747 on 37 degrees of freedom ## Multiple R-squared: 0.9291, Adjusted R-squared: 0.9252 ## F-statistic: 242.3 on 2 and 37 DF, p-value: < 2.2e-16
Remember that parameter estimates are independent of variable order in model, so m3 and m3.b give the same.
| Coef | In R output | Estimate | Std. Error | t value | Pr(>|t|) |
|---|---|---|---|---|---|
| \(\mu\) | (Intercept) | -110.0 | 8.32 | -13.2 | 0 |
| \(\alpha\) | root | 23.6 | 1.15 | 20.5 | 0 |
| \(\beta\) | grazing1 | 18.1 | 1.68 | 10.8 | 0 |
\[y = \mu + \alpha\times\mathtt{root} + \begin{bmatrix} \mathtt{grazing} & \mathrm{Deviation} \\ 1 & \beta \\ 2 & -\beta \end{bmatrix} + \epsilon\]
seed.wt in ungrazed and grazed plants (grand mean), adjusted for a root diameter of zero (!) — Whaaat? (Note the negative value: fictive…)seed.wt per unit change in root diameter (in either group).seed.wt in ungrazed \((+)\) and grazed \((-)\) plants from the grand mean, for any given root-stock diameter. Can we get a more informative value for \(\mu\) (instead of mean for a fictive root stock diameter of zero…)? — Yes:
if we subtract mean(root) from root before fitting the model, \(\mu\) will be the mean seed.wt for an average-sized root! root.c now gives deviation from average root size.
This is known as centering a continuous EV on its mean (“means-centering”):
Ipomopsis <- within(Ipomopsis, {
root.c <- root - mean(root) # centered version of `root`
})
m3.c <- update(m3, .~ root.c + grazing) # Re-fit the model
| Coef | In R output | Estimate | Std. Error | t value | Pr(>|t|) |
|---|---|---|---|---|---|
| \(\mu\) | (Intercept) | 59.4 | 1.07 | 55.7 | 0 |
| \(\alpha\) | root.c | 23.6 | 1.15 | 20.5 | 0 |
| \(\beta\) | grazing1 | 18.1 | 1.68 | 10.8 | 0 |
\[y = \mu + \alpha\times\mathtt{root.c} + \begin{bmatrix} \mathtt{grazing} & \mathrm{Deviation} \\ 1 & \beta \\ 2 & -\beta \end{bmatrix} + \epsilon\]
seed.wt in grazed and in ungrazed plants (grand mean), for plants that have average root diameter.seed.wt per unit change in root diameter.seed.wt in ungrazed \((+)\) and grazed \((-)\) plants from the grand mean, for any given root-stock diameter.\[y = \mu + \alpha\times\mathtt{root.c} + \beta\times\mathtt{grazing} + \epsilon\quad \mathrm{with}\quad \mathtt{grazing} = \begin{cases} +1&\mathrm{if\ ungrazed}\\ -1&\mathrm{if\ grazed}\end{cases}\]
What is the predicted seet weight in ungrazed and grazed Ipomopsis with 6.5 cm roothead diameter?
root, so we need to express rootstock diameter relative to the mean:( my.x <- 6.5 - mean(Ipomopsis$root) )
## [1] -0.68115
The parallel lines look reasonable enough here.
This is how we have specified the model: there is only one regression slope in the model, so the lines cannot be but parallel.
Next week we will relax this assumption:
The effect of (ignoring) multivariable effects are hard to predict! We will look at some scenarios in the next session.