In this exercise, we are going to fit a two-way ANOVA using the data set ‘soil’ in the provided soil.csv file. This data set contains soil phosphorus concentrations (\(mg\ kg^{-1}\)) derived from two different soil types: an alluvial soil (soil in proximity to a river experiencing regular flooding) and soil from an upland terrace (no soil nutrient replenishment through flooding). Soil samples were taken from the A-, B- and C-horizon. In statistical terms we have a continuous response variable and two categorical predictors (soil type ‘loc’ and soil horizon ‘horizon’). Because we have two predictor variables, the resulting model is called a two-way ANOVA.
emmeans
package.## First: Set your working directory (in the menu bar click on 'Session' => 'Set Working Directory' => 'Choose Directory...'. Copy working directory from the console and paste it into the RMarkdown script.)
## Read in the data set
soilp <- read.csv("soil.csv")
## Check structure and get data summary
head(soilp)
## loc horizon p depth.cm
## 1 alluvial A 345.45 0-10
## 2 alluvial A 220.64 0-10
## 3 alluvial A 224.98 0-10
## 4 alluvial A 246.72 0-10
## 5 alluvial A 286.60 0-10
## 6 alluvial A 283.31 0-10
str(soilp)
## 'data.frame': 54 obs. of 4 variables:
## $ loc : Factor w/ 2 levels "alluvial","terrace": 1 1 1 1 1 1 1 1 1 1 ...
## $ horizon : Factor w/ 3 levels "A","B","C": 1 1 1 1 1 1 1 1 1 2 ...
## $ p : num 345 221 225 247 287 ...
## $ depth.cm: Factor w/ 3 levels "0-10","20-30",..: 1 1 1 1 1 1 1 1 1 3 ...
summary(soilp)
## loc horizon p depth.cm
## alluvial:27 A:18 Min. : 36.72 0-10 :18
## terrace :27 B:18 1st Qu.:117.07 20-30 :18
## C:18 Median :156.34 Oct-20:18
## Mean :164.05
## 3rd Qu.:201.25
## Max. :345.45
## Graphical data exploration
library(ggplot2)
ggplot(data = soilp, mapping = aes(x = horizon, y = p)) + geom_boxplot() + facet_wrap(facets = ~ loc)
## Fit a linear model
m1 <- lm(p ~ loc * horizon, data = soilp)
par(mfrow = c(2, 2))
plot(m1)
summary(m1)
##
## Call:
## lm(formula = p ~ loc * horizon, data = soilp)
##
## Residuals:
## Min 1Q Median 3Q Max
## -67.208 -34.988 -0.961 27.502 91.298
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 260.49 14.56 17.892 < 2e-16 ***
## locterrace -105.55 20.59 -5.126 5.23e-06 ***
## horizonB -90.38 20.59 -4.390 6.21e-05 ***
## horizonC -115.27 20.59 -5.599 1.02e-06 ***
## locterrace:horizonB 85.06 29.12 2.921 0.0053 **
## locterrace:horizonC 64.26 29.12 2.207 0.0321 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 43.68 on 48 degrees of freedom
## Multiple R-squared: 0.572, Adjusted R-squared: 0.5274
## F-statistic: 12.83 on 5 and 48 DF, p-value: 6.159e-08
summary.aov(m1)
## Df Sum Sq Mean Sq F value Pr(>F)
## loc 1 41993 41993 22.015 2.28e-05 ***
## horizon 2 62683 31342 16.431 3.66e-06 ***
## loc:horizon 2 17697 8848 4.639 0.0144 *
## Residuals 48 91561 1908
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
pf(q = 4.639, df1 = 2, df2 = 48, lower.tail = F)
## [1] 0.01439068
library(emmeans)
slice <- emmeans(object = m1, specs = "horizon", by = "loc")
slice
## loc = alluvial:
## horizon emmean SE df lower.CL upper.CL
## A 260.4856 14.55842 48 231.21389 289.7572
## B 170.1056 14.55842 48 140.83389 199.3772
## C 145.2167 14.55842 48 115.94501 174.4883
##
## loc = terrace:
## horizon emmean SE df lower.CL upper.CL
## A 154.9389 14.55842 48 125.66723 184.2105
## B 149.6222 14.55842 48 120.35056 178.8939
## C 103.9278 14.55842 48 74.65612 133.1994
##
## Confidence level used: 0.95
pairs(slice)
## loc = alluvial:
## contrast estimate SE df t.ratio p.value
## A - B 90.380000 20.58871 48 4.390 0.0002
## A - C 115.268889 20.58871 48 5.599 <.0001
## B - C 24.888889 20.58871 48 1.209 0.4538
##
## loc = terrace:
## contrast estimate SE df t.ratio p.value
## A - B 5.316667 20.58871 48 0.258 0.9639
## A - C 51.011111 20.58871 48 2.478 0.0436
## B - C 45.694444 20.58871 48 2.219 0.0781
##
## P value adjustment: tukey method for comparing a family of 3 estimates