In the last mini-lecture, we have introduced randomised designs as a way to guard against selection biases and unconscious patterns.
— Can we do better still?
2024-01-29
In the last mini-lecture, we have introduced randomised designs as a way to guard against selection biases and unconscious patterns.
— Can we do better still?
We may have information about (pattern of) confounds:
Low water level
+-------+-------+-------+-------+
| ¦ ¦ ¦ ¦ ¦
| ¦ A ¦ A ¦ A ¦ A ¦
| ¦ ¦ ¦ ¦ ¦
G | +-------+-------+-------+-------+
r | ¦ ¦ ¦ ¦ ¦
a | ¦ B ¦ B ¦ B ¦ B ¦
d | ¦ ¦ ¦ ¦ ¦
i | +-------+-------+-------+-------+
e | ¦ ¦ ¦ ¦ ¦
n | ¦ C ¦ C ¦ C ¦ D ¦
t | ¦ ¦ ¦ ¦ ¦
| +-------+-------+-------+-------+
| ¦ ¦ ¦ ¦ ¦
| ¦ D ¦ D ¦ D ¦ D ¦
V ¦ ¦ ¦ ¦ ¦
+-------+-------+-------+-------+
High water levelE.g., gradient across plots.
This design is bad: confounds fertiliser with water level.
We could simply randomise.
But can we explicitly account for –
Yes: by
With blocking, we’re conducting the same ‘mini-experiment’ at four water levels:
Low water level
+-------+-------+-------+-------+
| ¦ ¦ ¦ ¦ ¦
| ¦ A ¦ B ¦ C ¦ D ¦ Block 1
| ¦ ¦ ¦ ¦ ¦
G | +-------+-------+-------+-------+
r | ¦ ¦ ¦ ¦ ¦
a | ¦ D ¦ A ¦ B ¦ C ¦ Block 2
d | ¦ ¦ ¦ ¦ ¦
i | +-------+-------+-------+-------+
e | ¦ ¦ ¦ ¦ ¦
n | ¦ C ¦ B ¦ A ¦ D ¦ Block 3
t | ¦ ¦ ¦ ¦ ¦
| +-------+-------+-------+-------+
| ¦ ¦ ¦ ¦ ¦
| ¦ D ¦ C ¦ A ¦ B ¦ Block 4
V ¦ ¦ ¦ ¦ ¦
+-------+-------+-------+-------+
High water levelblock fertiliser treats.fertiliser order within each block.block fertiliserblock fertiliser effects:lm(yield ~ block + fertiliser)
The Beans dataset is from a similar experiment:
yield of 6 varieties of bean;blocks of 6 plots each.We can see the difference between a fully randomised design and a randomised block design by analysing the data in two ways:
block in the analysis (=pretending it was fully randomised):lm(yield ~ bean)block in the analysis:lm(yield ~ block + bean)yield.m1 <- lm(yield ~ bean, # ignoring blocks
data = Beans,
contrasts = list(bean=contr.sum)
)
anova(yield.m1)
## Analysis of Variance Table ## ## Response: yield ## Df Sum Sq Mean Sq F value Pr(>F) ## bean 5 444.43 88.887 14.586 8.579e-06 *** ## Residuals 18 109.69 6.094 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
yield.m2 <- lm(yield ~ block + bean, # including blocks info
data = Beans,
contrasts = list(block=contr.sum, bean=contr.sum)
)
anova(yield.m2)
## Analysis of Variance Table ## ## Response: yield ## Df Sum Sq Mean Sq F value Pr(>F) ## block 3 52.90 17.632 4.6567 0.01713 * ## bean 5 444.44 88.887 23.4757 1.341e-06 *** ## Residuals 15 56.79 3.786 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Note: the small ANOVA \(P\)-value already indicates that blocks account for some of the observed variation, so blocking was worthwhile…
By taking account of variation between blocks in our model, we have
bean: \(F=14.586 \longrightarrow 23.4757\);bean:An ANOVA that factors in the blocks
block: here, 3 df; andblock: here, 20 dfblock SSQ and DoF into
treatment: here, 5 df (here, treatment is the bean variety) andError \((\epsilon)\) is the unexplained scatter around the fitted means — around the estimates / predicted values.
Aim of analysis is to separate the signal and the noise
\[\epsilon \sim \mathcal{N}(0, \sigma^2)\]
Blocks should…
block.It’s not just about counting beans on plots of land!
What to do if there are
C o l u m n b l o c k s +-------+-------+-------+-------+ ¦ ¦ ¦ ¦ ¦ ¦ A ¦ B ¦ C ¦ D ¦ ¦ ¦ ¦ ¦ ¦ R +-------+-------+-------+-------+ o ¦ ¦ ¦ ¦ ¦ w ¦ B ¦ C ¦ D ¦ A ¦ ¦ ¦ ¦ ¦ ¦ +-------+-------+-------+-------+ B ¦ ¦ ¦ ¦ ¦ l ¦ C ¦ D ¦ A ¦ B ¦ o ¦ ¦ ¦ ¦ ¦ c +-------+-------+-------+-------+ k ¦ ¦ ¦ ¦ ¦ s ¦ D ¦ A ¦ B ¦ C ¦ ¦ ¦ ¦ ¦ ¦ +-------+-------+-------+-------+
Model becomes
lm(y ~ row + column + treat)