2024-01-29

Blocking

Controlling for known confounds

In the last mini-lecture, we have introduced randomised designs as a way to guard against selection biases and unconscious patterns.
— Can we do better still?

Can we do better still?

We may have information about (pattern of) confounds:

Low water level
      +-------+-------+-------+-------+
  |   ¦       ¦       ¦       ¦       ¦
  |   ¦   A   ¦   A   ¦   A   ¦   A   ¦
  |   ¦       ¦       ¦       ¦       ¦
G |   +-------+-------+-------+-------+
r |   ¦       ¦       ¦       ¦       ¦
a |   ¦   B   ¦   B   ¦   B   ¦   B   ¦
d |   ¦       ¦       ¦       ¦       ¦
i |   +-------+-------+-------+-------+
e |   ¦       ¦       ¦       ¦       ¦
n |   ¦   C   ¦   C   ¦   C   ¦   D   ¦
t |   ¦       ¦       ¦       ¦       ¦
  |   +-------+-------+-------+-------+
  |   ¦       ¦       ¦       ¦       ¦
  |   ¦   D   ¦   D   ¦   D   ¦   D   ¦
  V   ¦       ¦       ¦       ¦       ¦
      +-------+-------+-------+-------+
High water level
  • E.g., gradient across plots.

  • This design is bad: confounds fertiliser with water level.

  • We could simply randomise.

  • But can we explicitly account for – model – the effect of water level?

Yes: by blocking.

Blocking for water level

With blocking, we’re conducting the same ‘mini-experiment’ at four water levels:

Low water level
      +-------+-------+-------+-------+
  |   ¦       ¦       ¦       ¦       ¦
  |   ¦   A   ¦   B   ¦   C   ¦   D   ¦ Block 1
  |   ¦       ¦       ¦       ¦       ¦
G |   +-------+-------+-------+-------+
r |   ¦       ¦       ¦       ¦       ¦
a |   ¦   D   ¦   A   ¦   B   ¦   C   ¦ Block 2
d |   ¦       ¦       ¦       ¦       ¦
i |   +-------+-------+-------+-------+
e |   ¦       ¦       ¦       ¦       ¦
n |   ¦   C   ¦   B   ¦   A   ¦   D   ¦ Block 3
t |   ¦       ¦       ¦       ¦       ¦
  |   +-------+-------+-------+-------+
  |   ¦       ¦       ¦       ¦       ¦
  |   ¦   D   ¦   C   ¦   A   ¦   B   ¦ Block 4
  V   ¦       ¦       ¦       ¦       ¦
      +-------+-------+-------+-------+
High water level
  • Eack block is a ‘mini-experiment’ with all four fertiliser treats.
  • Randomise fertiliser order within each block.
  • For each plot, we are recording block and fertiliser
  • Analyse all blocks in one LM, to estimate block and fertiliser effects:

lm(yield ~ block + fertiliser)

Bean counting

The advantages of blocked design

The Beans dataset is from a similar experiment:

  • Comparing the yield of 6 varieties of bean;
  • 24 plots, in 4 blocks of 6 plots each.

We can see the difference between a fully randomised design and a randomised block design by analysing the data in two ways:

  1. Ignoring block in the analysis (=pretending it was fully randomised):
    lm(yield ~ bean)
    Are there differences in yield between the bean varieties?
  2. Including block in the analysis:
    lm(yield ~ block + bean)
    Are there differences in yield between the bean varieties, once any differences between the blocks have been accounted for?

Bean counting

Analysed as a fully randomised design

yield.m1 <- lm(yield ~ bean,  # ignoring blocks
               data = Beans,
               contrasts = list(bean=contr.sum)
               )
anova(yield.m1)
## Analysis of Variance Table
## 
## Response: yield
##           Df Sum Sq Mean Sq F value    Pr(>F)    
## bean       5 444.43  88.887  14.586 8.579e-06 ***
## Residuals 18 109.69   6.094                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Bean counting

Analysed as a randomised block design

yield.m2 <- lm(yield ~ block + bean,  # including blocks info
               data = Beans,
               contrasts = list(block=contr.sum, bean=contr.sum)
               )
anova(yield.m2)
## Analysis of Variance Table
## 
## Response: yield
##           Df Sum Sq Mean Sq F value    Pr(>F)    
## block      3  52.90  17.632  4.6567   0.01713 *  
## bean       5 444.44  88.887 23.4757 1.341e-06 ***
## Residuals 15  56.79   3.786                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The gains from blocking

Note: the small ANOVA \(P\)-value already indicates that blocks account for some of the observed variation, so blocking was worthwhile…

By taking account of variation between blocks in our model, we have

  • reduced the Error MSQ, \(6.094 \longrightarrow 3.786\),
    despite lower Error DoF \(18 \longrightarrow 15\);
  • increased F for bean: \(F=14.586 \longrightarrow 23.4757\);
  • increased the strength of our evidence for an effect of bean:
    \(P=8.579\times 10^{-6} \longrightarrow 1.341\times 10^{-6}\).

Partitioning SSQs in blocked design

An ANOVA that factors in the blocks

  • partitions the total SSQ and DoF into
    • between-block: here, 3 df; and
    • within-block: here, 20 df
  • further partitions the within-block SSQ and DoF into
    • treatment: here, 5 df (here, treatment is the bean variety) and
    • error: 15 df

Another perspective

Error \((\epsilon)\) is the unexplained scatter around the fitted means — around the estimates / predicted values.

Aim of analysis is to separate the signal and the noise

  • (differences in) fitted means estimate the signal in the data
  • error variance \(\sigma^2\) estimates the noise in the data

\[\epsilon \sim \mathcal{N}(0, \sigma^2)\]

  • Blocking allows to ‘push’ some of the variance from the error to known confounds, and thus gives us a clearer picture of the signal.

Blocked designs

Blocks should…

  1. represent a factor known/believed to affect the response.
  2. be internally as homogeneous as possible — and therefore as different from one another as possible: maximise SSQ that can be accounted for by block.

Other applications

It’s not just about counting beans on plots of land!

  • Changes over time: if you cannot do the entire experiment in one session / day / week… time becomes blocking variable.
  • Differences between batches, litters

Latin Squares

What to do if there are two possible confounds? — Can block for both!

    C o l u m n   b l o c k s
+-------+-------+-------+-------+
¦       ¦       ¦       ¦       ¦
¦   A   ¦   B   ¦   C   ¦   D   ¦
¦       ¦       ¦       ¦       ¦  R
+-------+-------+-------+-------+  o
¦       ¦       ¦       ¦       ¦  w
¦   B   ¦   C   ¦   D   ¦   A   ¦ 
¦       ¦       ¦       ¦       ¦ 
+-------+-------+-------+-------+  B
¦       ¦       ¦       ¦       ¦  l
¦   C   ¦   D   ¦   A   ¦   B   ¦  o
¦       ¦       ¦       ¦       ¦  c
+-------+-------+-------+-------+  k
¦       ¦       ¦       ¦       ¦  s
¦   D   ¦   A   ¦   B   ¦   C   ¦ 
¦       ¦       ¦       ¦       ¦
+-------+-------+-------+-------+
  • Need not be plots on a field
  • N treatments = N rows = N columns
  • Fails if gradient is diagonal
    (i.e., row:column interaction)

Model becomes

lm(y ~  row + column + treat)