2024-02-01

The Factorial Principle

A first introduction to Interactions

More than one EV

We have already met examples of models with more than one EV (ANCOVA is one of them). But let’s go back to designed experiments, and just categorical variables. We have met models such as

  • yield ~ treatment (fully randomised)
  • yield ~ block + treatment (randomised block)
  • yield ~ row + col + treatment (latin squares)

In all these, there is only really one EV, treatment — we are not interested in the effect of block.

But what if we are interested in two EVs?

Example

Farmers can choose between two varieties of wheat,
V1 (SupaGrow®) and V2 (YieldsRUs®).

As a farmer, you want to maximise your yield and find out

  1. Which variety will give the greater yields, V1 or V2?
  2. What is the best sowing density, out of four densities, S1S4?

How to design the experiment?

Experimental Design 1

  1. Compare sowing densities in one variety only, V1: use 16 plots, with the 4 densities randomly assigned to 4 plots each.

  2. Compare the two varieties at density S1 only: use 8 plots, with the 2 varieties randomly assigned to 4 plots each.

Assume this result:

  1. Sowing density S4 is best.
  2. Variety V2 is better than V1.
  • Can you be sure that sowing V2 at S4 will give the best yields?
  • No…

The catch with Design 1

Essentially, these are two separate experiments:

    Experiment 1              Experiment 2

        S1  S2  S3  S4            S1  S2  S3  S4
       +---+---+---+---+         +---+---+---+---+
    V1 | 4 | 4 | 4 | 4 |      V1 | 4 |   |   |   |
       +---+---+---+---+         +---+---+---+---+
    V2 |   |   |   |   |      V2 | 4 |   |   |   |
       +---+---+---+---+         +---+---+---+---+
  • Neither experiment tested V2 at S4!
  • It is quite conceivable that the two varieties respond differently to changes in sowing density!
  • Note: Total of 24 plots used. Can we make better use of the 24 plots (more efficient)? Yes.

Experimental Design 2

One single factorial experiment:

              S1  S2  S3  S4
             +---+---+---+---+
          V1 | 3 | 3 | 3 | 3 |
             +---+---+---+---+
          V2 | 3 | 3 | 3 | 3 |
             +---+---+---+---+
  • Sow both varieties at all four densities — all combinations covered: factorial design.
  • Three plots (replicates) per combination — still only 24 plots!

We can now find the best combination of sowing density and variety.

A first introduction to Interactions

In fact, in a factorial design we are asking three questions in a single experiment:

  1. How does sowing density affect yield?
  2. How does variety affect yield?
  3. Does sowing density have the same effect in both varieties?

Question 3 in statistical terminology: “Is there an interaction between sow density and variety in how they affect yield?”

We get all three answers from a single design + analysis.

Advantages of factorial design I

              S1  S2  S3  S4
             +---+---+---+---+
          V1 | 3 | 3 | 3 | 3 |
             +---+---+---+---+
          V2 | 3 | 3 | 3 | 3 |
             +---+---+---+---+

If there is an interaction (if the effect of sow density on yield depends on variety), only a factorial design can find it.

Advantages of factorial design II

              S1  S2  S3  S4
             +---+---+---+---+
          V1 | 3 | 3 | 3 | 3 |
             +---+---+---+---+
          V2 | 3 | 3 | 3 | 3 |
             +---+---+---+---+

If there is no interaction (if yield responds to sow.dens in the same way in both varieties), it is still the better design:

  • In Design 1, there were 4 replicates of sow.dens + 4 replicates of each variety.
  • In Design 2, there are effectively 6 replicates of sow.dens + 12 replicates of each variety.

Factorial designs afford hidden replication:: Each replicate (plot) is taking part in two comparisons. Because the design is orthogonal,

  • we can interpret the 3+3+3+3 plots for each variety independently of density;
  • we can interpret the 3+3 plots for each density independently of variety.

Analysis of factorial experiments

How do we best analyse our factorial wheat yield experiment?

A ‘dumb’ approach…

There are 8 combinations of sow.dens and variety, each with 3 replicates (plots). Assume we have blocked the experiment, i.e., 3 blocks of 8 plots, each block with one replicate of each combination.

In principle, we could treat each of the 8 combinations of sow.dens and variety as a different treatment (trtmt), thus

  • trtmt = 1: variety V1 at sow density S1
  • trtmt = 2: variety V2 at sow density S1
  • trtmt = 3: variety V1 at sow density S2 etc…

and then fit the LM

m.1 <- lm(yield ~ block + trtmt, Wheat)

Blocked one-way ANOVA

anova(m.1)
## Analysis of Variance Table
## 
## Response: yield
##           Df Sum Sq Mean Sq F value  Pr(>F)
## block      2 0.3937 0.19683  0.5193 0.60599
## trtmt      7 8.0776 1.15394  3.0442 0.03623
## Residuals 14 5.3069 0.37906

What does this tell us?

  • There are significant differences between the treatments = some combinations of variety and density give higher yields than others.

How do we get the 3 answers?

OK, fair enough — valid answer; but this one-way ANOVA does not answer the three separate questions we asked in our design!

So we need a model with three model terms (plus block):

  • we keep the two main effects sow.dens and variety as two separate terms in the model (not rolled into one trtmt), and
  • we include an interaction term: sow.dens:variety.
m.2 <- lm(yield ~ block + sow.dens + variety + sow.dens:variety, Wheat)

Factorial ANOVA

anova(m.2)
## Analysis of Variance Table
## 
## Response: yield
##                  Df Sum Sq Mean Sq F value  Pr(>F)
## block             2 0.3937 0.19683  0.5193 0.60599
## sow.dens          3 5.8736 1.95786  5.1650 0.01303
## variety           1 2.1474 2.14742  5.6651 0.03206
## sow.dens:variety  3 0.0566 0.01887  0.0498 0.98470
## Residuals        14 5.3069 0.37906

Factorial ANOVA - Answer 1

Df Sum Sq Mean Sq F value Pr(>F)
block 2 0.3940 0.1970 0.5190 0.6060
sow.dens 3 5.8700 1.9600 5.1600 0.0130
variety 1 2.1500 2.1500 5.6700 0.0321
sow.dens:variety 3 0.0566 0.0189 0.0498 0.9850
Residuals 14 5.3100 0.3790
  1. Does the yield of the two varietys respond differently to sow.dens? No \((P=0.985)\).

Note that we start our interpretation with the interaction, we’ll come back to this. In a nutshell, the main reason is this:

If the interaction is significant, we already know that both EVs affect yield!! — so the remaining two Qs become obsolete.

Factorial ANOVA - Answer 2

Df Sum Sq Mean Sq F value Pr(>F)
block 2 0.3940 0.1970 0.5190 0.6060
sow.dens 3 5.8700 1.9600 5.1600 0.0130
variety 1 2.1500 2.1500 5.6700 0.0321
sow.dens:variety 3 0.0566 0.0189 0.0498 0.9850
Residuals 14 5.3100 0.3790
  1. Does the yield of the two varietys respond differently to sow.dens? No \((P=0.985)\).

  2. Does variety affect yield, when differences in sow.dens have been taken into account? Yes \((P=0.0321)\).

Factorial ANOVA - Answer 3

Df Sum Sq Mean Sq F value Pr(>F)
block 2 0.3940 0.1970 0.5190 0.6060
sow.dens 3 5.8700 1.9600 5.1600 0.0130
variety 1 2.1500 2.1500 5.6700 0.0321
sow.dens:variety 3 0.0566 0.0189 0.0498 0.9850
Residuals 14 5.3100 0.3790
  1. Does the yield of the two varietys respond differently to sow.dens? No \((P=0.985)\).

  2. Does variety affect yield, when differences in sow.dens have been taken into account? Yes \((P=0.0321)\).

  3. Does sow.dens affect yield, when differences in variety have been taken into account? Yes \((P=0.013)\).

OK, so…

In this example,

  • both main effects are significant,
  • but they do not interact.

In the next lecture, we will

  • consider all possible outcome scenarios, and
  • learn how to interpret them