Practical 4 - AESC40180 25/26

Author

Virginia Morera-Pujol

Introduction

The goal of this practical is for you to run a two-way ANOVA, plot the data, and interpret the results. If you accomplish this easily, you can extend it and run an ANOVA with 3 factors, and check what added complications arise, if any. The data for this practical comes from an experiment in which investigators examined how primary diet (barley, oats, wheat) and diet supplement (control, agrimore, supergain, supersupp) affected the growth (gain) of chickens. The data is contained in the file growth.csv.

Housekeeping

Here you will clean your environment, set the working directory if necessary, and load the data needed for the analysis (growth.csv), and the package tidyverse which contains everything you need for data wrangling, display, and analysis. I will call my dataset “growth” but you can call it whatever you want.

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.2     ✔ tibble    3.2.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.1
✔ purrr     1.0.4     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Rows: 48 Columns: 4
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (3): supplement, diet, sex
dbl (1): gain

ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Data cleaning

There’s not much to do with this dataset as it is already formatted for the ANOVA analysis (in the “long” format, remember last week), however, you do need to make sure all categorical variables are factors. You can check with the function str(), and all variables that are of type chr need to be converted. Remember, we’ve done this before by using the function mutate() and as.factor but I’ve given you a hint below as well.

growth <- growth %>% 
  mutate(supplement = as.factor(supplement)) %>% 
  ...

1-way ANOVAS

I will not provide the code for this, as you have done all (or most) of it in last week’s practical. We first will perform two separate 1-way ANOVAS, to check the effect of both supplements and diet separately.

Diet

Plot the data

First we use a boxplot to display differences in growth between the different diets:

Which diet seems like it causes the most weight gain in chickens?
Which seems to cause the least?
Can you tell what the effect of supplements is from this?

Run ANOVA

            Df Sum Sq Mean Sq F value Pr(>F)    
diet         2  287.2  143.59   41.11  7e-11 ***
Residuals   45  157.2    3.49                   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Are the differences in gain between diets significant?

Tukey test

I will give you the code for this once, as I did not show you this last week. You perform the Tukey test on top of the aov() function. You can do this all in one line by “concatenating” the two functions using pipes (like below), or by storing the result of the aov function using <- and a name of your choice, and then using that name as the argument of the TukeyHSD function (try it!)

aov(gain ~ diet, data = growth) %>% 
  TukeyHSD()

  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = gain ~ diet, data = growth)

$diet
                  diff       lwr       upr     p adj
oats-barley  -3.092817 -4.694242 -1.491391 0.0000773
wheat-barley -5.990298 -7.591723 -4.388872 0.0000000
wheat-oats   -2.897481 -4.498906 -1.296055 0.0002006

Are the differences between diets significant?
Can you tell the values for the mean gain of each diet? (hint, you may need a different function, check last week’s script).

Model diagnostics

Remember how we plotted the diagnostic plots for the one way ANOVA before? There was a little piece of code before the line actually producing the plots that allowed you to plot them in a 2x2 grid. I forgot to tell you last time, you need to “undo” that and turn the grid back into a 1x1 grid or the rest of your plots will come out weird. So, below, the code for plotting the diagnostics, and then you will have to repeat this for the other ANOVAs you’ll be doing throughout the tutorial

par(mfrow = c(2,2)) # turn plotting area into a 2x2 grid 
plot(aov(gain ~ diet, data = growth))

par(mfrow = c(1,1)) # turn the plotting area back into a 1x1

Supplements

Now repeat the process above for the different supplements. See below what each output should look like so you can check your code has worked.

Boxplot

ANOVA

            Df Sum Sq Mean Sq F value Pr(>F)  
supplement   3   91.9  30.627   3.823 0.0161 *
Residuals   44  352.5   8.011                 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Tukey

  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = gain ~ supplement, data = growth)

$supplement
                          diff        lwr        upr     p adj
control-agrimore    -2.6967005 -5.7818024  0.3884014 0.1058141
supergain-agrimore  -3.3814586 -6.4665605 -0.2963567 0.0267429
supersupp-agrimore  -0.7273521 -3.8124540  2.3577498 0.9220050
supergain-control   -0.6847581 -3.7698601  2.4003438 0.9337824
supersupp-control    1.9693484 -1.1157536  5.0544503 0.3336830
supersupp-supergain  2.6541065 -0.4309954  5.7392084 0.1142704

Diagnostics

2-way ANOVA

Now we are going to test whether the interaction between diet and supplement is causing a significant difference in weigh gain in chickens.

What are the three hypotheses being tested by this model?

Plot the data

First of all, we need to plot the data. This becomes a slightly more complicated boxlot, since now we’re “mapping” gain to the y-axis, supplement to the x-axis, and diet to the fill colour of the boxes:

library(viridis)

Loading required package: viridisLite

ggplot(growth) + 
  geom_boxplot(aes(x = supplement, y = gain, fill = diet)) +
  scale_fill_viridis_d() + 
  theme_bw()

Can you think of a different way to display this data in a boxplot? Try it!
Can you make the axes labels look a bit nicer? What about a plot title?

We can also add lines connecting the means, to see if they cross or not (suggestive of interactions). I’ve added the code below if you’re curious, but you don’t necessarily need to learn to do this

Code

ggplot(growth) + 
  geom_boxplot(aes(x = supplement, y = gain, fill = diet)) + 
  stat_summary(aes(x = supplement, y = gain, col = diet, group = diet),
               position = position_dodge(width = 0.2),
               fun = mean,
               geom = "line") +
  theme_bw()

Do you think we’ll find an effect of the interaction between supplement and diet? Why?

Perform the ANOVA

Again, I will show you the code to do this as this is the first time you’re dealing with an interaction between two terms in a model. In order for R to know that the terms are interacting, you need to link them with a * like so:

summary(aov(gain ~ supplement*diet, data = growth))

                Df Sum Sq Mean Sq F value   Pr(>F)    
supplement       3  91.88   30.63   17.82 2.95e-07 ***
diet             2 287.17  143.59   83.52 3.00e-14 ***
supplement:diet  6   3.41    0.57    0.33    0.917    
Residuals       36  61.89    1.72                     
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

What do you think would happen if we coded this as supplement + diet instead? Try it!
Do we reject, or fail to reject, our three null hypotheses?
Can we, all together, write the equation for this model?
After writing the equation, can you remember what line of code to use to obtain the values of the betas?


Call:
lm(formula = gain ~ supplement * diet, data = growth)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.48756 -1.00368 -0.07452  1.03496  2.68069 

Coefficients:
                              Estimate Std. Error t value Pr(>|t|)    
(Intercept)                    26.3485     0.6556  40.191  < 2e-16 ***
supplementcontrol              -3.0518     0.9271  -3.292 0.002237 ** 
supplementsupergain            -3.8824     0.9271  -4.187 0.000174 ***
supplementsupersupp            -0.7732     0.9271  -0.834 0.409816    
dietoats                       -3.0501     0.9271  -3.290 0.002248 ** 
dietwheat                      -6.7094     0.9271  -7.237 1.61e-08 ***
supplementcontrol:dietoats      0.2471     1.3112   0.188 0.851571    
supplementsupergain:dietoats    0.2470     1.3112   0.188 0.851652    
supplementsupersupp:dietoats   -0.6650     1.3112  -0.507 0.615135    
supplementcontrol:dietwheat     0.8183     1.3112   0.624 0.536512    
supplementsupergain:dietwheat   1.2557     1.3112   0.958 0.344601    
supplementsupersupp:dietwheat   0.8024     1.3112   0.612 0.544381    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.311 on 36 degrees of freedom
Multiple R-squared:  0.8607,    Adjusted R-squared:  0.8182 
F-statistic: 20.22 on 11 and 36 DF,  p-value: 3.295e-12

Based on this result, can you fill in a table like the one we did in the lecture, with one treatment in the rows, one treatment in the columns, and the corresponding beta(s) in each cell? Feel free to send this to me after the lab, or call me over during the lab, to make sure you got this right and receive feedback if you want it.

Tukey test

You have the code to perform a Tukey test above, so I’m not going to repeat it here, but this is the result you’d expect

  Tukey multiple comparisons of means
    95% family-wise confidence level

Fit: aov(formula = gain ~ supplement * diet, data = growth)

$supplement
                          diff       lwr        upr     p adj
control-agrimore    -2.6967005 -4.138342 -1.2550592 0.0000764
supergain-agrimore  -3.3814586 -4.823100 -1.9398173 0.0000015
supersupp-agrimore  -0.7273521 -2.168993  0.7142892 0.5326710
supergain-control   -0.6847581 -2.126399  0.7568832 0.5817637
supersupp-control    1.9693484  0.527707  3.4109897 0.0040534
supersupp-supergain  2.6541065  1.212465  4.0957478 0.0000972

$diet
                  diff       lwr       upr p adj
oats-barley  -3.092817 -4.225918 -1.959715 3e-07
wheat-barley -5.990298 -7.123399 -4.857196 0e+00
wheat-oats   -2.897481 -4.030582 -1.764379 1e-06

$`supplement:diet`
                                          diff         lwr        upr     p adj
control:barley-agrimore:barley    -3.051827710  -6.2878410  0.1841856 0.0797364
supergain:barley-agrimore:barley  -3.882353990  -7.1183673 -0.6463407 0.0081992
supersupp:barley-agrimore:barley  -0.773175055  -4.0091883  2.4628382 0.9993538
agrimore:oats-agrimore:barley     -3.050093860  -6.2861072  0.1859194 0.0800774
control:oats-agrimore:barley      -5.854812782  -9.0908261 -2.6187995 0.0000156
supergain:oats-agrimore:barley    -6.685474160  -9.9214875 -3.4494609 0.0000011
supersupp:oats-agrimore:barley    -4.488243097  -7.7242564 -1.2522298 0.0012832
agrimore:wheat-agrimore:barley    -6.709404652  -9.9454179 -3.4733914 0.0000010
control:wheat-agrimore:barley     -8.942959440 -12.1789727 -5.7069461 0.0000000
supergain:wheat-agrimore:barley   -9.336046197 -12.5720595 -6.1000329 0.0000000
supersupp:wheat-agrimore:barley   -6.680136725  -9.9161500 -3.4441234 0.0000011
supergain:barley-control:barley   -0.830526280  -4.0665396  2.4054870 0.9987590
supersupp:barley-control:barley    2.278652655  -0.9573606  5.5146659 0.3964466
agrimore:oats-control:barley       0.001733850  -3.2342794  3.2377471 1.0000000
control:oats-control:barley       -2.802985073  -6.0389984  0.4330282 0.1431702
supergain:oats-control:barley     -3.633646450  -6.8696597 -0.3976332 0.0168823
supersupp:oats-control:barley     -1.436415388  -4.6724287  1.7995979 0.9157324
agrimore:wheat-control:barley     -3.657576943  -6.8935902 -0.4215636 0.0157690
control:wheat-control:barley      -5.891131730  -9.1271450 -2.6551184 0.0000139
supergain:wheat-control:barley    -6.284218487  -9.5202318 -3.0482052 0.0000039
supersupp:wheat-control:barley    -3.628309015  -6.8643223 -0.3922957 0.0171405
supersupp:barley-supergain:barley  3.109178935  -0.1268344  6.3451922 0.0691463
agrimore:oats-supergain:barley     0.832260130  -2.4037532  4.0682734 0.9987355
control:oats-supergain:barley     -1.972458793  -5.2084721  1.2635545 0.6078579
supergain:oats-supergain:barley   -2.803120170  -6.0391335  0.4328931 0.1431270
supersupp:oats-supergain:barley   -0.605889108  -3.8419024  2.6301242 0.9999375
agrimore:wheat-supergain:barley   -2.827050663  -6.0630640  0.4089626 0.1356315
control:wheat-supergain:barley    -5.060605450  -8.2966187 -1.8245922 0.0002063
supergain:wheat-supergain:barley  -5.453692208  -8.6897055 -2.2176789 0.0000577
supersupp:wheat-supergain:barley  -2.797782735  -6.0337960  0.4382306 0.1448433
agrimore:oats-supersupp:barley    -2.276918805  -5.5129321  0.9590945 0.3975577
control:oats-supersupp:barley     -5.081637727  -8.3176510 -1.8456244 0.0001928
supergain:oats-supersupp:barley   -5.912299105  -9.1483124 -2.6762858 0.0000130
supersupp:oats-supersupp:barley   -3.715068043  -6.9510813 -0.4790547 0.0133696
agrimore:wheat-supersupp:barley   -5.936229597  -9.1722429 -2.7002163 0.0000120
control:wheat-supersupp:barley    -8.169784385 -11.4057977 -4.9337711 0.0000000
supergain:wheat-supersupp:barley  -8.562871142 -11.7988844 -5.3268578 0.0000000
supersupp:wheat-supersupp:barley  -5.906961670  -9.1429750 -2.6709484 0.0000132
control:oats-agrimore:oats        -2.804718922  -6.0407322  0.4312944 0.1426161
supergain:oats-agrimore:oats      -3.635380300  -6.8713936 -0.3993670 0.0167992
supersupp:oats-agrimore:oats      -1.438149237  -4.6741625  1.7978641 0.9151137
agrimore:wheat-agrimore:oats      -3.659310792  -6.8953241 -0.4232975 0.0156910
control:wheat-agrimore:oats       -5.892865580  -9.1288789 -2.6568523 0.0000138
supergain:wheat-agrimore:oats     -6.285952337  -9.5219656 -3.0499390 0.0000038
supersupp:wheat-agrimore:oats     -3.630042865  -6.8660562 -0.3940296 0.0170562
supergain:oats-control:oats       -0.830661377  -4.0666747  2.4053519 0.9987572
supersupp:oats-control:oats        1.366569685  -1.8694436  4.6025830 0.9382729
agrimore:wheat-control:oats       -0.854591870  -4.0906052  2.3814214 0.9983978
control:wheat-control:oats        -3.088146658  -6.3241600  0.1478666 0.0728783
supergain:wheat-control:oats      -3.481233415  -6.7172467 -0.2452201 0.0258844
supersupp:wheat-control:oats      -0.825323942  -4.0613372  2.4106894 0.9988273
supersupp:oats-supergain:oats      2.197231062  -1.0387822  5.4332444 0.4500973
agrimore:wheat-supergain:oats     -0.023930493  -3.2599438  3.2120828 1.0000000
control:wheat-supergain:oats      -2.257485280  -5.4934986  0.9785280 0.4101108
supergain:wheat-supergain:oats    -2.650572037  -5.8865853  0.5854413 0.1989133
supersupp:wheat-supergain:oats     0.005337435  -3.2306759  3.2413507 1.0000000
agrimore:wheat-supersupp:oats     -2.221161555  -5.4571748  1.0148517 0.4340350
control:wheat-supersupp:oats      -4.454716342  -7.6907296 -1.2187030 0.0014257
supergain:wheat-supersupp:oats    -4.847803100  -8.0838164 -1.6117898 0.0004093
supersupp:wheat-supersupp:oats    -2.191893627  -5.4279069  1.0441197 0.4537097
control:wheat-agrimore:wheat      -2.233554788  -5.4695681  1.0024585 0.4258079
supergain:wheat-agrimore:wheat    -2.626641545  -5.8626548  0.6093717 0.2089804
supersupp:wheat-agrimore:wheat     0.029267928  -3.2067454  3.2652812 1.0000000
supergain:wheat-control:wheat     -0.393086757  -3.6291001  2.8429265 0.9999993
supersupp:wheat-control:wheat      2.262822715  -0.9731906  5.4988360 0.4066453
supersupp:wheat-supergain:wheat    2.655909472  -0.5801038  5.8919228 0.1967179

You can also plot the results of the tukey test. It involves a bit more coding. First you need to save the tukey test results as an object (I’ve named the object tuk, then extract the results of the interaction (remember the use of the operator $?). And then turn that into a data frame that ggplot will understand for plotting. Let’s see how below:

tuk <- TukeyHSD(aov(gain ~ supplement*diet, data = growth)) # this stores the tukey results in an object called tuk (check it in your environment)

tuk_int <- tuk$`supplement:diet` # this stores the results of the tukey test on the interaction in an object called tuk_int

tuk_int <- as.data.frame(tuk_int) # this turns the object into a dataframe, necessary for ggplot plotting

# what is being compared needs to be added as a column in the dataset, that we create with the function mutate
tuk_int <- tuk_int %>% 
  mutate(comparison = row.names(tuk_int))

Now we can plot the points and errorbars but it’s going to be a very confusing plot!

ggplot(tuk_int) + 
  geom_point(aes(x = diff, y = comparison)) + 
  geom_errorbar(aes(xmin = lwr, xmax = upr, y = comparison)) + 
  geom_vline(aes(xintercept = 0), col = "red", linetype = 2) + 
  theme_bw()

Is this plot helpful at all to you?

Diagnostics

Do you think these look better, or worse, than the diagnostics for the 1-way ANOVA?
Why do you think that is?

Since the interaction is not significant, let’s reduce noise and try the ANOVA without the interaction. Here I’m only going to plot the diagnostics but feel free to try it and visualise the results

par(mfrow = c(2,2)) 
plot(aov(gain ~ supplement + diet, data = growth))

par(mfrow = c(1,1))

Does this make it any better?

Another “diagnostic tool” at your disposal is to see the % of variability in your data explained by your model (adjusted R²)

summary(lm(gain ~ supplement, data = growth))


Call:
lm(formula = gain ~ supplement, data = growth)

Residuals:
    Min      1Q  Median      3Q     Max 
-5.1309 -2.2142 -0.2459  1.7644  5.9339 

Coefficients:
                    Estimate Std. Error t value Pr(>|t|)    
(Intercept)          23.0953     0.8170  28.267  < 2e-16 ***
supplementcontrol    -2.6967     1.1555  -2.334  0.02423 *  
supplementsupergain  -3.3815     1.1555  -2.926  0.00541 ** 
supplementsupersupp  -0.7274     1.1555  -0.629  0.53228    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 2.83 on 44 degrees of freedom
Multiple R-squared:  0.2068,    Adjusted R-squared:  0.1527 
F-statistic: 3.823 on 3 and 44 DF,  p-value: 0.01614

summary(lm(gain ~ diet, data = growth))


Call:
lm(formula = gain ~ diet, data = growth)

Residuals:
    Min      1Q  Median      3Q     Max 
-3.4922 -1.1353 -0.1414  1.0647  4.6075 

Coefficients:
            Estimate Std. Error t value Pr(>|t|)    
(Intercept)  24.4216     0.4672  52.269  < 2e-16 ***
dietoats     -3.0928     0.6608  -4.681 2.64e-05 ***
dietwheat    -5.9903     0.6608  -9.066 1.02e-11 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.869 on 45 degrees of freedom
Multiple R-squared:  0.6463,    Adjusted R-squared:  0.6306 
F-statistic: 41.11 on 2 and 45 DF,  p-value: 6.998e-11

summary(lm(gain ~ supplement + diet, data = growth))


Call:
lm(formula = gain ~ supplement + diet, data = growth)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.30792 -0.85929 -0.07713  0.92052  2.90615 

Coefficients:
                    Estimate Std. Error t value Pr(>|t|)    
(Intercept)          26.1230     0.4408  59.258  < 2e-16 ***
supplementcontrol    -2.6967     0.5090  -5.298 4.03e-06 ***
supplementsupergain  -3.3815     0.5090  -6.643 4.72e-08 ***
supplementsupersupp  -0.7274     0.5090  -1.429     0.16    
dietoats             -3.0928     0.4408  -7.016 1.38e-08 ***
dietwheat            -5.9903     0.4408 -13.589  < 2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.247 on 42 degrees of freedom
Multiple R-squared:  0.8531,    Adjusted R-squared:  0.8356 
F-statistic: 48.76 on 5 and 42 DF,  p-value: < 2.2e-16

summary(lm(gain ~ supplement * diet, data = growth))


Call:
lm(formula = gain ~ supplement * diet, data = growth)

Residuals:
     Min       1Q   Median       3Q      Max 
-2.48756 -1.00368 -0.07452  1.03496  2.68069 

Coefficients:
                              Estimate Std. Error t value Pr(>|t|)    
(Intercept)                    26.3485     0.6556  40.191  < 2e-16 ***
supplementcontrol              -3.0518     0.9271  -3.292 0.002237 ** 
supplementsupergain            -3.8824     0.9271  -4.187 0.000174 ***
supplementsupersupp            -0.7732     0.9271  -0.834 0.409816    
dietoats                       -3.0501     0.9271  -3.290 0.002248 ** 
dietwheat                      -6.7094     0.9271  -7.237 1.61e-08 ***
supplementcontrol:dietoats      0.2471     1.3112   0.188 0.851571    
supplementsupergain:dietoats    0.2470     1.3112   0.188 0.851652    
supplementsupersupp:dietoats   -0.6650     1.3112  -0.507 0.615135    
supplementcontrol:dietwheat     0.8183     1.3112   0.624 0.536512    
supplementsupergain:dietwheat   1.2557     1.3112   0.958 0.344601    
supplementsupersupp:dietwheat   0.8024     1.3112   0.612 0.544381    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 1.311 on 36 degrees of freedom
Multiple R-squared:  0.8607,    Adjusted R-squared:  0.8182 
F-statistic: 20.22 on 11 and 36 DF,  p-value: 3.295e-12

Which model explains more of the total variability in the data?
Does the interaction improve the model?
The one-way ANOVAs testing the effects of supplement and diet independently have the better looking diagnostic plots. Does that mean they are the best models to fit to this data? Why?

Your task

Now it’s time to see what happens if you add sex as a third factor to the model. Does it fix our diagnostic issues? I am going to get you started with some data exploration, but after that you’re on your own to play around with the model and ask questions to me and the demonstrators.

First of all, we need to be aware that, as we keep adding factors to the model, the number of sample units in each treatment (sex-diet-supplement) decreases. Remember the group_by and tally functions? Use them here to see how many rows per treatment we have:

# A tibble: 24 × 4
# Groups:   supplement, diet [12]
   supplement diet   sex       n
   <fct>      <fct>  <fct> <int>
 1 agrimore   barley f         2
 2 agrimore   barley m         2
 3 agrimore   oats   f         2
 4 agrimore   oats   m         2
 5 agrimore   wheat  f         2
 6 agrimore   wheat  m         2
 7 control    barley f         2
 8 control    barley m         2
 9 control    oats   f         2
10 control    oats   m         2
11 control    wheat  f         2
12 control    wheat  m         2
13 supergain  barley f         2
14 supergain  barley m         2
15 supergain  oats   f         2
16 supergain  oats   m         2
17 supergain  wheat  f         2
18 supergain  wheat  m         2
19 supersupp  barley f         2
20 supersupp  barley m         2
21 supersupp  oats   f         2
22 supersupp  oats   m         2
23 supersupp  wheat  f         2
24 supersupp  wheat  m         2

Do you think this is enough? Why?
How might this affect the “leverage” of each data point (sampling unit)?

Still, let’s continue to play around with it for a bit. Since we now have three factors, it is going to be more complicated to plot the data using ggplot. So, I am going to introduce “facets”. With facets, you can create a “gridded” plot so that you can plot an extra variable, like so:

ggplot(growth) + 
  geom_boxplot(aes(x = supplement, y = gain, fill = diet)) +
  facet_grid(.~sex) + 
  theme_bw()

Play around with mapping sex to colour and supplement to the grid (for example), and different combinations to see which one is more helpful for visualising the data.

However, we know that there’s only 2 sample units in each treatment, so these boxplots are constructed with only 2 observations in each. So maybe we can just plot the points? See below:

ggplot(growth) + 
  geom_point(aes(y = gain, x = supplement, col = diet, shape = sex), 
             position = position_dodge(0.3)) + 
  theme_bw()

Overall, who has gained more weight during the experiment, males or females?
Do you think sex is interacting with either diet or supplements?
Play around with three way ANOVAs, with different interactions, and see if you can improve in the models we’ve run so far. Feel free to show me/email me the code of your bestest ANOVA if you want any feedback.