Topic 3: ANOVAs in jamovi


These are the solutions for DA Computer Lab 3.

Please make sure to go over these after the lab session, and finish off any questions you may have missed during the lab.


Preparations: Wolf River Data

No answer required.

In this question we prepared to assess data on the distribution of toxic substances in the Wolf River in Tennessee, USA (Jaffe et al., 1982).

The Wolf River data set contains recorded values for variables including:

  • Aldrin: Concentration level of Aldrin (ng/L)
  • HCB: Concentration level of HCB (ng/L)
  • Depth: Depth level at which recordings were taken (1 = surface, 2 = mid-depth, 3 = bottom).

Note that Depth is a fixed factor since recordings have been made at specific depth levels.

<span style='font-size:10px;'>Note. From File:Wolf-river-york-tn1.jpg, by [Brian Stansberry](https://commons.wikimedia.org/wiki/User:Bms4880), 2009, Wikimedia Commons ([https://commons.wikimedia.org/](https://commons.wikimedia.org/)). [CC BY 3.0 DEED](https://creativecommons.org/licenses/by/3.0/deed.en)</span>

Figure 0.1: Note. From File:Wolf-river-york-tn1.jpg, by Brian Stansberry, 2009, Wikimedia Commons (https://commons.wikimedia.org/). CC BY 3.0 DEED

a.

No answer required.

b.

No answer required.


1 Wolf River ANOVA 🌱

No answer required.

1.1

No answer required.

1.2

Let \(\mu_1\), \(\mu_2\) and \(\mu_3\) denote the population mean Aldrin concentration levels (ng/L) at the surface, mid-depth and bottom of Wolf River, respectively.

Then, we have

\[H_0: \mu_1 = \mu_2 = \mu_3 \text{ versus } H_1: \text{ Not all means are equal}\]

1.2.1

1.2.2

It appears that the mean Aldrin levels clearly increase as the river depth level increases.

Based solely on the descriptives outputs, it does appear that there is a significant difference in means. but remember, we cannot use descriptive statistics or graphs to conclude statistical significance - we need to conduct a formal statistical test.

1.2.3

We have \(c = 3\) categories, and \(n = 30\).

Therefore, \(df_1 = c - 1 = 2\) and \(df_2 = n-c = 27\). We can verify this by checking the jamovi one-way ANOVA output.

1.2.4

Based on the Levene’s test \(p\)-value of 0.054, we can use the Fisher’s \(F\)-test here, although the \(p\)-value is close to \(0.05\), so if you decided to use the Welch’s \(F\)-test to be on the safe side, that is ok.

1.2.5

The ANOVA assumption of normality appears to be satisfied - the Shapiro-Wilk normality test has a high \(p\)-value of 0.789, and the Normal Q-Q plot does not exhibit large variations from the theoretical line, especially taking into account the relatively small sample size.

1.2.6

We observe that the pairwise comparison between Surface and Bottom depth levels is statistically significant, with \(p = 0.005 < \alpha\). The Surface vs Middepth \((p = 0.274)\) and Middepth vs Bottom \((p = 0.158)\) pairwise comparisons are not statistically signficance, at the \(\alpha = 0.05\) level.

1.2.7

A one-way ANOVA was conducted to determine if there was a difference in the mean Aldrin concentration levels (ng/L) at the surface (\(M = 4.189\), \(SD = 0.671\), \(n=10\)), mid-depth (\(M = 5.019\), \(SD = 1.104\), \(n=10\)) and bottom (\(M = 6.021\), \(SD = 1.582\), \(n=10\)) of Wolf River.

There was a statistically significant difference in the mean Aldrin concentration levels at the \(\alpha = 0.05\) level of significance, with Fisher’s \(F(2, 27) = 6.051\), \(p = 0.007 < 0.05\). Fisher’s \(F\)-test was used as the result for the Levene’s test for homogeneity of variances was not statistically significant \((p - 0.054)\).

The Tukey post-hoc pairwise comparisons test indicated that the mean difference between Aldrin concentration levels at the surface and bottom of Wolf River was statistically significant \((p = 0.005)\). No other pairwise comparisons were statistically significant.

2 Wolf River ANOVA #2 🌱

No answer required.

2.1

2.1.1

Results for the \(F\) and \(p\) values are found in the last two columns of the output, and match previous results.

2.1.2

\(\eta^2 = 0.31\) is a large effect size, which suggests the test results are clinically significant.

2.1.3

The Normal Q-Q plot is identical to the previous result. The histogram of the residuals appears somewhat symmetric, but we may have concerns about the normality of the residuals, since we observe many more residuals around values of 0 and 1 than we would expect.

3 Wolf River Kruskal-Wallis test 🌱

No answer required.

3.1

To begin, write out an appropriate null hypothesis and alternative hypothesis, given we are testing Aldrin levels across different Depth levels.

We have

\[H_0: \text{ Aldrin level population distributions are equal across depth levels}\] \[\text{ versus } \] \[H_1: \text{ Aldrin level population distributions are not equal across depth levels}\]

3.2

3.2.1

A Kruskal-Wallis test was conducted to compare the distribution of Aldrin concentration levels (ng/L) across three depth levels in Wolf River.

A statisticall significant difference in Aldrin concentration level distributions across depths was observed at the \(\alpha = 0.05\) level of significance, with \(\chi^2 = 9.254\), \(df = 2\), \(N = 30\), \(p = 0.01 < 0.05\), \(n_1 = n_2 = n_3 = 10\).

Post-hoc pairwise comparisons using the DSCF test indicated a statistically significant difference in Aldrin concentration level distributions between the Surface and Bottom depths \((p = 0.009)\), but not between other pairs (Surface-Middepth \(p = 0.217\), Middepth-Bottom \(p = 0.302\)).


4 Noise Data Repeated Measures ANOVA 🌱

<span style='font-size:10px;'>Note. From File:Colour soundwave.svg, by [KMpumlwana (WMF)](https://commons.wikimedia.org/wiki/User:KMpumlwana_(WMF)), 2022, Wikimedia Commons ([https://commons.wikimedia.org/](https://commons.wikimedia.org/)). [CC0 1.0 DEED](https://creativecommons.org/publicdomain/zero/1.0/deed.en)</span>

Figure 4.1: Note. From File:Colour soundwave.svg, by KMpumlwana (WMF), 2022, Wikimedia Commons (https://commons.wikimedia.org/). CC0 1.0 DEED

No answer required.

In this question we assessed data on people’s performance of a task under various levels of background noise Walker (2008). The variables in the data include:

  • Subject: The subject number assigned to an individual in the study
  • Sex: The sex of the individual (1 = male, 2 = female)
  • None: Performance with no background noise
  • Low: Performance with low background noise
  • Medium: Performance with medium background noise
  • High: Performance with high background noise

4.1

No answer required.

4.2

  • Let \(\mu_{none}\) denote the population mean performance score with no background noise
  • Let \(\mu_{low}\) denote the population mean performance score with low background noise
  • Let \(\mu_{medium}\) denote the population mean performance score with medium background noise
  • Let \(\mu_{high}\) denote the population mean performance score with high background noise

Then we have:

\[H_0: \mu_{none} = \mu_{low} = \mu_{medium} = \mu_{high} \text{ versus } H_1: \text{ Not all means are equal}\]

4.3

The mean performance scores across the 4 noise levels show that average performance was actually lowest when there was no noise, and highest when there was medium noise.

4.4

4.4.1

We have \(c=4\) categories, and \(n = 20\) (20 individuals, sampled 4 times each).

Therefore \(df_1 = k - 1 = 3\) and \(df_2 = (20-1) \times (4-1) = 19 \times 3 = 57\). We observe these values in the jamovi output.

4.4.2

Since the Mauchley’s Sphericity test \(p\)-value \(= 0.244 > 0.05\), we do not reject the sphericity assumption. As a result, we use the default \(F\) test here, and do not need to apply a correction.

4.4.3

The \(\eta^2\) effect size is \(.606\), which is a large effect size. This suggests the test results are clinically significant.

4.4.4

At the 5% level of significance, the two methods identify the same pairwise comparisons as being statistically significant.

The None-Low \((p_{tukey} = 0.008,\, p_{bonferroni} = 0.009)\), None-Medium \((p_{tukey} < .001,\, p_{bonferroni} < .001)\), Low-Medium \((p_{tukey} < .001,\, p_{bonferroni} < .001)\) and Medium-High \((p_{tukey} < .001,\, p_{bonferroni} < .001)\) comparisons are all statistically significant.

4.4.5

A one-way repeated measures ANOVA was conducted to evaluate whether individuals’ mean performance scores for a particular test varied depending on the level of background noise. The sample mean scores differed across noise levels (\(M_{none} = 43.050\), \(M_{low} = 53.100\), \(M_{medium} = 67.650\), \(M_{high} = 46.000\)), as did the standard deviations (\(SD_{none} = 6.517\), \(SD_{low} = 9.781\), \(SD_{medium} = 9.016\), \(SD_{high} = 5.252\)).

The results of the ANOVA indicated a significant difference in mean performance scores, with \(F(3,57) = 39.494\), \(p < .001\). Mauchly’s sphericity test indicated no sphericity violations, and the large \(\eta^2\) effect size of \(.606\) indicated the ANOVA results were also clinically significant.

Post hoc comparisons using both the Tukey HSD and Bonferroni correction methods identified several statistically significant pairwise comparisons:

  • The mean performance score in a no noise environment had a mean difference of 10.050 units less compared to a low noise environment, and that difference was statistically significant \((p_{tukey} = 0.008,\, p_{bonferroni} = 0.009)\).
  • The mean performance score in a no noise environment had a mean difference of 24.6 units less than in a medium noise environment \((p_{tukey} < .001,\, p_{bonferroni} < .001)\).
  • The mean performance score in a low noise environment had a mean difference of 14.55 units less than in a medium noise environment \((p_{tukey} < .001,\, p_{bonferroni} < .001)\).
  • The mean performance score in a medium noise environment had a mean difference of 21.65 units more than in a high noise environment \((p_{tukey} < .001,\, p_{bonferroni} < .001)\).
  • There was no significant difference between the mean performance score in a no noise environment and a high noise environment \((p_{tukey} < .388,\, p_{bonferroni} < .721)\) or between a low noise environment and a high noise environment \((p_{tukey} < .071, p_{bonferroni} < .098)\). \end{itemize}


5 Noise Data Friedman Test 🌱

No answer required.

5.1

We have:

\[H_0: \text{Population performance scores distributions are equal across noise levels}\] \[\text{ versus }\] \[H_1: \text{Population performance scores distributions are not all equal across noise levels}\]

5.2

5.2.1

A Friedman test was conducted to compare the distributions of performance scores under 4 different noise levels.

The test indicated a statistically significant difference in distributions of performance scores across the noise levels at the \(\alpha = 0.05\) level of significance, with \(\chi^2 = 40.894\), \(df=3\), \(p < .001\). This is similar to what we saw from the Repeated Measures ANOVA test. The post-hoc Durbin-Conover pairwise comparisons showed a similar pattern to that observed in the Repeated Measures ANOVA post-hoc pairwise comparisons, but results are not identical. Namely, significant differences were found between all pairwise comparisons here, with None-Low, Non-Medium, Low-Medium, Low-High and Medium-High all having \(p < .001\), while None-High had \(p = 0.023\).


6 Pea Plant Data 🌳

No answer required.

<span style='font-size:10px;'>Note. From File:Leaves of Pisum sativum (2).JPG, by [Chmee2](https://commons.wikimedia.org/wiki/User:Chmee2), 2011, Wikimedia Commons ([https://commons.wikimedia.org/](https://commons.wikimedia.org/)). [CC BY 3.0 DEED](https://creativecommons.org/licenses/by/3.0/deed.en)</span>

Figure 6.1: Note. From File:Leaves of Pisum sativum (2).JPG, by Chmee2, 2011, Wikimedia Commons (https://commons.wikimedia.org/). CC BY 3.0 DEED

Background Information

For the experiment, each pea plant seedling was assigned to one of three groups, and then carefully sprayed:

  • C: a control group, were sprayed with water
  • TA: a treatment group, were sprayed with a 25mg/L solution of GA
  • TB: a treatment group, were sprayed with a 50mg/L solution of GA
<span style='font-size:10px;'>Pea Plant Raw Data</span>

Figure 6.2: Pea Plant Raw Data

6.1

No answer required.

6.2

Example results and summary are shown below.

A one-way ANOVA was conducted to compare the mean difference in pea plant seedling height for different watering treatment received. The normality assumption was not violated (Shapiro-Wilk \(p = 0.856\)). However, due to the Levene’s test \(p < .001\), the assumption of equal variances was violated, so Welch’s \(F(2, 50.044) = 93.697\) was used, with \(p < .001\). The mean difference in pea plant seedling height for different watering treatment was both statistically significant, and clinically significant, with \(\eta^2 = 0.611\).

Post-hoc comparisons using the Games-Howell test suggest that there are statistically significant mean height differences in seedlings given treatments C vs TA (mean difference \(=179.152\) mm, \(p < .001\)), and in seedlings given treatments C vs TB (mean difference \(=170.039\) mm, \(p < .001\)). The mean height difference in seedlings given treatments TA vs TB was not statistically significant (mean difference \(=9.113\) mm, \(p = 0.890\)).

6.3

No answer required. Discuss this with other students and/or your lab demonstrator.


7 Wolf River HCB ANOVA 🌳

No answer required - use the answers in 1 and/or 2 as a guide.


References

Jaffe, P. R., Parker, F. L., and Wilson, D. J. (1982). Distribution of toxic substances in rivers. Journal of the Environmental Engineering Division, 108(4), 639-649.

Walker, I. (2008). Repeated-measures/split-plot ANOVA: noisedata [Data file]. https://web.archive.org/web/20210506222824/https://people.bath.ac.uk/pssiw/stats2/page16/page16.html


These notes have been prepared by Rupert Kuveke and other members of the Department of Mathematical and Physical Sciences. The copyright for the material in these notes resides with the authors named above, with the Department of Mathematical and Physical Sciences and with the Department of Environment and Genetics and with La Trobe University. Copyright in this work is vested in La Trobe University including all La Trobe University branding and naming. Unless otherwise stated, material within this work is licensed under a Creative Commons Attribution-Non Commercial-Non Derivatives License BY-NC-ND.