Topic 7: One-way ANOVA


In this computer lab, we will extend our understanding of one-way ANOVAs, by practicing what we have learnt in Topic 7.


1 One-way ANOVA

One-way ANOVA are used to test for differences in means between two or more independent groups. For this question, we will assess the life expectancy of people living on the different continents around the world (excluding Antarctica!), using data from the gapminder R package (Gapminder 2021a), also (Gapminder 2021b).

This data set contains life expectancy, population size, and GDP per capita data on 142 different countries, in 5 different continents, recorded between the years 1952 to 2007.

For this question, we are interested only in the following variables:

  • Dependent variable: Life Expectancy (years at birth)
  • Independent variable: Continent\(^\dagger\) (Africa, Americas, Asia, Europe, Oceania)

\(^\dagger\) Note that for this data set, North America and South America have been combined into the (super) continent Americas, while Oceania includes Australia and New Zealand.

1.1

Open up RStudio and create a new script file. Before we can start analysing the data, we first need to install and load the required package. Run the code below to get started:

install.packages("gapminder") # Install package
library(gapminder) # Load package
data(gapminder) # Load gapminder data

1.2

Before we conduct our one-way ANOVA, it is important to carry out some exploratory data analysis. To begin, visualise our data by constructing a boxplot of people’s life expectancy at birth, separated by continent. Make sure to label your boxplot clearly.

1.3

Compute the sample mean and standard deviation of people’s life expectancy by continent. Also note down the sample size for each continent.

Hint: Recall that we performed similar calculations in Computer Lab 7 . If you are not sure how to proceed with this question, the following code will give you a head start.

# Compute mean life expectancy 
tapply(gapminder$lifeExp, gapminder$continent, mean)
# Compute sample size for continents observations
table(gapminder$continent)

1.4

What do you observe from your results for 1.2 and 1.3?

1.5

We would like to test, at the \(5\%\) level of significance, whether people’s average life expectancy at birth differs depending upon the continent in which they live. In order to carry out our one-way ANOVA, we first need to clearly define our null and alternative hypotheses.

Suppose that we let \(\mu_1\) denote the true average life expectancy at birth of people born in Africa.

Using this notation as a guide, define an appropriate \(H_0\) and \(H_1\).

1.6

We can carry out the one-way ANOVA described in part 1.5 above in R, using the aov function.

The aov function has the following structure:

example_anova <- aov(y_variable ~ x_variable, data = example_data)

Here, we model a chosen y variable from our data set against a chosen x variable. We specify the data set to use via the data = argument. Note that since we specify the data set via the data = argument, we don’t need to write e.g gapminder$gdpPercap if we want to include the gdpPercap variable in our ANOVA - we can simply write gdpPercap in the y_variable or x_variable position, as desired.

Using this information, try to conduct a one-way ANOVA of average life expectancy across continents, using the gapminder data, as discussed in 1.5.

1.7

Assess your ANOVA results using the summary R command, and note the following:

  • The degrees of freedom \(d1\) and \(d2\);
  • The p-value;
  • The test statistic (F value)

Hint: You can use the summary function as shown in the R code below - you will need to extrapolate from this example:

# Summarise results stored in object `example`
summary(example_anova)

1.8

Write a brief statement that summarises your results.

1.9

So far, we have proceeded assuming that the one-way ANOVA test assumptions were satisfied for our analysis. We should check these assumptions now.

We know that the data are numeric, and we can assume that the observations are independent between continents. However, we still need to test for the equality of variances between the groups.

To check this, use the leveneTest R command to carry out the Levene’s Test for equal variances.

Note you will need to load the car package in order to use this test.

Using the p-value to support your decision, what do you conclude?

1.10

Finally, we need to check the normality of the residuals for our one-way ANOVA.

We can access these using ...$residuals (where you will need to replace the ...s with the name you chose for your ANOVA - e.g. example_anova$residuals, using the name specified in the example in 1.5).

Create a histogram of the residuals, and overlay a normal curve, using the residuals data to inform your choice of mean and standard deviation. Also create a Normal Q-Q plot of the residuals.

Based upon visual inspection of these plots, what do you conclude?

1.11

To support your decision, it is important to carry out a formal statistical test. Use the Shapiro-Wilk test to assess the normality of the residuals.

What do you find? Does this support your answer to 1.10?

1.12

Regardless of your conclusion above, we will proceed under the assumption that our one-way ANOVA test assumptions have been safely met. Our next step is to conduct a Tukey HSD post-hoc test.

Use the TukeyHSD R function to carry out this test in R for our selected data.

1.12.1

Interpret the results of the Tukey HSD post-hoc test for any 2 of the 10 comparisons. Which, if any, comparisons are statistically significant? Are there any comparisons that are not statistically significant?

1.12.2

To conclude, use the etaSquared function from the lsr R package to calculate the \(\eta^2\) effect size for our one-way ANOVA test.

Interpret the effect size.

Hint: You can use the etaSquared function in a similar manner to how you used the summary function in 1.3.

2 Practice

If you have time, repeat Question 1, but this time use the variables:

  • Dependent variable: GDP per capita
  • Independent variable: Continent (Africa, Americas, Asia, Europe, Oceania)


References

Gapminder. 2021a. “Happiness Score (WHR) [.csv File].” 2021. http://gapm.io/dhapiscore\_whr.
———. 2021b. “Income Per Person [.csv File].” 2021. http://gapm.io/dgdppc.


These notes have been prepared by Rupert Kuveke. The copyright for the material in these notes resides with the author named above, with the Department of Mathematics and Statistics and with La Trobe University. Copyright in this work is vested in La Trobe University including all La Trobe University branding and naming. Unless otherwise stated, material within this work is licensed under a Creative Commons Attribution-Non Commercial-Non Derivatives License BY-NC-ND.