In this computer lab, we will extend our understanding of one-way ANOVAs, by practicing what we have learnt in Topic 7.
🏡 A one-way ANOVA is used to test for differences in means between two or more independent groups.
In this question, we will assess the average life expectancy of people living on the different continents around the world (excluding Antarctica!), using a subset of data from the gapminder
R package (see (Gapminder 2021a), also (Gapminder 2021b)).
The data set we will analyse, gapminder_2002.csv
, contains data from 142 different countries, and was recorded for the year 2002. In this question, we will focus solely on the following variables:
Dependent variable: Life Expectancy
(Numeric: The average life expectancy at birth -in years - of individuals from a country)
Independent variable: Continent
\(^\dagger\)
(Categorical: The continent from which the data was obtained. One of Africa, Americas, Europe, Asia_Oceania)
\(^\dagger\) Note that for this data set, North America and South America have been combined into the (super) continent Americas, while Asia_Oceania includes Asia, Australia and New Zealand.
💻 Download the gapminder_2002.csv
file from the LMS, and save it in a relevant location on your device.
💻 Open RStudio and create a new script file. Set your working directory to the folder in which you saved the gapminder_2002.csv
file, then import the data set by running the code below:
gapminder_2002 <- read.csv(“gapminder_2002.csv”, header = T)
💻 Let’s start by carrying out some exploratory data analysis. To begin, visualise our data by constructing a box plot of people’s Life Expectancy
, separated by Continent
. Make sure to label your box plot clearly.
Hint: Recall that we covered how to create box plots of one variable, separated by another variable, in previous core computer labs - see e.g. Computer Lab 7 .
💻 Compute the sample mean and standard deviation of people’s Life Expectancy
by Continent
. Also note down the sample size for each Continent
.
Hint: Recall that we performed similar calculations in Computer Lab 7 . If you are not sure how to proceed with this question, check the following code.
# Compute mean life expectancy
tapply(gapminder_2002$lifeExp, gapminder_2002$continent, mean)
# Compute sample size for continents observations
table(gapminder_2002$continent)
🏡 We would like to test, at the \(5\%\) level of significance, whether people’s Life Expectancy
value differs depending upon the Continent
in which they live. In order to carry out our one-way ANOVA, we first need to clearly define our null and alternative hypotheses (\(H_0\) and \(H_1\) respectively).
Suppose that we let \(\mu_1\) denote the true (population) average life expectancy at birth of people born in the continent Africa.
Using this notation as a guide, define similar notation for the other Continent
categories, and use these to define an appropriate \(H_0\) and \(H_1\).
Hint: Check the Topic 7 readings if you are unsure how to proceed.
💻 In R, we can use the aov
function to carry out the one-way ANOVA described in part 1.4 above.
The aov
function has the following structure:
example_anova <- aov(y_variable ~ x_variable, data = example_data)
Here, we model a chosen y_variable
from our data set against a chosen x_variable
. We specify the data set to use via the data =
argument.
Using this information, conduct a one-way ANOVA of Life Expectancy
across Continents
, using the gapminder_2002
data, as discussed in 1.4.
Note: Since we specify the data set via the data =
argument, we don’t need to write e.g gapminder$gdpPercap
if we want to include the gdpPercap
variable in our ANOVA - we can simply write gdpPercap
in the y_variable
or x_variable
position, as desired.
💻 Assess your ANOVA results using the summary
R command, and note the following:
Hint: You can use the summary
function as shown in the R code below - you will need to extrapolate from this example:
# Summarise results stored in object `example`
summary(example_anova)
🏡 Write a brief statement that summarises your results.
🏡 So far, we have proceeded assuming that the one-way ANOVA test assumptions were satisfied for our analysis. We should check these assumptions now.
Similar to the independent samples \(t\)-test, we have 4 test assumptions to check:
We know that the data are numeric (1), and we can assume that the observations are independent between continents (2). Therefore, we next need to test for the equality of variances between the groups (3).
To check this, run the code library(car)
, and then use the leveneTest
R command to carry out the Levene’s Test for equal variances.
What do you conclude? Provide a simple sentence, using the \(p\)-value you obtain from the Levene’s Test to support your decision.
Hint: We can interpret the Levene’s Test output just as we did in the independent samples \(t\)-test scenario in Computer Lab 7.
🏡 We also need to check the normality of the residuals produced for our one-way ANOVA (4).
We can access these using ...$residuals
(where you will need to replace the ...
s with the name you chose for your ANOVA - e.g. example_anova$residuals
).
Complete the following:
Based upon visual inspection of these plots, what do you conclude?
🏡 To support your conclusion to 1.6.2, it is important to carry out a formal statistical test. Use the Shapiro-Wilk test to assess the normality of the residuals.
What do you find? Does this support your answer to 1.6.2?
🏡 Regardless of your conclusions above in 1.6, we will proceed under the assumption that our one-way ANOVA test assumptions have been safely met. Our next step is to conduct a Tukey HSD post-hoc test.
Use the TukeyHSD
R function to carry out this test for our selected data.
🏡 Interpret the results of the Tukey HSD post-hoc test for any 2 of the various comparisons. Which comparisons, if any, are statistically significant? Are there any comparisons that are not statistically significant?
🏡 To conclude, we should also check the effect size for our one-way ANOVA.
Use the etaSquared
function from the lsr
R package to calculate the \(\eta^2\) effect size, and provide an interpretation of this effect size.
Hint: You can use the etaSquared
function in a similar manner to how you used the summary
function in 1.5.1.
💻 The gapminder_2002.csv
file also includes the variable GDP per capita
, which records the 2002 Gross Domestic Product (GDP) per capita for the different countries.
If you have time, repeat Question 1, but this time use the following structure:
GDP per capita
Continent
(Africa, Americas, Asia, Europe, Oceania)These notes have been prepared by Rupert Kuveke. The copyright for the material in these notes resides with the author named above, with the Department of Mathematical and Physical Sciences and with La Trobe University. Copyright in this work is vested in La Trobe University including all La Trobe University branding and naming. Unless otherwise stated, material within this work is licensed under a Creative Commons Attribution-Non Commercial-Non Derivatives License BY-NC-ND.