Topic 7: One-way ANOVA


These are the solutions for Computer Lab 8.


1 One-way ANOVA

1.1

No solutions required

1.2

1.3

We observe that the average life expectancy appears to be different across the continents. The data appear to be similarly spread out for the continents Africa and Asia_Oceania, whereas the data have a narrower spread for the Americas and Europe. This is supported by an assessment of the standard deviation values.

The mean life expectancy for people in Africa is much lower than for people in other continents.

The sample size for each continent is different, being 52, 25, 35, and 30 for Africa, the Americas, Asia_Oceania, and Europe respectively.

1.4

Appropriate notation is as follows:

  • Let \(\mu_1\) denote the population average life expectancy at birth of people born in Africa.
  • Let \(\mu_2\) denote the population average life expectancy at birth of people born in the Americas.
  • Let \(\mu_3\) denote the population average life expectancy at birth of people born in Asia_Oceania.
  • Let \(\mu_4\) denote the population average life expectancy at birth of people born in Europe.

Using this notation, we can define: \[H_0: \mu_1 = \mu_2 = \mu_3 = \mu_4\] versus \[H_1: \text{ not all $\mu_i$'s are equal, for $i = 1, \cdots, 4$ } \]

1.5

Compare your results with the following R output:

##              Df Sum Sq Mean Sq F value Pr(>F)    
## continent     3  13321    4440   77.17 <2e-16 ***
## Residuals   138   7941      58                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  • We note here that \(d1 = 3\) (the number of continents i.e. groups minus 1) and \(d2 = 138\) (the number of observations minus the number of continents i.e. groups).

  • The \(p\)-value is almost 0, which is much less than \(0.05\).

  • The test statistic is \(F=77.17\).

1.6

To summarise, we can write:

There was a significant difference in the average life expectancy from birth (in years) \(\left[F(3, 138) = 77.17, p < 0.001\right]\) for people living on different continents.

1.7

Since the \(p\)-value is close to 0, and much smaller than \(0.05\), we cannot assume equal variances. This is not a surprising result, given our box plot observations earlier.

1.8

The histogram of residuals shows that the residuals appear to be at least approximately normally distributed. The Normal Q-Q plot shows some deviation from the qqline for low theoretical quantile values, but this is not extreme enough to cause major concern.

1.9

Since the \(p\)-value \(= 0.005\), the test indicates that the residuals are in fact not normally distributed, despite our visual inspection appearing to suggest otherwise.

Let’s consider why we have obtained this unexpected result.

Due to the high sample size, the Shapiro test will be very high-powered, which means we are much more likely to get a small \(p\)-value even though the data may still be approximately normal. Given the symmetry of the residuals, as well as the high sample size (thanks to the Central Limit Theorem), we do not have any concerns about the normality assumption here despite the result of the Shapiro test.

1.10

Compare your results with the following R output:

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = lifeExp ~ continent, data = gapminder_2002)
## 
## $continent
##                            diff       lwr       upr     p adj
## Americas-Africa       19.096809 14.295731 23.897888 0.0000000
## Asia_Oceania-Africa   16.508998 12.195902 20.822093 0.0000000
## Europe-Africa         23.375369 18.852544 27.898194 0.0000000
## Asia_Oceania-Americas -2.587811 -7.753601  2.577978 0.5627338
## Europe-Americas        4.278560 -1.063587  9.620707 0.1638481
## Europe-Asia_Oceania    6.866371  1.958116 11.774627 0.0021688

1.11

We note that the Asia_Oceania-Americas comparison is not statistically significant, since we have \(p = 0.563 > 0.05\) (from the R output, we can also see that the confidence interval does include 0). This means that the difference in the average life expectancy of people born within these two continents is not statistically significant.

Most other comparisons (except Europe-Americas) have a statistically significant difference in average life expectancy between the two continents under consideration. For example, we note that:

  • People born in the Americas have an average life expectancy which is 19.097 years greater than the average life expectancy of people born in Africa. We have \(p <0.05\), i.e. \(p < 0.001\), and the \(95\%\) confidence interval for this difference is roughly \((14.29, 23.90)\). This confidence interval does not contain 0.

  • People born in Asia_Oceania have an average life expectancy which is 16.5 years greater than the average life expectancy of people born in Africa. We have \(p <0.05\), i.e. \(p < 0.001\), and the \(95\%\) confidence interval for this difference is roughly \((12.20, 20.82)\). This confidence interval does not contain 0.

N.B. This data is outdated now, so the differences may no longer be as extreme as suggested here.

1.12

We obtain an \(\eta^2\) value of roughly \(0.627\), which is considered large. This makes sense, based on our previous results - a large proportion of the variation in the life expectancy values of people can be attributed to the continent in which they live, i.e. the continent in which a person is born has a large impact on their life expectancy.

2 Practice

Follow the steps outlined above, with lifeExp replaced by gdpPercap as the dependent variable.


That’s everything! If there were any parts you were unsure about, take a look back over the relevant sections of the Topic 7 material.


References


These notes have been prepared by Amanda Shaker and Rupert Kuveke. The copyright for the material in these notes resides with the authors named above, with the Department of Mathematical and Physical Sciences and with La Trobe University. Copyright in this work is vested in La Trobe University including all La Trobe University branding and naming. Unless otherwise stated, material within this work is licensed under a Creative Commons Attribution-Non Commercial-Non Derivatives License BY-NC-ND.