Michaela Patton | s3872421
Last updated: 17 October, 2020
The modern Summer Olympic Games has been held every four years from 1896, except in times of war and pandemic.
Nations bid to host the Olympics, and while it requires significant investment and planning, rewards include retained infrastructure, an opportunity to invoke national pride on a global stage, and a potential increase in tourism. It has been said that host nations tend to do well at their home Games.5
1,2,3,4
This investigation aims to determine if the medal tally of a country’s host Games is typical of their performance in non-hosting years prior.
Using data from the Summer Olympics from 1896 to 2012, this analysis will focus on on seven nations who hosted the games from 1984 to 2012 - Great Britain, China, Greece, Australia, USA, Spain and South Korea.
The sample of these countries’ medal tallies from years prior to hosting will be compared to the medal tally of their host Games.
After visual analysis, data will be compared via t-testing and a critical value approach to see if a country’s sample medal tally is statistically significant, compared to their host year.
The summer.csv open source data was collected from Kaggle, and details the following variables:
The data was provided by the IOC Research and Reference Service and published by The Guardian’s Datablog. It can be found at: https://www.kaggle.com/the-guardian/olympic-games?select=summer.csv 7
Medal and Gender were factored into their levels. The following required transformation:
When a duo, trio, team, etc wins a medal, the source dataset lists each athlete as a medalist. In Olympic medal tallies, countries are only awarded one medal per event, not per person. Therefore, if the data were grouped into medals per country, it would count multiple medals for team events.
This was addressed by grouping the number of medals (“n”) per Year, Event and Country:
If n=1, that value remained a 1;
If n>2, n was imputed with a 1, with the assumption that it was unlikely that there was a three-way tie between compatriots in an individual event;
If n=2, events were filtered as either individual (ie. there was a tie between two compatriots), or a team event. If it was a team event, n was imputed with a 1. If it was a tie in an individual event, both medals were awarded and it remained n=2;
There was only one case of a three-way tie between three compatriots in an individual event, which was the men’s Pommel Horse in the 1948 gymnastics, when three Finnish men tied for Gold.8 As the 3 medals had been imputed with a 1, 2 medals were manually added.
The dataset was subsetted to include four variables: Year, City, Country, and the new variable:
As the data were grouped from the original source listing each medalist, countries that won no medals in an Olympiad had no observation for that year:
Each host country’s “Year” variable was checked against official records to determine whether they attended those Olympics and earned no medals, or if they did not attend.
Greece, Australia, Spain and South Korea all had years where they earned no medals,9 and these were manually added into the data.
Two additional years for Australia were also imputed under country code “ANZ”, under which they competed in 1904 and 1908.9
After preprocessing, the data were filtered to include only the seven host countries from 1984 to 2012:
| Country | Min | Q1 | Median | Q3 | Max | Mean | SD | n | Missing |
|---|---|---|---|---|---|---|---|---|---|
| AUS | 0 | 4.50 | 13.0 | 26.00 | 58 | 17.48 | 16.86 | 27 | 0 |
| CHN | 28 | 45.50 | 55.5 | 70.00 | 100 | 59.38 | 25.43 | 8 | 0 |
| ESP | 0 | 1.00 | 1.5 | 9.75 | 22 | 5.95 | 7.61 | 22 | 0 |
| GBR | 2 | 15.50 | 21.0 | 33.00 | 143 | 29.41 | 26.83 | 27 | 0 |
| GRE | 0 | 0.00 | 1.0 | 2.50 | 45 | 4.04 | 9.07 | 27 | 0 |
| KOR | 0 | 2.00 | 12.5 | 28.50 | 33 | 15.38 | 13.92 | 16 | 0 |
| USA | 19 | 71.75 | 95.0 | 104.00 | 233 | 92.62 | 41.16 | 26 | 0 |
A coloured line graph was used to visualise the medal tallies of seven recent host nations:
\[H_0: \mu_1 = \mu_2 \]
\[H_A: \mu_1\ < \mu_2\] The same hypotheses will be used for the seven host countries from 1984 to 2012.
Great Britain hosted the Olympic Games in 1908, 1948 and 2012, which were held in London.
A lower, one-tailed, one-sample t-test to test Great Britain’s 1896-2008 medal tally against their 68 medals earned in 2012. A critical value approach was also applied for the data with 24 degrees of freedom.
Decision: Reject \(H_0\): \(\mu\) = 68 as \(p\)-value < .001 and 95% CI [0, 27.387] did not capture \(H_0\).
Conclusion: The estimated mean medal tally, based on the pre-host year sample, was \(\bar{x}\) = 23.32, 95% CI [0, 27.387]. The results of the one-sample t-test found that the medal tallies of previous years were significantly lower than the 2012 medal tally of 68, \(t\)(\(df\)=24) = -18.793, \(p\) <. 001.
Great Britain also hosted the Games in 1908 and 1948. Their samples prior to these years were tested for normality, but were not normal enough to analyse.
Australia hosted the Olympic games in 2000 in Sydney and in 1956 in Melbourne.
A lower, one-tailed, one-sample \(t\)-test to test Australia’s sample from 1948-1996 against their host medal tally of 58 in 2000. A critical value approach was also applied for the sample with 11 degrees of freedom.
Decision: Reject \(H_0\): \(\mu\) = 58 as \(p\)-value < .001 and 95% CI [0, 22.121] did not capture \(H_0\).
Conclusion: The estimated mean medal tally, based on the pre-host year sample was \(\bar{x}\) = 17.75, 95% CI [0, 22.121]. The results of the one-sample \(t\)-test found that the medal tallies of previous years were significantly lower than the 2000 medal tally of 58, \(t\)(\(df\)=11) = -16.444, \(p\) < .001.
Australia also hosted the Games in 1956. The two samples prior to these years were tested for normality, but the data was found not to be normally distributed, and not suitable for analysis.
China has only competed at the Olympics since 1984, and so a lack of data in the sample made it difficult to determine a normal distribution.
After utilising boxplot, Q-Q Plot and a histogram density curve with normal overlay, China’s sample could not be defined as normal, and therefore could not be tested using a t-test or critical value approach.
Visually, the plot of China’s medal tallies since 1984 shows an upward trend in the lead-up to their peak medal tally at the Beijing games in 2008.
More data are needed to move towards a normal distribution, and therefore to determine if the pre-hosting sample is statistically significant, compared to China’s host year in 2008.
Spain’s sample (1900-1988) and South Korea’s sample (1948-1984) were checked for normality using a boxplot, Q-Q Plot and histogram density curve with normal overlay. Neither saples could be classified as normal, and therefore could not be tested using a t-test or critical value approach.
Graphically, both countries show an upward trend of medal hauls towards their peak medal tally in their host years, 1992 and 1988 respectively, after which both countries’ medal tallies declined.
A larger sample is needed in order for the distribution to be normalised, and to be tested for statistical significance when compared to medal tallies in host years.
Greece and USA have hosted the summer Games multiple times - Greece in 1896 and 2004, and USA in 1904, 1932, 1984 and 1996.
Pre-hosting samples were tested for normality via a boxplot, Q-Q Plot and normal curve overlay. A BoxCox transformation of USA’s sample was also attempted, but neither countries’ samples were normally distributed and could not be tested using a t-test or critical value approach.
Visually, Greece achieved their highest medal tallies in their host years, 1896 and 2004.
The USA achieved their highest medal tallies in host years 1904 and 1984, with a local maximum in 1932, but their medal tally did not peak when hosting in Atlanta in 1996, suggesting that it does not support the research hypothesis.
As the samples increase in size, the statistical significance of these host Olympics can be tested.
1 Encyclopaedia Britannica 2020, Athens 1896 Olympic Games, viewed 13 October 2020, https://www.britannica.com/event/Athens-1896-Olympic-Games.
2 Australian Olympic Committee 2020, Los Angeles 1984, viewed 13 October 2020, https://www.olympics.com.au/games/los-angeles-1984/.
3 Multiplex 2020, Stadium Australia - Sydney, viewed 13 October 2020, https://www.multiplex.global/projects/stadium-australia-sydney-australia/.
4 Tokyo 2020, Venues: Olympic Stadium, viewed 13 October 2020, https://tokyo2020.org/en/venues/olympic-stadium.
5 EuroNews 2016, Hosts with the most: why home advantage brings more Olympic medals, viewed 29 September 2020, https://www.britannica.com/event/Athens-1896-Olympic-Games.
6 International Olympic Committee 2020, The Olympic Programme comprises sports, disciplines and events, viewed 17 October 2020, https://www.olympic.org/faq/sports-programme-and-results/the-olympic-programme-comprises-sports-disciplines-and-events-what-is-the-difference-between-the-three.
7 The Guardian 2016, Olympic Sports and Medals, 1896-2014, viewed 23 September 2020, https://www.kaggle.com/the-guardian/olympic-games.
8 Wikimedia Foundation 2020, List of ties for medals at the Olympics, viewed 1 October 2020, https://en.wikipedia.org/wiki/List_of_ties_for_medals_at_the_Olympics.
9 Wikimedia Foundation 2020, List of participating nations at the Summer Olympic Games, viewed 1 October 2020, https://en.wikipedia.org/wiki/List_of_participating_nations_at_the_Summer_Olympic_Games.
10 Council of Foreign Relations 2018, The economics of hosting the Olympic Games, viewed 11 October 2020, https://www.cfr.org/backgrounder/economics-hosting-olympic-games.