Michaela Patton | s3872421
Last updated: 23 October, 2020
The modern Summer Olympic Games has been held every four years from 1896, except in times of war and pandemic.
Nations bid to host the Olympics, and while it requires significant investment and planning, rewards include retained infrastructure, an opportunity to invoke national pride on a global stage, and a potential increase in tourism. It has been said that host nations tend to do well at their home Games.5
1,2,3,4
This investigation aims to determine if the medal tally of a country’s host Games is typical of their performance in non-hosting years prior.
Using data from the Summer Olympics from 1896 to 2012, this analysis will focus on on seven nations who hosted the games from 1984 to 2012 - Great Britain, China, Greece, Australia, USA, Spain and South Korea.
The sample of these countries’ medal tallies from years prior to hosting will be compared to the medal tally of their host Games.
After visual analysis, data will be compared via t-testing and a critical value approach to see if a country’s sample medal tally is statistically significant, compared to their host year.
The summer.csv open source data was collected from Kaggle, and details the following variables:
The data was provided by the IOC Research and Reference Service and published by The Guardian’s Datablog. It can be found at: https://www.kaggle.com/the-guardian/olympic-games?select=summer.csv 7
Medal and Gender were factored into their levels. The following required transformation:
When a duo, trio, team, etc wins a medal, the source dataset lists each athlete as a medalist. In Olympic medal tallies, countries are only awarded one medal per event, not per person. Therefore, if the data were grouped into medals per country, it would count multiple medals for team events.
This was addressed by grouping the number of medals (“n”) per Year, Event and Country:
If n=1, that value remained a 1;
If n>2, n was imputed with a 1, with the assumption that it was unlikely that there was a three-way tie between compatriots in an individual event;
If n=2, events were filtered as either individual (ie. there was a tie between two compatriots), or a team event. If it was a team event, n was imputed with a 1. If it was a tie in an individual event, both medals were awarded and it remained n=2;
There was only one case of a three-way tie between three compatriots in an individual event, which was the men’s Pommel Horse in the 1948 gymnastics, when three Finnish men tied for Gold.8 As the 3 medals had been imputed with a 1, 2 medals were manually added.
The dataset was subsetted to include four variables: Year, City, Country, and the new variable:
As the data were grouped from the original source listing each medalist, countries that won no medals in an Olympiad had no observation for that year:
Each host country’s “Year” variable was checked against official records to determine whether they attended those Olympics and earned no medals, or if they did not attend.
Greece, Australia, Spain and South Korea all had years where they earned no medals,9 and these were manually added into the data.
Two additional years for Australia were also imputed under country code “ANZ”, under which they competed in 1904 and 1908.9
After preprocessing, the data were filtered to include only the seven host countries from 1984 to 2012:
Host_countries <- Olympics6 %>% filter(Country %in%
c("USA", "AUS", "CHN", "GRE", "KOR", "ESP", "GBR"))
Host_countries %>% group_by(Country) %>%
summarise (Min = min(n, na.rm = TRUE),
Q1 = quantile (n,probs = .25, na.rm = TRUE),
Median = median(n, na.rm = TRUE),
Q3 = quantile(n,probs = .75,na.rm = TRUE),
Max = max(n,na.rm = TRUE),
Mean = mean(n, na.rm = TRUE),
SD = sd(n, na.rm = TRUE),
n = n(), Missing = sum(is.na(n))) -> table1| Country | Min | Q1 | Median | Q3 | Max | Mean | SD | n | Missing |
|---|---|---|---|---|---|---|---|---|---|
| AUS | 0 | 4.50 | 13.0 | 26.00 | 58 | 17.481482 | 16.862271 | 27 | 0 |
| CHN | 28 | 45.50 | 55.5 | 70.00 | 100 | 59.375000 | 25.433035 | 8 | 0 |
| ESP | 0 | 1.00 | 1.5 | 9.75 | 22 | 5.954546 | 7.606246 | 22 | 0 |
| GBR | 2 | 15.50 | 21.0 | 33.00 | 143 | 29.407407 | 26.826020 | 27 | 0 |
| GRE | 0 | 0.00 | 1.0 | 2.50 | 45 | 4.037037 | 9.065919 | 27 | 0 |
| KOR | 0 | 2.00 | 12.5 | 28.50 | 33 | 15.375000 | 13.918214 | 16 | 0 |
| USA | 19 | 71.75 | 95.0 | 104.00 | 233 | 92.615385 | 41.160250 | 26 | 0 |
A coloured line graph was used to visualise the medal tallies of seven recent host nations:
Host_countries_plot <- Host_countries %>% ggplot(mapping = aes(x=Year, y = n, color = Country)) + labs(y =
"Total medals") + expand_limits(x= max(Host_countries$Year), y = 0:max(Host_countries$n)) +
scale_x_continuous(breaks = seq(1896, 2012, by = 4)) + scale_y_continuous(breaks = seq(0, 230, by = 25)) +
geom_line() + theme(axis.text.x=element_text(angle=45,hjust=1)) + ggtitle("Medal tallies of
7 Olympic host nations, 1896-2012") + theme(plot.title = element_text(hjust = 0.5))
Host_countries_plotThe following code was used to plot each country’s sample with host years and most medals:
GBR_historic_plot <- GBR_historic %>% ggplot(mapping = aes(x=Year, y = n)) + labs(y = "Total medals")+
expand_limits(x= max(GBR_historic$Year)+4, y = 0:max(GBR_historic$n)+10) +
scale_x_continuous(breaks = seq(1896, 2012, by = 4))+ scale_y_continuous(breaks = seq(0, 160, by = 10)) +
geom_line() + geom_point(data = GBR_historic, aes(x=Year, y=n), color = "black", size=2) +
geom_text(data=GBR_historic, aes(x=Year, y=n, label = n), size = 3.5, color = "blue", hjust = -0.2, vjust =1) +
geom_point(data=subset(GBR_historic, Most_medals == "Yes"), color = "red", size = 4) +
geom_point(data=subset(GBR_historic, Host_city == "Yes"), color = "darkgreen", size = 2) +
geom_text(data=subset(GBR_historic, Most_medals == "Yes"),
aes(label = "Most medals", hjust=0.5,vjust=-2.5), color = "red") +
geom_text(data=subset(GBR_historic, Host_city == "Yes"), aes(label = "Host city", hjust=.5,vjust=-1), color = "darkgreen") +
theme(axis.text.x=element_text(angle=45,hjust=1)) + ggtitle("Medals won by Great Britain at the Summer Olympics, 1896-2012") +
theme(plot.title = element_text(hjust = 0.5), legend.position = "none")\[H_0: \mu_1 = \mu_2 \]
\[H_A: \mu_1\ < \mu_2\] The same hypotheses will be used for the seven host countries from 1984 to 2012.
Great Britain’s samples were checked for normality using a boxplot, Q-Q Plot and histogram with overlay. After removing one extreme outlier (1908), the pre-2012 sample was near-normal. The pre-1908 and pre-1948 samples were non-normal and not analysed.
A lower, one-tailed, one-sample t-test to test Great Britain’s 1896-2008 medal tally against their 68 medals in 2012. A critical value approach was also applied for the data with 24 degrees of freedom.
Decision: Reject \(H_0\): \(\mu\) = 68 as \(p\)-value < .001 and 95% CI [0, 27.387] did not capture \(H_0\).
Conclusion: The estimated mean medal tally, based on the pre-host year sample, was \(\bar{x}\) = 23.32, 95% CI [0, 27.387]. The results of the one-sample t-test found that the medal tallies of previous years were significantly lower than the 2012 medal tally of 68, \(t\)(\(df\)=24) = -18.793, \(p\) < .001.
##
## One Sample t-test
##
## data: GBR_clean$n
## t = -18.793, df = 24, p-value = 3.659e-16
## alternative hypothesis: true mean is less than 68
## 95 percent confidence interval:
## -Inf 27.38758
## sample estimates:
## mean of x
## 23.32
Samples were checked for normality using a boxplot, Q-Q Plot and histogram with overlay. While pre-2000 and pre-1956 samples were non-normal, a sample of 1948-1996 was normal after the two Q-Q Plot outliers were removed.
Aus_mean_clean_1948_2000 = mean(Aus_clean_1948_2000$n)
Aus_sd_clean_1948_2000 = sd(Aus_clean_1948_2000$n)
hist(Aus_clean_1948_2000$n, density=80,
breaks=20, prob=TRUE)
lines(density(Aus_clean_1948_2000$n),
lwd = 2, col = "orange")
curve(dnorm(x, mean=Aus_mean_clean_1948_2000,
sd=Aus_sd_clean_1948_2000),
col="darkblue", lwd=2, add=TRUE, yaxt="n")A lower, one-tailed, one-sample \(t\)-test to test Australia’s 1948-1996 sample against their 58 medals in 2000. A critical value approach was also applied for the sample with 11 degrees of freedom.
Decision: Reject \(H_0\): \(\mu\) = 58 as \(p\)-value < .001 and 95% CI [0, 22.121] did not capture \(H_0\).
Conclusion: The estimated mean medal tally, based on the pre-host year sample was \(\bar{x}\) = 17.75, 95% CI [0, 22.121]. The results of the one-sample \(t\)-test found that the medal tallies of previous years were significantly lower than the 2000 medal tally of 58, \(t\)(\(df\)=11) = -16.444, \(p\) < .001.
##
## One Sample t-test
##
## data: Aus_clean_1948_2000$n
## t = -16.534, df = 11, p-value = 2.036e-09
## alternative hypothesis: true mean is less than 58
## 95 percent confidence interval:
## -Inf 22.12184
## sample estimates:
## mean of x
## 17.75
China: Small sample, non-normal
Greece: Larger sample, non-normal
Both sample plots show an upward trend in the lead-up to their peak medal tally at their host games.
Spain: Density plot vs normal curve:
South Korea: Boxplot & Q-Q Plot
## [1] 9 4
USA samples from pre-1996, 1984 and 1932 were tested for normality via boxplot, Q-Q Plot and normal curve overlay, yielding non-normal results. A BoxCox transformation was also attempted, which resulting in non-normality.
The USA achieved their highest medal tallies in 1904 and 1984, with a local maximum in 1932. However, it did not achieve a maximum in Atlanta 1996, suggesting this sample does not support the research hypothesis. A larger sample or a normal distribution is required to prove statistical significance.
1 Encyclopaedia Britannica 2020, Athens 1896, viewed 13 Oct 2020, https://www.britannica.com/event/Athens-1896-Olympic-Games.
2 Australian Olympic Committee 2020, Los Angeles 1984, viewed 13 October 2020, https://www.olympics.com.au/games/los-angeles-1984/.
3 Multiplex 2020, Stadium Australia - Sydney, viewed 13 October 2020, https://www.multiplex.global/projects/stadium-australia-sydney-australia/.
4 Tokyo 2020, Venues: Olympic Stadium, viewed 13 October 2020, https://tokyo2020.org/en/venues/olympic-stadium.
5 EuroNews 2016, Hosts with the most: why home advantage brings more Olympic medals, viewed 29 September 2020, https://www.britannica.com/event/Athens-1896-Olympic-Games.
6 IOC 2020, The Olympic Programme comprises sports, disciplines and events, viewed 17 October 2020, https://www.olympic.org/faq/sports-programme-and-results/the-olympic-programme-comprises-sports-disciplines-and-events-what-is-the-difference-between-the-three.
7 The Guardian 2016, Olympic Sports and Medals, 1896-2014, viewed 23 September 2020, https://www.kaggle.com/the-guardian/olympic-games.
8 Wikimedia Foundation 2020, List of ties for medals at the Olympics, viewed 1 October 2020, https://en.wikipedia.org/wiki/List_of_ties_for_medals_at_the_Olympics.
9 Wikimedia Foundation 2020, List of participating nations at the Summer Olympic Games, viewed 1 October 2020, https://en.wikipedia.org/wiki/List_of_participating_nations_at_the_Summer_Olympic_Games.
10 Council of Foreign Relations 2018, The economics of hosting the Olympic Games, viewed 11 October 2020, https://www.cfr.org/backgrounder/economics-hosting-olympic-games.