Home ground advantage at the Olympic Games

Are the medal tallies of nations typically higher when they host the Olympics?

Michaela Patton | s3872421

Last updated: 23 October, 2020

Problem Statement

Are medal tallies of nations typically higher when they host the Games?

Data

The summer.csv open source data was collected from Kaggle, and details the following variables:

The data was provided by the IOC Research and Reference Service and published by The Guardian’s Datablog. It can be found at: https://www.kaggle.com/the-guardian/olympic-games?select=summer.csv 7

Data: Preprocessing

Medal and Gender were factored into their levels. The following required transformation:

When a duo, trio, team, etc wins a medal, the source dataset lists each athlete as a medalist. In Olympic medal tallies, countries are only awarded one medal per event, not per person. Therefore, if the data were grouped into medals per country, it would count multiple medals for team events.

This was addressed by grouping the number of medals (“n”) per Year, Event and Country:

Data: Preprocessing

The dataset was subsetted to include four variables: Year, City, Country, and the new variable:

As the data were grouped from the original source listing each medalist, countries that won no medals in an Olympiad had no observation for that year:

After preprocessing, the data were filtered to include only the seven host countries from 1984 to 2012:

Descriptive Statistics

Host_countries <- Olympics6 %>% filter(Country %in%
c("USA", "AUS", "CHN", "GRE", "KOR", "ESP", "GBR"))
Host_countries %>% group_by(Country)  %>% 
summarise (Min = min(n, na.rm = TRUE),
Q1 = quantile (n,probs = .25, na.rm = TRUE), 
Median = median(n, na.rm = TRUE),
Q3 = quantile(n,probs = .75,na.rm = TRUE),
Max = max(n,na.rm = TRUE),
Mean = mean(n, na.rm = TRUE), 
SD = sd(n, na.rm = TRUE), 
n = n(), Missing = sum(is.na(n))) -> table1
kbl(table1, booktabs = T) %>%
  kable_styling(font_size = 13,
                latex_options = "striped")
Country Min Q1 Median Q3 Max Mean SD n Missing
AUS 0 4.50 13.0 26.00 58 17.481482 16.862271 27 0
CHN 28 45.50 55.5 70.00 100 59.375000 25.433035 8 0
ESP 0 1.00 1.5 9.75 22 5.954546 7.606246 22 0
GBR 2 15.50 21.0 33.00 143 29.407407 26.826020 27 0
GRE 0 0.00 1.0 2.50 45 4.037037 9.065919 27 0
KOR 0 2.00 12.5 28.50 33 15.375000 13.918214 16 0
USA 19 71.75 95.0 104.00 233 92.615385 41.160250 26 0

Visualisation

A coloured line graph was used to visualise the medal tallies of seven recent host nations:

Host_countries_plot <- Host_countries %>% ggplot(mapping = aes(x=Year, y = n, color = Country)) + labs(y = 
"Total medals") + expand_limits(x= max(Host_countries$Year), y = 0:max(Host_countries$n)) + 
scale_x_continuous(breaks = seq(1896, 2012, by = 4)) + scale_y_continuous(breaks = seq(0, 230, by = 25)) +
geom_line() + theme(axis.text.x=element_text(angle=45,hjust=1)) + ggtitle("Medal tallies of
7 Olympic host nations, 1896-2012") + theme(plot.title = element_text(hjust = 0.5))
Host_countries_plot

Visualisation

The following code was used to plot each country’s sample with host years and most medals:

GBR_historic_plot <- GBR_historic %>% ggplot(mapping = aes(x=Year, y = n)) + labs(y = "Total medals")+ 
  expand_limits(x= max(GBR_historic$Year)+4, y = 0:max(GBR_historic$n)+10) + 
  scale_x_continuous(breaks = seq(1896, 2012, by = 4))+ scale_y_continuous(breaks = seq(0, 160, by = 10)) + 
  geom_line() + geom_point(data = GBR_historic, aes(x=Year, y=n), color = "black", size=2) + 
  geom_text(data=GBR_historic, aes(x=Year, y=n, label = n), size = 3.5, color = "blue", hjust = -0.2, vjust =1) + 
  geom_point(data=subset(GBR_historic, Most_medals == "Yes"), color = "red", size = 4) + 
  geom_point(data=subset(GBR_historic, Host_city == "Yes"), color = "darkgreen", size = 2) + 
  geom_text(data=subset(GBR_historic, Most_medals == "Yes"), 
            aes(label = "Most medals", hjust=0.5,vjust=-2.5), color = "red") + 
  geom_text(data=subset(GBR_historic, Host_city == "Yes"), aes(label = "Host city", hjust=.5,vjust=-1), color = "darkgreen") + 
  theme(axis.text.x=element_text(angle=45,hjust=1)) + ggtitle("Medals won by Great Britain at the Summer Olympics, 1896-2012") + 
  theme(plot.title = element_text(hjust = 0.5), legend.position = "none")

Hypothesis Testing: Methods and assumptions

\[H_0: \mu_1 = \mu_2 \]

\[H_A: \mu_1\ < \mu_2\] The same hypotheses will be used for the seven host countries from 1984 to 2012.

Great Britain

Great Britain hosted the Olympic Games in 1908, 1948 and 2012, which were held in London.

Great Britain’s samples were checked for normality using a boxplot, Q-Q Plot and histogram with overlay. After removing one extreme outlier (1908), the pre-2012 sample was near-normal. The pre-1908 and pre-1948 samples were non-normal and not analysed.

GBR_mean = mean(GBR_clean$n)
GBR_sd = sd(GBR_clean$n)
hist(GBR_clean$n, density=80, breaks=25,
     prob=TRUE)
lines(density(GBR_clean$n),
      lwd = 2, col = "orange")
curve(dnorm(x, mean=GBR_mean, sd=GBR_sd), 
col="darkblue", lwd=2, add=TRUE, yaxt="n")

Results: Great Britain

A lower, one-tailed, one-sample t-test to test Great Britain’s 1896-2008 medal tally against their 68 medals in 2012. A critical value approach was also applied for the data with 24 degrees of freedom.

Decision: Reject \(H_0\): \(\mu\) = 68 as \(p\)-value < .001 and 95% CI [0, 27.387] did not capture \(H_0\).

Conclusion: The estimated mean medal tally, based on the pre-host year sample, was \(\bar{x}\) = 23.32, 95% CI [0, 27.387]. The results of the one-sample t-test found that the medal tallies of previous years were significantly lower than the 2012 medal tally of 68, \(t\)(\(df\)=24) = -18.793, \(p\) < .001.

t.test(GBR_clean$n, mu = 68, 
       conf.level = .95, alternative = "less")
## 
##  One Sample t-test
## 
## data:  GBR_clean$n
## t = -18.793, df = 24, p-value = 3.659e-16
## alternative hypothesis: true mean is less than 68
## 95 percent confidence interval:
##      -Inf 27.38758
## sample estimates:
## mean of x 
##     23.32
qt(0.025, df = 25-1,
   lower.tail = FALSE)
## [1] 2.063899

Australia

Australia hosted the Olympic games in 2000 in Sydney and in 1956 in Melbourne.

Samples were checked for normality using a boxplot, Q-Q Plot and histogram with overlay. While pre-2000 and pre-1956 samples were non-normal, a sample of 1948-1996 was normal after the two Q-Q Plot outliers were removed.

Aus_mean_clean_1948_2000 = mean(Aus_clean_1948_2000$n)
Aus_sd_clean_1948_2000 = sd(Aus_clean_1948_2000$n)
hist(Aus_clean_1948_2000$n, density=80, 
     breaks=20, prob=TRUE)
lines(density(Aus_clean_1948_2000$n),
      lwd = 2, col = "orange")
curve(dnorm(x, mean=Aus_mean_clean_1948_2000, 
            sd=Aus_sd_clean_1948_2000), 
      col="darkblue", lwd=2, add=TRUE, yaxt="n")

Results: Australia

A lower, one-tailed, one-sample \(t\)-test to test Australia’s 1948-1996 sample against their 58 medals in 2000. A critical value approach was also applied for the sample with 11 degrees of freedom.

Decision: Reject \(H_0\): \(\mu\) = 58 as \(p\)-value < .001 and 95% CI [0, 22.121] did not capture \(H_0\).

Conclusion: The estimated mean medal tally, based on the pre-host year sample was \(\bar{x}\) = 17.75, 95% CI [0, 22.121]. The results of the one-sample \(t\)-test found that the medal tallies of previous years were significantly lower than the 2000 medal tally of 58, \(t\)(\(df\)=11) = -16.444, \(p\) < .001.

t.test(Aus_clean_1948_2000$n, mu = 58,
       conf.level = .95, alternative = "less")
## 
##  One Sample t-test
## 
## data:  Aus_clean_1948_2000$n
## t = -16.534, df = 11, p-value = 2.036e-09
## alternative hypothesis: true mean is less than 58
## 95 percent confidence interval:
##      -Inf 22.12184
## sample estimates:
## mean of x 
##     17.75
qt(0.025, df = 12-1,
   lower.tail = FALSE)
## [1] 2.200985

China & Greece

China’s sample only starts from 1984, whereas Greece has competed since 1896. Both samples tested as non-normal via boxplot, Q-Q Plot and a histogram with density curve and normal overlay.

China: Small sample, non-normal

Greece: Larger sample, non-normal

Both sample plots show an upward trend in the lead-up to their peak medal tally at their host games.

Spain & South Korea

Spain and South Korea’s samples were checked for normality using a boxplot, Q-Q Plot and histogram density curve with normal overlay. Neither were considered normal, and were not tested using a t-test or critical value approach.

Spain: Density plot vs normal curve:

Spain_mean = mean(Spain_clean$n)
Spain_sd = sd(Spain_clean$n)
hist(Spain_clean$n, density=80, breaks=20, prob=TRUE)
lines(density(Spain_clean$n), lwd = 2, col = "orange")
curve(dnorm(x, mean=Spain_mean, sd=Spain_sd), 
      col="darkblue", lwd=2, add=TRUE, yaxt="n")

South Korea: Boxplot & Q-Q Plot

## [1] 9 4

USA

USA samples from pre-1996, 1984 and 1932 were tested for normality via boxplot, Q-Q Plot and normal curve overlay, yielding non-normal results. A BoxCox transformation was also attempted, which resulting in non-normality.

boxcox_USA <- BoxCox(USA_pre_1996$n,lambda = "auto")

The USA achieved their highest medal tallies in 1904 and 1984, with a local maximum in 1932. However, it did not achieve a maximum in Atlanta 1996, suggesting this sample does not support the research hypothesis. A larger sample or a normal distribution is required to prove statistical significance.

Discussion: Results

Discussion: Context

References

1 Encyclopaedia Britannica 2020, Athens 1896, viewed 13 Oct 2020, https://www.britannica.com/event/Athens-1896-Olympic-Games.

2 Australian Olympic Committee 2020, Los Angeles 1984, viewed 13 October 2020, https://www.olympics.com.au/games/los-angeles-1984/.

3 Multiplex 2020, Stadium Australia - Sydney, viewed 13 October 2020, https://www.multiplex.global/projects/stadium-australia-sydney-australia/.

4 Tokyo 2020, Venues: Olympic Stadium, viewed 13 October 2020, https://tokyo2020.org/en/venues/olympic-stadium.

5 EuroNews 2016, Hosts with the most: why home advantage brings more Olympic medals, viewed 29 September 2020, https://www.britannica.com/event/Athens-1896-Olympic-Games.

6 IOC 2020, The Olympic Programme comprises sports, disciplines and events, viewed 17 October 2020, https://www.olympic.org/faq/sports-programme-and-results/the-olympic-programme-comprises-sports-disciplines-and-events-what-is-the-difference-between-the-three.

7 The Guardian 2016, Olympic Sports and Medals, 1896-2014, viewed 23 September 2020, https://www.kaggle.com/the-guardian/olympic-games.

8 Wikimedia Foundation 2020, List of ties for medals at the Olympics, viewed 1 October 2020, https://en.wikipedia.org/wiki/List_of_ties_for_medals_at_the_Olympics.

9 Wikimedia Foundation 2020, List of participating nations at the Summer Olympic Games, viewed 1 October 2020, https://en.wikipedia.org/wiki/List_of_participating_nations_at_the_Summer_Olympic_Games.

10 Council of Foreign Relations 2018, The economics of hosting the Olympic Games, viewed 11 October 2020, https://www.cfr.org/backgrounder/economics-hosting-olympic-games.