This report will investigate whether certain factors determine a country’s success in the Summer Olympics. The “Summer Olympics Medals (1976-2008)” data set by Divyansh Agrawal, retrieved from the kaggle website, provides us a good starting point. I have wrangled this data set to reduce it to the number of medals by country and by year. We consider all Summer Olympic games from 1992-2008. Note that only countries who won at least one medal are included in this set. Plot 1 gives us a quick overview of the total number of Summer Olympic medals won by each country over this time period. At first sight, the most successful countries seem to be large countries and countries in richer parts of the world. The dominance of the United States also stands out.
Plot 1. Total number of Olympic medals won on Summer Olympic games from 1992-2008 by country plotted on a world map.
It seems like there could be some interesting relation and we will investigate this further. I chose three variables that intuitively seem like possible explanations for Olympic success. First, the wealth of a country will probably play a role, I will consider Gross Domestic Product (GDP) for this. A second related variable is the development of a country, therefore I will include the Human Development Index (HDI). Finally, regardless of income and development, absolute population numbers are likely to play a role. It is simply more easy to find exceptional athletes when you have a lot more people.
I have extended the data set with these three variables. The GDP and population data originate from a kaggle data set by Greesh Magirish, which extracted the data from the WorldBank website. Finally, the HDI index values were obtained from the United Nations Development Programme website. The HDI data is only available starting from 1990. This is why we only consider the Summer Olympic Games starting from 1992. There are a few missing values in the HDI index but we will ignore these. We have a total of 365 observations.
Table 1 shows an overview of our variables and a short description where needed.
| Variable | Description |
|---|---|
| Country | - |
| Year | Every 4 years between 1992 and 2008 |
| Number of medals | A country is only included if it won at least one medal |
| GDP | in current USD |
| Population | Total population of the country |
| HDI | Human Development Index, between 0 and 1. It is the geometric mean of indices for health, knowledge and standard of living. |
We start by taking a look at the relation between GDP and Olympic success. The interactive plot 2 shows us the GDP of each country against the number of Olympic medals won in that year. We can use the slider to switch between years and hover over points to see more detailed information. Note that some vertical jittering is applied because the number of medals is a count. This is the reason why some countries seem to have won less than one medal. Due to high and low numbers in both medals and GDP, both axis are adapted to a logarithmic scale. In every year, we see a clear upward relation between GDP and medals. However, variation is quite high as well. There seem to be some countries that perform under what you might expect based on their GDP.
The size of the points indicate the population in these countries. It becomes clear that countries that under perform based on their GDP, are usually small countries (in terms of population), for example Israel, Belgium or Finland. An important exception is India, its number of Olympic medals is a lot lower compared to what you would expect based on GDP and population. In contrast, there are also some countries that outperform despite their small population size.
Plot 2. GDP plotted against number of medals won for each country. The slider allows to switch between different years. The size of the points indicate how large the population of the country is. Both axis are on a logarithmic scale and a little vertical jittering is applied.
I now move on to development of a country, maybe this can explain the case of India. Plot 3 shows the Human Development Index against the number of medals. Again, the size of the points illustrate population and the y-axis is on a logarithmic scale. The relation is less clear but it does seem that highly developed countries usually get more medals. However, this is clearly not always the case, again mostly for countries with low population.
In this relation, the case of India is a bit more clear. While they do have a large population and a high GDP, they score quite low on the HDI, which may explain why their success is lower. Another interesting case is China, which clearly outperforms compared to what you would expect based on their HDI.
Finally, in both this and the previous graph, the United States stand out. In the world map it was clear that the USA is a great, if not the best performer in the Olympics. This is not so surprising now that we know that it also stands out in terms of GDP, HDI and population size.
Plot 3. The Human Development Index (HDI) plotted against the number of medals won by each country, by year. The size of the points indicate the population size. The y-axis is on a logarithmic scale and a little vertical jittering is applied.
In this final section, I return back to plot 2 and will take a look at the complete data set (all the years together). Plot 4 shows us the four variables: number of medals, GDP, HDI and population size. We see that HDI is not a very clear determinant, however the upper right part seems to be dominated by darker spots.
Finally, we fit two very simple linear models on the data. A first is a simple linear regression with GDP as the predictor. This is the red line on the plot. It seems to capture some of the trend in the data. Note that the associated R-squared value is equal to 0.565. Next, a quadratic function of GDP is fitted on the data (the blue line). This model seems to capture the relation a little bit better, especially at the higher end of GDP. The associated R-squared value is 0.578.
Plot 4. GDP against number of medals won by each country in the period 1992-2008. The size of the points indicate population size, the color indicates the HDI. Both axis are on a logarithmic scale and a little vertical jittering is applied. The red line is a simple linear regression model fitted on the data with its 95 percent confidence intervals. The blue line is the same however this is a quadratic function of GDP fitted on the data.
We can conclude that a country’s wealth, population and development clearly have an impact on a country’s success in the Summer Olympics. This is not a very surprising result. The USA is a prime example of this.
However, we have also discovered some interesting facts. For example, India has theoretically a great potential in the Olympics based on their population size, but it under performs in reality. The Human Development Index is one explanation to this phenomenon. Compared to other big countries, India scores quite low on this index. It is intuitive that a country might not be very pre-occupied with investing in athletes, when it is still developing in other areas. Note also that India’s most popular sport, cricket, is not in the Olympics. There are many more interesting countries to study in more detail.
Finally, we have fitted a very basic model on the data, using only GDP as a variable. This already gave some good results and it would be very interesting to expand this research, by including more variables and applying more refined models. Some important factors have been ignored in this report for simplicity. For example, a good econometric model should take the count nature of the medals data into account. Also the choice of variables should be further investigated, for example GDP per capita might be more telling than GDP and finally, there are probably other important determinants in a country’s Olympic Success.
Divyansh Agrawal. 2020; January. Summer Olympics Medals (1976-2008), Version 2. Retrieved October 21, 2021 from https://www.kaggle.com/divyansh22/summer-olympics-medals.
Greesh Magirish. 2019; December. WorldBank Data on GDP, Population and Military, Version 2. Retrieved October 21, 2021 from https://www.kaggle.com/greeshmagirish/worldbank-data-on-gdp-population-and-military/metadata.
United Nations Development Programme. No date. Human Development Reports - Human Development Index (HDI). Retrieved October 27, 2021 from http://hdr.undp.org/en/indicators/137506#.