Introduction

Data Pre-processing and Exploration

The first step is to bring in our 2014 dataset and look at the medal counts.

#Read in the file
olympic2014 <- read_excel("OlympicMedalsbyEvent2014.xlsx")

#Add a column for total medal count and group by country
medal_count_2014 <- olympic2014 %>% group_by(Country) %>% mutate(medal_count = 1) %>% summarize(total_2014_medals = sum(medal_count))

#Plot the medal count
ggplot(medal_count_2014, aes(x = reorder(Country, total_2014_medals), y = total_2014_medals)) + 
  geom_bar(stat="identity") + 
  coord_flip() +
   ggtitle("Winter Olympics 2014 Total Medals Won") +
  xlab("Country") + 
  ylab("Total Medals Won") +
  theme_few()

Here we can see the distribution of total medals. Next we will compare the medal distribution against three different country specific variables: latitude, gross domestic product (GDP) per capita, and average annual temperature.

  • Latitude-the angular distance of a place north or south of the earth’s equator, or of a celestial object north or south of the celestial equator, usually expressed in degrees and minutes.

  • GDP per Capita-This is the “average” output of the economy per person measured in a base year prices. This ratio is often used as a measure of standard of living in comparisons over time of one country, or between different countries when measured in the same currency.

  • Average Temperature- mean monthly and annual temperatures by country for the period 1961-1999. Values are in degrees Celsius.

The first variable to bring in and compare are the countries’ latitudes.

#Read in file
countries <- read.csv("countries.csv")

#Change column name for joining to original dataset
colnames(countries)[colnames(countries)=="name"] <- "Country"

#Create new dataset combining latitude and medals won
medal_loc_2014 <- inner_join(medal_count_2014, countries, by = "Country") %>% select(country, Country, total_2014_medals, latitude)

#Plot latitude and medals won
ggplot(medal_loc_2014, aes(x = latitude, y = total_2014_medals)) + 
  geom_point(size = 5, alpha = 0.5) +
  ggtitle("Country Latitude vs Number of 2014 Medals") +
  xlab("Latitude (degrees)") + 
  ylab("Total Medals Won") +
  theme_few()

Australia makes this plot look off since it’s latitude is so much different from the other countries. However, focusing on the other countries, we can see that latitude does not appear to strongly correlate with number of medals won. We will next check on how GDP per capita corrlates with our medal data.

#Bring in data file
gdp <- read.csv("UNdata_Export_GDP_per_Capita.csv")

#Filter for only 2014 values
gdp_2014 <- gdp %>% filter(Year == 2014) 

#Change column name to join to original data set.
colnames(gdp_2014)[colnames(gdp_2014)=="Country.or.Area"] <- "Country"

#Join the datasets
medal_gdp_2014 <- left_join(medal_count_2014, gdp_2014, by = "Country") %>% select(Country, total_2014_medals, Value)

#Plot the GDP/Capita compared to medals
ggplot(medal_gdp_2014, aes(x = Value, y = total_2014_medals)) + 
  geom_point(size = 5, alpha = 0.5) +
  geom_smooth(method = "lm") +
  ggtitle("2014 GDP/Capita vs Number of 2014 Medals") +
  xlab("GDP/Capita ($/person)") + 
  ylab("Total Medals Won") +
  theme_few()

Here we can see with the added smoothing that there is definitely a trend, as countries with a higher GDP/capita tended to win more medals. This makes intuitive sense since presumably having a stronger economy would allow for more money to invest in training atheletes. The last variable to compare is average annual temperature.

#Bring in temperature dataset
avg_temp <- read_xls("cckp_historical_data_0.xls", sheet = "Country_temperatureCRU") 

#The tempareture data uses ISO country codes, bring in dataset to map ISO code to country name
country_codes <- read.csv("country code list.csv")

#Change column name for joining tables
colnames(country_codes)[colnames(country_codes)=="ISO.ALPHA.3.Code"] <- "ISO_3DIGIT"

#Join to country name mapping data set to include country name in temperature dataset
avg_temp <- left_join(avg_temp, country_codes, by = "ISO_3DIGIT")

#Change column name to join to original dataset
colnames(avg_temp)[colnames(avg_temp)=="Country.or.Area.Name"] <- "Country"

#Join temperature data to medal data
medal_temp_2014 <- inner_join(medal_count_2014, avg_temp, by = "Country") %>% select(Country, total_2014_medals, Annual_temp)

#Plot medals copmpared to average tempareture
ggplot(medal_temp_2014, aes(x = Annual_temp, y = total_2014_medals)) +
  geom_point(size = 5, alpha = 0.5) +
  geom_smooth(method = "lm") +
  ggtitle("Average Temperature vs Number of 2014 Medals") +
  xlab("*Avg Temp (degrees C)") + 
  ylab("Total Medals Won") +
  labs(caption = "*Average temperature of timeframe 1961 through 1999") +
  theme_few()

Here we can see that there is a corrlation between average temperature and medals won. Overall countries with lower averge temperatures won more medals in 2014, this is expected as winter events would be harder to train for in warmer places.

Conclusion

There were some challenges I had with this project, the first of which were around finding data that was applicable to all or most of the medal winning countries and data that was available for the right timefraemes. Restrictions around the data also dictated some of my analyis. For example, the tempearture data was an average of temperatures from 1961 to 1999, so I couldn’t stray too far from that range. The GDP data also started in 1970, so I could not go back farther than that.

The data cleaning and combining was also very time consuming, espcially when some data has ISO codes while others used country name. Country names also change, which caused some headaches when tying all three variables (medals, GDP/capita, and avg temp) to the same country. The GDP data also only had more recent country names, so for example I couldn’t go back to look at the USSR or East and West Germany since I did not have GDP data for those entities. Data that would have been convienient would be more specific temperature data by country and GDP data that went back farther and also covered older country names.