#Loading necessary libraries:
library(ggplot2)
library(knitr)
library(plyr)
library(tidyr)
library(ggvis)
library(stringr)
library(pander)
library(dplyr)
In order to tell an adequate story about the 2014 Winter Olympics, many different datasets needed to be discovered and tidied. The Olympic dataset was provided to me. Much of the economic, demographic, geographic, and weather data was more difficult to locate; the economic information came from the , the
Many of the datasets needed to be manipulated both in excel and then again in R. Some country names needed to be updated to keep consistency and to have a variable to base dataset merging on, the country names were all modified to match what was in the Olympics dataset.
Variables:
GDP - Came from World Bank Data GDP per capita is gross domestic product divided by midyear population. GDP is the sum of gross value added by all resident producers in the economy plus any product taxes and minus any subsidies not included in the value of the products. It is calculated without making deductions for depreciation of fabricated assets or for depletion and degradation of natural resources. Data are in current U.S. dollars. Data available was for 1990, 2000, and 2008 through 2017.
Total population - Came from World Bank Data and they are based on the de facto definition of population, which counts all residents regardless of legal status or citizenship. The values shown are midyear estimates. Data available is annual midyear estimates from 2000 through 2015.
Geographical Location Came from Internet World Stats and categorized the geographic location of each country that paritcipated in the 2014 Winter Olympics. However, it was easier to manually add this column into the Population dataset. As a caveat, the UK was considered a part of hte European Union in this region designation, which is fine for comparison because in 2014, and when that year’s Winter Olympics took place, they were still a part of the EU.
## [1] "C:/Users/Marissa.Valente/Documents/RStuff"
F_Data <- myfulldata
After the datasets have been given one column names in common, in this case “Country” they can be merged together into the dataset I will use to explore a number of questions and determine the story-line for the 2014 Winter Olympics. In order to help form a narrative I will pose a question to explore. After exploration of this question I will check the historical information on the Winter Olympics to either prove the answer to the following question, or disprove it, or conclude that more evidence is needed.
##Do countries with better economic status (higher GDP per capita) perform better in the Olympics?
dim(F_Data)
## [1] 295 10
F_Data <- as.data.frame(F_Data)
str(F_Data)
## 'data.frame': 295 obs. of 10 variables:
## $ Country : chr "Australia" "Australia" "Australia" "Austria" ...
## $ ï..Sport : chr "Freestyle Skiing" "Snowboard" "Freestyle Skiing" "Alpine Skiing" ...
## $ Event : chr "Ladies' Aerials" "Ladies' Halfpipe" "Men's Aerials" "Men's Slalom" ...
## $ Medal : chr "B" "S" "S" "S" ...
## $ Athlete.Team : chr "Lydia LASSILA" "Torah BRIGHT" "David MORRIS" "Marcel HIRSCHER" ...
## $ Date : chr "2/14/2014" "2/12/2014" "2/17/2014" "2/22/2014" ...
## $ Country.Code : chr "AUS" "AUS" "AUS" "AUT" ...
## $ X2014_GDP : num 62328 62328 62328 51705 51705 ...
## $ Geographic.Region: chr "Oceania" "Oceania" "Oceania" "European Union" ...
## $ X2014_POP : int 23504138 23504138 23504138 8546356 8546356 8546356 8546356 8546356 8546356 8546356 ...
GDP2014 <- as.integer(F_Data$GDP2014)
names(F_Data) <- c("Country", "Sport", "Event", "Medal", "AthleteTeam", "Date", "C_Code", "GDP2014", "Region", "Pop2014")
names(F_Data)
## [1] "Country" "Sport" "Event" "Medal" "AthleteTeam"
## [6] "Date" "C_Code" "GDP2014" "Region" "Pop2014"
F_Data %>% ggplot(aes(x=Country, y = GDP2014)) +
geom_point(color="red", size=3) + theme(axis.text.x = element_text(angle = 90, hjust = 1))+ylab("2014 GDP per capita")
As we can see from Figure 1, the top 3 countries based on GDP per capita are Norway, Switzerland, and Sweden. We will see if these countries outperform the bottom 3 countries, Ukraine, China, and Belarus when it comes to how many gold medals they earn.
F_Data %>% arrange(desc(GDP2014)) %>% slice(1:10) %>% select(Country, Medal, GDP2014, Pop2014, Region)
## # A tibble: 10 x 5
## Country Medal GDP2014 Pop2014 Region
## <chr> <chr> <dbl> <int> <chr>
## 1 Norway G 97199.92 5137232 Europe
## 2 Norway G 97199.92 5137232 Europe
## 3 Norway S 97199.92 5137232 Europe
## 4 Norway G 97199.92 5137232 Europe
## 5 Norway B 97199.92 5137232 Europe
## 6 Norway B 97199.92 5137232 Europe
## 7 Norway B 97199.92 5137232 Europe
## 8 Norway G 97199.92 5137232 Europe
## 9 Norway B 97199.92 5137232 Europe
## 10 Norway B 97199.92 5137232 Europe
Gold <- filter(F_Data, Medal=="G")
head(Gold)
## Country Sport Event Medal AthleteTeam
## 1 Austria Alpine Skiing Men's Slalom G Mario MATT
## 2 Austria Alpine Skiing Women's Super-G G Anna FENNINGER
## 3 Austria Snowboard Ladies' Parallel Slalom G Julia DUJMOVITS
## 4 Austria Alpine Skiing Men's Downhill G Matthias MAYER
## 5 Belarus Biathlon Women 12.5 km Mass Start G Darya DOMRACHEVA
## 6 Belarus Freestyle Skiing Ladies' Aerials G Alla TSUPER
## Date C_Code GDP2014 Region Pop2014
## 1 2/22/2014 AUT 51704.541 European Union 8546356
## 2 2/15/2014 AUT 51704.541 European Union 8546356
## 3 2/22/2014 AUT 51704.541 European Union 8546356
## 4 2/9/2014 AUT 51704.541 European Union 8546356
## 5 2/17/2014 BLR 8318.513 Europe 9474511
## 6 2/14/2014 BLR 8318.513 Europe 9474511
#Creating a dataframe that shows Country, Medal, GDP2014 all by the Gold Medals
freq_df <- aggregate(cbind(MedalsCount= Medal) ~Country,
data=Gold,
FUN=function(x){NROW(x)})
pander(freq_df)
| Country | MedalsCount |
|---|---|
| Austria | 4 |
| Belarus | 5 |
| Canada | 10 |
| China | 3 |
| Czech Republic | 2 |
| Finland | 1 |
| France | 4 |
| Germany | 8 |
| Great Britain | 1 |
| Japan | 1 |
| Korea | 3 |
| Netherlands | 8 |
| Norway | 11 |
| Poland | 4 |
| Russian Fed. | 13 |
| Slovakia | 1 |
| Slovenia | 2 |
| Sweden | 2 |
| Switzerland | 5 |
| Ukraine | 1 |
| United States | 9 |
bar <- ggplot(freq_df2, aes(Country, count))+
geom_bar(stat="identity", fill="#FF6666")+
theme(axis.text.x = element_text(angle = 90, hjust = 1))+
ylab("Gold Medals")
bar
Figure 2 shows us that the country with the most gold medals is Russia, then Norway and then Canada. If we were only evaluating the top three highest gdp per capita countries, we could see that there is a relationship between gold medals and gdp per capita for only one country, Norway. However, let’s look across both variables in one graph to see if there seems to be more of a relatioship than just one country.
GDP_Gold <- ggplot(F_Data, aes(x=Country, Medal, color=as.numeric(GDP2014)))+
geom_point(size=5)+
labs(x = "Country", y="Type of Medal", color="GDP per capita", fill="GDP")+
theme(axis.text.x = element_text(angle = 90, hjust = 1))
GDP_Gold
Figure 3 does not seem to provide a clear picture as to the relationship between GDP per capita and type of medal.
Gold_Thou <- mutate(Gold, NGDP2014 = GDP2014/1000)
head(Gold_Thou)
## Country Sport Event Medal AthleteTeam
## 1 Austria Alpine Skiing Men's Slalom G Mario MATT
## 2 Austria Alpine Skiing Women's Super-G G Anna FENNINGER
## 3 Austria Snowboard Ladies' Parallel Slalom G Julia DUJMOVITS
## 4 Austria Alpine Skiing Men's Downhill G Matthias MAYER
## 5 Belarus Biathlon Women 12.5 km Mass Start G Darya DOMRACHEVA
## 6 Belarus Freestyle Skiing Ladies' Aerials G Alla TSUPER
## Date C_Code GDP2014 Region Pop2014 NGDP2014
## 1 2/22/2014 AUT 51704.541 European Union 8546356 51.704541
## 2 2/15/2014 AUT 51704.541 European Union 8546356 51.704541
## 3 2/22/2014 AUT 51704.541 European Union 8546356 51.704541
## 4 2/9/2014 AUT 51704.541 European Union 8546356 51.704541
## 5 2/17/2014 BLR 8318.513 Europe 9474511 8.318513
## 6 2/14/2014 BLR 8318.513 Europe 9474511 8.318513
#aggregating to have one GDP value per country and one aggregated gold medal value
Combined <- Gold_Thou %>% group_by(Country) %>% select(Country, Medal, Region, NGDP2014)
Use <- Combined %>% distinct(NGDP2014)
GDP_Gold_df <- inner_join(Use,freq_df,by="Country")
GG_Final <- ggplot(GDP_Gold_df, aes(Country, NGDP2014))+
geom_point()+
labs(x = "Country", y="2014 GDP per capita\n(in thousands)", color="MedalsCount", fill="MedalsCount")+
theme(text = element_text(size=7),
axis.text.x = element_text(angle=90, hjust=1))+
geom_point(aes(color="", y=MedalsCount), size=3)+
geom_text(aes(label=MedalsCount, y = MedalsCount),vjust=0,color="black", size=2)
GG_Final2 <- GG_Final + scale_y_continuous(sec.axis=sec_axis(~.*1, name="Number of Gold Medals"), limits=c(0,110))
GG_Final2
Figure 4 shows us that as GDP increases, the number of gold medals doesn’t have a clear pattern that follows the trend up or deviates down. It does not seem that there is a relationship between GDP and gold medals. However, this is just GDP per capita for one year compared to one year’s worth of Winter Olympics gold medals. In order to get a clearer idea if a relationship exists we will need to look at historical data to see if any trends become clear.
Before importing the Historical Olympics file, some data tidying was done in excel to include the country column as well as limiting the type of medal to gold. Additionally, the 2010 Winter Olympics medal by country information came from Wikipedia and was formatted into excel and included in the Historical Olympics data file. As a caveat, Ukraine did participate at the 2010 Winter Olympics, they just did not win any medals, because of this we will not use 2010 as a data source. Lastly, since the World Bank doesn’t provide GDP per capita for all countries and all years that the Winter Olympics took place, another data source was used Comstat Datahub.For simplicity all the dollar amounts for the historical comparisons across 1998, 2002, and 2014 are in current ($2018) U.S. dollars.
H_Data <- as.data.frame(H_Data)
dim(H_Data)
## [1] 107 9
H_data2 <- filter(H_Data, Year <="2002" )
H_data2 <- as.data.frame(H_data2)
dim(H_data2)
## [1] 107 9
H_data2 %>% filter(Year == "1998") %>% ggplot(aes(x=Country, y = GDP)) +
geom_point(color="red", size=3) + theme(axis.text.x = element_text(angle = 90, hjust = 1))+ylab("1998 GDP per capita")
H_data2 %>% filter(Year == "2002") %>% ggplot(aes(x=Country, y = GDP)) +
geom_point(color="red", size=3) + theme(axis.text.x = element_text(angle = 90, hjust = 1))+ylab("2002 GDP per capita")
#creating a dataframe for just the country and gold medal counts - for both 1998 and 2002
H_data2_df <- aggregate(cbind(MedalsCount= Medal) ~Country,
data=H_data2,
FUN=function(x){NROW(x)})
pander(H_data2_df)
| Country | MedalsCount |
|---|---|
| Australia | 2 |
| Austria | 6 |
| Canada | 13 |
| China | 2 |
| Czech Republic | 2 |
| Finland | 6 |
| France | 6 |
| Great Britain | 1 |
| Italy | 6 |
| Japan | 5 |
| Korea | 5 |
| Norway | 23 |
| Russian Fed. | 14 |
| United States | 16 |
#making a year dataframe for 1998
Y1998 <-filter(H_Data, Year=="1998")
head(Y1998)
## Year Sport Discipline NOC Event Event.gender Medal
## 1 1998 Skiing Alpine Skiing AUT alpine combined M Gold
## 2 1998 Skiing Alpine Skiing AUT giant slalom M Gold
## 3 1998 Skiing Alpine Skiing AUT super-G M Gold
## 4 1998 Bobsleigh Bobsleigh CAN two-man M Gold
## 5 1998 Curling Curling CAN curling W Gold
## 6 1998 Skating Short Track S. CAN 5000m relay M Gold
## Country GDP
## 1 Austria 27399.1
## 2 Austria 27399.1
## 3 Austria 27399.1
## 4 Canada 20974.4
## 5 Canada 20974.4
## 6 Canada 20974.4
#Creating a dataframe that shows Country, Medal, GDP all by the Gold Medals for 1998
h_gold98_df <- aggregate(cbind(MedalsCount= Medal) ~Country,
data=Y1998,
FUN=function(x){NROW(x)})
pander(h_gold98_df)
| Country | MedalsCount |
|---|---|
| Austria | 3 |
| Canada | 6 |
| Czech Republic | 1 |
| Finland | 2 |
| France | 2 |
| Italy | 2 |
| Japan | 5 |
| Korea | 3 |
| Norway | 10 |
| Russian Fed. | 9 |
| United States | 6 |
#making a year dataframe for 2002
Y2002 <-filter(H_Data, Year=="2002")
head(Y2002)
## Year Sport Discipline NOC Event Event.gender Medal
## 1 2002 Skating Short Track S. AUS 1000m M Gold
## 2 2002 Skiing Freestyle Ski. AUS aerials W Gold
## 3 2002 Skiing Alpine Skiing AUT downhill M Gold
## 4 2002 Skiing Alpine Skiing AUT giant slalom M Gold
## 5 2002 Skiing Cross Country S AUT 30km mass start M Gold
## 6 2002 Ice Hockey Ice Hockey CAN ice hockey M Gold
## Country GDP
## 1 Australia 21635.9
## 2 Australia 21635.9
## 3 Austria 26482.8
## 4 Austria 26482.8
## 5 Austria 26482.8
## 6 Canada 24208.8
#Creating a dataframe that shows Country, Medal, GDP all by the Gold Medals for 2002
h_gold02_df <- aggregate(cbind(MedalsCount= Medal) ~Country,
data=Y2002,
FUN=function(x){NROW(x)})
pander(h_gold02_df)
| Country | MedalsCount |
|---|---|
| Australia | 2 |
| Austria | 3 |
| Canada | 7 |
| China | 2 |
| Czech Republic | 1 |
| Finland | 4 |
| France | 4 |
| Great Britain | 1 |
| Italy | 4 |
| Korea | 2 |
| Norway | 13 |
| Russian Fed. | 5 |
| United States | 10 |
year <- H_data2_df$Year
bar3 <- ggplot(h_gold98_df, aes(Country, MedalsCount))+
geom_bar(stat="identity", fill="#FF6666")+
theme(axis.text.x = element_text(angle = 90, hjust = 1))+
ylab("Gold Medals")
bar3
bar4 <- ggplot(h_gold02_df, aes(Country, MedalsCount))+
geom_bar(stat="identity", fill="#FF6666")+
theme(axis.text.x = element_text(angle = 90, hjust = 1))+
ylab("Gold Medals")
bar4
Combined3 <- Y1998 %>% group_by(Country) %>% select(Medal, Country, GDP)
Use3 <- Combined3 %>% distinct(GDP)
head(Use3)
## # A tibble: 6 x 2
## # Groups: Country [6]
## Country GDP
## <chr> <dbl>
## 1 Austria 27399.1
## 2 Canada 20974.4
## 3 Czech Republic 6453.4
## 4 Finland 26054.2
## 5 France 25949.8
## 6 Italy 22282.2
h_gold98_df2 <- inner_join(Use3,h_gold98_df,by="Country")
head(h_gold98_df2)
## # A tibble: 6 x 3
## # Groups: Country [6]
## Country GDP MedalsCount
## <chr> <dbl> <int>
## 1 Austria 27399.1 3
## 2 Canada 20974.4 6
## 3 Czech Republic 6453.4 1
## 4 Finland 26054.2 2
## 5 France 25949.8 2
## 6 Italy 22282.2 2
Thou_98 <- mutate(h_gold98_df2, GDP = GDP/1000)
head(Thou_98)
## # A tibble: 6 x 3
## # Groups: Country [6]
## Country GDP MedalsCount
## <chr> <dbl> <int>
## 1 Austria 27.3991 3
## 2 Canada 20.9744 6
## 3 Czech Republic 6.4534 1
## 4 Finland 26.0542 2
## 5 France 25.9498 2
## 6 Italy 22.2822 2
Gold_98 <- ggplot(Thou_98, aes(Country, GDP))+
geom_point()+
labs(x = "Country", y="1998 GDP per capita\n(in thousands)", color="Gold Medals", fill="MedalsCount")+
theme(text = element_text(size=7),
axis.text.x = element_text(angle=90, hjust=1))+
geom_point(aes(color="", y=MedalsCount), size=5)+
geom_text(aes(label=MedalsCount, y = MedalsCount),vjust=0,color="black", size=2)
Gold_98_2 <- Gold_98 + scale_y_continuous(sec.axis=sec_axis(~.*1, name="Number of Gold Medals"), limits=c(0,60))
Gold_98_2
Figure 9 shows us that in 1998, there didn’t seem to be a strong relationship between GDP per capita and the number of gold medals a country received. Although, just like in 2014 Norway and Russia are the top gold medal winners and Norway has the highest GDP per capita. Although the data appears to follow a similar up and down trend the relationship doesn’t seem strong.
Gold_98_3 <- ggplot(Thou_98, aes(x=GDP, y=MedalsCount))+
geom_point()+
labs(x = "GDP per capita\n(in thousands)", y="Number of Gold Medals", color="", fill="MedalsCount")+
ylim(0,15)
Gold_98_3
After revising Figure 9, to remove the country label/axis, it becomes more clear that there is a relationship, although it does not seem like a strong relationship, between GDP per capita and the number of gold medals a country won in the 1998 Winter Olympics. Especially after a certain level of GDP per capita, around $26,000.
Combined4 <- Y2002 %>% group_by(Country) %>% select(Medal, Country, GDP)
Use4 <- Combined3 %>% distinct(GDP)
head(Use3)
## # A tibble: 6 x 2
## # Groups: Country [6]
## Country GDP
## <chr> <dbl>
## 1 Austria 27399.1
## 2 Canada 20974.4
## 3 Czech Republic 6453.4
## 4 Finland 26054.2
## 5 France 25949.8
## 6 Italy 22282.2
h_gold02_df2 <- inner_join(Use4,h_gold02_df,by="Country")
head(h_gold02_df2)
## # A tibble: 6 x 3
## # Groups: Country [6]
## Country GDP MedalsCount
## <chr> <dbl> <int>
## 1 Austria 27399.1 3
## 2 Canada 20974.4 7
## 3 Czech Republic 6453.4 1
## 4 Finland 26054.2 4
## 5 France 25949.8 4
## 6 Italy 22282.2 4
Thou_02 <- mutate(h_gold02_df2, GDP = GDP/1000)
head(Thou_02)
## # A tibble: 6 x 3
## # Groups: Country [6]
## Country GDP MedalsCount
## <chr> <dbl> <int>
## 1 Austria 27.3991 3
## 2 Canada 20.9744 7
## 3 Czech Republic 6.4534 1
## 4 Finland 26.0542 4
## 5 France 25.9498 4
## 6 Italy 22.2822 4
Gold_02 <- ggplot(Thou_02, aes(Country, GDP))+
geom_point()+
labs(x = "Country", y="2002 GDP per capita\n(in thousands)", color="Gold Medals", fill="MedalsCount")+
theme(text = element_text(size=7),
axis.text.x = element_text(angle=90, hjust=1))+
geom_point(aes(color="", y=MedalsCount), size=5)+
geom_text(aes(label=MedalsCount, y = MedalsCount),vjust=0,color="black", size=2)
Gold_02_2 <- Gold_02 + scale_y_continuous(sec.axis=sec_axis(~.*1, name="Number of Gold Medals"), limits=c(0,60))
Gold_02_2
Figure 10, which shows Winter Olympics 2002 gold medals by country and GDP per capita for those countries, allows us to see that there is once again a similarity in the pattern of data points going up and down as GDP per capita goes down the number of gold medals seems to go down as well, but this relatinship is not a strong one.
Gold_02_3 <- ggplot(Thou_02, aes(x=GDP, y=MedalsCount))+
geom_point()+
labs(x = "GDP per capita\n(in thousands)", y="Number of Gold Medals", color="", fill="MedalsCount")+
ylim(0,15)
Gold_02_3
Similarly to the 1998 data, after revising Figure 10, to remove the country labels/axis, it becomes much more clear that there is a relationship, although again this relationship does not seem strong based on the data used between GDP per capita and the number of gold medals a country won in the 2002 Winter Olympics.
Overall, the conclusion for looking at GDP per capita and the number of gold medals a country received for the Winter Olympics for 1998, 2002, and 2014 was inconclusive. If we had to use just the data explored in this assignment it indicates there is not a strong relationship between country GDP per capita and the number of gold medals a country takes home in the Winter Olympics. As always, it would be ideal to have more data for more years to look at the long-term trends to see if there is ever a relationship between these two variables. Additionally, maybe keeping all medals and not just gold would help develop the broader picture of if GDP per capita can be an indicator of winning countries in the Winter Olympics. There is likely another variable that may be able to tell a better story about what drives a country’s success in the Winter Olympic games, perhaps climate or geographic location would be of interest to see if athletes from countries with colder climates would outperform athletes from countries with warmer climates.
Even after revisions, my prior conclusion above remains.