#Loading necessary libraries:
library(ggplot2)
library(knitr)

library(plyr)
library(tidyr)
library(ggvis)
library(stringr)
library(pander)
library(dplyr)

Winter Olympics 2014 a Look at Who Wins More Gold Medals

In order to tell an adequate story about the 2014 Winter Olympics, many different datasets needed to be discovered and tidied. The Olympic dataset was provided to me. Much of the economic, demographic, geographic, and weather data was more difficult to locate; the economic information came from the , the

Many of the datasets needed to be manipulated both in excel and then again in R. Some country names needed to be updated to keep consistency and to have a variable to base dataset merging on, the country names were all modified to match what was in the Olympics dataset.

Variables:

GDP - Came from World Bank Data GDP per capita is gross domestic product divided by midyear population. GDP is the sum of gross value added by all resident producers in the economy plus any product taxes and minus any subsidies not included in the value of the products. It is calculated without making deductions for depreciation of fabricated assets or for depletion and degradation of natural resources. Data are in current U.S. dollars. Data available was for 1990, 2000, and 2008 through 2017.

Total population - Came from World Bank Data and they are based on the de facto definition of population, which counts all residents regardless of legal status or citizenship. The values shown are midyear estimates. Data available is annual midyear estimates from 2000 through 2015.

Geographical Location Came from Internet World Stats and categorized the geographic location of each country that paritcipated in the 2014 Winter Olympics. However, it was easier to manually add this column into the Population dataset. As a caveat, the UK was considered a part of hte European Union in this region designation, which is fine for comparison because in 2014, and when that year’s Winter Olympics took place, they were still a part of the EU.

## [1] "C:/Users/Marissa.Valente/Documents/RStuff"
F_Data <- myfulldata

After the datasets have been given one column names in common, in this case “Country” they can be merged together into the dataset I will use to explore a number of questions and determine the story-line for the 2014 Winter Olympics. In order to help form a narrative I will pose a question to explore. After exploration of this question I will check the historical information on the Winter Olympics to either prove the answer to the following question, or disprove it, or conclude that more evidence is needed.

##Do countries with better economic status (higher GDP per capita) perform better in the Olympics?

dim(F_Data)
## [1] 295  10
F_Data <- as.data.frame(F_Data)
str(F_Data)
## 'data.frame':    295 obs. of  10 variables:
##  $ Country          : chr  "Australia" "Australia" "Australia" "Austria" ...
##  $ ï..Sport         : chr  "Freestyle Skiing" "Snowboard" "Freestyle Skiing" "Alpine Skiing" ...
##  $ Event            : chr  "Ladies' Aerials" "Ladies' Halfpipe" "Men's Aerials" "Men's Slalom" ...
##  $ Medal            : chr  "B" "S" "S" "S" ...
##  $ Athlete.Team     : chr  "Lydia LASSILA" "Torah BRIGHT" "David MORRIS" "Marcel HIRSCHER" ...
##  $ Date             : chr  "2/14/2014" "2/12/2014" "2/17/2014" "2/22/2014" ...
##  $ Country.Code     : chr  "AUS" "AUS" "AUS" "AUT" ...
##  $ X2014_GDP        : num  62328 62328 62328 51705 51705 ...
##  $ Geographic.Region: chr  "Oceania" "Oceania" "Oceania" "European Union" ...
##  $ X2014_POP        : int  23504138 23504138 23504138 8546356 8546356 8546356 8546356 8546356 8546356 8546356 ...
GDP2014 <- as.integer(F_Data$GDP2014)
names(F_Data) <- c("Country", "Sport", "Event", "Medal", "AthleteTeam", "Date", "C_Code", "GDP2014", "Region", "Pop2014")
names(F_Data)
##  [1] "Country"     "Sport"       "Event"       "Medal"       "AthleteTeam"
##  [6] "Date"        "C_Code"      "GDP2014"     "Region"      "Pop2014"

Question to explore: Do countries with better economics status win more medals in the Olympics?

Figure 1. 2014 GDP per Capita by Country

F_Data %>% ggplot(aes(x=Country, y = GDP2014)) +
  geom_point(color="red", size=3) + theme(axis.text.x = element_text(angle = 90, hjust = 1))+ylab("2014 GDP per capita") 

As we can see from Figure 1, the top 3 countries based on GDP per capita are Norway, Switzerland, and Sweden. We will see if these countries outperform the bottom 3 countries, Ukraine, China, and Belarus when it comes to how many gold medals they earn.

F_Data %>% arrange(desc(GDP2014)) %>% slice(1:10) %>% select(Country, Medal, GDP2014, Pop2014, Region)
## # A tibble: 10 x 5
##    Country Medal  GDP2014 Pop2014 Region
##      <chr> <chr>    <dbl>   <int>  <chr>
##  1  Norway     G 97199.92 5137232 Europe
##  2  Norway     G 97199.92 5137232 Europe
##  3  Norway     S 97199.92 5137232 Europe
##  4  Norway     G 97199.92 5137232 Europe
##  5  Norway     B 97199.92 5137232 Europe
##  6  Norway     B 97199.92 5137232 Europe
##  7  Norway     B 97199.92 5137232 Europe
##  8  Norway     G 97199.92 5137232 Europe
##  9  Norway     B 97199.92 5137232 Europe
## 10  Norway     B 97199.92 5137232 Europe
Gold <- filter(F_Data, Medal=="G")
head(Gold)
##   Country            Sport                    Event Medal      AthleteTeam
## 1 Austria    Alpine Skiing             Men's Slalom     G       Mario MATT
## 2 Austria    Alpine Skiing          Women's Super-G     G   Anna FENNINGER
## 3 Austria        Snowboard  Ladies' Parallel Slalom     G  Julia DUJMOVITS
## 4 Austria    Alpine Skiing           Men's Downhill     G   Matthias MAYER
## 5 Belarus         Biathlon Women 12.5 km Mass Start     G Darya DOMRACHEVA
## 6 Belarus Freestyle Skiing          Ladies' Aerials     G      Alla TSUPER
##        Date C_Code   GDP2014         Region Pop2014
## 1 2/22/2014    AUT 51704.541 European Union 8546356
## 2 2/15/2014    AUT 51704.541 European Union 8546356
## 3 2/22/2014    AUT 51704.541 European Union 8546356
## 4  2/9/2014    AUT 51704.541 European Union 8546356
## 5 2/17/2014    BLR  8318.513         Europe 9474511
## 6 2/14/2014    BLR  8318.513         Europe 9474511
#Creating a dataframe that shows Country, Medal, GDP2014 all by the Gold Medals
freq_df <- aggregate(cbind(MedalsCount= Medal) ~Country,
                     data=Gold,
                     FUN=function(x){NROW(x)})
pander(freq_df)
Country MedalsCount
Austria 4
Belarus 5
Canada 10
China 3
Czech Republic 2
Finland 1
France 4
Germany 8
Great Britain 1
Japan 1
Korea 3
Netherlands 8
Norway 11
Poland 4
Russian Fed. 13
Slovakia 1
Slovenia 2
Sweden 2
Switzerland 5
Ukraine 1
United States 9

Figure 2. Gold Medals by Country

bar <- ggplot(freq_df2, aes(Country, count))+
geom_bar(stat="identity", fill="#FF6666")+
 theme(axis.text.x = element_text(angle = 90, hjust = 1))+
   ylab("Gold Medals") 
bar

Figure 2 shows us that the country with the most gold medals is Russia, then Norway and then Canada. If we were only evaluating the top three highest gdp per capita countries, we could see that there is a relationship between gold medals and gdp per capita for only one country, Norway. However, let’s look across both variables in one graph to see if there seems to be more of a relatioship than just one country.

Figure 3. GDP per capita by Type of Medal and Country

GDP_Gold <- ggplot(F_Data, aes(x=Country, Medal, color=as.numeric(GDP2014)))+
  geom_point(size=5)+
  labs(x = "Country", y="Type of Medal", color="GDP per capita", fill="GDP")+   
  theme(axis.text.x = element_text(angle = 90, hjust = 1))

GDP_Gold

Figure 3 does not seem to provide a clear picture as to the relationship between GDP per capita and type of medal.

Gold_Thou <- mutate(Gold, NGDP2014 = GDP2014/1000)
head(Gold_Thou)
##   Country            Sport                    Event Medal      AthleteTeam
## 1 Austria    Alpine Skiing             Men's Slalom     G       Mario MATT
## 2 Austria    Alpine Skiing          Women's Super-G     G   Anna FENNINGER
## 3 Austria        Snowboard  Ladies' Parallel Slalom     G  Julia DUJMOVITS
## 4 Austria    Alpine Skiing           Men's Downhill     G   Matthias MAYER
## 5 Belarus         Biathlon Women 12.5 km Mass Start     G Darya DOMRACHEVA
## 6 Belarus Freestyle Skiing          Ladies' Aerials     G      Alla TSUPER
##        Date C_Code   GDP2014         Region Pop2014  NGDP2014
## 1 2/22/2014    AUT 51704.541 European Union 8546356 51.704541
## 2 2/15/2014    AUT 51704.541 European Union 8546356 51.704541
## 3 2/22/2014    AUT 51704.541 European Union 8546356 51.704541
## 4  2/9/2014    AUT 51704.541 European Union 8546356 51.704541
## 5 2/17/2014    BLR  8318.513         Europe 9474511  8.318513
## 6 2/14/2014    BLR  8318.513         Europe 9474511  8.318513
#aggregating to have one GDP value per country and one aggregated gold medal value

Combined <- Gold_Thou %>% group_by(Country) %>% select(Country, Medal, Region, NGDP2014)

Use <- Combined %>% distinct(NGDP2014)
GDP_Gold_df <- inner_join(Use,freq_df,by="Country")

Figure 4. Number of Gold Medals by 2014 GDP per Capita

GG_Final <- ggplot(GDP_Gold_df, aes(Country, NGDP2014))+
  geom_point()+
  labs(x = "Country", y="2014 GDP per capita\n(in thousands)", color="MedalsCount", fill="MedalsCount")+   
  theme(text = element_text(size=7),
        axis.text.x = element_text(angle=90, hjust=1))+
  geom_point(aes(color="", y=MedalsCount), size=3)+
   geom_text(aes(label=MedalsCount, y = MedalsCount),vjust=0,color="black", size=2)


GG_Final2 <- GG_Final + scale_y_continuous(sec.axis=sec_axis(~.*1, name="Number of Gold Medals"), limits=c(0,110))
GG_Final2

Figure 4 shows us that as GDP increases, the number of gold medals doesn’t have a clear pattern that follows the trend up or deviates down. It does not seem that there is a relationship between GDP and gold medals. However, this is just GDP per capita for one year compared to one year’s worth of Winter Olympics gold medals. In order to get a clearer idea if a relationship exists we will need to look at historical data to see if any trends become clear.

Historical View of if Countries with Higher GDP per Capita Win More Gold Medals.

Before importing the Historical Olympics file, some data tidying was done in excel to include the country column as well as limiting the type of medal to gold. Additionally, the 2010 Winter Olympics medal by country information came from Wikipedia and was formatted into excel and included in the Historical Olympics data file. As a caveat, Ukraine did participate at the 2010 Winter Olympics, they just did not win any medals, because of this we will not use 2010 as a data source. Lastly, since the World Bank doesn’t provide GDP per capita for all countries and all years that the Winter Olympics took place, another data source was used Comstat Datahub.For simplicity all the dollar amounts for the historical comparisons across 1998, 2002, and 2014 are in current ($2018) U.S. dollars.

H_Data <- as.data.frame(H_Data)
dim(H_Data)
## [1] 107   9
H_data2 <- filter(H_Data, Year <="2002" )

H_data2 <- as.data.frame(H_data2)
dim(H_data2)
## [1] 107   9

Figure 5. 1998 GDP per Capita by Country

H_data2 %>% filter(Year == "1998") %>% ggplot(aes(x=Country, y = GDP)) +
  geom_point(color="red", size=3) + theme(axis.text.x = element_text(angle = 90, hjust = 1))+ylab("1998 GDP per capita") 

Figure 6. 2002 GDP per Capita by Country

H_data2 %>% filter(Year == "2002") %>% ggplot(aes(x=Country, y = GDP)) +
  geom_point(color="red", size=3) + theme(axis.text.x = element_text(angle = 90, hjust = 1))+ylab("2002 GDP per capita") 

#creating a dataframe for just the country and gold medal counts - for both 1998 and 2002
H_data2_df <- aggregate(cbind(MedalsCount= Medal) ~Country,
                     data=H_data2,
                     FUN=function(x){NROW(x)})


pander(H_data2_df)
Country MedalsCount
Australia 2
Austria 6
Canada 13
China 2
Czech Republic 2
Finland 6
France 6
Great Britain 1
Italy 6
Japan 5
Korea 5
Norway 23
Russian Fed. 14
United States 16
#making a year dataframe for 1998

Y1998 <-filter(H_Data, Year=="1998")
head(Y1998)
##   Year     Sport     Discipline NOC           Event Event.gender Medal
## 1 1998    Skiing  Alpine Skiing AUT alpine combined            M  Gold
## 2 1998    Skiing  Alpine Skiing AUT    giant slalom            M  Gold
## 3 1998    Skiing  Alpine Skiing AUT         super-G            M  Gold
## 4 1998 Bobsleigh      Bobsleigh CAN         two-man            M  Gold
## 5 1998   Curling        Curling CAN         curling            W  Gold
## 6 1998   Skating Short Track S. CAN     5000m relay            M  Gold
##   Country     GDP
## 1 Austria 27399.1
## 2 Austria 27399.1
## 3 Austria 27399.1
## 4  Canada 20974.4
## 5  Canada 20974.4
## 6  Canada 20974.4
#Creating a dataframe that shows Country, Medal, GDP all by the Gold Medals for 1998
h_gold98_df <- aggregate(cbind(MedalsCount= Medal) ~Country,
                     data=Y1998,
                     FUN=function(x){NROW(x)})
pander(h_gold98_df)
Country MedalsCount
Austria 3
Canada 6
Czech Republic 1
Finland 2
France 2
Italy 2
Japan 5
Korea 3
Norway 10
Russian Fed. 9
United States 6
#making a year dataframe for 2002

Y2002 <-filter(H_Data, Year=="2002")
head(Y2002)
##   Year      Sport      Discipline NOC           Event Event.gender Medal
## 1 2002    Skating  Short Track S. AUS           1000m            M  Gold
## 2 2002     Skiing  Freestyle Ski. AUS         aerials            W  Gold
## 3 2002     Skiing   Alpine Skiing AUT        downhill            M  Gold
## 4 2002     Skiing   Alpine Skiing AUT    giant slalom            M  Gold
## 5 2002     Skiing Cross Country S AUT 30km mass start            M  Gold
## 6 2002 Ice Hockey      Ice Hockey CAN      ice hockey            M  Gold
##     Country     GDP
## 1 Australia 21635.9
## 2 Australia 21635.9
## 3   Austria 26482.8
## 4   Austria 26482.8
## 5   Austria 26482.8
## 6    Canada 24208.8
#Creating a dataframe that shows Country, Medal, GDP all by the Gold Medals for 2002
h_gold02_df <- aggregate(cbind(MedalsCount= Medal) ~Country,
                     data=Y2002,
                     FUN=function(x){NROW(x)})
pander(h_gold02_df)
Country MedalsCount
Australia 2
Austria 3
Canada 7
China 2
Czech Republic 1
Finland 4
France 4
Great Britain 1
Italy 4
Korea 2
Norway 13
Russian Fed. 5
United States 10
year <- H_data2_df$Year

Figure 7. Gold Medals by Country in 1998

bar3 <- ggplot(h_gold98_df, aes(Country, MedalsCount))+
geom_bar(stat="identity", fill="#FF6666")+
 theme(axis.text.x = element_text(angle = 90, hjust = 1))+
   ylab("Gold Medals") 
bar3

Figure 8. Gold Medals by Country in 2002

bar4 <- ggplot(h_gold02_df, aes(Country, MedalsCount))+
geom_bar(stat="identity", fill="#FF6666")+
 theme(axis.text.x = element_text(angle = 90, hjust = 1))+
   ylab("Gold Medals") 
bar4

Combined3 <- Y1998 %>% group_by(Country) %>% select(Medal, Country, GDP)

Use3 <- Combined3 %>% distinct(GDP) 
head(Use3)
## # A tibble: 6 x 2
## # Groups:   Country [6]
##          Country     GDP
##            <chr>   <dbl>
## 1        Austria 27399.1
## 2         Canada 20974.4
## 3 Czech Republic  6453.4
## 4        Finland 26054.2
## 5         France 25949.8
## 6          Italy 22282.2
h_gold98_df2 <- inner_join(Use3,h_gold98_df,by="Country")
head(h_gold98_df2)
## # A tibble: 6 x 3
## # Groups:   Country [6]
##          Country     GDP MedalsCount
##            <chr>   <dbl>       <int>
## 1        Austria 27399.1           3
## 2         Canada 20974.4           6
## 3 Czech Republic  6453.4           1
## 4        Finland 26054.2           2
## 5         France 25949.8           2
## 6          Italy 22282.2           2
Thou_98 <- mutate(h_gold98_df2, GDP = GDP/1000) 
head(Thou_98)
## # A tibble: 6 x 3
## # Groups:   Country [6]
##          Country     GDP MedalsCount
##            <chr>   <dbl>       <int>
## 1        Austria 27.3991           3
## 2         Canada 20.9744           6
## 3 Czech Republic  6.4534           1
## 4        Finland 26.0542           2
## 5         France 25.9498           2
## 6          Italy 22.2822           2

Figure 9. Number of Gold Medals by 1998 GDP per Capita

Gold_98 <- ggplot(Thou_98, aes(Country, GDP))+
  geom_point()+
  labs(x = "Country", y="1998 GDP per capita\n(in thousands)", color="Gold Medals", fill="MedalsCount")+   
  theme(text = element_text(size=7),
        axis.text.x = element_text(angle=90, hjust=1))+
  geom_point(aes(color="", y=MedalsCount), size=5)+
   geom_text(aes(label=MedalsCount, y = MedalsCount),vjust=0,color="black", size=2)


Gold_98_2 <- Gold_98 + scale_y_continuous(sec.axis=sec_axis(~.*1, name="Number of Gold Medals"), limits=c(0,60))
Gold_98_2

Figure 9 shows us that in 1998, there didn’t seem to be a strong relationship between GDP per capita and the number of gold medals a country received. Although, just like in 2014 Norway and Russia are the top gold medal winners and Norway has the highest GDP per capita. Although the data appears to follow a similar up and down trend the relationship doesn’t seem strong.

Figure 9. Revised.Number of Gold Medals by 1998 GDP per Capita

Gold_98_3 <- ggplot(Thou_98, aes(x=GDP, y=MedalsCount))+
  geom_point()+
  labs(x = "GDP per capita\n(in thousands)", y="Number of Gold Medals", color="", fill="MedalsCount")+
  ylim(0,15)

Gold_98_3

After revising Figure 9, to remove the country label/axis, it becomes more clear that there is a relationship, although it does not seem like a strong relationship, between GDP per capita and the number of gold medals a country won in the 1998 Winter Olympics. Especially after a certain level of GDP per capita, around $26,000.

Combined4 <- Y2002 %>% group_by(Country) %>% select(Medal, Country, GDP)

Use4 <- Combined3 %>% distinct(GDP) 
head(Use3)
## # A tibble: 6 x 2
## # Groups:   Country [6]
##          Country     GDP
##            <chr>   <dbl>
## 1        Austria 27399.1
## 2         Canada 20974.4
## 3 Czech Republic  6453.4
## 4        Finland 26054.2
## 5         France 25949.8
## 6          Italy 22282.2
h_gold02_df2 <- inner_join(Use4,h_gold02_df,by="Country")
head(h_gold02_df2)
## # A tibble: 6 x 3
## # Groups:   Country [6]
##          Country     GDP MedalsCount
##            <chr>   <dbl>       <int>
## 1        Austria 27399.1           3
## 2         Canada 20974.4           7
## 3 Czech Republic  6453.4           1
## 4        Finland 26054.2           4
## 5         France 25949.8           4
## 6          Italy 22282.2           4
Thou_02 <- mutate(h_gold02_df2, GDP = GDP/1000) 
head(Thou_02)
## # A tibble: 6 x 3
## # Groups:   Country [6]
##          Country     GDP MedalsCount
##            <chr>   <dbl>       <int>
## 1        Austria 27.3991           3
## 2         Canada 20.9744           7
## 3 Czech Republic  6.4534           1
## 4        Finland 26.0542           4
## 5         France 25.9498           4
## 6          Italy 22.2822           4

Figure 10. Number of Gold Medals by 2002 GDP per Capita

Gold_02 <- ggplot(Thou_02, aes(Country, GDP))+
  geom_point()+
  labs(x = "Country", y="2002 GDP per capita\n(in thousands)", color="Gold Medals", fill="MedalsCount")+   
  theme(text = element_text(size=7),
        axis.text.x = element_text(angle=90, hjust=1))+
  geom_point(aes(color="", y=MedalsCount), size=5)+
   geom_text(aes(label=MedalsCount, y = MedalsCount),vjust=0,color="black", size=2)


Gold_02_2 <- Gold_02 + scale_y_continuous(sec.axis=sec_axis(~.*1, name="Number of Gold Medals"), limits=c(0,60))
Gold_02_2

Figure 10, which shows Winter Olympics 2002 gold medals by country and GDP per capita for those countries, allows us to see that there is once again a similarity in the pattern of data points going up and down as GDP per capita goes down the number of gold medals seems to go down as well, but this relatinship is not a strong one.

Figure 10 Revised.Number of Gold Medals by 2002 GDP per Capita

Gold_02_3 <- ggplot(Thou_02, aes(x=GDP, y=MedalsCount))+
  geom_point()+
  labs(x = "GDP per capita\n(in thousands)", y="Number of Gold Medals", color="", fill="MedalsCount")+
  ylim(0,15)

Gold_02_3

Similarly to the 1998 data, after revising Figure 10, to remove the country labels/axis, it becomes much more clear that there is a relationship, although again this relationship does not seem strong based on the data used between GDP per capita and the number of gold medals a country won in the 2002 Winter Olympics.

Conclusion:

Overall, the conclusion for looking at GDP per capita and the number of gold medals a country received for the Winter Olympics for 1998, 2002, and 2014 was inconclusive. If we had to use just the data explored in this assignment it indicates there is not a strong relationship between country GDP per capita and the number of gold medals a country takes home in the Winter Olympics. As always, it would be ideal to have more data for more years to look at the long-term trends to see if there is ever a relationship between these two variables. Additionally, maybe keeping all medals and not just gold would help develop the broader picture of if GDP per capita can be an indicator of winning countries in the Winter Olympics. There is likely another variable that may be able to tell a better story about what drives a country’s success in the Winter Olympic games, perhaps climate or geographic location would be of interest to see if athletes from countries with colder climates would outperform athletes from countries with warmer climates.

Even after revisions, my prior conclusion above remains.