Prepare the Athlete Events data for analysis.
# Import athletes_events csv file into R and the read the first 6 lines using the head() function
athletes<-read.csv("athlete_events.csv")
head(athletes)
## ID Name Sex Age Height Weight Team NOC
## 1 1 A Dijiang M 24 180 80 China CHN
## 2 2 A Lamusi M 23 170 60 China CHN
## 3 3 Gunnar Nielsen Aaby M 24 NA NA Denmark DEN
## 4 4 Edgar Lindenau Aabye M 34 NA NA Denmark/Sweden DEN
## 5 5 Christine Jacoba Aaftink F 21 185 82 Netherlands NED
## 6 5 Christine Jacoba Aaftink F 21 185 82 Netherlands NED
## Games Year Season City Sport
## 1 1992 Summer 1992 Summer Barcelona Basketball
## 2 2012 Summer 2012 Summer London Judo
## 3 1920 Summer 1920 Summer Antwerpen Football
## 4 1900 Summer 1900 Summer Paris Tug-Of-War
## 5 1988 Winter 1988 Winter Calgary Speed Skating
## 6 1988 Winter 1988 Winter Calgary Speed Skating
## Event Medal
## 1 Basketball Men's Basketball <NA>
## 2 Judo Men's Extra-Lightweight <NA>
## 3 Football Men's Football <NA>
## 4 Tug-Of-War Men's Tug-Of-War Gold
## 5 Speed Skating Women's 500 metres <NA>
## 6 Speed Skating Women's 1,000 metres <NA>
# Convert Year from the atlhete_events.csv file to a factor
athletes$Year <-factor(athletes$Year)
# remove all the NA from the Medals column only
athletes <- subset(athletes, !is.na(Medal) & Medal != "")
Summary: Data Preparation - All the Athlete Events data from the Olympics from Summer of 1896 to Summer or 2016 has been loaded for future Analysis. In reviewing the Olympic data, the year had to be converted to a factor. Since the assigned analysis task are focused on medals all NA values in the medals column were removed.
Provide analysis based on the assigned task.
# Make Team USA a subset() and assign to USA.
USA <- subset(athletes, Team == "United States")
head(USA)
## ID Name Sex Age Height Weight Team NOC
## 187 84 Stephen Anthony Abas M 26 165 55 United States USA
## 279 145 Jeremy Abbott M 28 175 70 United States USA
## 284 150 Margaret Ives Abbott (-Dunne) F 23 NA NA United States USA
## 287 153 Monica Cecilia Abbott F 23 191 88 United States USA
## 312 165 Nia Nicole Abdallah F 20 175 56 United States USA
## 610 351 Julius Shareef Abdur-Rahim M 23 202 104 United States USA
## Games Year Season City Sport
## 187 2004 Summer 2004 Summer Athina Wrestling
## 279 2014 Winter 2014 Winter Sochi Figure Skating
## 284 1900 Summer 1900 Summer Paris Golf
## 287 2008 Summer 2008 Summer Beijing Softball
## 312 2004 Summer 2004 Summer Athina Taekwondo
## 610 2000 Summer 2000 Summer Sydney Basketball
## Event Medal
## 187 Wrestling Men's Featherweight, Freestyle Silver
## 279 Figure Skating Mixed Team Bronze
## 284 Golf Women's Individual Gold
## 287 Softball Women's Softball Silver
## 312 Taekwondo Women's Featherweight Silver
## 610 Basketball Men's Basketball Gold
Summary: In order to focus on the next 2 analysis task, a subset of the athlete_events data was created with just the United States as the Team.
# Choose Cran Mirror, Install dplyr package needed to group by, summarize and arrange
chooseCRANmirror(ind = 69)
install.packages("dplyr")
## Installing package into 'C:/Users/Joe/AppData/Local/R/win-library/4.2'
## (as 'lib' is unspecified)
## package 'dplyr' successfully unpacked and MD5 sums checked
## Warning: cannot remove prior installation of package 'dplyr'
## Warning in file.copy(savedcopy, lib, recursive = TRUE): problem copying
## C:\Users\Joe\AppData\Local\R\win-library\4.2\00LOCK\dplyr\libs\x64\dplyr.dll to
## C:\Users\Joe\AppData\Local\R\win-library\4.2\dplyr\libs\x64\dplyr.dll:
## Permission denied
## Warning: restored 'dplyr'
##
## The downloaded binary packages are in
## C:\Users\Joe\AppData\Local\Temp\Rtmpw9Ki25\downloaded_packages
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
# Using the USA data set group by Sports, Summarize the Total Medal count for Each Sport and Arrange Total Medal Count in descending order.
USA_Sport <- USA %>%
group_by(Sport) %>%
summarize(Total_Medal_Count = n()) %>%
arrange(desc(Total_Medal_Count))
# View top medal count per Sport.
head (USA_Sport)
## # A tibble: 6 × 2
## Sport Total_Medal_Count
## <chr> <int>
## 1 Athletics 1071
## 2 Swimming 1066
## 3 Basketball 341
## 4 Rowing 333
## 5 Ice Hockey 276
## 6 Shooting 193
Summary: Based on the analysis you can see that Top Medal Count are in Athletics and Swimming for the United States
# Using the USA data set make a subset for USA Swimming
usa_swim <- subset(USA, Sport == "Swimming")
# Team USA data that is a Swimming Sport
head(usa_swim)
## ID Name Sex Age Height Weight Team NOC
## 1463 813 Edgar Holmes Adams M 36 NA NA United States USA
## 1844 1017 Nathan Ghar-Jun Adrian M 19 198 100 United States USA
## 1845 1017 Nathan Ghar-Jun Adrian M 23 198 100 United States USA
## 1846 1017 Nathan Ghar-Jun Adrian M 23 198 100 United States USA
## 1847 1017 Nathan Ghar-Jun Adrian M 23 198 100 United States USA
## 1848 1017 Nathan Ghar-Jun Adrian M 27 198 100 United States USA
## Games Year Season City Sport
## 1463 1904 Summer 1904 Summer St. Louis Swimming
## 1844 2008 Summer 2008 Summer Beijing Swimming
## 1845 2012 Summer 2012 Summer London Swimming
## 1846 2012 Summer 2012 Summer London Swimming
## 1847 2012 Summer 2012 Summer London Swimming
## 1848 2016 Summer 2016 Summer Rio de Janeiro Swimming
## Event Medal
## 1463 Swimming Men's Plunge For Distance Silver
## 1844 Swimming Men's 4 x 100 metres Freestyle Relay Gold
## 1845 Swimming Men's 100 metres Freestyle Gold
## 1846 Swimming Men's 4 x 100 metres Freestyle Relay Silver
## 1847 Swimming Men's 4 x 100 metres Medley Relay Gold
## 1848 Swimming Men's 50 metres Freestyle Bronze
Summary: The USA Swim subset displays all USA Medals in the Swimming Sport.
# List the Top 10 Medal winning countries. First you have to get the Total Medal Count by Team
country_summary <- athletes %>%
group_by(Team) %>%
summarize(Total_Medal_Count_Team = n()) %>%
ungroup()
# Using the country_summary and Total_Medal_Count_Team list the top 10 medal Winning countries. I added in descending order so I can see the true Top in order of most.
top_10_countries <- country_summary %>%
top_n(10, Total_Medal_Count_Team) %>%
arrange(desc(Total_Medal_Count_Team))
# View Top Ten medal winning countries. Go USA
top_10_countries
## # A tibble: 10 × 2
## Team Total_Medal_Count_Team
## <chr> <int>
## 1 United States 5219
## 2 Soviet Union 2451
## 3 Germany 1984
## 4 Great Britain 1673
## 5 France 1550
## 6 Italy 1527
## 7 Sweden 1434
## 8 Australia 1306
## 9 Canada 1243
## 10 Hungary 1127
Summary: Based on this analysis the United States holds the top Medal Count at 5219 and Hungry is the 10th Medal Count at 1127 Medals.
# Using dataset athletes display only the Gold Medal counts for each Team in descending order.
gold_medal_count <- athletes %>%
group_by(Team) %>%
filter(Medal == "Gold") %>%
summarize(Total_Gold_Medal_Count = n()) %>%
arrange(desc(Total_Gold_Medal_Count))
# Gold Medal Count by Team.
gold_medal_count
## # A tibble: 242 × 2
## Team Total_Gold_Medal_Count
## <chr> <int>
## 1 United States 2474
## 2 Soviet Union 1058
## 3 Germany 679
## 4 Italy 535
## 5 Great Britain 519
## 6 France 455
## 7 Sweden 451
## 8 Hungary 432
## 9 Canada 422
## 10 East Germany 369
## # ℹ 232 more rows
Summary: United States holds the top Gold Medal count at 2,474 Gold Medals and the Soviet Union comes in second with a Gold Medal count of 1,058.
# Using the athletes data sent filter out Female and year 2016 then count the Total Medals won by Women.
women_medals_won_2016 <- athletes %>%
filter(Sex == "F", Year == 2016) %>%
summarize(Total_Medals_Won_by_Women = n())
# The Total Medals Won by Women in 2016
women_medals_won_2016
## Total_Medals_Won_by_Women
## 1 969
Summary: The total number of Medals won by Women in 2016 are 969 medals.