You will be working on olympic_gymnasts dataset. Please DO NOT change the code below:
olympics <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-07-27/olympics.csv')
olympic_gymnasts <- olympics %>%
filter(!is.na(age)) %>% # only keep athletes with known age
filter(sport == "Gymnastics") %>% # keep only gymnasts
mutate(
medalist = case_when( # add column for success in medaling
is.na(medal) ~ FALSE, # NA values go to FALSE
!is.na(medal) ~ TRUE # non-NA values (Gold, Silver, Bronze) go to TRUE
)
)
More information about the dataset can be found at
https://github.com/rfordatascience/tidytuesday/blob/master/data/2021/2021-07-27/readme.md
Question 1: Create a subset dataset with the following columns only: name, sex, age, team, year and medalist. Call it df.
df <- olympics %>%
select(name, sex, age, team, year, medal)
print(df)
## # A tibble: 271,116 × 6
## name sex age team year medal
## <chr> <chr> <dbl> <chr> <dbl> <chr>
## 1 A Dijiang M 24 China 1992 <NA>
## 2 A Lamusi M 23 China 2012 <NA>
## 3 Gunnar Nielsen Aaby M 24 Denmark 1920 <NA>
## 4 Edgar Lindenau Aabye M 34 Denmark/Sweden 1900 Gold
## 5 Christine Jacoba Aaftink F 21 Netherlands 1988 <NA>
## 6 Christine Jacoba Aaftink F 21 Netherlands 1988 <NA>
## 7 Christine Jacoba Aaftink F 25 Netherlands 1992 <NA>
## 8 Christine Jacoba Aaftink F 25 Netherlands 1992 <NA>
## 9 Christine Jacoba Aaftink F 27 Netherlands 1994 <NA>
## 10 Christine Jacoba Aaftink F 27 Netherlands 1994 <NA>
## # ℹ 271,106 more rows
Question 2: From df create df2 that only have year of 2008 2012, and 2016
df2 <- df %>%
filter(year == 2008 | year == 2012 | year == 2016)
print(df2)
## # A tibble: 40,210 × 6
## name sex age team year medal
## <chr> <chr> <dbl> <chr> <dbl> <chr>
## 1 A Lamusi M 23 China 2012 <NA>
## 2 Ragnhild Margrethe Aamodt F 27 Norway 2008 Gold
## 3 Andreea Aanei F 22 Romania 2016 <NA>
## 4 Jamale (Djamel-) Aarrass (Ahrass-) M 30 France 2012 <NA>
## 5 Abdelhak Aatakni M 24 Morocco 2012 <NA>
## 6 Moonika Aava F 28 Estonia 2008 <NA>
## 7 Nstor Abad Sanjun M 23 Spain 2016 <NA>
## 8 Nstor Abad Sanjun M 23 Spain 2016 <NA>
## 9 Nstor Abad Sanjun M 23 Spain 2016 <NA>
## 10 Nstor Abad Sanjun M 23 Spain 2016 <NA>
## # ℹ 40,200 more rows
Question 3 Group by these three years (2008,2012, and 2016) and sumarize the mean of the age.
by_year <- group_by(df2, year)
summarise(by_year, Mean = mean(age, na.rm = TRUE))
## # A tibble: 3 × 2
## year Mean
## <dbl> <dbl>
## 1 2008 25.7
## 2 2012 26.0
## 3 2016 26.2
Question 4 Use olympic_gymnasts dataset, group by year, and find the mean of the age for each year, call this dataset oly_year. (optional after creating the dataset, find the minimum average age)
oly_year <- olympic_gymnasts %>%
group_by(year) %>%
summarise(age_mean = mean(age))
print(oly_year)
## # A tibble: 29 × 2
## year age_mean
## <dbl> <dbl>
## 1 1896 24.3
## 2 1900 22.2
## 3 1904 25.1
## 4 1906 24.7
## 5 1908 23.2
## 6 1912 24.2
## 7 1920 26.7
## 8 1924 27.6
## 9 1928 25.6
## 10 1932 23.9
## # ℹ 19 more rows
min(oly_year)
## [1] 19.86606
Question 5 This question is open ended. Create a question that requires you to use at least to verbs. Create a code that answers your question. Then below the chunk, reflect on your question choice and coding procedure
The question is : What sport has on average older gold medal winners?
#What I want to see is the sport that has the oldest mean age of gold medal winners to do this I filtered to only gold medals. Then I selected age and sport grouped by sport and made the mean of age. With this I aranged in descending orfer and made the top 5.
gold_winners <- olympics %>%
filter(medal == "Gold") %>%
select(age, sport) %>%
group_by(sport) %>%
summarise(mean_age = mean(age, na.rm = TRUE)) %>%
arrange(desc(mean_age)) %>%
head(5)
gold_winners
## # A tibble: 5 × 2
## sport mean_age
## <chr> <dbl>
## 1 Roque 64
## 2 Art Competitions 41.2
## 3 Alpinism 38.8
## 4 Polo 36.0
## 5 Equestrianism 35.3
Discussion: Your discussion of results here.
With my analysis the top 5 sports that have tend to have older gold medal winners are Roque, Art Competitions, Alpinism, Polo and Equestrianism. And as I thought wen first thinking abought the possible answers all the top sports that have a higher age mean tend to be less athletically or depends on others athleticism such as equestrianism.