Homework 3

Do not change anything in the following chunk

You will be working on olympic_gymnasts dataset. Do not change the code below:

olympics <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-07-27/olympics.csv')

olympic_gymnasts <- olympics %>% 
  filter(!is.na(age)) %>%             # only keep athletes with known age
  filter(sport == "Gymnastics") %>%   # keep only gymnasts
  mutate(
    medalist = case_when(             # add column for success in medaling
      is.na(medal) ~ FALSE,           # NA values go to FALSE
      !is.na(medal) ~ TRUE            # non-NA values (Gold, Silver, Bronze) go to TRUE
    )
  )

More information about the dataset can be found at

https://github.com/rfordatascience/tidytuesday/blob/master/data/2021/2021-07-27/readme.md

Question 1: Create a subset dataset with the following columns only: name, sex, age, team, year and medalist. Call it df.

# subset dataset with only the requested columns
df <- olympic_gymnasts %>%
  select(name, sex, age, team, year, medalist) # Select columns
head(df) # preview first 6 rows

## # A tibble: 6 × 6
##   name                    sex     age team     year medalist
##   <chr>                   <chr> <dbl> <chr>   <dbl> <lgl>   
## 1 Paavo Johannes Aaltonen M        28 Finland  1948 TRUE    
## 2 Paavo Johannes Aaltonen M        28 Finland  1948 TRUE    
## 3 Paavo Johannes Aaltonen M        28 Finland  1948 FALSE   
## 4 Paavo Johannes Aaltonen M        28 Finland  1948 TRUE    
## 5 Paavo Johannes Aaltonen M        28 Finland  1948 FALSE   
## 6 Paavo Johannes Aaltonen M        28 Finland  1948 FALSE

Question 2: From df create df2 that only have year of 2008 2012, and 2016

# Filter df for years 2008, 2012, and 2016
df2 <- df %>% 
  filter(year %in% c(92008, 2012, 2016))
head(df2) # preview first 6 rows

## # A tibble: 6 × 6
##   name              sex     age team   year medalist
##   <chr>             <chr> <dbl> <chr> <dbl> <lgl>   
## 1 Nstor Abad Sanjun M        23 Spain  2016 FALSE   
## 2 Nstor Abad Sanjun M        23 Spain  2016 FALSE   
## 3 Nstor Abad Sanjun M        23 Spain  2016 FALSE   
## 4 Nstor Abad Sanjun M        23 Spain  2016 FALSE   
## 5 Nstor Abad Sanjun M        23 Spain  2016 FALSE   
## 6 Nstor Abad Sanjun M        23 Spain  2016 FALSE

Question 3 Group by these three years (2008,2012, and 2016) and summarize the mean of the age in each group.

# Group by year and summarize mean age
df2 %>%
  group_by(year) %>% 
  summarise(mean_age = mean(age, na.rm = TRUE))

## # A tibble: 2 × 2
##    year mean_age
##   <dbl>    <dbl>
## 1  2012     21.9
## 2  2016     22.2

Question 4 Use olympic_gymnasts dataset, group by year, and find the mean of the age for each year, call this dataset oly_year. (optional after creating the dataset, find the minimum average age)

# Group by year and find mean age for each year
oly_year <- olympic_gymnasts %>% 
  group_by(year) %>%
  summarise(mean_age = mean(age, na.rm = TRUE)) 

head(oly_year) # preview first 6 rows

## # A tibble: 6 × 2
##    year mean_age
##   <dbl>    <dbl>
## 1  1896     24.3
## 2  1900     22.2
## 3  1904     25.1
## 4  1906     24.7
## 5  1908     23.2
## 6  1912     24.2

# optional: find the year with minimum average age
oly_year %>% 
  filter(mean_age == min(mean_age))

## # A tibble: 1 × 2
##    year mean_age
##   <dbl>    <dbl>
## 1  1988     19.9

Question 5 This question is open ended. Create a question that requires you to use at least two verbs. Create a code that answers your question. Then below the chunk, reflect on your question choice and coding procedure

# How many gymnasts won medals in 2016, and what was their average age?
medal_2016 <- df2 %>%
  filter(year == 2016, medalist == TRUE) %>%
  summarise(
    count = n(),
    average_age = mean(age, na.rm = TRUE)
  )
medal_2016

## # A tibble: 1 × 2
##   count average_age
##   <int>       <dbl>
## 1    66        21.8

Discussion: Enter your discussion of results here. I wanted to see how successful gymnasts in 2016 were distributed by age. I first filtered for 2016 medalists and then counted them and calculated their mean age. This shows how older or younger gymnasts tended to win in 2016.