Homework 3

Do not change anything in the following chunk

You will be working on olympic_gymnasts dataset. Do not change the code below:

olympics <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-07-27/olympics.csv')

olympic_gymnasts <- olympics %>% 
  filter(!is.na(age)) %>%             # only keep athletes with known age
  filter(sport == "Gymnastics") %>%   # keep only gymnasts
  mutate(
    medalist = case_when(             # add column for success in medaling
      is.na(medal) ~ FALSE,           # NA values go to FALSE
      !is.na(medal) ~ TRUE            # non-NA values (Gold, Silver, Bronze) go to TRUE
    )
  )

More information about the dataset can be found at

https://github.com/rfordatascience/tidytuesday/blob/master/data/2021/2021-07-27/readme.md

Question 1: Create a subset dataset with the following columns only: name, sex, age, team, year and medalist. Call it df.

df <- olympic_gymnasts %>%
select(name, sex, age, team, year, medalist)

head(df)

## # A tibble: 6 × 6
##   name                    sex     age team     year medalist
##   <chr>                   <chr> <dbl> <chr>   <dbl> <lgl>   
## 1 Paavo Johannes Aaltonen M        28 Finland  1948 TRUE    
## 2 Paavo Johannes Aaltonen M        28 Finland  1948 TRUE    
## 3 Paavo Johannes Aaltonen M        28 Finland  1948 FALSE   
## 4 Paavo Johannes Aaltonen M        28 Finland  1948 TRUE    
## 5 Paavo Johannes Aaltonen M        28 Finland  1948 FALSE   
## 6 Paavo Johannes Aaltonen M        28 Finland  1948 FALSE

Question 2: From df create df2 that only have year of 2008 2012, and 2016

df2 <- df %>%
filter(year %in% c(2008, 2012, 2016))

head(df2)

## # A tibble: 6 × 6
##   name              sex     age team   year medalist
##   <chr>             <chr> <dbl> <chr> <dbl> <lgl>   
## 1 Nstor Abad Sanjun M        23 Spain  2016 FALSE   
## 2 Nstor Abad Sanjun M        23 Spain  2016 FALSE   
## 3 Nstor Abad Sanjun M        23 Spain  2016 FALSE   
## 4 Nstor Abad Sanjun M        23 Spain  2016 FALSE   
## 5 Nstor Abad Sanjun M        23 Spain  2016 FALSE   
## 6 Nstor Abad Sanjun M        23 Spain  2016 FALSE

Question 3 Group by these three years (2008,2012, and 2016) and summarize the mean of the age in each group.

df2_summary <- df2 %>%
group_by(year) %>%
summarize(mean_age = mean(age, na.rm = TRUE))

df2_summary

## # A tibble: 3 × 2
##    year mean_age
##   <dbl>    <dbl>
## 1  2008     21.6
## 2  2012     21.9
## 3  2016     22.2

Question 4 Use olympic_gymnasts dataset, group by year, and find the mean of the age for each year, call this dataset oly_year. (optional after creating the dataset, find the minimum average age)

oly_year <- olympic_gymnasts %>%
group_by(year) %>%
summarize(mean_age = mean(age, na.rm = TRUE))

# Display dataset

oly_year

## # A tibble: 29 × 2
##     year mean_age
##    <dbl>    <dbl>
##  1  1896     24.3
##  2  1900     22.2
##  3  1904     25.1
##  4  1906     24.7
##  5  1908     23.2
##  6  1912     24.2
##  7  1920     26.7
##  8  1924     27.6
##  9  1928     25.6
## 10  1932     23.9
## # ℹ 19 more rows

# Year with minimum average age

oly_year %>%
filter(mean_age == min(mean_age))

## # A tibble: 1 × 2
##    year mean_age
##   <dbl>    <dbl>
## 1  1988     19.9

Question 5 This question is open ended. Create a question that requires you to use at least two verbs. Create a code that answers your question. Then below the chunk, reflect on your question choice and coding procedure

# Your R code here
oldest_team_2016 <- df %>%
filter(year == 2016) %>%
group_by(team) %>%
summarize(mean_age = mean(age, na.rm = TRUE)) %>%
arrange(desc(mean_age))

oldest_team_2016

## # A tibble: 60 × 2
##    team        mean_age
##    <chr>          <dbl>
##  1 Uzbekistan      35  
##  2 Greece          30  
##  3 Venezuela       30  
##  4 Israel          29  
##  5 North Korea     28.3
##  6 Chile           28  
##  7 Armenia         27  
##  8 Romania         26.8
##  9 Vietnam         25.3
## 10 Egypt           25  
## # ℹ 50 more rows

Discussion: I wanted to explore which countries had older gymnasts on average in the 2016 Olympics. I first filtered the dataset for the year 2016, then grouped by team and calculated the mean age. Sorting in descending order allowed me to quickly identify the teams with the oldest gymnasts. I used at least two verbs: filter() and group_by() with summarize(), satisfying the requirements of the question.