Do not change anything in the following chunk

You will be working on olympic_gymnasts dataset. Do not change the code below:

olympics <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-07-27/olympics.csv')

olympic_gymnasts <- olympics %>% 
  filter(!is.na(age)) %>%             # only keep athletes with known age
  filter(sport == "Gymnastics") %>%   # keep only gymnasts
  mutate(
    medalist = case_when(             # add column for success in medaling
      is.na(medal) ~ FALSE,           # NA values go to FALSE
      !is.na(medal) ~ TRUE            # non-NA values (Gold, Silver, Bronze) go to TRUE
    )
  )

More information about the dataset can be found at

https://github.com/rfordatascience/tidytuesday/blob/master/data/2021/2021-07-27/readme.md

Question 1: Create a subset dataset with the following columns only: name, sex, age, team, year and medalist. Call it df.

df <- select(olympic_gymnasts, c("name", "sex", "age", "team", "year", "medalist"))

Question 2: From df create df2 that only have year of 2008 2012, and 2016

df2 <- df %>% 
  filter(year == 2008 | year == 2012 | year == 2016)

Question 3 Group by these three years (2008,2012, and 2016) and summarize the mean of the age in each group.

df2 %>%
  group_by(year) %>%
  summarise(mean_age = mean(age, na.rm = TRUE))
## # A tibble: 3 × 2
##    year mean_age
##   <dbl>    <dbl>
## 1  2008     21.6
## 2  2012     21.9
## 3  2016     22.2

Question 4 Use olympic_gymnasts dataset, group by year, and find the mean of the age for each year, call this dataset oly_year. (optional after creating the dataset, find the minimum average age)

oly_year <- olympic_gymnasts %>%
  group_by(year) %>%
  summarise(mean_age = mean(age, na.rm = TRUE))


# Display results
oly_year
## # A tibble: 29 × 2
##     year mean_age
##    <dbl>    <dbl>
##  1  1896     24.3
##  2  1900     22.2
##  3  1904     25.1
##  4  1906     24.7
##  5  1908     23.2
##  6  1912     24.2
##  7  1920     26.7
##  8  1924     27.6
##  9  1928     25.6
## 10  1932     23.9
## # ℹ 19 more rows

Question 5 This question is open ended. Create a question that requires you to use at least two verbs. Create a code that answers your question. Then below the chunk, reflect on your question choice and coding procedure

# Your R code here
country_count <- olympic_gymnasts %>%
  distinct(team) %>%
  count()
  
country_count
## # A tibble: 1 × 1
##       n
##   <int>
## 1   108

Discussion: *My question was how many countries have competed in gymnastics over the course of Olympic history, so I used the “distinct()” verb to remove duplicate lines for each country, then used “count()” to tally the number.