Do not change anything in the following chunk

You will be working on olympic_gymnasts dataset. Do not change the code below:

olympics <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-07-27/olympics.csv')

olympic_gymnasts <- olympics %>% 
  filter(!is.na(age)) %>%             # only keep athletes with known age
  filter(sport == "Gymnastics") %>%   # keep only gymnasts
  mutate(
    medalist = case_when(             # add column for success in medaling
      is.na(medal) ~ FALSE,           # NA values go to FALSE
      !is.na(medal) ~ TRUE            # non-NA values (Gold, Silver, Bronze) go to TRUE
    )
  )

More information about the dataset can be found at

https://github.com/rfordatascience/tidytuesday/blob/master/data/2021/2021-07-27/readme.md

Question 1: Create a subset dataset with the following columns only: name, sex, age, team, year and medalist. Call it df.

df<- olympic_gymnasts|>
  select(name, sex, age, team, year, medalist)
head(df)
## # A tibble: 6 × 6
##   name                    sex     age team     year medalist
##   <chr>                   <chr> <dbl> <chr>   <dbl> <lgl>   
## 1 Paavo Johannes Aaltonen M        28 Finland  1948 TRUE    
## 2 Paavo Johannes Aaltonen M        28 Finland  1948 TRUE    
## 3 Paavo Johannes Aaltonen M        28 Finland  1948 FALSE   
## 4 Paavo Johannes Aaltonen M        28 Finland  1948 TRUE    
## 5 Paavo Johannes Aaltonen M        28 Finland  1948 FALSE   
## 6 Paavo Johannes Aaltonen M        28 Finland  1948 FALSE

Question 2: From df create df2 that only have year of 2008 2012, and 2016

df2 <- df |>
  filter( year == c("2008","2012","2016"))
## Warning: There was 1 warning in `filter()`.
## ℹ In argument: `year == c("2008", "2012", "2016")`.
## Caused by warning in `year == c("2008", "2012", "2016")`:
## ! longer object length is not a multiple of shorter object length
head(df2)
## # A tibble: 6 × 6
##   name                        sex     age team     year medalist
##   <chr>                       <chr> <dbl> <chr>   <dbl> <lgl>   
## 1 Nstor Abad Sanjun           M        23 Spain    2016 FALSE   
## 2 Nstor Abad Sanjun           M        23 Spain    2016 FALSE   
## 3 Katja Abel                  F        25 Germany  2008 FALSE   
## 4 Denis Mikhaylovich Ablyazin M        19 Russia   2012 TRUE    
## 5 Denis Mikhaylovich Ablyazin M        19 Russia   2012 FALSE   
## 6 Denis Mikhaylovich Ablyazin M        24 Russia   2016 TRUE

Question 3 Group by these three years (2008,2012, and 2016) and summarize the mean of the age in each group.

group_df <-df2 |>
  group_by(year)|>
  summarize(mean(age))
group_df
## # A tibble: 3 × 2
##    year `mean(age)`
##   <dbl>       <dbl>
## 1  2008        21.7
## 2  2012        22.0
## 3  2016        22.2

Question 4 Use olympic_gymnasts dataset, group by year, and find the mean of the age for each year, call this dataset oly_year. (optional after creating the dataset, find the minimum average age)

oly_year <- olympic_gymnasts |>
  group_by(year) |>
  summarize(mean_age = mean (age))
oly_year
## # A tibble: 29 × 2
##     year mean_age
##    <dbl>    <dbl>
##  1  1896     24.3
##  2  1900     22.2
##  3  1904     25.1
##  4  1906     24.7
##  5  1908     23.2
##  6  1912     24.2
##  7  1920     26.7
##  8  1924     27.6
##  9  1928     25.6
## 10  1932     23.9
## # ℹ 19 more rows

Question 5 This question is open ended. Create a question that requires you to use at least two verbs. Create a code that answers your question. Then below the chunk, reflect on your question choice and coding procedure

# Find which country won the most medals in the year 2016.
# Your R code here
medals <- olympic_gymnasts |>
  filter(year == "2016") |>
  filter(medalist == "TRUE") |>
  group_by(team) |>
  count(medalist) |> 
  arrange(desc(n))
medals
## # A tibble: 12 × 3
## # Groups:   team [12]
##    team          medalist     n
##    <chr>         <lgl>    <int>
##  1 Russia        TRUE        16
##  2 United States TRUE        16
##  3 China         TRUE        10
##  4 Japan         TRUE         7
##  5 Great Britain TRUE         6
##  6 Brazil        TRUE         3
##  7 Germany       TRUE         2
##  8 Ukraine       TRUE         2
##  9 Greece        TRUE         1
## 10 Netherlands   TRUE         1
## 11 North Korea   TRUE         1
## 12 Switzerland   TRUE         1

Discussion: Enter your discussion of results here. Using filter, group by, count, and arrange. I used filter to show all of the participants during 2016, and then was able to find all the participants that won medals. I then used group by to find the countries that won, and then used count to see how many medals they all won. At the end I used arrange to see which country had the most medals showing highest number to lowest. We are able to see that Russia and the United States were joint first place for most medals won at the 2016 Olympics.