Do not change anything in the following chunk

You will be working on olympic_gymnasts dataset. Do not change the code below:

olympics <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-07-27/olympics.csv')

olympic_gymnasts <- olympics %>% 
  filter(!is.na(age)) %>%             # only keep athletes with known age
  filter(sport == "Gymnastics") %>%   # keep only gymnasts
  mutate(
    medalist = case_when(             # add column for success in medaling
      is.na(medal) ~ FALSE,           # NA values go to FALSE
      !is.na(medal) ~ TRUE            # non-NA values (Gold, Silver, Bronze) go to TRUE
    )
  )

More information about the dataset can be found at

https://github.com/rfordatascience/tidytuesday/blob/master/data/2021/2021-07-27/readme.md

Question 1: Create a subset dataset with the following columns only: name, sex, age, team, year and medalist. Call it df.

df<- olympic_gymnasts|>
  select(name, sex, age, year, medalist)
df
## # A tibble: 25,528 × 5
##    name                    sex     age  year medalist
##    <chr>                   <chr> <dbl> <dbl> <lgl>   
##  1 Paavo Johannes Aaltonen M        28  1948 TRUE    
##  2 Paavo Johannes Aaltonen M        28  1948 TRUE    
##  3 Paavo Johannes Aaltonen M        28  1948 FALSE   
##  4 Paavo Johannes Aaltonen M        28  1948 TRUE    
##  5 Paavo Johannes Aaltonen M        28  1948 FALSE   
##  6 Paavo Johannes Aaltonen M        28  1948 FALSE   
##  7 Paavo Johannes Aaltonen M        28  1948 FALSE   
##  8 Paavo Johannes Aaltonen M        28  1948 TRUE    
##  9 Paavo Johannes Aaltonen M        32  1952 FALSE   
## 10 Paavo Johannes Aaltonen M        32  1952 TRUE    
## # ℹ 25,518 more rows

Question 2: From df create df2 that only have year of 2008 2012, and 2016

df2 <- df |>
  mutate(year = olympic_gymnasts$year) |>
  filter(year %in% c("2008", "2012", "2016"))

Question 3 Group by these three years (2008,2012, and 2016) and summarize the mean of the age in each group.

df2 |>
  group_by(year) |>
  summarise(
    mean_age = mean(age)
  )
## # A tibble: 3 × 2
##    year mean_age
##   <dbl>    <dbl>
## 1  2008     21.6
## 2  2012     21.9
## 3  2016     22.2

Question 4 Use olympic_gymnasts dataset, group by year, and find the mean of the age for each year, call this dataset oly_year. (optional after creating the dataset, find the minimum average age)

oly_year <- olympic_gymnasts |>
  group_by(year) |>
  summarise(
    mean_age = mean(age)
  )

oly_year
## # A tibble: 29 × 2
##     year mean_age
##    <dbl>    <dbl>
##  1  1896     24.3
##  2  1900     22.2
##  3  1904     25.1
##  4  1906     24.7
##  5  1908     23.2
##  6  1912     24.2
##  7  1920     26.7
##  8  1924     27.6
##  9  1928     25.6
## 10  1932     23.9
## # ℹ 19 more rows

Question 5 This question is open ended. Create a question that requires you to use at least two verbs. Create a code that answers your question. Then below the chunk, reflect on your question choice and coding procedure

Question: Filter the olympic_gymnasts dataset for cities only in Asia into a new dataset titled “oly_asia”. Then mutate a column showing age status (minor or adult)

unique(olympic_gymnasts$city)
##  [1] "London"         "Helsinki"       "Antwerpen"      "Rio de Janeiro"
##  [5] "Sydney"         "Munich"         "Beijing"        "Roma"          
##  [9] "Berlin"         "Stockholm"      "Mexico City"    "Tokyo"         
## [13] "Moskva"         "Los Angeles"    "Amsterdam"      "Seoul"         
## [17] "Melbourne"      "Barcelona"      "Athina"         "Atlanta"       
## [21] "St. Louis"      "Montreal"       "Paris"
oly_asia <- olympic_gymnasts |>
    filter(city %in% c("Tokyo", "Beijing", "Seoul")) |>
    mutate(age_status = ifelse(age < 18, "minor", "adult"))
oly_asia
## # A tibble: 3,695 × 17
##       id name     sex     age height weight team  noc   games  year season city 
##    <dbl> <chr>    <chr> <dbl>  <dbl>  <dbl> <chr> <chr> <chr> <dbl> <chr>  <chr>
##  1   396 Katja A… F        25    165     55 Germ… GER   2008…  2008 Summer Beij…
##  2   396 Katja A… F        25    165     55 Germ… GER   2008…  2008 Summer Beij…
##  3   396 Katja A… F        25    165     55 Germ… GER   2008…  2008 Summer Beij…
##  4   396 Katja A… F        25    165     55 Germ… GER   2008…  2008 Summer Beij…
##  5   396 Katja A… F        25    165     55 Germ… GER   2008…  2008 Summer Beij…
##  6   610 Ginko A… F        26    148     46 Japan JPN   1964…  1964 Summer Tokyo
##  7   610 Ginko A… F        26    148     46 Japan JPN   1964…  1964 Summer Tokyo
##  8   610 Ginko A… F        26    148     46 Japan JPN   1964…  1964 Summer Tokyo
##  9   610 Ginko A… F        26    148     46 Japan JPN   1964…  1964 Summer Tokyo
## 10   610 Ginko A… F        26    148     46 Japan JPN   1964…  1964 Summer Tokyo
## # ℹ 3,685 more rows
## # ℹ 5 more variables: sport <chr>, event <chr>, medal <chr>, medalist <lgl>,
## #   age_status <chr>

Discussion: I saw the wide variety of cities in the dataset and realized they can be grouped by continent. However, I wanted to narrow this dataset down by only filtering to Asian countries. I checked all the countries and found that the ones located in Asia are Tokyo, Beijing, and Seoul. I used the filter function to choose these cities and add it into a new dataset. Next, I noticed that there is a wide range of ages in the dataset. To specify whether the person is an adult or a minor, I mutated a coloumn and wrote an ifelse statement, stating that if the variable “age” is less than 18, it will output minor in the column. if else, the output will be adult.