You will be working on olympic_gymnasts dataset. Please DO NOT change the code below:

olympics <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-07-27/olympics.csv')

olympic_gymnasts <- olympics %>% 
  filter(!is.na(age)) %>%             # only keep athletes with known age
  filter(sport == "Gymnastics") %>%   # keep only gymnasts
  mutate(
    medalist = case_when(             # add column for success in medaling
      is.na(medal) ~ FALSE,           # NA values go to FALSE
      !is.na(medal) ~ TRUE            # non-NA values (Gold, Silver, Bronze) go to TRUE
    )
  )

More information about the dataset can be found at

https://github.com/rfordatascience/tidytuesday/blob/master/data/2021/2021-07-27/readme.md

Question 1: Create a subset dataset with the following columns only: name, sex, age, team, year and medalist. Call it df.

df <- subset(olympic_gymnasts, select = c("name", "sex", "age", "team", "year", "medalist"))
df
## # A tibble: 25,528 × 6
##    name                    sex     age team     year medalist
##    <chr>                   <chr> <dbl> <chr>   <dbl> <lgl>   
##  1 Paavo Johannes Aaltonen M        28 Finland  1948 TRUE    
##  2 Paavo Johannes Aaltonen M        28 Finland  1948 TRUE    
##  3 Paavo Johannes Aaltonen M        28 Finland  1948 FALSE   
##  4 Paavo Johannes Aaltonen M        28 Finland  1948 TRUE    
##  5 Paavo Johannes Aaltonen M        28 Finland  1948 FALSE   
##  6 Paavo Johannes Aaltonen M        28 Finland  1948 FALSE   
##  7 Paavo Johannes Aaltonen M        28 Finland  1948 FALSE   
##  8 Paavo Johannes Aaltonen M        28 Finland  1948 TRUE    
##  9 Paavo Johannes Aaltonen M        32 Finland  1952 FALSE   
## 10 Paavo Johannes Aaltonen M        32 Finland  1952 TRUE    
## # ℹ 25,518 more rows

Question 2: From df create df2 that only have year of 2008 2012, and 2016

df2 <- subset(df, year == 2008 | year == 2012 | year == 2016)
df2
## # A tibble: 2,703 × 6
##    name              sex     age team     year medalist
##    <chr>             <chr> <dbl> <chr>   <dbl> <lgl>   
##  1 Nstor Abad Sanjun M        23 Spain    2016 FALSE   
##  2 Nstor Abad Sanjun M        23 Spain    2016 FALSE   
##  3 Nstor Abad Sanjun M        23 Spain    2016 FALSE   
##  4 Nstor Abad Sanjun M        23 Spain    2016 FALSE   
##  5 Nstor Abad Sanjun M        23 Spain    2016 FALSE   
##  6 Nstor Abad Sanjun M        23 Spain    2016 FALSE   
##  7 Katja Abel        F        25 Germany  2008 FALSE   
##  8 Katja Abel        F        25 Germany  2008 FALSE   
##  9 Katja Abel        F        25 Germany  2008 FALSE   
## 10 Katja Abel        F        25 Germany  2008 FALSE   
## # ℹ 2,693 more rows

Question 3 Group by these three years (2008,2012, and 2016) and summarize the mean of the age.

df2 %>% group_by(year) %>% summarize(mean = mean(age), n = n())
## # A tibble: 3 × 3
##    year  mean     n
##   <dbl> <dbl> <int>
## 1  2008  21.6   994
## 2  2012  21.9   848
## 3  2016  22.2   861

Question 4 Use olympic_gymnasts dataset, group by year, and find the mean of the age for each year, call this dataset oly_year. (optional after creating the dataset, find the minimum average age)

oly_year <- df2 %>% group_by(year) %>% summarize(mean = mean(age), n = n())
oly_year
## # A tibble: 3 × 3
##    year  mean     n
##   <dbl> <dbl> <int>
## 1  2008  21.6   994
## 2  2012  21.9   848
## 3  2016  22.2   861
min(oly_year$mean)
## [1] 21.6338

Question 5 This question is open ended. Create a question that requires you to use at least two verbs. Create a code that answers your question. Then below the chunk, reflect on your question choice and coding procedure

# Find the mean of female olympic gymansts and male olympic gymnasts
# Your R code here
my_data <- olympic_gymnasts %>% group_by(sex) %>% summarize(mean = mean(age))
my_data
## # A tibble: 2 × 2
##   sex    mean
##   <chr> <dbl>
## 1 F      19.2
## 2 M      24.7

Discussion: Your discussion of results here.

I was interested in seeing if the average age of female gymnasts differed from the average age of male gymnasts, and it turns out that it does. The average age of male olympic gymnasts is over 5 years, or over 25% greater, than the average age of female olympic gymnasts. This might be because males enter and exit puberty at a slightly older age than females, meaning that they are in ideal olympic-gymnast form at a slightly older age than females.