Homework 3

Do not change anything in the following chunk

You will be working on olympic_gymnasts dataset. Do not change the code below:

olympics <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-07-27/olympics.csv')

olympic_gymnasts <- olympics %>% 
  filter(!is.na(age)) %>%             # only keep athletes with known age
  filter(sport == "Gymnastics") %>%   # keep only gymnasts
  mutate(
    medalist = case_when(             # add column for success in medaling
      is.na(medal) ~ FALSE,           # NA values go to FALSE
      !is.na(medal) ~ TRUE            # non-NA values (Gold, Silver, Bronze) go to TRUE
    )
  )

More information about the dataset can be found at

https://github.com/rfordatascience/tidytuesday/blob/master/data/2021/2021-07-27/readme.md

Question 1: Create a subset dataset with the following columns only: name, sex, age, team, year and medalist. Call it df.

df<- olympic_gymnasts|>
  select(name, sex, age, team, year, medalist)
df

## # A tibble: 25,528 × 6
##    name                    sex     age team     year medalist
##    <chr>                   <chr> <dbl> <chr>   <dbl> <lgl>   
##  1 Paavo Johannes Aaltonen M        28 Finland  1948 TRUE    
##  2 Paavo Johannes Aaltonen M        28 Finland  1948 TRUE    
##  3 Paavo Johannes Aaltonen M        28 Finland  1948 FALSE   
##  4 Paavo Johannes Aaltonen M        28 Finland  1948 TRUE    
##  5 Paavo Johannes Aaltonen M        28 Finland  1948 FALSE   
##  6 Paavo Johannes Aaltonen M        28 Finland  1948 FALSE   
##  7 Paavo Johannes Aaltonen M        28 Finland  1948 FALSE   
##  8 Paavo Johannes Aaltonen M        28 Finland  1948 TRUE    
##  9 Paavo Johannes Aaltonen M        32 Finland  1952 FALSE   
## 10 Paavo Johannes Aaltonen M        32 Finland  1952 TRUE    
## # ℹ 25,518 more rows

Question 2: From df create df2 that only have year of 2008 2012, and 2016

df2 <- olympic_gymnasts |>
  filter(year %in% c(2008, 2012, 2016))
df2

## # A tibble: 2,703 × 16
##       id name     sex     age height weight team  noc   games  year season city 
##    <dbl> <chr>    <chr> <dbl>  <dbl>  <dbl> <chr> <chr> <chr> <dbl> <chr>  <chr>
##  1    51 Nstor A… M        23    167     64 Spain ESP   2016…  2016 Summer Rio …
##  2    51 Nstor A… M        23    167     64 Spain ESP   2016…  2016 Summer Rio …
##  3    51 Nstor A… M        23    167     64 Spain ESP   2016…  2016 Summer Rio …
##  4    51 Nstor A… M        23    167     64 Spain ESP   2016…  2016 Summer Rio …
##  5    51 Nstor A… M        23    167     64 Spain ESP   2016…  2016 Summer Rio …
##  6    51 Nstor A… M        23    167     64 Spain ESP   2016…  2016 Summer Rio …
##  7   396 Katja A… F        25    165     55 Germ… GER   2008…  2008 Summer Beij…
##  8   396 Katja A… F        25    165     55 Germ… GER   2008…  2008 Summer Beij…
##  9   396 Katja A… F        25    165     55 Germ… GER   2008…  2008 Summer Beij…
## 10   396 Katja A… F        25    165     55 Germ… GER   2008…  2008 Summer Beij…
## # ℹ 2,693 more rows
## # ℹ 4 more variables: sport <chr>, event <chr>, medal <chr>, medalist <lgl>

Question 3 Group by these three years (2008,2012, and 2016) and summarize the mean of the age in each group.

df3 <- df2 %>% #New df here
group_by(year) %>% #Column first
summarise(mean = mean(age, na.rm = TRUE))
df3

## # A tibble: 3 × 2
##    year  mean
##   <dbl> <dbl>
## 1  2008  21.6
## 2  2012  21.9
## 3  2016  22.2

Question 4 Use olympic_gymnasts dataset, group by year, and find the mean of the age for each year, call this dataset oly_year. (optional after creating the dataset, find the minimum average age)

olympic_year <- olympic_gymnasts %>%
  group_by(year) %>%
  mutate(mean_age = mean(age, na.rm = TRUE))
olympic_year

## # A tibble: 25,528 × 17
## # Groups:   year [29]
##       id name     sex     age height weight team  noc   games  year season city 
##    <dbl> <chr>    <chr> <dbl>  <dbl>  <dbl> <chr> <chr> <chr> <dbl> <chr>  <chr>
##  1    17 Paavo J… M        28    175     64 Finl… FIN   1948…  1948 Summer Lond…
##  2    17 Paavo J… M        28    175     64 Finl… FIN   1948…  1948 Summer Lond…
##  3    17 Paavo J… M        28    175     64 Finl… FIN   1948…  1948 Summer Lond…
##  4    17 Paavo J… M        28    175     64 Finl… FIN   1948…  1948 Summer Lond…
##  5    17 Paavo J… M        28    175     64 Finl… FIN   1948…  1948 Summer Lond…
##  6    17 Paavo J… M        28    175     64 Finl… FIN   1948…  1948 Summer Lond…
##  7    17 Paavo J… M        28    175     64 Finl… FIN   1948…  1948 Summer Lond…
##  8    17 Paavo J… M        28    175     64 Finl… FIN   1948…  1948 Summer Lond…
##  9    17 Paavo J… M        32    175     64 Finl… FIN   1952…  1952 Summer Hels…
## 10    17 Paavo J… M        32    175     64 Finl… FIN   1952…  1952 Summer Hels…
## # ℹ 25,518 more rows
## # ℹ 5 more variables: sport <chr>, event <chr>, medal <chr>, medalist <lgl>,
## #   mean_age <dbl>

Question 5 This question is open ended. Create a question that requires you to use at least two verbs. Create a code that answers your question. Then below the chunk, reflect on your question choice and coding procedure

# Your R code here
# Who are the gymnasts that are from Team USA with the events they competed in?

team <- olympic_gymnasts |>
  select(name, games, team, event) |>
  filter(team == "United States")
team

## # A tibble: 1,830 × 4
##    name                      games       team          event                    
##    <chr>                     <chr>       <chr>         <chr>                    
##  1 James Kanati Allen        1968 Summer United States Gymnastics Men's Individ…
##  2 James Kanati Allen        1968 Summer United States Gymnastics Men's Team Al…
##  3 James Kanati Allen        1968 Summer United States Gymnastics Men's Floor E…
##  4 James Kanati Allen        1968 Summer United States Gymnastics Men's Horse V…
##  5 James Kanati Allen        1968 Summer United States Gymnastics Men's Paralle…
##  6 James Kanati Allen        1968 Summer United States Gymnastics Men's Horizon…
##  7 James Kanati Allen        1968 Summer United States Gymnastics Men's Rings   
##  8 James Kanati Allen        1968 Summer United States Gymnastics Men's Pommell…
##  9 William Peter Andelfinger 1904 Summer United States Gymnastics Men's Individ…
## 10 William Peter Andelfinger 1904 Summer United States Gymnastics Men's Individ…
## # ℹ 1,820 more rows

Discussion: If I know how to make it so data doesn’t repeat the same name even though the dataset including the different events or olympic games they’d competed in, I’d like to incorporate that to this. I like that I was able to narrow down on the results I can get from a large dataset and work from there, instead of having to look at literally the whole thing.