Homework 3

Do not change anything in the following chunk

You will be working on olympic_gymnasts dataset. Do not change the code below:

olympics <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-07-27/olympics.csv')

olympic_gymnasts <- olympics %>% 
  filter(!is.na(age)) %>%             # only keep athletes with known age
  filter(sport == "Gymnastics") %>%   # keep only gymnasts
  mutate(
    medalist = case_when(             # add column for success in medaling
      is.na(medal) ~ FALSE,           # NA values go to FALSE
      !is.na(medal) ~ TRUE            # non-NA values (Gold, Silver, Bronze) go to TRUE
    )
  )

More information about the dataset can be found at

https://github.com/rfordatascience/tidytuesday/blob/master/data/2021/2021-07-27/readme.md

Question 1: Create a subset dataset with the following columns only: name, sex, age, team, year and medalist. Call it df.

df<- olympic_gymnasts|>
  select(name, sex, age)
df <- olympic_gymnasts[c("name", "sex", "age", "team", "year","medalist")]
df

## # A tibble: 25,528 × 6
##    name                    sex     age team     year medalist
##    <chr>                   <chr> <dbl> <chr>   <dbl> <lgl>   
##  1 Paavo Johannes Aaltonen M        28 Finland  1948 TRUE    
##  2 Paavo Johannes Aaltonen M        28 Finland  1948 TRUE    
##  3 Paavo Johannes Aaltonen M        28 Finland  1948 FALSE   
##  4 Paavo Johannes Aaltonen M        28 Finland  1948 TRUE    
##  5 Paavo Johannes Aaltonen M        28 Finland  1948 FALSE   
##  6 Paavo Johannes Aaltonen M        28 Finland  1948 FALSE   
##  7 Paavo Johannes Aaltonen M        28 Finland  1948 FALSE   
##  8 Paavo Johannes Aaltonen M        28 Finland  1948 TRUE    
##  9 Paavo Johannes Aaltonen M        32 Finland  1952 FALSE   
## 10 Paavo Johannes Aaltonen M        32 Finland  1952 TRUE    
## # ℹ 25,518 more rows

Question 2: From df create df2 that only have year of 2008 2012, and 2016

df2 <- df[df$year %in% c(2008, 2012, 2016), ]
df2

## # A tibble: 2,703 × 6
##    name              sex     age team     year medalist
##    <chr>             <chr> <dbl> <chr>   <dbl> <lgl>   
##  1 Nstor Abad Sanjun M        23 Spain    2016 FALSE   
##  2 Nstor Abad Sanjun M        23 Spain    2016 FALSE   
##  3 Nstor Abad Sanjun M        23 Spain    2016 FALSE   
##  4 Nstor Abad Sanjun M        23 Spain    2016 FALSE   
##  5 Nstor Abad Sanjun M        23 Spain    2016 FALSE   
##  6 Nstor Abad Sanjun M        23 Spain    2016 FALSE   
##  7 Katja Abel        F        25 Germany  2008 FALSE   
##  8 Katja Abel        F        25 Germany  2008 FALSE   
##  9 Katja Abel        F        25 Germany  2008 FALSE   
## 10 Katja Abel        F        25 Germany  2008 FALSE   
## # ℹ 2,693 more rows

Question 3 Group by these three years (2008,2012, and 2016) and summarize the mean of the age in each group.

df2 |>
  group_by(year) |>
  summarize(
    mean_age = mean(age, na.rm = TRUE),
    n = n()
  )

## # A tibble: 3 × 3
##    year mean_age     n
##   <dbl>    <dbl> <int>
## 1  2008     21.6   994
## 2  2012     21.9   848
## 3  2016     22.2   861

Question 4 Use olympic_gymnasts dataset, group by year, and find the mean of the age for each year, call this dataset oly_year. (optional after creating the dataset, find the minimum average age)

oly_year <- olympic_gymnasts |>
  group_by(year) |>
  summarize(
    n = n(),  
    mean_age = mean(age, na.rm = TRUE),  
    max_age = max(age, na.rm = TRUE),  )

oly_year

## # A tibble: 29 × 4
##     year     n mean_age max_age
##    <dbl> <int>    <dbl>   <dbl>
##  1  1896    73     24.3      31
##  2  1900    33     22.2      31
##  3  1904   317     25.1      37
##  4  1906    70     24.7      35
##  5  1908   240     23.2      49
##  6  1912   310     24.2      38
##  7  1920   206     26.7      45
##  8  1924   499     27.6      38
##  9  1928   561     25.6      39
## 10  1932   140     23.9      34
## # ℹ 19 more rows

Question 5 This question is open ended. Create a question that requires you to use at least two verbs. Create a code that answers your question. Then below the chunk, reflect on your question choice and coding procedure

# Your R code here
df2 <- olympic_gymnasts |>
  filter(medal == "Gold")

df2

## # A tibble: 785 × 16
##       id name     sex     age height weight team  noc   games  year season city 
##    <dbl> <chr>    <chr> <dbl>  <dbl>  <dbl> <chr> <chr> <chr> <dbl> <chr>  <chr>
##  1    17 "Paavo … M        28    175     64 Finl… FIN   1948…  1948 Summer Lond…
##  2    17 "Paavo … M        28    175     64 Finl… FIN   1948…  1948 Summer Lond…
##  3    17 "Paavo … M        28    175     64 Finl… FIN   1948…  1948 Summer Lond…
##  4   521 "Isak A… M        21     NA     NA Norw… NOR   1912…  1912 Summer Stoc…
##  5   697 "Fausto… M        22     NA     NA Swed… SWE   1920…  1920 Summer Antw…
##  6  1109 "Lavini… F        16    148     40 Roma… ROU   1984…  1984 Summer Los …
##  7  1211 "Estell… F        19     NA     NA Neth… NED   1928…  1928 Summer Amst…
##  8  1483 "Nobuyu… M        25    154     53 Japan JPN   1960…  1960 Summer Roma 
##  9  1483 "Nobuyu… M        25    154     53 Japan JPN   1960…  1960 Summer Roma 
## 10  2347 "Georg … M        30     NA     NA Denm… DEN   1920…  1920 Summer Antw…
## # ℹ 775 more rows
## # ℹ 4 more variables: sport <chr>, event <chr>, medal <chr>, medalist <lgl>

Discussion: Enter your discussion of results here. In this step, I filtered the dataset to only include rows where the medal type is “Gold”. This creates a new dataset, df2, that shows only gymnasts who won gold medals. Using filter() is useful here because it quickly narrows down the data to just the results I’m interested in studying. Now, instead of looking at all gymnasts, I can focus on the performances of gold medalists.

Homework 3

zebidian debele