Homework 3

Do not change anything in the following chunk

You will be working on olympic_gymnasts dataset. Do not change the code below:

olympics <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-07-27/olympics.csv')

olympic_gymnasts <- olympics %>% 
  filter(!is.na(age)) %>%             # only keep athletes with known age
  filter(sport == "Gymnastics") %>%   # keep only gymnasts
  mutate(
    medalist = case_when(             # add column for success in medaling
      is.na(medal) ~ FALSE,           # NA values go to FALSE
      !is.na(medal) ~ TRUE            # non-NA values (Gold, Silver, Bronze) go to TRUE
    )
  )

More information about the dataset can be found at

https://github.com/rfordatascience/tidytuesday/blob/master/data/2021/2021-07-27/readme.md

Question 1: Create a subset dataset with the following columns only: name, sex, age, team, year and medalist. Call it df.

df<- olympic_gymnasts|>
  select(name, sex, age, team, year, medalist)
df

## # A tibble: 25,528 × 6
##    name                    sex     age team     year medalist
##    <chr>                   <chr> <dbl> <chr>   <dbl> <lgl>   
##  1 Paavo Johannes Aaltonen M        28 Finland  1948 TRUE    
##  2 Paavo Johannes Aaltonen M        28 Finland  1948 TRUE    
##  3 Paavo Johannes Aaltonen M        28 Finland  1948 FALSE   
##  4 Paavo Johannes Aaltonen M        28 Finland  1948 TRUE    
##  5 Paavo Johannes Aaltonen M        28 Finland  1948 FALSE   
##  6 Paavo Johannes Aaltonen M        28 Finland  1948 FALSE   
##  7 Paavo Johannes Aaltonen M        28 Finland  1948 FALSE   
##  8 Paavo Johannes Aaltonen M        28 Finland  1948 TRUE    
##  9 Paavo Johannes Aaltonen M        32 Finland  1952 FALSE   
## 10 Paavo Johannes Aaltonen M        32 Finland  1952 TRUE    
## # ℹ 25,518 more rows

Question 2: From df create df2 that only have year of 2008 2012, and 2016

df2 <- df |> filter(df$year %in% c(2008, 2012, 2016))

Question 3 Group by these three years (2008,2012, and 2016) and summarize the mean of the age in each group.

df2 |> group_by(year) |> summarise(avg_age = mean(age))

## # A tibble: 3 × 2
##    year avg_age
##   <dbl>   <dbl>
## 1  2008    21.6
## 2  2012    21.9
## 3  2016    22.2

Question 4 Use olympic_gymnasts dataset, group by year, and find the mean of the age for each year, call this dataset oly_year. (optional after creating the dataset, find the minimum average age)

oly_year = olympic_gymnasts |> group_by(year) |> summarise(avg_age = mean(age))
oly_year[which.min(oly_year$avg_age), ]

## # A tibble: 1 × 2
##    year avg_age
##   <dbl>   <dbl>
## 1  1988    19.9

Question 5 This question is open ended. Create a question that requires you to use at least two verbs. Create a code that answers your question. Then below the chunk, reflect on your question choice and coding procedure

Which team has the highest mean age of medalists?

# Your R code here
avg_ages = olympic_gymnasts |> filter(medalist = TRUE) |> group_by(team) |> summarise(avg_age = mean(age))


summary(avg_ages$avg_age)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   10.00   20.42   22.47   22.25   23.85   30.00

avg_ages[which.max(avg_ages$avg_age), ]

## # A tibble: 1 × 2
##   team    avg_age
##   <chr>     <dbl>
## 1 Bohemia      30

Discussion: Enter your discussion of results here.

When filtering the dataset to only include medalists and grouping by team, there was a surprisingly large range (20 years), however, the team with the highest average medalist age is Bohemia, at 30 years.

Homework 3

Rebecca Murphy