Homework 3

Do not change anything in the following chunk

You will be working on olympic_gymnasts dataset. Do not change the code below:

olympics <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-07-27/olympics.csv')

olympic_gymnasts <- olympics %>% 
  filter(!is.na(age)) %>%             # only keep athletes with known age
  filter(sport == "Gymnastics") %>%   # keep only gymnasts
  mutate(
    medalist = case_when(             # add column for success in medaling
      is.na(medal) ~ FALSE,           # NA values go to FALSE
      !is.na(medal) ~ TRUE            # non-NA values (Gold, Silver, Bronze) go to TRUE
    )
  )

More information about the dataset can be found at

https://github.com/rfordatascience/tidytuesday/blob/master/data/2021/2021-07-27/readme.md

Question 1: Create a subset dataset with the following columns only: name, sex, age, team, year and medalist. Call it df.

df<- olympic_gymnasts|>
  select(name, sex, age, team, year, medalist)

Question 2: From df create df2 that only have year of 2008 2012, and 2016

df2 <- filter(df, year %in% c(2008, 2012, 2016))

Question 3 Group by these three years (2008,2012, and 2016) and summarize the mean of the age in each group.

df2 |>
  group_by(year) |>
  summarize(mean(age))

## # A tibble: 3 × 2
##    year `mean(age)`
##   <dbl>       <dbl>
## 1  2008        21.6
## 2  2012        21.9
## 3  2016        22.2

Question 4 Use olympic_gymnasts dataset, group by year, and find the mean of the age for each year, call this dataset oly_year. (optional after creating the dataset, find the minimum average age)

oly_year <- olympic_gymnasts |>
  group_by(year) |>
  summarize(mean(age))

summary(oly_year)

##       year        mean(age)    
##  Min.   :1896   Min.   :19.87  
##  1st Qu.:1924   1st Qu.:21.29  
##  Median :1960   Median :23.16  
##  Mean   :1957   Mean   :23.18  
##  3rd Qu.:1988   3rd Qu.:24.76  
##  Max.   :2016   Max.   :27.83

Question 5 This question is open ended. Create a question that requires you to use at least two verbs. Create a code that answers your question. Then below the chunk, reflect on your question choice and coding procedure.

Create a new dataset called CAN_olympic_gymnasts by filtering out all Canadian gymnasts from the olympic_gymnasts dataset, and then summarize the average age, height, and weight by year of the new dataset.

# Your R code here
CAN_olympic_gymnasts <- filter(olympic_gymnasts, team %in% "Canada") 

CAN_olympic_gymnasts |>
  group_by(year) |>   
  summarize(
  average_age = mean(age),
  average_height = mean(height),
  average_weight = mean(weight)
  )

## # A tibble: 16 × 4
##     year average_age average_height average_weight
##    <dbl>       <dbl>          <dbl>          <dbl>
##  1  1908        20.5            NA            NA  
##  2  1956        19.2            NA            NA  
##  3  1960        19.4           161.           56.9
##  4  1964        24.7           168.           65.8
##  5  1968        21.9           167.           59.2
##  6  1972        20.4            NA            NA  
##  7  1976        19.2            NA            NA  
##  8  1984        20.5           160.           56.2
##  9  1988        20.0           161.           58.1
## 10  1992        20.5           160.           56.5
## 11  1996        20.5           160.           53.8
## 12  2000        17.8           158.           51  
## 13  2004        20.6           163.           57.3
## 14  2008        23.5           162.           54.9
## 15  2012        18.2           154.           48.2
## 16  2016        19.8           155.           51.8

Discussion: Enter your discussion of results here. I chose this question because I wanted to try filtering and summarising based on nationality. I believe that I have achieved what I wanted, except there are some unexpected NAs. However, it looks like it comes from the original olympic_gymnasts dataframe, so there is nothing I can do about it.