Homework 3

Do not change anything in the following chunk

You will be working on olympic_gymnasts dataset. Do not change the code below:

olympics <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-07-27/olympics.csv')

olympic_gymnasts <- olympics %>% 
  filter(!is.na(age)) %>%             # only keep athletes with known age
  filter(sport == "Gymnastics") %>%   # keep only gymnasts
  mutate(
    medalist = case_when(             # add column for success in medaling
      is.na(medal) ~ FALSE,           # NA values go to FALSE
      !is.na(medal) ~ TRUE            # non-NA values (Gold, Silver, Bronze) go to TRUE
    )
  )

More information about the dataset can be found at

https://github.com/rfordatascience/tidytuesday/blob/master/data/2021/2021-07-27/readme.md

Question 1: Create a subset dataset with the following columns only: name, sex, age, team, year and medalist. Call it df.

df<- olympic_gymnasts|>
  select(name, sex, age, team, year, medalist)
head(df)

## # A tibble: 6 × 6
##   name                    sex     age team     year medalist
##   <chr>                   <chr> <dbl> <chr>   <dbl> <lgl>   
## 1 Paavo Johannes Aaltonen M        28 Finland  1948 TRUE    
## 2 Paavo Johannes Aaltonen M        28 Finland  1948 TRUE    
## 3 Paavo Johannes Aaltonen M        28 Finland  1948 FALSE   
## 4 Paavo Johannes Aaltonen M        28 Finland  1948 TRUE    
## 5 Paavo Johannes Aaltonen M        28 Finland  1948 FALSE   
## 6 Paavo Johannes Aaltonen M        28 Finland  1948 FALSE

Question 2: From df create df2 that only have year of 2008 2012, and 2016

df2 <- df |>
  filter(year == c("2008","2012","2016"))

## Warning: There was 1 warning in `filter()`.
## ℹ In argument: `year == c("2008", "2012", "2016")`.
## Caused by warning in `year == c("2008", "2012", "2016")`:
## ! longer object length is not a multiple of shorter object length

head(df2)

## # A tibble: 6 × 6
##   name                        sex     age team     year medalist
##   <chr>                       <chr> <dbl> <chr>   <dbl> <lgl>   
## 1 Nstor Abad Sanjun           M        23 Spain    2016 FALSE   
## 2 Nstor Abad Sanjun           M        23 Spain    2016 FALSE   
## 3 Katja Abel                  F        25 Germany  2008 FALSE   
## 4 Denis Mikhaylovich Ablyazin M        19 Russia   2012 TRUE    
## 5 Denis Mikhaylovich Ablyazin M        19 Russia   2012 FALSE   
## 6 Denis Mikhaylovich Ablyazin M        24 Russia   2016 TRUE

Question 3 Group by these three years (2008,2012, and 2016) and summarize the mean of the age in each group.

df2 |>
  group_by(year) |>
  summarize(mean = mean(age))

## # A tibble: 3 × 2
##    year  mean
##   <dbl> <dbl>
## 1  2008  21.7
## 2  2012  22.0
## 3  2016  22.2

Question 4 Use olympic_gymnasts dataset, group by year, and find the mean of the age for each year, call this dataset oly_year. (optional after creating the dataset, find the minimum average age)

oly_year <- olympic_gymnasts |>
  group_by(year) |>
  summarize(mean = mean(age)) |>
  arrange(mean)
head(oly_year)

## # A tibble: 6 × 2
##    year  mean
##   <dbl> <dbl>
## 1  1988  19.9
## 2  1992  20.0
## 3  1980  20.1
## 4  1996  20.3
## 5  1984  20.4
## 6  1976  20.5

# The minimum average age is 19.866 years.

Question 5 This question is open ended. Create a question that requires you to use at least two verbs. Create a code that answers your question. Then below the chunk, reflect on your question choice and coding procedure

Question: Who (name variable) won the most Gold medals (medal variable)?

# Your R code here
gold <- olympic_gymnasts |>
  group_by(name) |>
  filter(!is.na(medal)) |>
  filter(medal == "Gold") |>
  count(medal) |>
  arrange(desc(n))
head(gold)

## # A tibble: 6 × 3
## # Groups:   name [6]
##   name                               medal     n
##   <chr>                              <chr> <int>
## 1 Larysa Semenivna Latynina (Diriy-) Gold      9
## 2 Sawao Kato                         Gold      8
## 3 Borys Anfiyanovych Shakhlin        Gold      7
## 4 Nikolay Yefimovich Andrianov       Gold      7
## 5 Viktor Ivanovych Chukarin          Gold      7
## 6 Vra slavsk (-Odloilov)             Gold      7

Discussion: Enter your discussion of results here.

Larysa Semenivna Latynina (Diriy-) won the most gold medals, 9 to be exact. In my code, I grouped by the name of the gymnasts, filtered out all N/A values for the category, filtered for only the “Gold” value, counted all the medal counts as there are many gymnasts with many gold medals, and then I ordered the data by descending order. The reason why I chose this question of trying to figure out who had the gold medals was because of curiosity; I actually wanted to see who had the most gold medals out of this data set.