Homework 3

Do not change anything in the following chunk

You will be working on olympic_gymnasts dataset. Do not change the code below:

olympics <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-07-27/olympics.csv')

olympic_gymnasts <- olympics %>% 
  filter(!is.na(age)) %>%             # only keep athletes with known age
  filter(sport == "Gymnastics") %>%   # keep only gymnasts
  mutate(
    medalist = case_when(             # add column for success in medaling
      is.na(medal) ~ FALSE,           # NA values go to FALSE
      !is.na(medal) ~ TRUE            # non-NA values (Gold, Silver, Bronze) go to TRUE
    )
  )

More information about the dataset can be found at

https://github.com/rfordatascience/tidytuesday/blob/master/data/2021/2021-07-27/readme.md

Question 1: Create a subset dataset with the following columns only: name, sex, age, team, year and medalist. Call it df.

df<- olympic_gymnasts|>
  select(name, sex, age, team, year, medalist)
df

## # A tibble: 25,528 × 6
##    name                    sex     age team     year medalist
##    <chr>                   <chr> <dbl> <chr>   <dbl> <lgl>   
##  1 Paavo Johannes Aaltonen M        28 Finland  1948 TRUE    
##  2 Paavo Johannes Aaltonen M        28 Finland  1948 TRUE    
##  3 Paavo Johannes Aaltonen M        28 Finland  1948 FALSE   
##  4 Paavo Johannes Aaltonen M        28 Finland  1948 TRUE    
##  5 Paavo Johannes Aaltonen M        28 Finland  1948 FALSE   
##  6 Paavo Johannes Aaltonen M        28 Finland  1948 FALSE   
##  7 Paavo Johannes Aaltonen M        28 Finland  1948 FALSE   
##  8 Paavo Johannes Aaltonen M        28 Finland  1948 TRUE    
##  9 Paavo Johannes Aaltonen M        32 Finland  1952 FALSE   
## 10 Paavo Johannes Aaltonen M        32 Finland  1952 TRUE    
## # ℹ 25,518 more rows

Question 2: From df create df2 that only have year of 2008 2012, and 2016

df2 <- df |>
  filter(year %in% c(2008,2012,2016)) #extract rows with data from 2008, 2012, and 2016
str(df2)

## tibble [2,703 × 6] (S3: tbl_df/tbl/data.frame)
##  $ name    : chr [1:2703] "Nstor Abad Sanjun" "Nstor Abad Sanjun" "Nstor Abad Sanjun" "Nstor Abad Sanjun" ...
##  $ sex     : chr [1:2703] "M" "M" "M" "M" ...
##  $ age     : num [1:2703] 23 23 23 23 23 23 25 25 25 25 ...
##  $ team    : chr [1:2703] "Spain" "Spain" "Spain" "Spain" ...
##  $ year    : num [1:2703] 2016 2016 2016 2016 2016 ...
##  $ medalist: logi [1:2703] FALSE FALSE FALSE FALSE FALSE FALSE ...

print(df2)

## # A tibble: 2,703 × 6
##    name              sex     age team     year medalist
##    <chr>             <chr> <dbl> <chr>   <dbl> <lgl>   
##  1 Nstor Abad Sanjun M        23 Spain    2016 FALSE   
##  2 Nstor Abad Sanjun M        23 Spain    2016 FALSE   
##  3 Nstor Abad Sanjun M        23 Spain    2016 FALSE   
##  4 Nstor Abad Sanjun M        23 Spain    2016 FALSE   
##  5 Nstor Abad Sanjun M        23 Spain    2016 FALSE   
##  6 Nstor Abad Sanjun M        23 Spain    2016 FALSE   
##  7 Katja Abel        F        25 Germany  2008 FALSE   
##  8 Katja Abel        F        25 Germany  2008 FALSE   
##  9 Katja Abel        F        25 Germany  2008 FALSE   
## 10 Katja Abel        F        25 Germany  2008 FALSE   
## # ℹ 2,693 more rows

Question 3 Group by these three years (2008,2012, and 2016) and summarize the mean of the age in each group.

group_df2 <- df2 |>
  group_by(df2$year) |>
  summarize(mean_age = mean(age))

head(group_df2)

## # A tibble: 3 × 2
##   `df2$year` mean_age
##        <dbl>    <dbl>
## 1       2008     21.6
## 2       2012     21.9
## 3       2016     22.2

Question 4 Use olympic_gymnasts dataset, group by year, and find the mean of the age for each year, call this dataset oly_year. (optional after creating the dataset, find the minimum average age)

oly_year <- olympic_gymnasts |>
  group_by(olympic_gymnasts$year) |>
  summarize(mean_of_all_age = mean(age))
head(oly_year)

## # A tibble: 6 × 2
##   `olympic_gymnasts$year` mean_of_all_age
##                     <dbl>           <dbl>
## 1                    1896            24.3
## 2                    1900            22.2
## 3                    1904            25.1
## 4                    1906            24.7
## 5                    1908            23.2
## 6                    1912            24.2

Question 5 This question is open ended. Create a question that requires you to use at least two verbs. Create a code that answers your question. Then below the chunk, reflect on your question choice and coding procedure

Create a data frame that only has the team Japan. Then find the number of gold, silver, and bronze medals they won.

japan_medals <- olympic_gymnasts |>
  filter(team == "Japan") |>
  count(medal)
str(japan_medals)

## tibble [4 × 2] (S3: tbl_df/tbl/data.frame)
##  $ medal: chr [1:4] "Bronze" "Gold" "Silver" NA
##  $ n    : int [1:4] 54 65 47 1050

print(japan_medals)

## # A tibble: 4 × 2
##   medal      n
##   <chr>  <int>
## 1 Bronze    54
## 2 Gold      65
## 3 Silver    47
## 4 <NA>    1050

Discussion: Enter your discussion of results here.

First I passed the olympic_gymnasts dataset to a new data frame called japan_medals. In this data frame, it outputs the number of bronze, silver, and gold medals Japan won in gymnastics. I then use the filter() function to only select rows that included the team Japan. |> and then, I used the count() function to count the number of medals from the team Japan.