Homework 3

Do not change anything in the following chunk

You will be working on olympic_gymnasts dataset. Do not change the code below:

olympics <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-07-27/olympics.csv')

olympic_gymnasts <- olympics %>% 
  filter(!is.na(age)) %>%             # only keep athletes with known age
  filter(sport == "Gymnastics") %>%   # keep only gymnasts
  mutate(
    medalist = case_when(             # add column for success in medaling
      is.na(medal) ~ FALSE,           # NA values go to FALSE
      !is.na(medal) ~ TRUE            # non-NA values (Gold, Silver, Bronze) go to TRUE
    )
  )

More information about the dataset can be found at

https://github.com/rfordatascience/tidytuesday/blob/master/data/2021/2021-07-27/readme.md

Question 1: Create a subset dataset with the following columns only: name, sex, age, team, year and medalist. Call it df.

df <- data.frame(
  name= (olympic_gymnasts$name),
  sex= (olympic_gymnasts$sex),
  age= (olympic_gymnasts$age),
  team= (olympic_gymnasts$team),
  year= (olympic_gymnasts$year),
  medalist= (olympic_gymnasts$medalist)
)

df<- olympic_gymnasts|>
  select(name, sex, age, year)

df

## # A tibble: 25,528 × 4
##    name                    sex     age  year
##    <chr>                   <chr> <dbl> <dbl>
##  1 Paavo Johannes Aaltonen M        28  1948
##  2 Paavo Johannes Aaltonen M        28  1948
##  3 Paavo Johannes Aaltonen M        28  1948
##  4 Paavo Johannes Aaltonen M        28  1948
##  5 Paavo Johannes Aaltonen M        28  1948
##  6 Paavo Johannes Aaltonen M        28  1948
##  7 Paavo Johannes Aaltonen M        28  1948
##  8 Paavo Johannes Aaltonen M        28  1948
##  9 Paavo Johannes Aaltonen M        32  1952
## 10 Paavo Johannes Aaltonen M        32  1952
## # ℹ 25,518 more rows

Question 2: From df create df2 that only have year of 2008 2012, and 2016

df2 <- df |> 
  filter( year %in% c(2008, 2012, 2016))
           
df2

## # A tibble: 2,703 × 4
##    name              sex     age  year
##    <chr>             <chr> <dbl> <dbl>
##  1 Nstor Abad Sanjun M        23  2016
##  2 Nstor Abad Sanjun M        23  2016
##  3 Nstor Abad Sanjun M        23  2016
##  4 Nstor Abad Sanjun M        23  2016
##  5 Nstor Abad Sanjun M        23  2016
##  6 Nstor Abad Sanjun M        23  2016
##  7 Katja Abel        F        25  2008
##  8 Katja Abel        F        25  2008
##  9 Katja Abel        F        25  2008
## 10 Katja Abel        F        25  2008
## # ℹ 2,693 more rows

Question 3 Group by these three years (2008,2012, and 2016) and summarize the mean of the age in each group.

df2 <- df |> 
  filter( year %in% c(2008, 2012, 2016)) |>
  group_by(year) |>
  summarize(meanyears = mean(age))
  
df2

## # A tibble: 3 × 2
##    year meanyears
##   <dbl>     <dbl>
## 1  2008      21.6
## 2  2012      21.9
## 3  2016      22.2

Question 4 Use olympic_gymnasts dataset, group by year, and find the mean of the age for each year, call this dataset oly_year. (optional after creating the dataset, find the minimum average age)

df <- olympic_gymnasts |> group_by(year) |> 
  summarize(meanyears = mean(age))
  
df

## # A tibble: 29 × 2
##     year meanyears
##    <dbl>     <dbl>
##  1  1896      24.3
##  2  1900      22.2
##  3  1904      25.1
##  4  1906      24.7
##  5  1908      23.2
##  6  1912      24.2
##  7  1920      26.7
##  8  1924      27.6
##  9  1928      25.6
## 10  1932      23.9
## # ℹ 19 more rows

Question 5 This question is open ended. Create a question that requires you to use at least two verbs. Create a code that answers your question. Then below the chunk, reflect on your question choice and coding procedure.

Question: Which male olympic gymnasts won a gold medal and what were their weights in kg. Create a new column with their weights in kg.

# Your R code here

df <- olympic_gymnasts |> 
  select(name, medal, sex, weight) |>
  filter(medal == "Gold", sex == "M") |>
    mutate(weight_in_lbs = weight * 2.2)

df

## # A tibble: 551 × 5
##    name                             medal sex   weight weight_in_lbs
##    <chr>                            <chr> <chr>  <dbl>         <dbl>
##  1 Paavo Johannes Aaltonen          Gold  M         64          141.
##  2 Paavo Johannes Aaltonen          Gold  M         64          141.
##  3 Paavo Johannes Aaltonen          Gold  M         64          141.
##  4 Isak Abrahamsen                  Gold  M         NA           NA 
##  5 Fausto Alesio Acke (Padovini-)   Gold  M         NA           NA 
##  6 Nobuyuki Aihara                  Gold  M         53          117.
##  7 Nobuyuki Aihara                  Gold  M         53          117.
##  8 Georg Albert Christian Albertsen Gold  M         NA           NA 
##  9 Carl Albert Andersen             Gold  M         NA           NA 
## 10 Carl Rudolf Svend Andersen       Gold  M         NA           NA 
## # ℹ 541 more rows

Discussion: Enter your discussion of results here.

We first start by prompting R to look into the df dataframe and then specifically look into the olympic_gymnasts subset dataframe. Once it does this, I am asking it to specifically pick the columns name, medal, sex, and medal and then to pick the rows containing the Male sex and the the row with gold. Meaning that we specifically want the males who won gold medals and then after that, I want to create a new column which will show their weights in kg by multiplying it by 2.2 assuming the weights given are in pounds.