Homework 3

Do not change anything in the following chunk

You will be working on olympic_gymnasts dataset. Do not change the code below:

olympics <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-07-27/olympics.csv')

olympic_gymnasts <- olympics %>% 
  filter(!is.na(age)) %>%             # only keep athletes with known age
  filter(sport == "Gymnastics") %>%   # keep only gymnasts
  mutate(
    medalist = case_when(             # add column for success in medaling
      is.na(medal) ~ FALSE,           # NA values go to FALSE
      !is.na(medal) ~ TRUE            # non-NA values (Gold, Silver, Bronze) go to TRUE
    )
  )

More information about the dataset can be found at

https://github.com/rfordatascience/tidytuesday/blob/master/data/2021/2021-07-27/readme.md

Question 1: Create a subset dataset with the following columns only: name, sex, age, team, year and medalist. Call it df.

df<- olympic_gymnasts|>
select(name, sex, age, team, year, medalist)

Question 2: From df create df2 that only have year of 2008 2012, and 2016

df2 <- olympic_gymnasts|>
filter(year %in% c(2008,2012,2016))

Question 3 Group by these three years (2008,2012, and 2016) and summarize the mean of the age in each group.

df2 <- olympic_gymnasts|>
filter(year %in% c(2008,2012,2016))|>
group_by(year)|>
summarise(mean_age = mean(age))
print(df2)

## # A tibble: 3 × 2
##    year mean_age
##   <dbl>    <dbl>
## 1  2008     21.6
## 2  2012     21.9
## 3  2016     22.2

Question 4 Use olympic_gymnasts dataset, group by year, and find the mean of the age for each year, call this dataset oly_year. (optional after creating the dataset, find the minimum average age)

oly_year <- olympic_gymnasts|>
group_by(year)|>
summarise(mean_age = mean(age))

Question 5 This question is open ended. Create a question that requires you to use at least two verbs. Create a code that answers your question. Then below the chunk, reflect on your question choice and coding procedure

question 5 Use olympic_gymnasts dataset, find the average age of gymnast who won a medal, and group by gender.

# Your R code here

olympic_gymnasts|>
filter(medalist == TRUE)|>
group_by(sex)|>
summarise(avg_age = mean(age))

## # A tibble: 2 × 2
##   sex   avg_age
##   <chr>   <dbl>
## 1 F        20.2
## 2 M        24.9

Discussion: Enter your discussion of results here.

Using the olympic_gymnasts dataset, I got an average of 20 years old for the female medalist and an average of 24.9; which if you round it, is 25 years old for the male medalist. This suggests that females gymnasts tend to compete at higher levels at a much younger age than males gymnasts. In this case, at an average of 5 years younger than males. As you see, we get a high volume of medalist that are older which are male, which suggests that male gymnnasts tend to have higher performance levels as they get older, in contrast to females gymnast, who have high performance levels at younger ages.

With the coding procedure, you first when finding an average of a certain data set you need to put the exact name of said data said in this case it was, olympic_gymnasts. Then you have to end it with the pipe operator |>, to help the code flow in logical order! Then next the questions asks you to find the average of gymnast in the dataset that have won a medal. So next, you need to use the verb filter, and type in medalist which is the column that specifies to R, gymnast that have actually won a medal. To ensure that you get the rows with only the actual number of gymnast that won a medal you put == and TRUE, without it you might get a inaccurate number of gynmnast that didn’t win a medal depending on the dataset itself. All together this gives us the necessary data and only that, so we don’t get bombarded with the rest of the data. And of course, end it with the pipe operator to ensure the code runs in logical order. Thirdly, you are going to use the group_by verb, it will look like this: group_by(sex) and end it with the pipe operator to continue to ensure logical flow. This code allows us to make a column from the derived dataset and create it solely for in this case, gender “sex”. Lastly, to go ahead and collect the average age of these gymnast you are going to use the verb summarize(), this allows you to make a summary of the data, then within the parenthesis, you are going to have to tell it avg_age which is the name of your result and then connect it with an = to assign it a value. Then with mean verb and in parenthesis use age to get the calculation of the average age. This all will give you a subset of the data from olympic_gymnast which will give you only the selected information which is the average age of gymnast both female and male that won a medal.