Homework 3

Do not change anything in the following chunk

You will be working on olympic_gymnasts dataset. Do not change the code below:

olympics <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-07-27/olympics.csv')

olympic_gymnasts <- olympics %>% 
  filter(!is.na(age)) %>%             # only keep athletes with known age
  filter(sport == "Gymnastics") %>%   # keep only gymnasts
  mutate(
    medalist = case_when(             # add column for success in medaling
      is.na(medal) ~ FALSE,           # NA values go to FALSE
      !is.na(medal) ~ TRUE            # non-NA values (Gold, Silver, Bronze) go to TRUE
    )
  )

More information about the dataset can be found at

https://github.com/rfordatascience/tidytuesday/blob/master/data/2021/2021-07-27/readme.md

Question 1: Create a subset dataset with the following columns only: name, sex, age, team, year and medalist. Call it df.

df<- olympic_gymnasts|>
  select(name, sex, age,team,year, medalist)
df

Question 2: From df create df2 that only have year of 2008 2012, and 2016

df2 <- df |>
  filter(year %in% c(2008,2012,2016))
df2

Question 3 Group by these three years (2008,2012, and 2016) and summarize the mean of the age in each group.

df|>
  group_by(year %in% c(2008,2012,2016)) |>
  summarise(mean(age))

df

Question 4 Use olympic_gymnasts dataset, group by year, and find the mean of the age for each year, call this dataset oly_year. (optional after creating the dataset, find the minimum average age)

oly_year <- olympic_gymnasts |>
  group_by(year) |>
  summarise(mean(age))
oly_year

Question 5 This question is open ended. Create a question that requires you to use at least two verbs. Create a code that answers your question. Then below the chunk, reflect on your question choice and coding procedure

My Question Use Olympic_gymnasts data-set, group by teams(country), filter the gymnasts who got a gold medal, get the the total amount of gold medals each country got, then only display the top 10 countries with the most gold medals.

# Your R code here
oly_teams_gold_medals <- olympic_gymnasts |>
  
  filter(medal == "Gold") |> # filtering teams with gold medals only
  
  # Count function counts the total of gold medals each country earned throughout the years, the "name=" part     renames the variable
  count(team, name = "Gold_Medal_Total") |>  
  
  arrange(desc(Gold_Medal_Total)) |> # arranges team in from highest to lowest
  
  slice_head(n = 10) # displays the top 10 countries with the most medals

oly_teams_gold_medals

Discussion: Enter your discussion of results here. As a result of coding all of this, we see that the top ten countries are…

Soviet Union, Sweden, Italy, Japan, United States, China, Germany, Norway, Romania, and Switzerland. At first I did the top 5 countries, and it just shocked me to see that the U.S had more gold medals than China, so that’s why I decided to display the top ten, to see if China even made it into the top 10 countries with the most gold medals. Keep in mind this is only from years 1896-2016, so the number definitely differ from how many gold medals each countries have accumulated, but still very interesting to see. Troubles I had when coding my question, I actually changed my questions 3 times because I couldn’t figure out how to code them. Another trouble I had was the count() function, honestly I forgot that existed and I was using group_by() and summarise and it wasn’t compiling correctly, but when I figured out the count function can do both for me(group teams into 1 country and getting the sum for gold medals for each country, I felt success). Overall great learning experience trying to code this, made me think a lot.