Do not change anything in the following chunk
You will be working on olympic_gymnasts dataset. Do not change the code below:
olympics <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-07-27/olympics.csv')
olympic_gymnasts <- olympics %>%
filter(!is.na(age)) %>% # only keep athletes with known age
filter(sport == "Gymnastics") %>% # keep only gymnasts
mutate(
medalist = case_when( # add column for success in medaling
is.na(medal) ~ FALSE, # NA values go to FALSE
!is.na(medal) ~ TRUE # non-NA values (Gold, Silver, Bronze) go to TRUE
)
)
More information about the dataset can be found at
https://github.com/rfordatascience/tidytuesday/blob/master/data/2021/2021-07-27/readme.md
Question 1: Create a subset dataset with the following columns only: name, sex, age, team, year and medalist. Call it df.
df<- olympic_gymnasts|>
select(name, sex, age, team, year, medalist)
head(df)
## # A tibble: 6 × 6
## name sex age team year medalist
## <chr> <chr> <dbl> <chr> <dbl> <lgl>
## 1 Paavo Johannes Aaltonen M 28 Finland 1948 TRUE
## 2 Paavo Johannes Aaltonen M 28 Finland 1948 TRUE
## 3 Paavo Johannes Aaltonen M 28 Finland 1948 FALSE
## 4 Paavo Johannes Aaltonen M 28 Finland 1948 TRUE
## 5 Paavo Johannes Aaltonen M 28 Finland 1948 FALSE
## 6 Paavo Johannes Aaltonen M 28 Finland 1948 FALSE
Question 2: From df create df2 that only have year of 2008 2012, and 2016
df2 <- olympic_gymnasts |>
select(name, sex, age, team, year, medalist) |>
filter(year == c("2008", "2012", "2016"))
## Warning: There was 1 warning in `filter()`.
## ℹ In argument: `year == c("2008", "2012", "2016")`.
## Caused by warning in `year == c("2008", "2012", "2016")`:
## ! longer object length is not a multiple of shorter object length
head(df2)
## # A tibble: 6 × 6
## name sex age team year medalist
## <chr> <chr> <dbl> <chr> <dbl> <lgl>
## 1 Nstor Abad Sanjun M 23 Spain 2016 FALSE
## 2 Nstor Abad Sanjun M 23 Spain 2016 FALSE
## 3 Katja Abel F 25 Germany 2008 FALSE
## 4 Denis Mikhaylovich Ablyazin M 19 Russia 2012 TRUE
## 5 Denis Mikhaylovich Ablyazin M 19 Russia 2012 FALSE
## 6 Denis Mikhaylovich Ablyazin M 24 Russia 2016 TRUE
Question 3 Group by these three years (2008,2012, and 2016) and summarize the mean of the age in each group.
q3 <- df2 |>
group_by(year) |>
summarize(mean = mean(age))
q3
## # A tibble: 3 × 2
## year mean
## <dbl> <dbl>
## 1 2008 21.7
## 2 2012 22.0
## 3 2016 22.2
Question 4 Use olympic_gymnasts dataset, group by year, and find the mean of the age for each year, call this dataset oly_year. (optional after creating the dataset, find the minimum average age)
oly_year <- olympic_gymnasts |>
group_by(year) |>
summarize(mean = mean(age))
head(oly_year)
## # A tibble: 6 × 2
## year mean
## <dbl> <dbl>
## 1 1896 24.3
## 2 1900 22.2
## 3 1904 25.1
## 4 1906 24.7
## 5 1908 23.2
## 6 1912 24.2
Question 5 This question is open ended. Create a question that requires you to use at least two verbs. Create a code that answers your question. Then below the chunk, reflect on your question choice and coding procedure
What is the ratio between the gold medals of males versus females?
# Your R code here
gold_medals <- olympic_gymnasts |>
select(sex, medal) |>
filter(medal == "Gold") |>
group_by(sex) |>
count(medal)
gold_medals
## # A tibble: 2 × 3
## # Groups: sex [2]
## sex medal n
## <chr> <chr> <int>
## 1 F Gold 234
## 2 M Gold 551
Discussion: I chose this question as I was curious myself on what the ratio between gold medals of males versus females is. I wanted to see how different the values would be for each gender. I named this tibble gold_medals as I am primarily focusing on the amount of gold medals each gender of gymnasts obtained at the Olympics. I selected the two variables of the Olympic gymnast data set that were needed in the scenario of the question. Then I filtered the medal selection to gold as well as grouped by sex and counted the medals. Using this code, I answered my question by observing that Male gymnasts had more than double the amount of gold medals than female gymnasts in the Olympics. Males had 551 gold medals compared to the females having 234 gold medals.