Do not change anything in the following chunk
You will be working on olympic_gymnasts dataset. Do not change the code below:
olympics <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-07-27/olympics.csv')
olympic_gymnasts <- olympics %>%
filter(!is.na(age)) %>% # only keep athletes with known age
filter(sport == "Gymnastics") %>% # keep only gymnasts
mutate(
medalist = case_when( # add column for success in medaling
is.na(medal) ~ FALSE, # NA values go to FALSE
!is.na(medal) ~ TRUE # non-NA values (Gold, Silver, Bronze) go to TRUE
)
)
More information about the dataset can be found at
https://github.com/rfordatascience/tidytuesday/blob/master/data/2021/2021-07-27/readme.md
Question 1: Create a subset dataset with the following columns only: name, sex, age, team, year and medalist. Call it df.
df<- olympic_gymnasts|>
select(name, sex, age, team, year, medalist)
Question 2: From df create df2 that only have year of 2008 2012, and 2016
df2 <- filter(df, year %in% c(2008, 2012, 2016))
Question 3 Group by these three years (2008,2012, and 2016) and summarize the mean of the age in each group.
df2 |>
group_by(year) |>
summarize(mean(age))
## # A tibble: 3 × 2
## year `mean(age)`
## <dbl> <dbl>
## 1 2008 21.6
## 2 2012 21.9
## 3 2016 22.2
Question 4 Use olympic_gymnasts dataset, group by year, and find the mean of the age for each year, call this dataset oly_year. (optional after creating the dataset, find the minimum average age)
oly_year <- olympic_gymnasts |>
group_by(year) |>
summarize(mean(age))
summary(oly_year)
## year mean(age)
## Min. :1896 Min. :19.87
## 1st Qu.:1924 1st Qu.:21.29
## Median :1960 Median :23.16
## Mean :1957 Mean :23.18
## 3rd Qu.:1988 3rd Qu.:24.76
## Max. :2016 Max. :27.83
Question 5 This question is open ended. Create a question that requires you to use at least two verbs. Create a code that answers your question. Then below the chunk, reflect on your question choice and coding procedure.
Create a new dataset called CAN_olympic_gymnasts by filtering out all Canadian gymnasts from the olympic_gymnasts dataset, and then summarize the average age, height, and weight by year of the new dataset.
# Your R code here
CAN_olympic_gymnasts <- filter(olympic_gymnasts, team %in% "Canada")
CAN_olympic_gymnasts |>
group_by(year) |>
summarize(
average_age = mean(age),
average_height = mean(height),
average_weight = mean(weight)
)
## # A tibble: 16 × 4
## year average_age average_height average_weight
## <dbl> <dbl> <dbl> <dbl>
## 1 1908 20.5 NA NA
## 2 1956 19.2 NA NA
## 3 1960 19.4 161. 56.9
## 4 1964 24.7 168. 65.8
## 5 1968 21.9 167. 59.2
## 6 1972 20.4 NA NA
## 7 1976 19.2 NA NA
## 8 1984 20.5 160. 56.2
## 9 1988 20.0 161. 58.1
## 10 1992 20.5 160. 56.5
## 11 1996 20.5 160. 53.8
## 12 2000 17.8 158. 51
## 13 2004 20.6 163. 57.3
## 14 2008 23.5 162. 54.9
## 15 2012 18.2 154. 48.2
## 16 2016 19.8 155. 51.8
Discussion: Enter your discussion of results here. I chose this question because I wanted to try filtering and summarising based on nationality. I believe that I have achieved what I wanted, except there are some unexpected NAs. However, it looks like it comes from the original olympic_gymnasts dataframe, so there is nothing I can do about it.