Do not change anything in the following chunk
You will be working on olympic_gymnasts dataset. Do not change the code below:
olympics <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-07-27/olympics.csv')
olympic_gymnasts <- olympics %>%
filter(!is.na(age)) %>% # only keep athletes with known age
filter(sport == "Gymnastics") %>% # keep only gymnasts
mutate(
medalist = case_when( # add column for success in medaling
is.na(medal) ~ FALSE, # NA values go to FALSE
!is.na(medal) ~ TRUE # non-NA values (Gold, Silver, Bronze) go to TRUE
)
)
More information about the dataset can be found at
https://github.com/rfordatascience/tidytuesday/blob/master/data/2021/2021-07-27/readme.md
Question 1: Create a subset dataset with the following columns only: name, sex, age, team, year and medalist. Call it df.
df<- olympic_gymnasts[, c("name", "sex", "age", "team", "year", "medalist")]
head(df)
## # A tibble: 6 × 6
## name sex age team year medalist
## <chr> <chr> <dbl> <chr> <dbl> <lgl>
## 1 Paavo Johannes Aaltonen M 28 Finland 1948 TRUE
## 2 Paavo Johannes Aaltonen M 28 Finland 1948 TRUE
## 3 Paavo Johannes Aaltonen M 28 Finland 1948 FALSE
## 4 Paavo Johannes Aaltonen M 28 Finland 1948 TRUE
## 5 Paavo Johannes Aaltonen M 28 Finland 1948 FALSE
## 6 Paavo Johannes Aaltonen M 28 Finland 1948 FALSE
Question 2: From df create df2 that only have year of 2008 2012, and 2016
df2 <- df[df$year %in% c(2008, 2012, 2016), ]
head(df2)
## # A tibble: 6 × 6
## name sex age team year medalist
## <chr> <chr> <dbl> <chr> <dbl> <lgl>
## 1 Nstor Abad Sanjun M 23 Spain 2016 FALSE
## 2 Nstor Abad Sanjun M 23 Spain 2016 FALSE
## 3 Nstor Abad Sanjun M 23 Spain 2016 FALSE
## 4 Nstor Abad Sanjun M 23 Spain 2016 FALSE
## 5 Nstor Abad Sanjun M 23 Spain 2016 FALSE
## 6 Nstor Abad Sanjun M 23 Spain 2016 FALSE
Question 3 Group by these three years (2008,2012, and 2016) and summarize the mean of the age in each group.
library(dplyr)
df |>
filter(year %in% c(2008, 2012, 2016)) |>
group_by(year) |>
summarize(mean_age = mean(age))
## # A tibble: 3 × 2
## year mean_age
## <dbl> <dbl>
## 1 2008 21.6
## 2 2012 21.9
## 3 2016 22.2
Question 4 Use olympic_gymnasts dataset, group by year, and find the mean of the age for each year, call this dataset oly_year. (optional after creating the dataset, find the minimum average age)
oly_year <- olympic_gymnasts |>
group_by(year) |>
summarize(mean_age = mean(age))
oly_year
## # A tibble: 29 × 2
## year mean_age
## <dbl> <dbl>
## 1 1896 24.3
## 2 1900 22.2
## 3 1904 25.1
## 4 1906 24.7
## 5 1908 23.2
## 6 1912 24.2
## 7 1920 26.7
## 8 1924 27.6
## 9 1928 25.6
## 10 1932 23.9
## # ℹ 19 more rows
oly_year |>
filter(mean_age == min(mean_age))
## # A tibble: 1 × 2
## year mean_age
## <dbl> <dbl>
## 1 1988 19.9
Question 5 This question is open ended. Create a question that requires you to use at least two verbs. Create a code that answers your question. Then below the chunk, reflect on your question choice and coding procedure
Filter the gymnasts from the year 2016 and show their names
# Your R code here
gymnasts_2016 <- olympic_gymnasts |>
filter(year == 2016)
gymnasts_2016
## # A tibble: 861 × 16
## id name sex age height weight team noc games year season city
## <dbl> <chr> <chr> <dbl> <dbl> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr>
## 1 51 Nstor A… M 23 167 64 Spain ESP 2016… 2016 Summer Rio …
## 2 51 Nstor A… M 23 167 64 Spain ESP 2016… 2016 Summer Rio …
## 3 51 Nstor A… M 23 167 64 Spain ESP 2016… 2016 Summer Rio …
## 4 51 Nstor A… M 23 167 64 Spain ESP 2016… 2016 Summer Rio …
## 5 51 Nstor A… M 23 167 64 Spain ESP 2016… 2016 Summer Rio …
## 6 51 Nstor A… M 23 167 64 Spain ESP 2016… 2016 Summer Rio …
## 7 455 Denis M… M 24 161 62 Russ… RUS 2016… 2016 Summer Rio …
## 8 455 Denis M… M 24 161 62 Russ… RUS 2016… 2016 Summer Rio …
## 9 455 Denis M… M 24 161 62 Russ… RUS 2016… 2016 Summer Rio …
## 10 455 Denis M… M 24 161 62 Russ… RUS 2016… 2016 Summer Rio …
## # ℹ 851 more rows
## # ℹ 4 more variables: sport <chr>, event <chr>, medal <chr>, medalist <lgl>
gymnasts_names <- gymnasts_2016 |>
select(name)
gymnasts_names
## # A tibble: 861 × 1
## name
## <chr>
## 1 Nstor Abad Sanjun
## 2 Nstor Abad Sanjun
## 3 Nstor Abad Sanjun
## 4 Nstor Abad Sanjun
## 5 Nstor Abad Sanjun
## 6 Nstor Abad Sanjun
## 7 Denis Mikhaylovich Ablyazin
## 8 Denis Mikhaylovich Ablyazin
## 9 Denis Mikhaylovich Ablyazin
## 10 Denis Mikhaylovich Ablyazin
## # ℹ 851 more rows
Discussion: Enter your discussion of results here.
My question is about the athletes that competed on 2016, and the names of this group. I use FILTER verb to select only the rows where the years were 2016. Then I used SELECT to keep the names of only those gymnasts. This way I work with specific parts of the dataset