Do not change anything in the following chunk
You will be working on olympic_gymnasts dataset. Do not change the code below:
olympics <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-07-27/olympics.csv')
olympic_gymnasts <- olympics %>%
filter(!is.na(age)) %>% # only keep athletes with known age
filter(sport == "Gymnastics") %>% # keep only gymnasts
mutate(
medalist = case_when( # add column for success in medaling
is.na(medal) ~ FALSE, # NA values go to FALSE
!is.na(medal) ~ TRUE # non-NA values (Gold, Silver, Bronze) go to TRUE
)
)
More information about the dataset can be found at
https://github.com/rfordatascience/tidytuesday/blob/master/data/2021/2021-07-27/readme.md
Question 1: Create a subset dataset with the following columns only: name, sex, age, team, year and medalist. Call it df.
df<- olympic_gymnasts|>
select(name, sex, age, team, year, medalist)
df
## # A tibble: 25,528 × 6
## name sex age team year medalist
## <chr> <chr> <dbl> <chr> <dbl> <lgl>
## 1 Paavo Johannes Aaltonen M 28 Finland 1948 TRUE
## 2 Paavo Johannes Aaltonen M 28 Finland 1948 TRUE
## 3 Paavo Johannes Aaltonen M 28 Finland 1948 FALSE
## 4 Paavo Johannes Aaltonen M 28 Finland 1948 TRUE
## 5 Paavo Johannes Aaltonen M 28 Finland 1948 FALSE
## 6 Paavo Johannes Aaltonen M 28 Finland 1948 FALSE
## 7 Paavo Johannes Aaltonen M 28 Finland 1948 FALSE
## 8 Paavo Johannes Aaltonen M 28 Finland 1948 TRUE
## 9 Paavo Johannes Aaltonen M 32 Finland 1952 FALSE
## 10 Paavo Johannes Aaltonen M 32 Finland 1952 TRUE
## # ℹ 25,518 more rows
Question 2: From df create df2 that only have year of 2008 2012, and 2016
df2 <- df |>
filter(year == c(2008,2012,2016))
df2
## # A tibble: 886 × 6
## name sex age team year medalist
## <chr> <chr> <dbl> <chr> <dbl> <lgl>
## 1 Nstor Abad Sanjun M 23 Spain 2016 FALSE
## 2 Nstor Abad Sanjun M 23 Spain 2016 FALSE
## 3 Katja Abel F 25 Germany 2008 FALSE
## 4 Denis Mikhaylovich Ablyazin M 19 Russia 2012 TRUE
## 5 Denis Mikhaylovich Ablyazin M 19 Russia 2012 FALSE
## 6 Denis Mikhaylovich Ablyazin M 24 Russia 2016 TRUE
## 7 Denis Mikhaylovich Ablyazin M 24 Russia 2016 TRUE
## 8 Andreea Roxana Acatrinei F 16 Romania 2008 TRUE
## 9 Jonna Eva-Maj Adlerteg F 17 Sweden 2012 FALSE
## 10 Kseniya Dmitriyevna Afanasyeva F 16 Russia 2008 FALSE
## # ℹ 876 more rows
Question 3 Group by these three years (2008,2012, and 2016) and summarize the mean of the age in each group.
df2 |>
group_by(year) |>
summarize(
mean_age = mean(age)
)
## # A tibble: 3 × 2
## year mean_age
## <dbl> <dbl>
## 1 2008 21.7
## 2 2012 22.0
## 3 2016 22.2
Question 4 Use olympic_gymnasts dataset, group by year, and find the mean of the age for each year, call this dataset oly_year. (optional after creating the dataset, find the minimum average age)
oly_year <- olympic_gymnasts |>
group_by(year) |>
summarize(
mean_age = mean(age),
min_age = min(age)
)
oly_year
## # A tibble: 29 × 3
## year mean_age min_age
## <dbl> <dbl> <dbl>
## 1 1896 24.3 10
## 2 1900 22.2 17
## 3 1904 25.1 18
## 4 1906 24.7 14
## 5 1908 23.2 16
## 6 1912 24.2 18
## 7 1920 26.7 17
## 8 1924 27.6 19
## 9 1928 25.6 11
## 10 1932 23.9 15
## # ℹ 19 more rows
mean(oly_year$min_age)
## [1] 14.58621
Question 5 This question is open ended. Create a question that requires you to use at least two verbs. Create a code that answers your question. Then below the chunk, reflect on your question choice and coding procedure
Find the average physical characteristics of Olympic gold, silver, and bronze medalists.
# Your R code here
medalists <- olympic_gymnasts |>
filter(medalist == T) |>
group_by(medal) |>
summarize(
n = n(),
mean_age = mean(age),
mean_height = mean(height, na.rm = T),
mean_weight = mean(weight, na.rm = T)
)
medalists
## # A tibble: 3 × 5
## medal n mean_age mean_height mean_weight
## <chr> <int> <dbl> <dbl> <dbl>
## 1 Bronze 675 23.2 162. 55.7
## 2 Gold 785 23.6 161. 54.7
## 3 Silver 727 23.4 161. 54.9
Discussion: Enter your discussion of results here.
I chose my question to determine if physical characteristics have a significant impact on which medal a medalist would receive. To investigate this, I filtered the olympic gymnast dataset to only include those who earned medals, and I grouped the data by the three medals. I then calculated the average age, height, and weight of gymnasts in their respective medal groups. From the table, I see there is no significant difference in physical characteristics between the three groups of medalists.