Do not change anything in the following chunk
You will be working on olympic_gymnasts dataset. Do not change the code below:
olympics <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-07-27/olympics.csv')
olympic_gymnasts <- olympics %>%
filter(!is.na(age)) %>% # only keep athletes with known age
filter(sport == "Gymnastics") %>% # keep only gymnasts
mutate(
medalist = case_when( # add column for success in medaling
is.na(medal) ~ FALSE, # NA values go to FALSE
!is.na(medal) ~ TRUE # non-NA values (Gold, Silver, Bronze) go to TRUE
)
)
More information about the dataset can be found at
https://github.com/rfordatascience/tidytuesday/blob/master/data/2021/2021-07-27/readme.md
Question 1: Create a subset dataset with the following columns only: name, sex, age, team, year and medalist. Call it df.
head(olympic_gymnasts)
## # A tibble: 6 × 16
## id name sex age height weight team noc games year season city
## <dbl> <chr> <chr> <dbl> <dbl> <dbl> <chr> <chr> <chr> <dbl> <chr> <chr>
## 1 17 Paavo Jo… M 28 175 64 Finl… FIN 1948… 1948 Summer Lond…
## 2 17 Paavo Jo… M 28 175 64 Finl… FIN 1948… 1948 Summer Lond…
## 3 17 Paavo Jo… M 28 175 64 Finl… FIN 1948… 1948 Summer Lond…
## 4 17 Paavo Jo… M 28 175 64 Finl… FIN 1948… 1948 Summer Lond…
## 5 17 Paavo Jo… M 28 175 64 Finl… FIN 1948… 1948 Summer Lond…
## 6 17 Paavo Jo… M 28 175 64 Finl… FIN 1948… 1948 Summer Lond…
## # ℹ 4 more variables: sport <chr>, event <chr>, medal <chr>, medalist <lgl>
df<- olympic_gymnasts|>
select(name, sex, age, team, year, medalist)
df
## # A tibble: 25,528 × 6
## name sex age team year medalist
## <chr> <chr> <dbl> <chr> <dbl> <lgl>
## 1 Paavo Johannes Aaltonen M 28 Finland 1948 TRUE
## 2 Paavo Johannes Aaltonen M 28 Finland 1948 TRUE
## 3 Paavo Johannes Aaltonen M 28 Finland 1948 FALSE
## 4 Paavo Johannes Aaltonen M 28 Finland 1948 TRUE
## 5 Paavo Johannes Aaltonen M 28 Finland 1948 FALSE
## 6 Paavo Johannes Aaltonen M 28 Finland 1948 FALSE
## 7 Paavo Johannes Aaltonen M 28 Finland 1948 FALSE
## 8 Paavo Johannes Aaltonen M 28 Finland 1948 TRUE
## 9 Paavo Johannes Aaltonen M 32 Finland 1952 FALSE
## 10 Paavo Johannes Aaltonen M 32 Finland 1952 TRUE
## # ℹ 25,518 more rows
Question 2: From df create df2 that only have year of 2008 2012, and 2016
df2 <- olympic_gymnasts |>
filter(year == c(2008, 2012, 2016)) |>
select(name, sex, age, team, year, medalist)
## Warning: There was 1 warning in `filter()`.
## ℹ In argument: `year == c(2008, 2012, 2016)`.
## Caused by warning in `year == c(2008, 2012, 2016)`:
## ! longer object length is not a multiple of shorter object length
df2
## # A tibble: 886 × 6
## name sex age team year medalist
## <chr> <chr> <dbl> <chr> <dbl> <lgl>
## 1 Nstor Abad Sanjun M 23 Spain 2016 FALSE
## 2 Nstor Abad Sanjun M 23 Spain 2016 FALSE
## 3 Katja Abel F 25 Germany 2008 FALSE
## 4 Denis Mikhaylovich Ablyazin M 19 Russia 2012 TRUE
## 5 Denis Mikhaylovich Ablyazin M 19 Russia 2012 FALSE
## 6 Denis Mikhaylovich Ablyazin M 24 Russia 2016 TRUE
## 7 Denis Mikhaylovich Ablyazin M 24 Russia 2016 TRUE
## 8 Andreea Roxana Acatrinei F 16 Romania 2008 TRUE
## 9 Jonna Eva-Maj Adlerteg F 17 Sweden 2012 FALSE
## 10 Kseniya Dmitriyevna Afanasyeva F 16 Russia 2008 FALSE
## # ℹ 876 more rows
Question 3 Group by these three years (2008,2012, and 2016) and summarize the mean of the age in each group.
years_grouped <- df2 |>
group_by(df2$year) |>
summarise(mean(age))
years_grouped
## # A tibble: 3 × 2
## `df2$year` `mean(age)`
## <dbl> <dbl>
## 1 2008 21.7
## 2 2012 22.0
## 3 2016 22.2
Question 4 Use olympic_gymnasts dataset, group by year, and find the mean of the age for each year, call this dataset oly_year. (optional after creating the dataset, find the minimum average age)
oly_year <- olympic_gymnasts |>
group_by(year) |>
summarise(mean(age))
oly_year
## # A tibble: 29 × 2
## year `mean(age)`
## <dbl> <dbl>
## 1 1896 24.3
## 2 1900 22.2
## 3 1904 25.1
## 4 1906 24.7
## 5 1908 23.2
## 6 1912 24.2
## 7 1920 26.7
## 8 1924 27.6
## 9 1928 25.6
## 10 1932 23.9
## # ℹ 19 more rows
youngest_average <- min(oly_year$`mean(age)`)
youngest_average
## [1] 19.86606
Question 5 This question is open ended. Create a question that requires you to use at least two verbs. Create a code that answers your question. Then below the chunk, reflect on your question choice and coding procedure
# Your R code here
#Count the total numbers of medals men have in comparison to women
men_vs_women <- olympic_gymnasts |>
group_by(sex) |>
summarise(total_medalists <- sum(medalist, na.rm = TRUE))
#Find the first year that both sexes won medals
first_medal_by_sex <- olympic_gymnasts |>
filter(medalist == TRUE) |>
group_by(sex) |>
summarise(first_year <- min(year))
#Subtracts elements stored in previous code block.
mens_head_start <- first_medal_by_sex |>
pull(("first_year <- min(year)"))
years_men_had_over_women <- (mens_head_start[1] - mens_head_start[2])
#Uses answer from previous code block and runs it through a function to present a final statement.
final_statement <-function()
cat("Eventhough men have won more medals in the Olympics. Men have also had a ", years_men_had_over_women, "year head start." )
final_statement()
## Eventhough men have won more medals in the Olympics. Men have also had a 32 year head start.
Discussion: Enter your discussion of results here.
# I thought it would be interesting to compare the women to men from the data set. The results showcased an inate difference between women and men in terms of total medals collected. Given this, I thought it was important to check the point at which women even began earning medals.
#My process was similar to solving the earlier questions. Because I was doing addition I thought it was important to remove any NAs as they might disrupt the count.
#The second program, took a bit more time to get through. My structure was always similar to the final result, but I couldnt get it to load initially because it kept stating that there needed to be a logical in the filter. I tried various things. But I did end up asking AI (ChatGPT) and looking back at some of the old in class lessons to try to find out what it could mean by this. I figured out I needed a TRUE or FALSE attached to the filtered element. This was the only place where AI was used in this HW assignment
#I finished off this answer by storing a final figure, that I used ina. function to leave a final statement. This did not use AI.
# Also just added this after the fact.