Homework 3

Do not change anything in the following chunk

You will be working on olympic_gymnasts dataset. Do not change the code below:

olympics <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-07-27/olympics.csv')

olympic_gymnasts <- olympics %>% 
  filter(!is.na(age)) %>%             # only keep athletes with known age
  filter(sport == "Gymnastics") %>%   # keep only gymnasts
  mutate(
    medalist = case_when(             # add column for success in medaling
      is.na(medal) ~ FALSE,           # NA values go to FALSE
      !is.na(medal) ~ TRUE            # non-NA values (Gold, Silver, Bronze) go to TRUE
    )
  )

More information about the dataset can be found at

https://github.com/rfordatascience/tidytuesday/blob/master/data/2021/2021-07-27/readme.md

Question 1: Create a subset dataset with the following columns only: name, sex, age, team, year and medalist. Call it df.

head(olympic_gymnasts)

## # A tibble: 6 × 16
##      id name      sex     age height weight team  noc   games  year season city 
##   <dbl> <chr>     <chr> <dbl>  <dbl>  <dbl> <chr> <chr> <chr> <dbl> <chr>  <chr>
## 1    17 Paavo Jo… M        28    175     64 Finl… FIN   1948…  1948 Summer Lond…
## 2    17 Paavo Jo… M        28    175     64 Finl… FIN   1948…  1948 Summer Lond…
## 3    17 Paavo Jo… M        28    175     64 Finl… FIN   1948…  1948 Summer Lond…
## 4    17 Paavo Jo… M        28    175     64 Finl… FIN   1948…  1948 Summer Lond…
## 5    17 Paavo Jo… M        28    175     64 Finl… FIN   1948…  1948 Summer Lond…
## 6    17 Paavo Jo… M        28    175     64 Finl… FIN   1948…  1948 Summer Lond…
## # ℹ 4 more variables: sport <chr>, event <chr>, medal <chr>, medalist <lgl>

df<- olympic_gymnasts|>
  select(name, sex, age, team, year, medalist)
df

## # A tibble: 25,528 × 6
##    name                    sex     age team     year medalist
##    <chr>                   <chr> <dbl> <chr>   <dbl> <lgl>   
##  1 Paavo Johannes Aaltonen M        28 Finland  1948 TRUE    
##  2 Paavo Johannes Aaltonen M        28 Finland  1948 TRUE    
##  3 Paavo Johannes Aaltonen M        28 Finland  1948 FALSE   
##  4 Paavo Johannes Aaltonen M        28 Finland  1948 TRUE    
##  5 Paavo Johannes Aaltonen M        28 Finland  1948 FALSE   
##  6 Paavo Johannes Aaltonen M        28 Finland  1948 FALSE   
##  7 Paavo Johannes Aaltonen M        28 Finland  1948 FALSE   
##  8 Paavo Johannes Aaltonen M        28 Finland  1948 TRUE    
##  9 Paavo Johannes Aaltonen M        32 Finland  1952 FALSE   
## 10 Paavo Johannes Aaltonen M        32 Finland  1952 TRUE    
## # ℹ 25,518 more rows

Question 2: From df create df2 that only have year of 2008 2012, and 2016

df2 <- olympic_gymnasts |>
  filter(year == c(2008, 2012, 2016)) |>
  select(name, sex, age, team, year, medalist)

## Warning: There was 1 warning in `filter()`.
## ℹ In argument: `year == c(2008, 2012, 2016)`.
## Caused by warning in `year == c(2008, 2012, 2016)`:
## ! longer object length is not a multiple of shorter object length

df2

## # A tibble: 886 × 6
##    name                           sex     age team     year medalist
##    <chr>                          <chr> <dbl> <chr>   <dbl> <lgl>   
##  1 Nstor Abad Sanjun              M        23 Spain    2016 FALSE   
##  2 Nstor Abad Sanjun              M        23 Spain    2016 FALSE   
##  3 Katja Abel                     F        25 Germany  2008 FALSE   
##  4 Denis Mikhaylovich Ablyazin    M        19 Russia   2012 TRUE    
##  5 Denis Mikhaylovich Ablyazin    M        19 Russia   2012 FALSE   
##  6 Denis Mikhaylovich Ablyazin    M        24 Russia   2016 TRUE    
##  7 Denis Mikhaylovich Ablyazin    M        24 Russia   2016 TRUE    
##  8 Andreea Roxana Acatrinei       F        16 Romania  2008 TRUE    
##  9 Jonna Eva-Maj Adlerteg         F        17 Sweden   2012 FALSE   
## 10 Kseniya Dmitriyevna Afanasyeva F        16 Russia   2008 FALSE   
## # ℹ 876 more rows

Question 3 Group by these three years (2008,2012, and 2016) and summarize the mean of the age in each group.

years_grouped <- df2 |>
  group_by(df2$year) |>
  summarise(mean(age))

years_grouped

## # A tibble: 3 × 2
##   `df2$year` `mean(age)`
##        <dbl>       <dbl>
## 1       2008        21.7
## 2       2012        22.0
## 3       2016        22.2

Question 4 Use olympic_gymnasts dataset, group by year, and find the mean of the age for each year, call this dataset oly_year. (optional after creating the dataset, find the minimum average age)

oly_year <- olympic_gymnasts |>
  group_by(year) |>
  summarise(mean(age))

oly_year

## # A tibble: 29 × 2
##     year `mean(age)`
##    <dbl>       <dbl>
##  1  1896        24.3
##  2  1900        22.2
##  3  1904        25.1
##  4  1906        24.7
##  5  1908        23.2
##  6  1912        24.2
##  7  1920        26.7
##  8  1924        27.6
##  9  1928        25.6
## 10  1932        23.9
## # ℹ 19 more rows

youngest_average <- min(oly_year$`mean(age)`)

youngest_average

## [1] 19.86606

Question 5 This question is open ended. Create a question that requires you to use at least two verbs. Create a code that answers your question. Then below the chunk, reflect on your question choice and coding procedure

# Your R code here
 #Count the total numbers of medals men have in comparison to women

            
men_vs_women <- olympic_gymnasts |>
  group_by(sex) |>
  summarise(total_medalists <- sum(medalist, na.rm = TRUE))



#Find the first year that both sexes won medals
first_medal_by_sex <- olympic_gymnasts |>
  filter(medalist == TRUE) |>
  group_by(sex) |>
  summarise(first_year <- min(year))



#Subtracts elements stored in previous code block.
mens_head_start <- first_medal_by_sex |>
  pull(("first_year <- min(year)"))

years_men_had_over_women <- (mens_head_start[1] - mens_head_start[2])


#Uses answer from previous code block and runs it through a function to present a final statement. 
final_statement <-function()
  cat("Eventhough men have won more medals in the Olympics. Men have also had a ", years_men_had_over_women, "year head start." )

final_statement()

## Eventhough men have won more medals in the Olympics. Men have also had a  32 year head start.

Discussion: Enter your discussion of results here.

# I thought it would be interesting to compare the women to men from the data set.  The results showcased an inate difference between women and men in terms of total medals collected. Given this, I thought it was important to check the point at which women even began earning medals. 

#My process was similar to solving the earlier questions. Because I was doing addition I thought it was important to remove any NAs as they might disrupt the count. 

#The second program, took a bit more time to get through. My structure was always similar to the final result, but I couldnt get it to load initially because it kept stating that there needed to be a logical in the filter. I tried various things. But I did end up asking AI (ChatGPT) and looking back at some of the old in class lessons to try to find out what it could mean by this. I figured out I needed a TRUE or FALSE attached to the filtered element. This was the only place where AI was used in this HW assignment 

#I finished off this answer by storing a final figure, that I used ina. function to leave a final statement. This did not use AI.

# Also just added this after the fact.