You will be working on olympic_gymnasts dataset. Please DO NOT change the code below:

olympics <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-07-27/olympics.csv')

olympic_gymnasts <- olympics %>% 
  filter(!is.na(age)) %>%             # only keep athletes with known age
  filter(sport == "Gymnastics") %>%   # keep only gymnasts
  mutate(
    medalist = case_when(             # add column for success in medaling
      is.na(medal) ~ FALSE,           # NA values go to FALSE
      !is.na(medal) ~ TRUE            # non-NA values (Gold, Silver, Bronze) go to TRUE
    )
  )

More information about the dataset can be found at

https://github.com/rfordatascience/tidytuesday/blob/master/data/2021/2021-07-27/readme.md

Question 1: Create a subset dataset with the following columns only: name, sex, age, team, year and medalist. Call it df.

df <- olympics %>%
 select(name, sex, age, team, year, medal)
print(df)
## # A tibble: 271,116 × 6
##    name                     sex     age team            year medal
##    <chr>                    <chr> <dbl> <chr>          <dbl> <chr>
##  1 A Dijiang                M        24 China           1992 <NA> 
##  2 A Lamusi                 M        23 China           2012 <NA> 
##  3 Gunnar Nielsen Aaby      M        24 Denmark         1920 <NA> 
##  4 Edgar Lindenau Aabye     M        34 Denmark/Sweden  1900 Gold 
##  5 Christine Jacoba Aaftink F        21 Netherlands     1988 <NA> 
##  6 Christine Jacoba Aaftink F        21 Netherlands     1988 <NA> 
##  7 Christine Jacoba Aaftink F        25 Netherlands     1992 <NA> 
##  8 Christine Jacoba Aaftink F        25 Netherlands     1992 <NA> 
##  9 Christine Jacoba Aaftink F        27 Netherlands     1994 <NA> 
## 10 Christine Jacoba Aaftink F        27 Netherlands     1994 <NA> 
## # ℹ 271,106 more rows

Question 2: From df create df2 that only have year of 2008 2012, and 2016

df2 <- df %>%
  filter(year == 2008 | year == 2012 | year == 2016)
print(df2)
## # A tibble: 40,210 × 6
##    name                               sex     age team     year medal
##    <chr>                              <chr> <dbl> <chr>   <dbl> <chr>
##  1 A Lamusi                           M        23 China    2012 <NA> 
##  2 Ragnhild Margrethe Aamodt          F        27 Norway   2008 Gold 
##  3 Andreea Aanei                      F        22 Romania  2016 <NA> 
##  4 Jamale (Djamel-) Aarrass (Ahrass-) M        30 France   2012 <NA> 
##  5 Abdelhak Aatakni                   M        24 Morocco  2012 <NA> 
##  6 Moonika Aava                       F        28 Estonia  2008 <NA> 
##  7 Nstor Abad Sanjun                  M        23 Spain    2016 <NA> 
##  8 Nstor Abad Sanjun                  M        23 Spain    2016 <NA> 
##  9 Nstor Abad Sanjun                  M        23 Spain    2016 <NA> 
## 10 Nstor Abad Sanjun                  M        23 Spain    2016 <NA> 
## # ℹ 40,200 more rows

Question 3 Group by these three years (2008,2012, and 2016) and sumarize the mean of the age.

by_year <- group_by(df2, year)
summarise(by_year, Mean = mean(age, na.rm = TRUE))
## # A tibble: 3 × 2
##    year  Mean
##   <dbl> <dbl>
## 1  2008  25.7
## 2  2012  26.0
## 3  2016  26.2

Question 4 Use olympic_gymnasts dataset, group by year, and find the mean of the age for each year, call this dataset oly_year. (optional after creating the dataset, find the minimum average age)

oly_year <- olympic_gymnasts %>%
  group_by(year) %>%
  summarise(age_mean = mean(age))
print(oly_year)
## # A tibble: 29 × 2
##     year age_mean
##    <dbl>    <dbl>
##  1  1896     24.3
##  2  1900     22.2
##  3  1904     25.1
##  4  1906     24.7
##  5  1908     23.2
##  6  1912     24.2
##  7  1920     26.7
##  8  1924     27.6
##  9  1928     25.6
## 10  1932     23.9
## # ℹ 19 more rows
min(oly_year)
## [1] 19.86606

Question 5 This question is open ended. Create a question that requires you to use at least to verbs. Create a code that answers your question. Then below the chunk, reflect on your question choice and coding procedure

The question is : What sport has on average older gold medal winners?

#What I want to see is the sport that has the oldest mean age of gold medal winners to do this I filtered to only gold medals. Then I selected age and sport grouped by sport and made the mean of age. With this I aranged in descending orfer and made the top 5.
gold_winners <- olympics %>%
  filter(medal == "Gold") %>%
  select(age, sport) %>%
  group_by(sport) %>%
  summarise(mean_age = mean(age, na.rm = TRUE)) %>%
  arrange(desc(mean_age)) %>%
  head(5)
gold_winners
## # A tibble: 5 × 2
##   sport            mean_age
##   <chr>               <dbl>
## 1 Roque                64  
## 2 Art Competitions     41.2
## 3 Alpinism             38.8
## 4 Polo                 36.0
## 5 Equestrianism        35.3

Discussion: Your discussion of results here.

With my analysis the top 5 sports that have tend to have older gold medal winners are Roque, Art Competitions, Alpinism, Polo and Equestrianism. And as I thought wen first thinking abought the possible answers all the top sports that have a higher age mean tend to be less athletically or depends on others athleticism such as equestrianism.