Do not change anything in the following chunk
You will be working on olympic_gymnasts dataset. Do not change the code below:
olympics <- readr::read_csv('https://raw.githubusercontent.com/rfordatascience/tidytuesday/master/data/2021/2021-07-27/olympics.csv')
olympic_gymnasts <- olympics %>%
filter(!is.na(age)) %>% # only keep athletes with known age
filter(sport == "Gymnastics") %>% # keep only gymnasts
mutate(
medalist = case_when( # add column for success in medaling
is.na(medal) ~ FALSE, # NA values go to FALSE
!is.na(medal) ~ TRUE # non-NA values (Gold, Silver, Bronze) go to TRUE
)
)
More information about the dataset can be found at
https://github.com/rfordatascience/tidytuesday/blob/master/data/2021/2021-07-27/readme.md
Question 1: Create a subset dataset with the following columns only: name, sex, age, team, year and medalist. Call it df.
df <- data.frame(
name= (olympic_gymnasts$name),
sex= (olympic_gymnasts$sex),
age= (olympic_gymnasts$age),
team= (olympic_gymnasts$team),
year= (olympic_gymnasts$year),
medalist= (olympic_gymnasts$medalist)
)
df<- olympic_gymnasts|>
select(name, sex, age, year)
df
## # A tibble: 25,528 × 4
## name sex age year
## <chr> <chr> <dbl> <dbl>
## 1 Paavo Johannes Aaltonen M 28 1948
## 2 Paavo Johannes Aaltonen M 28 1948
## 3 Paavo Johannes Aaltonen M 28 1948
## 4 Paavo Johannes Aaltonen M 28 1948
## 5 Paavo Johannes Aaltonen M 28 1948
## 6 Paavo Johannes Aaltonen M 28 1948
## 7 Paavo Johannes Aaltonen M 28 1948
## 8 Paavo Johannes Aaltonen M 28 1948
## 9 Paavo Johannes Aaltonen M 32 1952
## 10 Paavo Johannes Aaltonen M 32 1952
## # ℹ 25,518 more rows
Question 2: From df create df2 that only have year of 2008 2012, and 2016
df2 <- df |>
filter( year %in% c(2008, 2012, 2016))
df2
## # A tibble: 2,703 × 4
## name sex age year
## <chr> <chr> <dbl> <dbl>
## 1 Nstor Abad Sanjun M 23 2016
## 2 Nstor Abad Sanjun M 23 2016
## 3 Nstor Abad Sanjun M 23 2016
## 4 Nstor Abad Sanjun M 23 2016
## 5 Nstor Abad Sanjun M 23 2016
## 6 Nstor Abad Sanjun M 23 2016
## 7 Katja Abel F 25 2008
## 8 Katja Abel F 25 2008
## 9 Katja Abel F 25 2008
## 10 Katja Abel F 25 2008
## # ℹ 2,693 more rows
Question 3 Group by these three years (2008,2012, and 2016) and summarize the mean of the age in each group.
df2 <- df |>
filter( year %in% c(2008, 2012, 2016)) |>
group_by(year) |>
summarize(meanyears = mean(age))
df2
## # A tibble: 3 × 2
## year meanyears
## <dbl> <dbl>
## 1 2008 21.6
## 2 2012 21.9
## 3 2016 22.2
Question 4 Use olympic_gymnasts dataset, group by year, and find the mean of the age for each year, call this dataset oly_year. (optional after creating the dataset, find the minimum average age)
df <- olympic_gymnasts |> group_by(year) |>
summarize(meanyears = mean(age))
df
## # A tibble: 29 × 2
## year meanyears
## <dbl> <dbl>
## 1 1896 24.3
## 2 1900 22.2
## 3 1904 25.1
## 4 1906 24.7
## 5 1908 23.2
## 6 1912 24.2
## 7 1920 26.7
## 8 1924 27.6
## 9 1928 25.6
## 10 1932 23.9
## # ℹ 19 more rows
Question 5 This question is open ended. Create a question that requires you to use at least two verbs. Create a code that answers your question. Then below the chunk, reflect on your question choice and coding procedure.
Question: Which male olympic gymnasts won a gold medal and what were their weights in kg. Create a new column with their weights in kg.
# Your R code here
df <- olympic_gymnasts |>
select(name, medal, sex, weight) |>
filter(medal == "Gold", sex == "M") |>
mutate(weight_in_lbs = weight * 2.2)
df
## # A tibble: 551 × 5
## name medal sex weight weight_in_lbs
## <chr> <chr> <chr> <dbl> <dbl>
## 1 Paavo Johannes Aaltonen Gold M 64 141.
## 2 Paavo Johannes Aaltonen Gold M 64 141.
## 3 Paavo Johannes Aaltonen Gold M 64 141.
## 4 Isak Abrahamsen Gold M NA NA
## 5 Fausto Alesio Acke (Padovini-) Gold M NA NA
## 6 Nobuyuki Aihara Gold M 53 117.
## 7 Nobuyuki Aihara Gold M 53 117.
## 8 Georg Albert Christian Albertsen Gold M NA NA
## 9 Carl Albert Andersen Gold M NA NA
## 10 Carl Rudolf Svend Andersen Gold M NA NA
## # ℹ 541 more rows
Discussion: Enter your discussion of results here.
We first start by prompting R to look into the df dataframe and then specifically look into the olympic_gymnasts subset dataframe. Once it does this, I am asking it to specifically pick the columns name, medal, sex, and medal and then to pick the rows containing the Male sex and the the row with gold. Meaning that we specifically want the males who won gold medals and then after that, I want to create a new column which will show their weights in kg by multiplying it by 2.2 assuming the weights given are in pounds.