Welcome to the PSYC3361 coding W3 self test. The test assesses your ability to use the coding skills covered in the Week 3 online coding modules.
In particular, it assesses your ability to…
It is IMPORTANT to document the code that you write so that someone who is looking at your code can understand what it is doing. Above each chunk, write a few sentences outlining which packages/functions you have chosen to use and what the function is doing to your data. Where relevant, also write a sentence that interprets the output of your code.
Your notes should also document the troubleshooting process you went through to arrive at the code that worked.
For each of the challenges below, the documentation is JUST AS IMPORTANT as the code.
Good luck!!
Jenny
PS- if you get stuck have a look in the /images folder for inspiration
library(tidyverse)
library(here)
library(janitor)
alone <- read_csv(here("data", "alone.csv"))
We are mostly interested in gender, age, the days they lasted and whether contestants were medically evacuted. Use select() to make a smaller dataframe containing just the relevant variables. Rename the variable called medically_evacuated to make it shorter and easier to type
alone %>% select(gender, age, days_lasted, medically_evacuated) %>% rename (medi_evac = medically_evacuated)
## # A tibble: 94 × 4
## gender age days_lasted medi_evac
## <chr> <dbl> <dbl> <lgl>
## 1 Male 40 56 FALSE
## 2 Male 22 55 FALSE
## 3 Male 34 43 FALSE
## 4 Male 32 39 FALSE
## 5 Male 37 8 FALSE
## 6 Male 44 6 FALSE
## 7 Male 46 4 FALSE
## 8 Male 24 4 FALSE
## 9 Male 41 1 FALSE
## 10 Male 31 0 FALSE
## # ℹ 84 more rows
alone %>% select(gender, age ) %>% arrange(desc(age))
## # A tibble: 94 × 2
## gender age
## <chr> <dbl>
## 1 Male 61
## 2 Female 57
## 3 Male 55
## 4 Male 55
## 5 Male 53
## 6 Male 50
## 7 Male 50
## 8 Male 49
## 9 Female 49
## 10 Male 48
## # ℹ 84 more rows
alone %>% group_by(season) %>% summarise(mean_time = mean(days_lasted), sd_time = sd(days_lasted))
## # A tibble: 9 × 3
## season mean_time sd_time
## <dbl> <dbl> <dbl>
## 1 1 21.6 23.6
## 2 2 34.4 25.0
## 3 3 54.3 30.9
## 4 4 31.4 32.4
## 5 5 30.1 19.4
## 6 6 45.4 28.0
## 7 7 49.9 31.6
## 8 8 41.2 26.6
## 9 9 46.1 21.6
HINT: can you make a line graph that has error bars around the mean for each season?
alone %>% group_by(season) %>% summarise(mean_time = mean(days_lasted), sd_time = sd(days_lasted)) %>% ggplot(aes(x = season, y = mean_time)) + geom_line() +ggtitle(label = "Average Time Lasted Seasonally")
alone %>% group_by(gender) %>% summarise(mean_days = mean(days_lasted))
## # A tibble: 2 × 2
## gender mean_days
## <chr> <dbl>
## 1 Female 49.4
## 2 Male 36.2
HINT: can you make a plot that captures the median and distribution of days survived, by gender?
(alone %>%
group_by(gender) %>%
summarise(mean_days = mean(days_lasted)) %>%
ggplot(aes(x = gender, y = mean_days, fill = gender)) +
geom_col()) +ggtitle((label = "Gendered Average Rate Of Surivial"))
## 5. do older contestants last longer?
HINT: Use case_when to create a new variable that groups participants by age in decades
HINT: what is the mean length of time in the game for each age group? How many participants fall into each group?
alone %>% group_by(age) %>% summarise(mean_day = mean(days_lasted)) %>% arrange(desc(mean_day))
## # A tibble: 33 × 2
## age mean_day
## <dbl> <dbl>
## 1 57 75
## 2 26 74
## 3 61 74
## 4 29 73
## 5 27 72
## 6 49 59.5
## 7 35 57.5
## 8 40 56.8
## 9 39 53.7
## 10 28 53.5
## # ℹ 23 more rows
HINT: filter the dataset to keep only those contestants who didn’t win, then calculate the mean age, separately for those who were medically evacuated vs. not.
alone %>% filter(result > 1) %>% group_by(medically_evacuated, gender ) %>% summarise(mean_age = mean(age), .groups ="keep") %>% ungroup()
## # A tibble: 4 × 3
## medically_evacuated gender mean_age
## <lgl> <chr> <dbl>
## 1 FALSE Female 42.5
## 2 FALSE Male 37.8
## 3 TRUE Female 37.9
## 4 TRUE Male 35.9
HINT: make a column graph of the data you just summarised
alone %>% filter(result > 1) %>% group_by(medically_evacuated, gender ) %>% summarise(mean_age = mean(age), .groups ="keep") %>% ungroup() %>% ggplot(aes(x= gender, y = mean_age , fill = medically_evacuated)) + geom_col(position = "dodge") + ggtitle(label= "Gender And Age Differences In Medical Evacuation Rates")