Welcome to the PSYC3361 coding W3 self test. The test assesses your ability to use the coding skills covered in the Week 3 online coding modules.
In particular, it assesses your ability to…
It is IMPORTANT to document the code that you write so that someone who is looking at your code can understand what it is doing. Above each chunk, write a few sentences outlining which packages/functions you have chosen to use and what the function is doing to your data. Where relevant, also write a sentence that interprets the output of your code.
Your notes should also document the troubleshooting process you went through to arrive at the code that worked.
For each of the challenges below, the documentation is JUST AS IMPORTANT as the code.
Good luck!!
Jenny
PS- if you get stuck have a look in the /images folder for inspiration
library(tidyverse)
library(here)
here::i_am("w3-self-test.Rmd")
alone <- read_csv(here("data","alone.csv"))
We are mostly interested in gender, age, the days they lasted and whether contestants were medically evacuted. Use select() to make a smaller dataframe containing just the relevant variables. Rename the variable called medically_evacuated to make it shorter and easier to type
alone_evac <- alone %>%
rename(evac = medically_evacuated) %>%
select(
gender,
age,
days_lasted,
evac
)
print(alone_evac)
## # A tibble: 94 Ă— 4
## gender age days_lasted evac
## <chr> <dbl> <dbl> <lgl>
## 1 Male 40 56 FALSE
## 2 Male 22 55 FALSE
## 3 Male 34 43 FALSE
## 4 Male 32 39 FALSE
## 5 Male 37 8 FALSE
## 6 Male 44 6 FALSE
## 7 Male 46 4 FALSE
## 8 Male 24 4 FALSE
## 9 Male 41 1 FALSE
## 10 Male 31 0 FALSE
## # ℹ 84 more rows
oldest <- alone %>%
group_by(gender) %>%
summarise(oldest = max(age)) %>%
ungroup()
print(oldest)
## # A tibble: 2 Ă— 2
## gender oldest
## <chr> <dbl>
## 1 Female 57
## 2 Male 61
HINT: can you make a line graph that has error bars around the mean for each season?
seasonchange <- alone %>%
group_by(season) %>%
summarise(
mean_survival = mean(days_lasted),
sd_survival = sd(days_lasted),
) %>%
arrange(desc(mean_survival)) %>%
ungroup()
print(seasonchange)
## # A tibble: 9 Ă— 3
## season mean_survival sd_survival
## <dbl> <dbl> <dbl>
## 1 3 54.3 30.9
## 2 7 49.9 31.6
## 3 9 46.1 21.6
## 4 6 45.4 28.0
## 5 8 41.2 26.6
## 6 2 34.4 25.0
## 7 4 31.4 32.4
## 8 5 30.1 19.4
## 9 1 21.6 23.6
seasonchangeplot <- ggplot(
data = seasonchange,
aes(
x = season,
y = mean_survival,
)
) +
geom_point(mapping = aes(
x = factor(season),
y = mean_survival
),
size = 2) +
geom_line() +
geom_errorbar(mapping = aes(
y = mean_survival,
ymin = mean_survival - sd_survival,
ymax = mean_survival + sd_survival
)) +
geom_smooth(method = "lm", se = FALSE, colour = "red") +
theme_minimal() +
labs(title = "Survival over seasons", subtitle = "Question 3")
plot(seasonchangeplot)
## `geom_smooth()` using formula = 'y ~ x'
Answer: Increasing overall over the seasons. Increased and peaked at season 3 before dropping to a trough in season 5 and mostly increasing since.
# Summarise
menvswomen <- alone %>%
group_by(gender) %>%
summarise(
mean = mean(days_lasted),
median = median(days_lasted),
sd = sd(days_lasted),
n = n()
) %>%
mutate(percent_evac = case_when(
gender == "Male" ~ (
alone %>% filter(gender == "Male", medically_evacuated == TRUE) %>% count())/
(alone %>% filter(gender == "Male") %>% count()),
gender == "Female" ~ (
alone %>% filter(gender == "Female", medically_evacuated == TRUE) %>% count())/
(alone %>% filter(gender == "Female") %>% count()),
)) %>%
ungroup()
print(menvswomen)
## # A tibble: 2 Ă— 6
## gender mean median sd n percent_evac$n
## <chr> <dbl> <dbl> <dbl> <int> <dbl>
## 1 Female 49.4 50.5 28.5 20 0.5
## 2 Male 36.2 35 27.2 74 0.203
# Plot
menvswomenplot <- ggplot(alone) +
geom_violin(mapping = aes(
x = gender,
y = days_lasted
),
draw_quantiles = c(.25, .5, .75)) +
ggtitle(label = "Question 4")
plot(menvswomenplot)
# Junk
# geom_point(data = menvswomen, mapping = (aes(
# x = gender,
# y = median
# ))) +
# geom_polygon(mapping = aes(
# x = gender,
# y = mean
# )) +
# facet_wrap(vars(gender)) +
# geom_rug()
# theme_minimal() +
HINT: can you make a plot that captures the median and distribution of days survived, by gender?
Answer: Females on average last longer in alone (49.5 days) compared to males (36.2) but they also have a greater chance of needing to be medically evacuated (50%) vs men (20.3%).
HINT: Use case_when to create a new variable that groups participants by age in decades
question5 <- alone %>%
mutate(age_class = case_when(
age >= 60 ~"60s",
age >= 50 & age <= 59 ~"50s",
age >= 40 & age <= 49 ~"40s",
age >= 30 & age <= 39 ~"30s",
age >= 20 & age <= 29 ~"20s",
age >= 10 & age <= 19 ~"10s"
)) %>%
group_by(age_class) %>%
summarise(
mean = mean(days_lasted),
n = n()
) %>%
ungroup()
print(question5)
## # A tibble: 6 Ă— 3
## age_class mean n
## <chr> <dbl> <int>
## 1 10s 1.5 2
## 2 20s 39.7 12
## 3 30s 41.4 36
## 4 40s 37.1 37
## 5 50s 42.2 6
## 6 60s 74 1
HINT: what is the mean length of time in the game for each age group? How many participants fall into each group?
Answer: Shown in table above.
HINT: filter the dataset to keep only those contestants who didn’t win, then calculate the mean age, separately for those who were medically evacuated vs. not.
# Data summarised
question6 <- alone %>%
filter(result != 1) %>%
mutate(medically_evacuated = case_when(
medically_evacuated == TRUE ~ "Yes",
medically_evacuated == FALSE ~ "No",
)) %>%
group_by(medically_evacuated, gender) %>%
summarise(mean_age = mean(age)) %>%
ungroup()
## `summarise()` has grouped output by 'medically_evacuated'. You can override
## using the `.groups` argument.
print(question6)
## # A tibble: 4 Ă— 3
## medically_evacuated gender mean_age
## <chr> <chr> <dbl>
## 1 No Female 42.5
## 2 No Male 37.8
## 3 Yes Female 37.9
## 4 Yes Male 35.9
# Column graph
question6plot <- ggplot(data = question6) +
geom_col(mapping = aes(
x = medically_evacuated,
y = mean_age
)) +
facet_wrap(vars(gender)) +
scale_x_discrete(name = "Medical Evacuation") +
#scale_y_continuous(breaks = pretty(question6$mean_age, n = 100))
ggtitle(label = "Question 6")
# stat_summary is useful
# alone %>%
# ggplot(aes(x = gender, y = days_lasted)) +
# stat_summary(fun = "mean", geom = "bar")
plot(question6plot)
HINT: make a column graph of the data you just summarised
Answer: By eye, it doesn’t seem that age is a factor in whether contestants leave the show through medical evacuation or by tapping out themselves.