Welcome to the PSYC3361 coding W3 self test. The test assesses your ability to use the coding skills covered in the Week 3 online coding modules.

In particular, it assesses your ability to…

It is IMPORTANT to document the code that you write so that someone who is looking at your code can understand what it is doing. Above each chunk, write a few sentences outlining which packages/functions you have chosen to use and what the function is doing to your data. Where relevant, also write a sentence that interprets the output of your code.

Your notes should also document the troubleshooting process you went through to arrive at the code that worked.

For each of the challenges below, the documentation is JUST AS IMPORTANT as the code.

Good luck!!

Jenny

PS- if you get stuck have a look in the /images folder for inspiration

load the packages you will need

library(tidyverse)
library(here)

read the Alone data

here::i_am("w3-self-test.Rmd")
alone <- read_csv(here("data","alone.csv"))

1. make a smaller dataset

We are mostly interested in gender, age, the days they lasted and whether contestants were medically evacuted. Use select() to make a smaller dataframe containing just the relevant variables. Rename the variable called medically_evacuated to make it shorter and easier to type

alone_evac <- alone %>% 
  rename(evac = medically_evacuated) %>% 
  select(
    gender, 
    age, 
    days_lasted, 
    evac
  )
print(alone_evac)
## # A tibble: 94 Ă— 4
##    gender   age days_lasted evac 
##    <chr>  <dbl>       <dbl> <lgl>
##  1 Male      40          56 FALSE
##  2 Male      22          55 FALSE
##  3 Male      34          43 FALSE
##  4 Male      32          39 FALSE
##  5 Male      37           8 FALSE
##  6 Male      44           6 FALSE
##  7 Male      46           4 FALSE
##  8 Male      24           4 FALSE
##  9 Male      41           1 FALSE
## 10 Male      31           0 FALSE
## # ℹ 84 more rows

2. write code to determine how old the oldest male and female contestant are

oldest <- alone %>% 
  group_by(gender) %>% 
  summarise(oldest = max(age)) %>% 
  ungroup()
print(oldest)
## # A tibble: 2 Ă— 2
##   gender oldest
##   <chr>   <dbl>
## 1 Female     57
## 2 Male       61

3. has the average length of time that alone contestants lasted changed over seasons?

HINT: can you make a line graph that has error bars around the mean for each season?

seasonchange <- alone %>% 
  group_by(season) %>% 
  summarise(
    mean_survival = mean(days_lasted),
    sd_survival = sd(days_lasted), 
  ) %>% 
  arrange(desc(mean_survival)) %>% 
  ungroup()
print(seasonchange)
## # A tibble: 9 Ă— 3
##   season mean_survival sd_survival
##    <dbl>         <dbl>       <dbl>
## 1      3          54.3        30.9
## 2      7          49.9        31.6
## 3      9          46.1        21.6
## 4      6          45.4        28.0
## 5      8          41.2        26.6
## 6      2          34.4        25.0
## 7      4          31.4        32.4
## 8      5          30.1        19.4
## 9      1          21.6        23.6
seasonchangeplot <- ggplot(
  data = seasonchange, 
  aes(
    x = season,
    y = mean_survival,
    )
) +
  geom_point(mapping = aes(
    x = factor(season),
    y = mean_survival
  ), 
  size = 2) +
  geom_line() +
  geom_errorbar(mapping = aes(
    y = mean_survival, 
    ymin = mean_survival - sd_survival, 
    ymax = mean_survival + sd_survival
  )) +
  geom_smooth(method = "lm", se = FALSE, colour = "red") +
  theme_minimal() +
  labs(title = "Survival over seasons", subtitle = "Question 3")
  
plot(seasonchangeplot)
## `geom_smooth()` using formula = 'y ~ x'

Answer: Increasing overall over the seasons. Increased and peaked at season 3 before dropping to a trough in season 5 and mostly increasing since.

4. do women on average last longer in the game than men? Are men more likely to leave early?

# Summarise
menvswomen <- alone %>% 
  group_by(gender) %>% 
  summarise(
    mean = mean(days_lasted),
    median = median(days_lasted),
    sd = sd(days_lasted),
    n = n()
  ) %>% 
  mutate(percent_evac = case_when(
    gender == "Male" ~ (
      alone %>% filter(gender == "Male", medically_evacuated == TRUE) %>% count())/
      (alone %>% filter(gender == "Male") %>% count()),
    gender == "Female" ~ (
      alone %>% filter(gender == "Female", medically_evacuated == TRUE) %>% count())/
      (alone %>% filter(gender == "Female") %>% count()),
  )) %>% 
  ungroup()

print(menvswomen)
## # A tibble: 2 Ă— 6
##   gender  mean median    sd     n percent_evac$n
##   <chr>  <dbl>  <dbl> <dbl> <int>          <dbl>
## 1 Female  49.4   50.5  28.5    20          0.5  
## 2 Male    36.2   35    27.2    74          0.203
# Plot
menvswomenplot <- ggplot(alone) +
  geom_violin(mapping = aes(
    x = gender,
    y = days_lasted
  ),
  draw_quantiles = c(.25, .5, .75)) +
  ggtitle(label = "Question 4")

plot(menvswomenplot)

# Junk
  # geom_point(data = menvswomen, mapping = (aes(
  #   x = gender,
  #   y = median
  # ))) +
  # geom_polygon(mapping = aes(
  #   x = gender,
  #   y = mean
  # )) +
  # facet_wrap(vars(gender)) +
  # geom_rug()
  # theme_minimal() +

HINT: can you make a plot that captures the median and distribution of days survived, by gender?

Answer: Females on average last longer in alone (49.5 days) compared to males (36.2) but they also have a greater chance of needing to be medically evacuated (50%) vs men (20.3%).

5. do older contestants last longer?

HINT: Use case_when to create a new variable that groups participants by age in decades

question5 <- alone %>% 
  mutate(age_class = case_when(
    age >= 60 ~"60s",
    age >= 50 & age <= 59 ~"50s",
    age >= 40 & age <= 49 ~"40s",
    age >= 30 & age <= 39 ~"30s",
    age >= 20 & age <= 29 ~"20s",
    age >= 10 & age <= 19 ~"10s"
    )) %>% 
  group_by(age_class) %>% 
  summarise(
    mean = mean(days_lasted),
    n = n()
  ) %>% 
  ungroup()
print(question5)
## # A tibble: 6 Ă— 3
##   age_class  mean     n
##   <chr>     <dbl> <int>
## 1 10s         1.5     2
## 2 20s        39.7    12
## 3 30s        41.4    36
## 4 40s        37.1    37
## 5 50s        42.2     6
## 6 60s        74       1

HINT: what is the mean length of time in the game for each age group? How many participants fall into each group?

Answer: Shown in table above.

6. Are contestants who are medically evacuted, on average older than those who pull out themselves? does that differ by gender?

HINT: filter the dataset to keep only those contestants who didn’t win, then calculate the mean age, separately for those who were medically evacuated vs. not.

# Data summarised
question6 <- alone %>%
  filter(result != 1) %>%
  mutate(medically_evacuated = case_when(
    medically_evacuated == TRUE ~ "Yes",
    medically_evacuated == FALSE ~ "No",
  )) %>%
  group_by(medically_evacuated, gender) %>%
  summarise(mean_age = mean(age)) %>%
  ungroup()
## `summarise()` has grouped output by 'medically_evacuated'. You can override
## using the `.groups` argument.
print(question6)
## # A tibble: 4 Ă— 3
##   medically_evacuated gender mean_age
##   <chr>               <chr>     <dbl>
## 1 No                  Female     42.5
## 2 No                  Male       37.8
## 3 Yes                 Female     37.9
## 4 Yes                 Male       35.9
# Column graph
question6plot <- ggplot(data = question6) +
  geom_col(mapping = aes(
    x = medically_evacuated,
    y = mean_age
  )) +
  facet_wrap(vars(gender)) +
  scale_x_discrete(name = "Medical Evacuation") +
  #scale_y_continuous(breaks = pretty(question6$mean_age, n = 100))
  ggtitle(label = "Question 6")

# stat_summary is useful
# alone %>% 
#   ggplot(aes(x = gender, y = days_lasted)) +
#   stat_summary(fun = "mean", geom = "bar")

plot(question6plot)

HINT: make a column graph of the data you just summarised

Answer: By eye, it doesn’t seem that age is a factor in whether contestants leave the show through medical evacuation or by tapping out themselves.

7. knit your document to pdf