Welcome to the PSYC3361 coding W3 self test. The test assesses your ability to use the coding skills covered in the Week 3 online coding modules.

In particular, it assesses your ability to…

  • load packages
  • read data
  • select
  • rename
  • group_by and summarise
  • make plots
  • mutate and case_when

It is IMPORTANT to document the code that you write so that someone who is looking at your code can understand what it is doing. Above each chunk, write a few sentences outlining which packages/functions you have chosen to use and what the function is doing to your data. Where relevant, also write a sentence that interprets the output of your code.

Your notes should also document the troubleshooting process you went through to arrive at the code that worked.

For each of the challenges below, the documentation is JUST AS IMPORTANT as the code.

Good luck!!

Jenny

PS- if you get stuck have a look in the /images folder for inspiration

load the packages you will need

I am loading the tidyverse and here packages.

library(tidyverse)
library(here)
library(ggplot2)

read the Alone data

alone <- read.csv("data/alone.csv")

1. make a smaller dataset

We are mostly interested in gender, age, the days they lasted and whether contestants were medically evacuated. Use select() to make a smaller data frame containing just the relevant variables. Rename the variable called medically_evacuated to make it shorter and easier to type

medically_evacuated <- alone %>% 
  select(gender, age, days_lasted, medically_evacuated)

2. write code to determine how old the oldest male and female contestant are

Use the arrange function to sort age in descending order. The oldest male is 61 years old, and the oldest female is 57 years old.

medically_evacuated %>% 
  arrange(desc(age))
##    gender age days_lasted medically_evacuated
## 1    Male  61          74               FALSE
## 2  Female  57          75               FALSE
## 3    Male  55          21               FALSE
## 4    Male  55           4                TRUE
## 5    Male  53          51               FALSE
## 6    Male  50          66               FALSE
## 7    Male  50          36               FALSE
## 8    Male  49          73                TRUE
## 9  Female  49          46                TRUE
## 10   Male  48           2               FALSE
## 11   Male  48           6               FALSE
## 12 Female  47           9                TRUE
## 13   Male  47         100               FALSE
## 14   Male  47          24               FALSE
## 15   Male  46           4               FALSE
## 16   Male  46          41               FALSE
## 17 Female  46          21               FALSE
## 18   Male  46          27                TRUE
## 19   Male  45          59               FALSE
## 20 Female  45          57               FALSE
## 21 Female  45          49               FALSE
## 22 Female  45          28               FALSE
## 23   Male  45          22                TRUE
## 24   Male  44           6               FALSE
## 25   Male  44          64               FALSE
## 26 Female  44           8               FALSE
## 27   Male  44          14               FALSE
## 28   Male  44           5                TRUE
## 29 Female  44          52                TRUE
## 30   Male  43          19               FALSE
## 31   Male  43          10               FALSE
## 32 Female  43          37                TRUE
## 33   Male  43          19               FALSE
## 34 Female  42          73               FALSE
## 35   Male  42          22               FALSE
## 36   Male  41           1               FALSE
## 37 Female  41          78               FALSE
## 38   Male  41          56               FALSE
## 39   Male  40          56               FALSE
## 40   Male  40          35               FALSE
## 41   Male  40          49               FALSE
## 42   Male  40          58               FALSE
## 43   Male  40          74               FALSE
## 44 Female  40          69                TRUE
## 45   Male  39          72               FALSE
## 46   Male  39          69                TRUE
## 47   Male  39          20               FALSE
## 48   Male  38           8                TRUE
## 49   Male  37           8               FALSE
## 50   Male  37           6               FALSE
## 51   Male  37           2               FALSE
## 52 Female  36           7                TRUE
## 53   Male  36          87               FALSE
## 54   Male  36          32               FALSE
## 55   Male  36          67                TRUE
## 56   Male  36          52               FALSE
## 57   Male  35          35               FALSE
## 58   Male  35          75               FALSE
## 59   Male  35          77               FALSE
## 60   Male  35          43               FALSE
## 61   Male  34          43               FALSE
## 62   Male  34          51               FALSE
## 63   Male  34          40               FALSE
## 64   Male  33          14               FALSE
## 65 Female  33          80               FALSE
## 66   Male  33          44               FALSE
## 67   Male  32          39               FALSE
## 68   Male  32          75               FALSE
## 69   Male  32          24                TRUE
## 70   Male  31           0               FALSE
## 71   Male  31           5                TRUE
## 72   Male  31          35               FALSE
## 73 Female  31          48                TRUE
## 74 Female  31          89                TRUE
## 75   Male  31          44               FALSE
## 76   Male  31          63               FALSE
## 77 Female  30           5                TRUE
## 78   Male  30          12                TRUE
## 79   Male  30          78               FALSE
## 80   Male  30          42                TRUE
## 81   Male  29          73               FALSE
## 82   Male  28          21               FALSE
## 83 Female  28          86                TRUE
## 84 Female  27          72               FALSE
## 85   Male  26          74               FALSE
## 86   Male  24           4               FALSE
## 87   Male  24          60               FALSE
## 88   Male  24           7               FALSE
## 89   Male  23           1                TRUE
## 90   Male  23          15               FALSE
## 91   Male  22          55               FALSE
## 92   Male  22           8                TRUE
## 93   Male  19           2               FALSE
## 94   Male  19           1                TRUE

3. has the average length of time that alone contestants lasted changed over seasons?

The goal is to find the mean amount of time that contestants lasted per season. I have created a new data frame within the alone data set. I have piped it into a group_by to define the variables by season, and then obtained the mean, SD, n and standard error for each season using the summarise function.

days_lasted <- alone %>% 
  group_by(season) %>%
  summarise(
    mean_days = mean(days_lasted), 
    sd_days = sd(days_lasted),
    n = n(), 
    stderr = sd_days/sqrt(n))
print(days_lasted)
## # A tibble: 9 × 5
##   season mean_days sd_days     n stderr
##    <int>     <dbl>   <dbl> <int>  <dbl>
## 1      1      21.6    23.6    10   7.45
## 2      2      34.4    25.0    10   7.89
## 3      3      54.3    30.9    10   9.76
## 4      4      31.4    32.4    14   8.65
## 5      5      30.1    19.4    10   6.14
## 6      6      45.4    28.0    10   8.86
## 7      7      49.9    31.6    10   9.99
## 8      8      41.2    26.6    10   8.40
## 9      9      46.1    21.6    10   6.84

HINT: can you make a line graph that has error bars around the mean for each season?

# creating a line graph using geom_line function
ggplot(days_lasted) +
  geom_line(aes(
    x = season,
    y = mean_days
  )) +
  geom_point(aes(
    x = season,
    y = mean_days
  )) +
  geom_errorbar(aes(
    x = season,
    ymin = mean_days - stderr, 
    ymax = mean_days + stderr), 
    width = 0.2,
    colour = "blue"
  ) +
  labs(
    title = "Mean number of days Alone contestents survive in each season", 
    y = "Mean number of days", 
    x = "Season") + 
  theme_light() +
  scale_x_continuous(breaks = 1:9)

4. do women on average last longer in the game than men? Are men more likely to leave early?

I am piping from the alone data frame and grouping by gender to find the mean days lasted for males and females.

alone %>%
  group_by(gender) %>%
  summarise(mean_days = mean(days_lasted))
## # A tibble: 2 × 2
##   gender mean_days
##   <chr>      <dbl>
## 1 Female      49.4
## 2 Male        36.2

HINT: can you make a plot that captures the median and distribution of days survived, by gender?

gender_days_lasted <- medically_evacuated %>% 
  select(gender, days_lasted)

ggplot(gender_days_lasted) +
  geom_boxplot(mapping = aes(
    x = gender,
    y = days_lasted,
    fill = gender
  )) +
  geom_jitter(aes(
    x = gender,
    y = days_lasted),
    width = 0.1,
    alpha = 0.5,
    size = 2
  ) +
  theme_light() +
  labs(
    title = "Distribution of Days Survived by Gender", 
    y = "Mean number of days lasted", 
    x = "Gender")

5. do older contestants last longer?

HINT: Use case_when to create a new variable that groups participants by age in decades

alone <- alone %>% 
  mutate(decade = case_when(
    age < 20 ~ "teenager",
    age >=20 & age < 30 ~ "twenties",
    age >=30 & age < 40 ~ "thirties",
    age >=40 & age < 50 ~ "forties",
    age >=50 & age < 60 ~ "fifties",
    age >=60 & age < 70 ~ "sixties",
  ))

HINT: what is the mean length of time in the game for each age group? How many participants fall into each group?

alone %>% 
  group_by(decade) %>% 
  summarise(m_time = mean(days_lasted), n = n())
## # A tibble: 6 × 3
##   decade   m_time     n
##   <chr>     <dbl> <int>
## 1 fifties    42.2     6
## 2 forties    37.1    37
## 3 sixties    74       1
## 4 teenager    1.5     2
## 5 thirties   41.4    36
## 6 twenties   39.7    12

6. Are contestants who are medically evacuted, on average older than those who pull out themselves? does that differ by gender?

HINT: filter the dataset to keep only those contestants who didn’t win, then calculate the mean age, separately for those who were medically evacuated vs. not. Those who were medically evacuated were, on average, younger than those not medically evacuated by approximately 2 years. This seems to differ by gender. Those who were not medically evacuated tend to be older among females (42.5 years) than males (37.8 years). Those who were medically evacuated tend to be older also among females compared to males (37.9 and 35.8 years, respectively).

# Average age of those medically evacuated
med_evac <- alone %>% 
  filter(result > 1) %>% 
  group_by(medically_evacuated) %>% 
  summarise(mean_age = mean(age))
print(med_evac)
## # A tibble: 2 × 2
##   medically_evacuated mean_age
##   <lgl>                  <dbl>
## 1 FALSE                   38.6
## 2 TRUE                    36.7
# Average age of those medically evacuated by gender
med_gender <- alone %>% 
  filter(result > 1) %>% 
  group_by(medically_evacuated, gender) %>% 
  summarise(mean_age = mean(age))
## `summarise()` has grouped output by 'medically_evacuated'. You can override
## using the `.groups` argument.
print(med_gender)
## # A tibble: 4 × 3
## # Groups:   medically_evacuated [2]
##   medically_evacuated gender mean_age
##   <lgl>               <chr>     <dbl>
## 1 FALSE               Female     42.5
## 2 FALSE               Male       37.8
## 3 TRUE                Female     37.9
## 4 TRUE                Male       35.9

HINT: make a column graph of the data you just summarised

ggplot(med_gender) +
  geom_col(aes(
    x = gender, 
    y = mean_age,
    fill = medically_evacuated),
    position = "dodge"
  ) +
  theme_light() +
  labs(
    title = "Contestants Medically Evacuated or Not as a Function of Gender", 
    y = "Mean age", 
    x = "Gender") + # adding appropriate labels to graph and axes
  scale_fill_discrete(name = "Medically Evacuated") # renaming legend title to remove _ symbol

7. knit your document to pdf