Welcome to the PSYC3361 coding W3 self test. The test assesses your ability to use the coding skills covered in the Week 3 online coding modules.

In particular, it assesses your ability to…

It is IMPORTANT to document the code that you write so that someone who is looking at your code can understand what it is doing. Above each chunk, write a few sentences outlining which packages/functions you have chosen to use and what the function is doing to your data. Where relevant, also write a sentence that interprets the output of your code.

Your notes should also document the troubleshooting process you went through to arrive at the code that worked.

For each of the challenges below, the documentation is JUST AS IMPORTANT as the code.

Good luck!!

Jenny

PS- if you get stuck have a look in the /images folder for inspiration

load the packages you will need

library(tidyverse)
library(here)
library(janitor)

read the Alone data

alone <- read_csv(here("data", "alone.csv"))

1. make a smaller dataset

We are mostly interested in gender, age, the days they lasted and whether contestants were medically evacuted. Use select() to make a smaller dataframe containing just the relevant variables. Rename the variable called medically_evacuated to make it shorter and easier to type

alone %>% select(gender, age, days_lasted, medically_evacuated) %>% rename (medi_evac = medically_evacuated)
## # A tibble: 94 × 4
##    gender   age days_lasted medi_evac
##    <chr>  <dbl>       <dbl> <lgl>    
##  1 Male      40          56 FALSE    
##  2 Male      22          55 FALSE    
##  3 Male      34          43 FALSE    
##  4 Male      32          39 FALSE    
##  5 Male      37           8 FALSE    
##  6 Male      44           6 FALSE    
##  7 Male      46           4 FALSE    
##  8 Male      24           4 FALSE    
##  9 Male      41           1 FALSE    
## 10 Male      31           0 FALSE    
## # ℹ 84 more rows

2. write code to determine how old the oldest male and female contestant are

the “select” function, allows me to choose specific details from the variables I would like to analyse, and reduces the overall messiness and noisness of the data visualisation. From there I arranged the ages from highest to lowest using the “desc” function, so that R could easily recognise the oldest males and females

alone %>% select(gender, age ) %>% arrange(desc(age))
## # A tibble: 94 × 2
##    gender   age
##    <chr>  <dbl>
##  1 Male      61
##  2 Female    57
##  3 Male      55
##  4 Male      55
##  5 Male      53
##  6 Male      50
##  7 Male      50
##  8 Male      49
##  9 Female    49
## 10 Male      48
## # ℹ 84 more rows

3. has the average length of time that alone contestants lasted changed over seasons?

 alone %>% group_by(season) %>% summarise(mean_time = mean(days_lasted), sd_time = sd(days_lasted))
## # A tibble: 9 × 3
##   season mean_time sd_time
##    <dbl>     <dbl>   <dbl>
## 1      1      21.6    23.6
## 2      2      34.4    25.0
## 3      3      54.3    30.9
## 4      4      31.4    32.4
## 5      5      30.1    19.4
## 6      6      45.4    28.0
## 7      7      49.9    31.6
## 8      8      41.2    26.6
## 9      9      46.1    21.6

HINT: can you make a line graph that has error bars around the mean for each season?

I was able to make the line graph, but I was not able to add error bars, without looking at the answers. I did not want to cheat so I am going to rewatch some videos to learn how to add all components to a graph.

 alone %>% group_by(season) %>% summarise(mean_time = mean(days_lasted), sd_time = sd(days_lasted))  %>% ggplot(aes(x = season, y = mean_time)) + geom_line() +ggtitle(label = "Average Time Lasted Seasonally")

4. do women on average last longer in the game than men? Are men more likely to leave early?

Typically women will last longer than men, as indicated in the table below men are likely to leave on average 13 days earlier than women.

To figure this out, I made sure to only analyse the gender category using the “group_by” function and then summarised the mean information about the days lasted from there.

 alone %>% group_by(gender) %>% summarise(mean_days = mean(days_lasted))
## # A tibble: 2 × 2
##   gender mean_days
##   <chr>      <dbl>
## 1 Female      49.4
## 2 Male        36.2

HINT: can you make a plot that captures the median and distribution of days survived, by gender?

(alone %>% 
  group_by(gender) %>% 
  summarise(mean_days = mean(days_lasted)) %>% 
  ggplot(aes(x = gender, y = mean_days, fill = gender)) + 
  geom_col()) +ggtitle((label = "Gendered Average Rate Of Surivial"))

## 5. do older contestants last longer?

HINT: Use case_when to create a new variable that groups participants by age in decades

I will be honest, I had no idea how to do the “case_when” function, so I couldn’t group the variables into their decades like I saw in the example,but I am planning on rewatching the tutorial videos to learn how to incorporate this function into my code in future.

HINT: what is the mean length of time in the game for each age group? How many participants fall into each group?

alone %>% group_by(age) %>% summarise(mean_day = mean(days_lasted)) %>% arrange(desc(mean_day))
## # A tibble: 33 × 2
##      age mean_day
##    <dbl>    <dbl>
##  1    57     75  
##  2    26     74  
##  3    61     74  
##  4    29     73  
##  5    27     72  
##  6    49     59.5
##  7    35     57.5
##  8    40     56.8
##  9    39     53.7
## 10    28     53.5
## # ℹ 23 more rows

6. Are contestants who are medically evacuted, on average older than those who pull out themselves? does that differ by gender?

On average participants that are older are less likely to get medically evacuated, with older women being the least likely to be medically evacuated.

HINT: filter the dataset to keep only those contestants who didn’t win, then calculate the mean age, separately for those who were medically evacuated vs. not.

 alone %>% filter(result > 1) %>% group_by(medically_evacuated, gender ) %>% summarise(mean_age = mean(age), .groups ="keep") %>% ungroup() 
## # A tibble: 4 × 3
##   medically_evacuated gender mean_age
##   <lgl>               <chr>     <dbl>
## 1 FALSE               Female     42.5
## 2 FALSE               Male       37.8
## 3 TRUE                Female     37.9
## 4 TRUE                Male       35.9

HINT: make a column graph of the data you just summarised

I will use the aes() function to specify what each of my vairbales will be, the “fill” code indicates wether each gender was or was not evacuate, with the “dodge” function preventing any wierd overlap happening in the column graph.

 alone %>% filter(result > 1) %>% group_by(medically_evacuated, gender ) %>% summarise(mean_age = mean(age), .groups ="keep") %>% ungroup() %>% ggplot(aes(x= gender, y = mean_age , fill = medically_evacuated)) + geom_col(position = "dodge") + ggtitle(label= "Gender And Age Differences In Medical Evacuation Rates")

7. knit your document to pdf

I did have to copy the tables from the console onto the document because everytime I would knit my document, all my tables would dissapear, however one i removed my labels for the environment from each of my chunks my tables finally appeared onto my document!