# Exercise 3
#
# Welcome to the pipe! In this exercise I'm going to give you some code that
# uses functions you've never seen before, and I want you to try to work out 
# what each one does. I'll use the forensic data to illustrate this, so the 
# first step is to load the tidyverse and the data...

library(tidyverse)
## ── Attaching packages ────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.2.1     ✓ purrr   0.3.3
## ✓ tibble  2.1.3     ✓ dplyr   0.8.4
## ✓ tidyr   1.0.2     ✓ stringr 1.4.0
## ✓ readr   1.3.1     ✓ forcats 0.4.0
## ── Conflicts ───────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
forensic <- read_csv("data_forensic.csv")
## Parsed with column specification:
## cols(
##   participant = col_double(),
##   handwriting_expert = col_character(),
##   us = col_character(),
##   condition = col_character(),
##   age = col_double(),
##   forensic_scientist = col_character(),
##   forensic_specialty = col_character(),
##   handwriting_reports = col_double(),
##   confidence = col_double(),
##   familiarity = col_double(),
##   feature = col_character(),
##   est = col_double(),
##   true = col_double(),
##   band = col_character()
## )
# Summarise participant 1 -------------------------------------------------

participant1 <- ungroup(
  summarise(
    group_by(
      filter(forensic, participant == 1),
      band
    ),
    mean = mean(est), 
    sd = sd(est)  
  )
)


# Summarise participant 2 -------------------------------------------------

x <- filter(forensic, participant == 2)
y <- group_by(x, band)
z <- summarise(y, mean = mean(est), sd = sd(est))
participant2 <- ungroup(z)



# Summarise participant 3 -------------------------------------------------

participant3 <- forensic %>%
  filter(participant == 3) %>%
  group_by(band) %>%
  summarise(mean = mean(est), sd = sd(est)) %>%
  ungroup()


# Check the results -------------------------------------------------------

print(participant1)
## # A tibble: 6 x 3
##   band     mean    sd
##   <chr>   <dbl> <dbl>
## 1 Band 01   9.5  16.8
## 2 Band 25  46.7  31.2
## 3 Band 50  53.5  28.3
## 4 Band 75  50.4  24.1
## 5 Band 99  80.9  29.1
## 6 Band NA  NA    NA
print(participant2)
## # A tibble: 5 x 3
##   band     mean    sd
##   <chr>   <dbl> <dbl>
## 1 Band 01  6.26 14.9 
## 2 Band 25 10.9   9.62
## 3 Band 50 17.6  15.9 
## 4 Band 75 21.9  16.5 
## 5 Band 99 49.5  20.6
print(participant3)
## # A tibble: 5 x 3
##   band     mean    sd
##   <chr>   <dbl> <dbl>
## 1 Band 01  31.2  44.1
## 2 Band 25  46.2  49.4
## 3 Band 50  48.3  47.8
## 4 Band 75  60.7  47.8
## 5 Band 99  85    33.7
# Discussion (in pairs, as always!)
# 
# - All three "versions" of the code do the same thing, only with different
#   participants. Setting aside the fact that you haven't actually learned
#   what filter(), group_by(), etc does... can you work out what the code is
#   doing overall?
# -- finding the mean and standard deviation of participant 1, 2, and 3 respectively
#
# - Which of these three versions is easiest to understand? Or are they all
#   equally easy/difficult? What makes it easy/hard?
# -- participant 3 is easiest to read
# -- Participant 2 reads kind of backwards and clunky. 
# -- Participant 1 has functions nested which reads from the inside out which is unreadable/clunky.
#
# - What do you think the "pipe operator" %>% is doing here (hint: compare
#   the code for versions 2 and 3 to each other...) 
# -- I think it's simplifying the code? By taking the output of one function and passing it onto the next, 
# you don't have to repeat yourself and have clunky code. It allows for sequencing of analysis steps.
# - There is something weird about the "participant1" data. This has nothing
#   to do with my code... it's actually something that is in the "forensic"
#   data itself. What is the weird thing and what do you think it means?
# -- Not sure...thought it could be something to do with the differences in band means?