# Exercise 3
#
# Welcome to the pipe! In this exercise I'm going to give you some code that
# uses functions you've never seen before, and I want you to try to work out
# what each one does. I'll use the forensic data to illustrate this, so the
# first step is to load the tidyverse and the data...
library(tidyverse)
## ── Attaching packages ────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse 1.3.0 ──
## ✓ ggplot2 3.2.1 ✓ purrr 0.3.3
## ✓ tibble 2.1.3 ✓ dplyr 0.8.4
## ✓ tidyr 1.0.2 ✓ stringr 1.4.0
## ✓ readr 1.3.1 ✓ forcats 0.4.0
## ── Conflicts ───────────────────────────────────────────────────────────────────────────────────────────────────────── tidyverse_conflicts() ──
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
forensic <- read_csv("data_forensic.csv")
## Parsed with column specification:
## cols(
## participant = col_double(),
## handwriting_expert = col_character(),
## us = col_character(),
## condition = col_character(),
## age = col_double(),
## forensic_scientist = col_character(),
## forensic_specialty = col_character(),
## handwriting_reports = col_double(),
## confidence = col_double(),
## familiarity = col_double(),
## feature = col_character(),
## est = col_double(),
## true = col_double(),
## band = col_character()
## )
# Summarise participant 1 -------------------------------------------------
participant1 <- ungroup(
summarise(
group_by(
filter(forensic, participant == 1),
band
),
mean = mean(est),
sd = sd(est)
)
)
# Summarise participant 2 -------------------------------------------------
x <- filter(forensic, participant == 2)
y <- group_by(x, band)
z <- summarise(y, mean = mean(est), sd = sd(est))
participant2 <- ungroup(z)
# Summarise participant 3 -------------------------------------------------
participant3 <- forensic %>%
filter(participant == 3) %>%
group_by(band) %>%
summarise(mean = mean(est), sd = sd(est)) %>%
ungroup()
# Check the results -------------------------------------------------------
print(participant1)
## # A tibble: 6 x 3
## band mean sd
## <chr> <dbl> <dbl>
## 1 Band 01 9.5 16.8
## 2 Band 25 46.7 31.2
## 3 Band 50 53.5 28.3
## 4 Band 75 50.4 24.1
## 5 Band 99 80.9 29.1
## 6 Band NA NA NA
print(participant2)
## # A tibble: 5 x 3
## band mean sd
## <chr> <dbl> <dbl>
## 1 Band 01 6.26 14.9
## 2 Band 25 10.9 9.62
## 3 Band 50 17.6 15.9
## 4 Band 75 21.9 16.5
## 5 Band 99 49.5 20.6
print(participant3)
## # A tibble: 5 x 3
## band mean sd
## <chr> <dbl> <dbl>
## 1 Band 01 31.2 44.1
## 2 Band 25 46.2 49.4
## 3 Band 50 48.3 47.8
## 4 Band 75 60.7 47.8
## 5 Band 99 85 33.7
# Discussion (in pairs, as always!)
#
# - All three "versions" of the code do the same thing, only with different
# participants. Setting aside the fact that you haven't actually learned
# what filter(), group_by(), etc does... can you work out what the code is
# doing overall?
# -- finding the mean and standard deviation of participant 1, 2, and 3 respectively
#
# - Which of these three versions is easiest to understand? Or are they all
# equally easy/difficult? What makes it easy/hard?
# -- participant 3 is easiest to read
# -- Participant 2 reads kind of backwards and clunky.
# -- Participant 1 has functions nested which reads from the inside out which is unreadable/clunky.
#
# - What do you think the "pipe operator" %>% is doing here (hint: compare
# the code for versions 2 and 3 to each other...)
# -- I think it's simplifying the code? By taking the output of one function and passing it onto the next,
# you don't have to repeat yourself and have clunky code. It allows for sequencing of analysis steps.
# - There is something weird about the "participant1" data. This has nothing
# to do with my code... it's actually something that is in the "forensic"
# data itself. What is the weird thing and what do you think it means?
# -- Not sure...thought it could be something to do with the differences in band means?