The Exercise
Although we are conducting our Repeated Measures ANOVA in SPSS, it is worth illustrating some of the advantages of visualizing your data in R and ggplot (i.e., raw data, three-way interactions, polynomial trends, etc.)
Loading Packages & Data
From Personal/Lab Computer
Load tidyverse to get access to ggplot2, dplyr, and tidyr.
Load haven to load in SPSS data.
ipak <- function(pkg){
new.pkg <- pkg[!(pkg %in% installed.packages()[, "Package"])]
if (length(new.pkg))
install.packages(new.pkg, dependencies = TRUE)
sapply(pkg, require, character.only = TRUE)
}
packages <- c("tidyverse", "dplyr","haven", "stringr")
ipak(packages)
# Change ggplot theme to light background
old <- theme_set(theme_light(base_size = 12))
Load and Restructure Data
Load in lab data using read_sav() function from haven package. Let’s also create new variables that map labels onto our coded variables (e.g., language (1) = Excel).
lab14mw_data <- read_sav("/Users/Julian/GDrive/Misc/Classes/InterStats/Lab14M/Lab14MW_data.sav") %>%
mutate(SubjID = row_number()) %>%
select(SubjID, everything()) %>%
mutate(language_lab = ifelse(language == 1, "Excel", NA),
language_lab = ifelse(language == 2, "SPSS", language_lab),
language_lab = ifelse(language == 3, "R", language_lab)) %>%
mutate(school_lab = ifelse(school == 1, "School 1", NA),
school_lab = ifelse(school == 2, "School 2", school_lab))
lab14mw_data
Notice, however, that our repeated measures span four columns (once for each interval, e.g. diff_time1). Let’s use the gather() function from the tidyr package to convert our data from wide to long. This way we will have four rows per subject, as well as a new column (“time”) to indicate the specific time measurement of perceived difficulty (“diff”).
NOTE: This method also drops subjects with missing data. This is definitely not always recommended, but we will do this so that our plots are in sync with the SPSS analysis (which also drops incomplete cases).
lab14mw_data <- read_sav("/Users/Julian/GDrive/Misc/Classes/InterStats/Lab14M/Lab14MW_data.sav") %>%
mutate(SubjID = row_number()) %>%
select(SubjID, everything()) %>%
mutate(language_lab = ifelse(language == 1, "Excel", NA),
language_lab = ifelse(language == 2, "SPSS", language_lab),
language_lab = ifelse(language == 3, "R", language_lab)) %>%
mutate(school_lab = ifelse(school == 1, "School 1", NA),
school_lab = ifelse(school == 2, "School 2", school_lab)) %>%
gather(time, diff, diff_time1:diff_time4) %>%
mutate(time = as.numeric(str_replace(time,"diff_time",""))) %>%
arrange(SubjID, time)
lab14mw_data
Violin Plots
We can examine the distribution of difficulty at each time interval.
lab14mw_data %>%
mutate(time = factor(time)) %>%
ggplot(aes(time, diff)) + geom_violin() + geom_jitter(alpha=.5,width=.1,height=0) +
labs(x="Time Interval", y = "Perceived Difficulty",
title="Perceived Difficulty over Time", subtitle = "Violin Plots with (jittered) Observations")

Main Effect of Time
lab14mw_data %>%
ggplot(aes(time, diff)) +
stat_summary(fun.data = mean_cl_boot, geom="ribbon", alpha=.3) +
stat_summary(fun.y = mean, geom="line") +
labs(x="Time Interval", y = "Perceived Difficulty",
title="Perceived Difficulty over Time", subtitle = "Mean and Boostrapped 95% Confidence Intervals")

Language x Time Interaction
This is a bit cluttered, but it gives us information about the variability (unlike SPSS).
lab14mw_data %>%
ggplot(aes(time, diff, color=language_lab, fill=language_lab)) +
stat_summary(fun.data = mean_se, geom="ribbon", alpha=.1) +
stat_summary(fun.y = mean, geom="line") +
labs(x="Time Interval", y = "Perceived Difficulty",
title="Perceived Difficulty over Time\nas a function of Programming Language",
subtitle = "Mean +/- 1 Standard Error", color="Language", fill="Language")

Linear Trends
lab14mw_data %>%
ggplot(aes(time, diff, color=language_lab, fill=language_lab)) +
# facet_wrap(~school_lab) +
geom_smooth(method=lm, alpha=.2) +
labs(x="Time Interval", y = "Perceived Difficulty",
title="Perceived Difficulty over Time as a function of \nProgramming Language and School",
subtitle = "Linear Model Fit", color="Language", fill="Language")

Quadratic Trends
lab14mw_data %>%
ggplot(aes(time, diff, color=language_lab, fill=language_lab)) +
# facet_wrap(~school_lab) +
geom_smooth(method=lm, alpha=.2, formula= y ~ x + I(x^2)) +
labs(x="Time Interval", y = "Perceived Difficulty",
title="Perceived Difficulty over Time as a function of \nProgramming Language and School",
subtitle = "Quadratic Model Fit", color="Language", fill="Language")

Language x School x Time Interaction
lab14mw_data %>%
ggplot(aes(time, diff, color=language_lab, fill=language_lab)) +
facet_wrap(~school_lab) +
stat_summary(fun.data = mean_se, alpha=1, position=position_dodge(width=.5)) +
stat_summary(fun.y = mean, geom="line", position=position_dodge(width=.5)) +
labs(x="Time Interval", y = "Perceived Difficulty",
title="Perceived Difficulty over Time as a function of \nProgramming Language and School",
subtitle = "Mean +/- 1 Standard Error", color="Language", fill="Language")

Compare Polynomial Trends
lab14mw_data %>%
ggplot(aes(time, diff, color=language_lab, fill=language_lab)) +
facet_wrap(~school_lab) +
geom_smooth(method=lm, formula= y ~ x + I(x^1), se=F, linetype=1) +
geom_smooth(method=lm, formula= y ~ x + I(x^2), se=F, linetype=3) +
labs(x="Time Interval", y = "Perceived Difficulty",
title="Perceived Difficulty over Time as a function of \nProgramming Language and School",
subtitle = "Linear (solid) vs. Quadratic (dotted) Fit", color="Language", fill="Language")

---
title: "Lab14MW"
author: "Julian Wills"
date: "December 5th-7th, 2016"
output:
  html_notebook:
    theme: readable
    toc: yes
---
## The Exercise
Although we are conducting our Repeated Measures ANOVA in SPSS, it is worth illustrating some of the advantages of visualizing your data in R and ggplot (i.e., raw data, three-way interactions, polynomial trends, etc.) 

## Loading Packages & Data

### From Personal/Lab Computer
Load **tidyverse** to get access to ggplot2, dplyr, and tidyr.   
Load **haven** to load in SPSS data. 

```{r, results='hide', message=FALSE}
ipak <- function(pkg){
  new.pkg <- pkg[!(pkg %in% installed.packages()[, "Package"])]
  if (length(new.pkg)) 
    install.packages(new.pkg, dependencies = TRUE)
    sapply(pkg, require, character.only = TRUE)

}

packages <- c("tidyverse", "dplyr","haven", "stringr")
ipak(packages)

# Change ggplot theme to light background
old <- theme_set(theme_light(base_size = 12))
```

### Load and Restructure Data

Load in lab data using **read_sav()** function from haven package.
Let's also create new variables that map labels onto our coded variables (e.g., language (1) = Excel).
```{r, warning=FALSE}
lab14mw_data <- read_sav("/Users/Julian/GDrive/Misc/Classes/InterStats/Lab14M/Lab14MW_data.sav") %>% 
  mutate(SubjID = row_number()) %>% 
  select(SubjID, everything()) %>% 
  mutate(language_lab = ifelse(language == 1, "Excel", NA),
         language_lab = ifelse(language == 2, "SPSS", language_lab),
         language_lab = ifelse(language == 3, "R", language_lab)) %>% 
  mutate(school_lab = ifelse(school == 1, "School 1", NA),
         school_lab = ifelse(school == 2, "School 2", school_lab))
lab14mw_data
```
  

Notice, however, that our repeated measures span four columns (once for each interval, e.g. diff_time1). 
Let's use the **gather()** function from the tidyr package to convert our data from wide to long.
This way we will have four rows per subject, as well as a new column ("time") to indicate the specific time 
measurement of perceived difficulty ("diff"). 

NOTE: This method also drops subjects with missing data. This is definitely not always recommended, but we will 
do this so that our plots are in sync with the SPSS analysis (which also drops incomplete cases). 
```{r, warning=FALSE}
lab14mw_data <- read_sav("/Users/Julian/GDrive/Misc/Classes/InterStats/Lab14M/Lab14MW_data.sav") %>% 
  mutate(SubjID = row_number()) %>% 
  select(SubjID, everything()) %>% 
  mutate(language_lab = ifelse(language == 1, "Excel", NA),
         language_lab = ifelse(language == 2, "SPSS", language_lab),
         language_lab = ifelse(language == 3, "R", language_lab)) %>% 
  mutate(school_lab = ifelse(school == 1, "School 1", NA),
         school_lab = ifelse(school == 2, "School 2", school_lab)) %>% 
  gather(time, diff, diff_time1:diff_time4) %>% 
  mutate(time = as.numeric(str_replace(time,"diff_time",""))) %>% 
  arrange(SubjID, time)
lab14mw_data
```


## Violin Plots

We can examine the distribution of difficulty at each time interval.
```{r}

lab14mw_data %>% 
  mutate(time = factor(time)) %>% 
  ggplot(aes(time, diff)) + geom_violin() + geom_jitter(alpha=.5,width=.1,height=0) + 
  labs(x="Time Interval", y = "Perceived Difficulty", 
       title="Perceived Difficulty over Time", subtitle = "Violin Plots with (jittered) Observations")
```

## Main Effect of Time

```{r}

lab14mw_data %>% 
  ggplot(aes(time, diff))  + 
  stat_summary(fun.data = mean_cl_boot, geom="ribbon", alpha=.3) + 
  stat_summary(fun.y = mean, geom="line") + 
  labs(x="Time Interval", y = "Perceived Difficulty", 
       title="Perceived Difficulty over Time", subtitle = "Mean and Boostrapped 95% Confidence Intervals")
```

## Language x Time Interaction

This is a bit cluttered, but it gives us information about the variability (unlike SPSS).
```{r}

lab14mw_data %>% 
  ggplot(aes(time, diff, color=language_lab, fill=language_lab))  + 
  stat_summary(fun.data = mean_se, geom="ribbon", alpha=.1) +
  stat_summary(fun.y = mean, geom="line") + 
  labs(x="Time Interval", y = "Perceived Difficulty", 
       title="Perceived Difficulty over Time\nas a function of Programming Language",
       subtitle = "Mean +/- 1 Standard Error", color="Language", fill="Language")
```

### Linear Trends

```{r}

lab14mw_data %>% 
  ggplot(aes(time, diff, color=language_lab, fill=language_lab))  + 
  geom_smooth(method=lm, alpha=.2) + 
  labs(x="Time Interval", y = "Perceived Difficulty", 
       title="Perceived Difficulty over Time as a function of \nProgramming Language and School",
       subtitle = "Linear Model Fit", color="Language", fill="Language")

```

### Quadratic Trends

```{r}

lab14mw_data %>% 
  ggplot(aes(time, diff, color=language_lab, fill=language_lab))  + 
  geom_smooth(method=lm, alpha=.2, formula= y ~ x + I(x^2)) + 
  labs(x="Time Interval", y = "Perceived Difficulty", 
       title="Perceived Difficulty over Time as a function of \nProgramming Language and School",
       subtitle = "Quadratic Model Fit", color="Language", fill="Language")

```



## Language x School x Time Interaction

```{r}

lab14mw_data %>% 
  ggplot(aes(time, diff, color=language_lab, fill=language_lab))  + 
  facet_wrap(~school_lab) + 
  stat_summary(fun.data = mean_se, alpha=1, position=position_dodge(width=.5)) +
  stat_summary(fun.y = mean, geom="line", position=position_dodge(width=.5)) +
  labs(x="Time Interval", y = "Perceived Difficulty", 
       title="Perceived Difficulty over Time as a function of \nProgramming Language and School",
       subtitle = "Mean +/- 1 Standard Error", color="Language", fill="Language")
```


### Compare Polynomial Trends

```{r, warning=FALSE}

lab14mw_data %>% 
  ggplot(aes(time, diff, color=language_lab, fill=language_lab))  + 
  facet_wrap(~school_lab) + 
  geom_smooth(method=lm, formula= y ~ x + I(x^1), se=F, linetype=1) + 
  geom_smooth(method=lm, formula= y ~ x + I(x^2), se=F, linetype=3) + 
  labs(x="Time Interval", y = "Perceived Difficulty", 
       title="Perceived Difficulty over Time as a function of \nProgramming Language and School",
       subtitle = "Linear (solid) vs. Quadratic (dotted) Fit", color="Language", fill="Language")

```


