For this exercise, please try to reproduce the results from Study 1 of the associated paper (Maglio & Polman, 2014). The PDF of the paper is included in the same folder as this Rmd file.
Researchers recruited 202 volunteers at a subway station in Toronto, Ontario, Canada. Half of the sample was traveling East, while the other half was traveling West. In a 2 (orientation: toward, away from) X 4 (station: Spadina, St. George, Bloor-Yonge, Sherbourne) design, each participant was randomly asked to estimate how far one of the four stations felt to them (1= very close, 7= very far). Authors conducted a 2 X 4 ANOVA on distance estimates, and then tested differences in distance estimates between East and West-bound groups for each individual station.
Below is the specific result you will attempt to reproduce (quoted directly from the results section of Study 1):
We carried out a 2 (orientation: toward, away from) × 4 (station: Spadina, St. George, Bloor-Yonge, Sherbourne) analysis of variance (ANOVA) on closeness ratings, which revealed no main effect of orientation, F < 1, and a main effect of station, F(3, 194) = 24.10, p < .001, ηp 2 = .27. This main effect was qualified by the predicted interaction between orientation and station, F(3, 194) = 16.28, p < .001, ηp2 = .20. We decomposed this interaction by the subjective-distance ratings between participants traveling east and west for each of the four subway stations. Westbound participants rated the stations to the west of Bay Street as closer than did eastbound participants; this effect was obtained for both the station one stop to the west (St. George, p < .001, ηp2 = .28) and the station two stops to the west (Spadina, p = .001, ηp2 = .20). The opposite pattern held true for stations to the east of Bay Street. Eastbound participants rated the stations to the east of Bay Street as closer than did westbound participants; this effect was obtained for both the station one stop to the east (Bloor-Yonge, p = .053, ηp2 = .08) and the station two stops to the east (Sherbourne, p < .001, ηp2 = .24). Figure 1 summarizes these results.
library(tidyverse) # for data munging
library(knitr) # for kable table formating
library(haven) # import and export 'SPSS', 'Stata' and 'SAS' Files
library(readxl) # import excel files
library(lsr)
library(ggthemes)
setwd("~/Downloads")
knitr::opts_chunk$set(root.dir = "~/Downloads")
# Load the data
d <- read_excel("S1_Subway.xlsx")
The data are already tidy as provided by the authors.
You will notice that R doesn’t recognize some of these variables as factors. So let’s just factorize our variables before we proceed with the analyses.
d = d %>%
mutate(
DIRECTION = as.factor(DIRECTION),
STN_NUMBER = as.factor(STN_NUMBER),
STN_NAME = factor(STN_NAME, levels = c("SPAD", "STG", "B-Y", "SHER"))
)
We carried out a 2 (orientation: toward, away from) × 4 (station: Spadina, St. George, Bloor-Yonge, Sherbourne) analysis of variance (ANOVA) on closeness ratings, which revealed no main effect of orientation, F < 1, and a main effect of station, F(3, 194) = 24.10, p < .001, ηp 2 = .27. This main effect was qualified by the predicted interaction between orientation and station, F(3, 194) = 16.28, p < .001, ηp2 = .20.
Note from teaching team: ηp2 = “partial eta squared”
d %>%
group_by(DIRECTION, STN_NUMBER) %>%
summarize(
num_obs = n(),
.groups = "keep"
) %>%
ungroup()
## # A tibble: 8 × 3
## DIRECTION STN_NUMBER num_obs
## <fct> <fct> <int>
## 1 EAST 1 26
## 2 EAST 2 26
## 3 EAST 3 23
## 4 EAST 4 26
## 5 WEST 1 25
## 6 WEST 2 25
## 7 WEST 3 26
## 8 WEST 4 25
Looks pretty even to me!
When I see “analysis of variance” in the paper, my first thought is
to use the aov
function in R. This is a wrapper function
that fits a linear model to the data.
Note on statistics: the goal of this assignment is for you to get used to reproducing analyses. It is not intended to teach you about the subtleties of statistics; that will be covered in PSYC201B. We are happy to chat about these details in office hours, but do not worry if you do not understand them right now!
Now fill in the aov command with the proper variables (e.g. replace “VAR1” with the first variable of interest. This is a two-way ANOVA that tests for a main effect of direction (east vs west), a main effect of station, and an interaction between direction and station. The interaction term is denoted by the * below. For more information, do a little Googling about two-way ANOVAs.
mod <- aov(data = d,
formula = DISTANCE ~ DIRECTION + STN_NUMBER + DIRECTION * STN_NUMBER)
anova(mod)
## Analysis of Variance Table
##
## Response: DISTANCE
## Df Sum Sq Mean Sq F value Pr(>F)
## DIRECTION 1 0.713 0.7129 0.6644 0.416
## STN_NUMBER 3 75.158 25.0525 23.3492 6.011e-13 ***
## DIRECTION:STN_NUMBER 3 52.413 17.4710 16.2832 1.765e-09 ***
## Residuals 194 208.152 1.0729
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Note: the values in the parentheses denote the degrees of freedom for the F-test. * Main effect of direction: F(1, 194) = 0.664 * Main effect of station: F(3, 194) = 23.35 * Interaction between direction and station: F(3,194) = 16.28
Note from instructors: We’re unclear as to why the F-values are slightly different from those in the paper. But students in previous versions of this class came to the same answer.
Get the partial eta-squared (under the eta.sq.part
column). Do your results match the original paper? Hint: they
should!
etaSquared(mod, anova = TRUE)
## eta.sq eta.sq.part SS df MS
## DIRECTION 0.001195964 0.001929304 0.4023651 1 0.4023651
## STN_NUMBER 0.223393543 0.265284110 75.1575503 3 25.0525168
## DIRECTION:STN_NUMBER 0.155789423 0.201151614 52.4131149 3 17.4710383
## Residuals 0.618698140 NA 208.1521070 194 1.0729490
## F p
## DIRECTION 0.3750086 5.410039e-01
## STN_NUMBER 23.3492148 6.010747e-13
## DIRECTION:STN_NUMBER 16.2831954 1.765498e-09
## Residuals NA NA
We decomposed this interaction by the subjective-distance ratings between participants traveling east and west for each of the four subway stations. Westbound participants rated the stations to the west of Bay Street as closer than did eastbound participants; this effect was obtained for both the station one stop to the west (St. George, p < .001, ηp2 = .28) and the station two stops to the west (Spadina, p = .001, ηp2 = .20). The opposite pattern held true for stations to the east of Bay Street. Eastbound participants rated the stations to the east of Bay Street as closer than did westbound participants; this effect was obtained for both the station one stop to the east (Bloor-Yonge, p = .053, ηp2 = .08) and the station two stops to the east (Sherbourne, p < .001, ηp2 = .24). Figure 1 summarizes these results.
# Step 1: Filter data for St. George (STG)
st.george <- d %>%
filter(STN_NAME == "STG")
# Step 2: Calculate mean estimated distance by direction
st.george %>%
group_by(DIRECTION) %>%
summarize(mean_distance = mean(DISTANCE, na.rm = TRUE))
## # A tibble: 2 × 2
## DIRECTION mean_distance
## <fct> <dbl>
## 1 EAST 2.77
## 2 WEST 1.64
# Step 3: Run ANOVA for St. George
st.george.aov <- aov(data = st.george, formula = DISTANCE ~ DIRECTION)
summary(st.george.aov)
## Df Sum Sq Mean Sq F value Pr(>F)
## DIRECTION 1 16.25 16.252 18.79 7.23e-05 ***
## Residuals 49 42.38 0.865
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Step 4: Calculate partial eta-squared
etaSquared(st.george.aov, anova = TRUE)
## eta.sq eta.sq.part SS df MS F p
## DIRECTION 0.2772092 0.2772092 16.25207 1 16.2520664 18.79278 7.22865e-05
## Residuals 0.7227908 NA 42.37538 49 0.8648038 NA NA
## do the same for Spadina
# Step 1: Filter data for Spadina (SPAD)
spadina <- d %>%
filter(STN_NAME == "SPAD")
# Step 2: Calculate mean estimated distance by direction
spadina %>%
group_by(DIRECTION) %>%
summarize(mean_distance = mean(DISTANCE, na.rm = TRUE))
## # A tibble: 2 × 2
## DIRECTION mean_distance
## <fct> <dbl>
## 1 EAST 3.65
## 2 WEST 2.64
# Step 3: Run ANOVA for Spadina
spadina.aov <- aov(data = spadina, formula = DISTANCE ~ DIRECTION)
summary(spadina.aov)
## Df Sum Sq Mean Sq F value Pr(>F)
## DIRECTION 1 13.10 13.100 11.97 0.00113 **
## Residuals 49 53.64 1.095
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Step 4: Calculate partial eta-squared
etaSquared(spadina.aov, anova = TRUE)
## eta.sq eta.sq.part SS df MS F p
## DIRECTION 0.1962763 0.1962763 13.10048 1 13.100483 11.96623 0.001131006
## Residuals 0.8037237 NA 53.64462 49 1.094788 NA NA
## do the same for Bloor-Yonge
# Step 1: Filter data for Bloor-Yonge (B-Y)
bloor.yonge <- d %>%
filter(STN_NAME == "B-Y")
# Step 2: Calculate mean estimated distance by direction
bloor.yonge %>%
group_by(DIRECTION) %>%
summarize(mean_distance = mean(DISTANCE, na.rm = TRUE))
## # A tibble: 2 × 2
## DIRECTION mean_distance
## <fct> <dbl>
## 1 EAST 1.61
## 2 WEST 2.19
# Step 3: Run ANOVA for Bloor-Yonge
bloor.yonge.aov <- aov(data = bloor.yonge, formula = DISTANCE ~ DIRECTION)
summary(bloor.yonge.aov)
## Df Sum Sq Mean Sq F value Pr(>F)
## DIRECTION 1 4.16 4.157 3.945 0.0528 .
## Residuals 47 49.52 1.054
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Step 4: Calculate partial eta-squared
etaSquared(bloor.yonge.aov, anova = TRUE)
## eta.sq eta.sq.part SS df MS F p
## DIRECTION 0.0774451 0.0774451 4.156747 1 4.156747 3.945477 0.05284615
## Residuals 0.9225549 NA 49.516722 47 1.053547 NA NA
## do the same for Sherbourne
# Step 1: Filter data for Sherbourne (SHER)
sherbourne <- d %>%
filter(STN_NAME == "SHER")
# Step 2: Calculate mean estimated distance by direction
sherbourne %>%
group_by(DIRECTION) %>%
summarize(mean_distance = mean(DISTANCE, na.rm = TRUE))
## # A tibble: 2 × 2
## DIRECTION mean_distance
## <fct> <dbl>
## 1 EAST 2.77
## 2 WEST 4
# Step 3: Run ANOVA for Sherbourne
sherbourne.aov <- aov(data = sherbourne, formula = DISTANCE ~ DIRECTION)
summary(sherbourne.aov)
## Df Sum Sq Mean Sq F value Pr(>F)
## DIRECTION 1 19.31 19.306 15.11 0.000305 ***
## Residuals 49 62.62 1.278
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# Step 4: Calculate partial eta-squared
etaSquared(sherbourne.aov, anova = TRUE)
## eta.sq eta.sq.part SS df MS F p
## DIRECTION 0.2356667 0.2356667 19.30618 1 19.306184 15.10816 0.0003052116
## Residuals 0.7643333 NA 62.61538 49 1.277865 NA NA
Were you able to reproduce the results you attempted to reproduce? If not, what part(s) were you unable to reproduce?
ANSWER HERE. yes!
How difficult was it to reproduce your results?
ANSWER HERE. I found it challenging to fully grasp the analysis we were conducting, primarily because I’m not yet familiar with ANOVA tests. I hope to learn more of the underlying mathematics as I progress in my academic career, but I appreciate not having to delve into it for this class. At this stage, I think it would likely create more confusion than clarity.
What aspects made it difficult? What aspects made it easy?
ANSWER HERE. This bumper rails doc was super useful because it explained the code really well. I thought the layout made it easy to digest!