AI disclosure: We used ChatGPT and Claude to check our work.
For this exercise, please try to reproduce the results from Study 1 of the associated paper (Maglio & Polman, 2014). The PDF of the paper is included in the same folder as this Rmd file.
Researchers recruited 202 volunteers at a subway station in Toronto, Ontario, Canada. Half of the sample was traveling East, while the other half was traveling West. In a 2 (orientation: toward, away from) X 4 (station: Spadina, St. George, Bloor-Yonge, Sherbourne) design, each participant was randomly asked to estimate how far one of the four stations felt to them (1= very close, 7= very far). Authors conducted a 2 X 4 ANOVA on distance estimates, and then tested differences in distance estimates between East and West-bound groups for each individual station.
Below is the specific result you will attempt to reproduce (quoted directly from the results section of Study 1):
We carried out a 2 (orientation: toward, away from) × 4 (station: Spadina, St. George, Bloor-Yonge, Sherbourne) analysis of variance (ANOVA) on closeness ratings, which revealed no main effect of orientation, F < 1, and a main effect of station, F(3, 194) = 24.10, p < .001, ηp 2 = .27. This main effect was qualified by the predicted interaction between orientation and station, F(3, 194) = 16.28, p < .001, ηp2 = .20. We decomposed this interaction by the subjective-distance ratings between participants traveling east and west for each of the four subway stations. Westbound participants rated the stations to the west of Bay Street as closer than did eastbound participants; this effect was obtained for both the station one stop to the west (St. George, p < .001, ηp2 = .28) and the station two stops to the west (Spadina, p = .001, ηp2 = .20). The opposite pattern held true for stations to the east of Bay Street. Eastbound participants rated the stations to the east of Bay Street as closer than did westbound participants; this effect was obtained for both the station one stop to the east (Bloor-Yonge, p = .053, ηp2 = .08) and the station two stops to the east (Sherbourne, p < .001, ηp2 = .24). Figure 1 summarizes these results.
library(tidyverse) # for data munging
library(knitr) # for kable table formating
library(haven) # import and export 'SPSS', 'Stata' and 'SAS' Files
library(readxl) # import excel files
# #optional packages:
# library(lsr)
# library(ggthemes)
# Just Study 1
d <- read_excel ("data/S1_Subway.xlsx")
The data are already tidy as provided by the authors.
# subset by station
d_st_george = d %>% filter(STN_NAME == "STG")
d_spadina = d %>% filter(STN_NAME == "SPAD")
d_bloor_yonge = d %>% filter(STN_NAME == "B-Y")
d_sherbourne = d %>% filter(STN_NAME == "SHER")
# model the data
model = lm(formula = DISTANCE ~ 1 + DIRECTION + STN_NAME +
DIRECTION:STN_NAME, data = d)
# get the residual sum of squares
ss_residual = anova(model)["Residuals", "Sum Sq"]
We carried out a 2 (orientation: toward, away from) × 4 (station: Spadina, St. George, Bloor-Yonge, Sherbourne) analysis of variance (ANOVA) on closeness ratings, which revealed no main effect of orientation, F < 1, and a main effect of station, F(3, 194) = 24.10, p < .001, ηp 2 = .27. This main effect was qualified by the predicted interaction between orientation and station, F(3, 194) = 16.28, p < .001, ηp2 = .20.
# reproduce the main effect of orientation
anova(model)["DIRECTION", ]
## Analysis of Variance Table
##
## Response: DISTANCE
## Df Sum Sq Mean Sq F value Pr(>F)
## DIRECTION 1 0.71287 0.71287 0.6644 0.416
ss_direction = anova(model)["DIRECTION", "Sum Sq"]
ss_direction / (ss_direction + ss_residual)
## [1] 0.003413072
# reproduce the main effect of station
anova(model)["STN_NAME", ]
## Analysis of Variance Table
##
## Response: DISTANCE
## Df Sum Sq Mean Sq F value Pr(>F)
## STN_NAME 3 75.158 25.052 23.349 6.011e-13 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
ss_station = anova(model)["STN_NAME", "Sum Sq"]
ss_station / (ss_station + ss_residual)
## [1] 0.2652841
# reproduce the interaction between orientation and station
anova(model)["DIRECTION:STN_NAME", ]
## Analysis of Variance Table
##
## Response: DISTANCE
## Df Sum Sq Mean Sq F value Pr(>F)
## DIRECTION:STN_NAME 3 52.413 17.471 16.283 1.765e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
ss_interaction = anova(model)["DIRECTION:STN_NAME", "Sum Sq"]
ss_interaction / (ss_interaction + ss_residual)
## [1] 0.2011516
We decomposed this interaction by the subjective-distance ratings between participants traveling east and west for each of the four subway stations. Westbound participants rated the stations to the west of Bay Street as closer than did eastbound participants; this effect was obtained for both the station one stop to the west (St. George, p < .001, ηp2 = .28) and the station two stops to the west (Spadina, p = .001, ηp2 = .20). The opposite pattern held true for stations to the east of Bay Street. Eastbound participants rated the stations to the east of Bay Street as closer than did westbound participants; this effect was obtained for both the station one stop to the east (Bloor-Yonge, p = .053, ηp2 = .08) and the station two stops to the east (Sherbourne, p < .001, ηp2 = .24). Figure 1 summarizes these results.
# reproduce results for St. George
model_st_george = lm(DISTANCE ~ DIRECTION, data = d_st_george)
anova(model_st_george)
## Analysis of Variance Table
##
## Response: DISTANCE
## Df Sum Sq Mean Sq F value Pr(>F)
## DIRECTION 1 16.252 16.2521 18.793 7.229e-05 ***
## Residuals 49 42.375 0.8648
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
ss_st_george = anova(model_st_george)["DIRECTION", "Sum Sq"]
ss_st_george_residual = anova(model_st_george)["Residuals", "Sum Sq"]
ss_st_george / (ss_st_george + ss_st_george_residual)
## [1] 0.2772092
## reproduce results for Spadina
model_spadina = lm(DISTANCE ~ DIRECTION, data = d_spadina)
anova(model_spadina)
## Analysis of Variance Table
##
## Response: DISTANCE
## Df Sum Sq Mean Sq F value Pr(>F)
## DIRECTION 1 13.100 13.1005 11.966 0.001131 **
## Residuals 49 53.645 1.0948
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
ss_spadina = anova(model_spadina)["DIRECTION", "Sum Sq"]
ss_spadina_residual = anova(model_spadina)["Residuals", "Sum Sq"]
ss_spadina / (ss_spadina + ss_spadina_residual)
## [1] 0.1962763
## reproduce results for Bloor-Yonge
model_bloor_yonge = lm(DISTANCE ~ DIRECTION, data = d_bloor_yonge)
anova(model_bloor_yonge)
## Analysis of Variance Table
##
## Response: DISTANCE
## Df Sum Sq Mean Sq F value Pr(>F)
## DIRECTION 1 4.157 4.1567 3.9455 0.05285 .
## Residuals 47 49.517 1.0535
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
ss_bloor_yonge = anova(model_bloor_yonge)["DIRECTION", "Sum Sq"]
ss_bloor_yonge_residual = anova(model_bloor_yonge)["Residuals", "Sum Sq"]
ss_bloor_yonge / (ss_bloor_yonge + ss_bloor_yonge_residual)
## [1] 0.0774451
## reproduce results for Sherbourne
model_sherbourne = lm(DISTANCE ~ DIRECTION, data = d_sherbourne)
anova(model_sherbourne)
## Analysis of Variance Table
##
## Response: DISTANCE
## Df Sum Sq Mean Sq F value Pr(>F)
## DIRECTION 1 19.306 19.3062 15.108 0.0003052 ***
## Residuals 49 62.615 1.2779
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
ss_sherbourne = anova(model_sherbourne)["DIRECTION", "Sum Sq"]
ss_sherbourne_residual = anova(model_sherbourne)["Residuals", "Sum Sq"]
ss_sherbourne / (ss_sherbourne + ss_sherbourne_residual)
## [1] 0.2356667
Were you able to reproduce the results you attempted to reproduce? If not, what part(s) were you unable to reproduce?
Yes, the reproduced F-values, p-values, and effect sizes very closely match the original study’s findings.
How difficult was it to reproduce your results?
It was relatively straightforward to reproduce these results.
What aspects made it difficult? What aspects made it easy?
The data was already tidy and the findings were clearly reported. However, the authors did not report the F or t statistics for their station-specific analyses. Also, the variable/column names (e.g. DISTANCE) and one of the station names (B-Y) were not written in a conventional, machine-readable format (i.e. using all lowercase and underscores).