For this exercise, please try to reproduce the results from Study 1 of the associated paper (Maglio & Polman, 2014). The PDF of the paper is included in the same folder as this Rmd file.

Methods summary:

Researchers recruited 202 volunteers at a subway station in Toronto, Ontario, Canada. Half of the sample was traveling East, while the other half was traveling West. In a 2 (orientation: toward, away from) X 4 (station: Spadina, St. George, Bloor-Yonge, Sherbourne) design, each participant was randomly asked to estimate how far one of the four stations felt to them (1= very close, 7= very far). Authors conducted a 2 X 4 ANOVA on distance estimates, and then tested differences in distance estimates between East and West-bound groups for each individual station.

Target outcomes:

Below is the specific result you will attempt to reproduce (quoted directly from the results section of Study 1):

We carried out a 2 (orientation: toward, away from) × 4 (station: Spadina, St. George, Bloor-Yonge, Sherbourne) analysis of variance (ANOVA) on closeness ratings, which revealed no main effect of orientation, F < 1, and a main effect of station, F(3, 194) = 24.10, p < .001, ηp 2 = .27. This main effect was qualified by the predicted interaction between orientation and station, F(3, 194) = 16.28, p < .001, ηp2 = .20. We decomposed this interaction by the subjective-distance ratings between participants traveling east and west for each of the four subway stations. Westbound participants rated the stations to the west of Bay Street as closer than did eastbound participants; this effect was obtained for both the station one stop to the west (St. George, p < .001, ηp2 = .28) and the station two stops to the west (Spadina, p = .001, ηp2 = .20). The opposite pattern held true for stations to the east of Bay Street. Eastbound participants rated the stations to the east of Bay Street as closer than did westbound participants; this effect was obtained for both the station one stop to the east (Bloor-Yonge, p = .053, ηp2 = .08) and the station two stops to the east (Sherbourne, p < .001, ηp2 = .24). Figure 1 summarizes these results.

Step 1: Load packages

library(tidyverse) # for data munging
library(knitr) # for kable table formating
library(haven) # import and export 'SPSS', 'Stata' and 'SAS' Files
library(readxl) # import excel files

# #optional packages:
library(lsr)
# library(ggthemes)

Step 2: Load data

# Just Study 1
d <- read_excel ("data/S1_Subway.xlsx")

Step 3: Tidy data

The data are already tidy as provided by the authors.

Step 4: Run analysis

Pre-processing

Inferential statistics

We carried out a 2 (orientation: toward, away from) × 4 (station: Spadina, St. George, Bloor-Yonge, Sherbourne) analysis of variance (ANOVA) on closeness ratings, which revealed no main effect of orientation, F < 1, and a main effect of station, F(3, 194) = 24.10, p < .001, ηp 2 = .27. This main effect was qualified by the predicted interaction between orientation and station, F(3, 194) = 16.28, p < .001, ηp2 = .20.

# reproduce the main effect of orientation
closeness_anova <- aov(DISTANCE~DIRECTION+STN_NAME+STN_NAME:DIRECTION, data = d)
summary(closeness_anova)

##                     Df Sum Sq Mean Sq F value   Pr(>F)    
## DIRECTION            1   0.71   0.713   0.664    0.416    
## STN_NAME             3  75.16  25.053  23.349 6.01e-13 ***
## DIRECTION:STN_NAME   3  52.41  17.471  16.283 1.77e-09 ***
## Residuals          194 208.15   1.073                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# F value in first row of summary 0.66 < 1

# reproduce the main effect of station
# F value in second row of summary 23.35 != 24.10 (could not replicate)
# p value in second row of summary < 0.001
etaSquared(closeness_anova)

##                         eta.sq eta.sq.part
## DIRECTION          0.001195964 0.001929304
## STN_NAME           0.223393543 0.265284110
## DIRECTION:STN_NAME 0.155789423 0.201151614

# eta.sq.part for station name 0.27 

# reproduce the interaction between orientation and station
# F value in third row of summary 16.28, p value < 0.001
# eta.sq.part for interaction 0.20

Station-specific analyses

We decomposed this interaction by the subjective-distance ratings between participants traveling east and west for each of the four subway stations. Westbound participants rated the stations to the west of Bay Street as closer than did eastbound participants; this effect was obtained for both the station one stop to the west (St. George, p < .001, ηp2 = .28) and the station two stops to the west (Spadina, p = .001, ηp2 = .20). The opposite pattern held true for stations to the east of Bay Street. Eastbound participants rated the stations to the east of Bay Street as closer than did westbound participants; this effect was obtained for both the station one stop to the east (Bloor-Yonge, p = .053, ηp2 = .08) and the station two stops to the east (Sherbourne, p < .001, ηp2 = .24). Figure 1 summarizes these results.

St. George

# reproduce results for St. George
d1 <- subset(d, STN_NAME == "STG", select=c(DISTANCE, DIRECTION)) 
stg_anova <- aov(DISTANCE~DIRECTION, data = d1)
summary(stg_anova)

##             Df Sum Sq Mean Sq F value   Pr(>F)    
## DIRECTION    1  16.25  16.252   18.79 7.23e-05 ***
## Residuals   49  42.38   0.865                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

etaSquared(stg_anova)

##              eta.sq eta.sq.part
## DIRECTION 0.2772092   0.2772092

Spadina

## reproduce results for Spadina
d2 <- subset(d, STN_NAME == "SPAD", select=c(DISTANCE, DIRECTION)) 
spad_anova <- aov(DISTANCE~DIRECTION, data = d2)
summary(spad_anova)

##             Df Sum Sq Mean Sq F value  Pr(>F)   
## DIRECTION    1  13.10  13.100   11.97 0.00113 **
## Residuals   49  53.64   1.095                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

etaSquared(spad_anova)

##              eta.sq eta.sq.part
## DIRECTION 0.1962763   0.1962763

Bloor-Yonge

## reproduce results for Bloor-Yonge
d3 <- subset(d, STN_NAME == "B-Y", select=c(DISTANCE, DIRECTION)) 
by_anova <- aov(DISTANCE~DIRECTION, data = d3)
summary(by_anova)

##             Df Sum Sq Mean Sq F value Pr(>F)  
## DIRECTION    1   4.16   4.157   3.945 0.0528 .
## Residuals   47  49.52   1.054                 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

etaSquared(by_anova)

##              eta.sq eta.sq.part
## DIRECTION 0.0774451   0.0774451

Sherbourne

## reproduce results for Sherbourne
d4 <- subset(d, STN_NAME == "SHER", select=c(DISTANCE, DIRECTION)) 
sher_anova <- aov(DISTANCE~DIRECTION, data = d4)
summary(sher_anova)

##             Df Sum Sq Mean Sq F value   Pr(>F)    
## DIRECTION    1  19.31  19.306   15.11 0.000305 ***
## Residuals   49  62.62   1.278                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

etaSquared(sher_anova)

##              eta.sq eta.sq.part
## DIRECTION 0.2356667   0.2356667

Step 5: Reflection

Were you able to reproduce the results you attempted to reproduce? If not, what part(s) were you unable to reproduce?

I was able to reproduce all results except for the F value for the main effect of station, which was slightly lower than reported.

How difficult was it to reproduce your results?

It was relatively easy to reproduce the results. It took me a couple of hours once I realized that we were not expected to read and reproduce the entire paper (realizing this took some time, as I opened the PDF before I saw the report).

What aspects made it difficult? What aspects made it easy?

The only difficult part was that I had to read about ANOVA to understand what it does and how to perform it in R. The analysis was made easy by the fact that the data were tidy and did not have unusable entries. The templated report was very helpful in moving through the reproduction step-by-step. Built-in functions in R allowed me to implement ANOVA with only a high-level understanding of its mechanism (I am not sure whether this is a good thing, but it certainly eliminated the mistakes that I might have made in a first attempt at implementation even after reading up on the method).

Reproducibility Report: Group A Choice 2