For this exercise, please try to reproduce the results from Study 1 of the associated paper (Maglio & Polman, 2014). The PDF of the paper is included in the same folder as this Rmd file.
Researchers recruited 202 volunteers at a subway station in Toronto, Ontario, Canada. Half of the sample was traveling East, while the other half was traveling West. In a 2 (orientation: toward, away from) X 4 (station: Spadina, St. George, Bloor-Yonge, Sherbourne) design, each participant was randomly asked to estimate how far one of the four stations felt to them (1= very close, 7= very far). Authors conducted a 2 X 4 ANOVA on distance estimates, and then tested differences in distance estimates between East and West-bound groups for each individual station.
Below is the specific result you will attempt to reproduce (quoted directly from the results section of Study 1):
We carried out a 2 (orientation: toward, away from) × 4 (station: Spadina, St. George, Bloor-Yonge, Sherbourne) analysis of variance (ANOVA) on closeness ratings, which revealed no main effect of orientation, F < 1, and a main effect of station, F(3, 194) = 24.10, p < .001, ηp 2 = .27. This main effect was qualified by the predicted interaction between orientation and station, F(3, 194) = 16.28, p < .001, ηp2 = .20. We decomposed this interaction by the subjective-distance ratings between participants traveling east and west for each of the four subway stations. Westbound participants rated the stations to the west of Bay Street as closer than did eastbound participants; this effect was obtained for both the station one stop to the west (St. George, p < .001, ηp2 = .28) and the station two stops to the west (Spadina, p = .001, ηp2 = .20). The opposite pattern held true for stations to the east of Bay Street. Eastbound participants rated the stations to the east of Bay Street as closer than did westbound participants; this effect was obtained for both the station one stop to the east (Bloor-Yonge, p = .053, ηp2 = .08) and the station two stops to the east (Sherbourne, p < .001, ηp2 = .24). Figure 1 summarizes these results.
library(tidyverse) # for data munging
library(knitr) # for kable table formating
library(haven) # import and export 'SPSS', 'Stata' and 'SAS' Files
library(readxl) # import excel files
# #optional packages:
library(lsr)
# library(ggthemes)
library(broom)
# Just Study 1
d <- read_excel ("data/S1_Subway.xlsx")
The data are already tidy as provided by the authors.
d_processed <- d %>%
mutate(
orientation = case_when(
(DIRECTION == "EAST") & (STN_NAME %in% c("B-Y", "SHER")) ~ "toward",
(DIRECTION == "EAST") & (STN_NAME %in% c("SPAD", "STG")) ~ "away",
(DIRECTION == "WEST") & (STN_NAME %in% c("B-Y", "SHER")) ~ "away",
(DIRECTION == "WEST") & (STN_NAME %in% c("SPAD", "STG")) ~ "toward"
)
) %>%
mutate(
STN_NAME = factor(
STN_NAME,
levels = c("SPAD", "STG", "B-Y", "SHER")
),
orientation = factor(
orientation,
levels = c("toward", "away")
)
)
We carried out a 2 (orientation: toward, away from) × 4 (station: Spadina, St. George, Bloor-Yonge, Sherbourne) analysis of variance (ANOVA) on closeness ratings, which revealed no main effect of orientation, F < 1, and a main effect of station, F(3, 194) = 24.10, p < .001, ηp 2 = .27. This main effect was qualified by the predicted interaction between orientation and station, F(3, 194) = 16.28, p < .001, ηp2 = .20.
main_anova <- d_processed %>%
aov(
DISTANCE ~ STN_NAME*DIRECTION,
data = .
)
main_anova_summary <- main_anova %>% tidy()
eta_main_anova <- etaSquared(main_anova) %>% as.data.frame()
main_anova_summary
## # A tibble: 4 × 6
## term df sumsq meansq statistic p.value
## <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
## 1 STN_NAME 3 75.5 25.2 23.4 5.41e-13
## 2 DIRECTION 1 0.402 0.402 0.375 5.41e- 1
## 3 STN_NAME:DIRECTION 3 52.4 17.5 16.3 1.77e- 9
## 4 Residuals 194 208. 1.07 NA NA
# reproduce the main effect of orientation
print(
str_c(
"Main effect of orientation: F(",
main_anova_summary$df[[2]], # DF for Direction
", ", main_anova_summary$df[[4]], # DF for residuals
") = ", round(main_anova_summary$statistic[[2]], 2), # F-value
", p = ", round(main_anova_summary$p.value[[2]], 3), # p-value
", ηp2 = ", round(eta_main_anova$eta.sq.part[[2]], 2) # eta-squared
)
)
## [1] "Main effect of orientation: F(1, 194) = 0.38, p = 0.541, ηp2 = 0"
# reproduce the main effect of station
print(
str_c(
"Main effect of station: F(",
main_anova_summary$df[[1]], # DF for Station
", ", main_anova_summary$df[[4]], # DF for residuals
") = ", round(main_anova_summary$statistic[[1]], 2), # F-value
", p < .001", # p-value
", ηp2 = ", round(eta_main_anova$eta.sq.part[[1]], 2) # eta-squared
)
)
## [1] "Main effect of station: F(3, 194) = 23.45, p < .001, ηp2 = 0.27"
# reproduce the interaction between orientation and station
print(
str_c(
"Interaction between orientation and station: F(",
main_anova_summary$df[[3]], # DF for Station
", ", main_anova_summary$df[[4]], # DF for residuals
") = ", round(main_anova_summary$statistic[[3]], 2), # F-value
", p < .001", # p-value
", ηp2 = ", round(eta_main_anova$eta.sq.part[[3]], 2) # eta-squared
)
)
## [1] "Interaction between orientation and station: F(3, 194) = 16.28, p < .001, ηp2 = 0.2"
We decomposed this interaction by the subjective-distance ratings between participants traveling east and west for each of the four subway stations. Westbound participants rated the stations to the west of Bay Street as closer than did eastbound participants; this effect was obtained for both the station one stop to the west (St. George, p < .001, ηp2 = .28) and the station two stops to the west (Spadina, p = .001, ηp2 = .20). The opposite pattern held true for stations to the east of Bay Street. Eastbound participants rated the stations to the east of Bay Street as closer than did westbound participants; this effect was obtained for both the station one stop to the east (Bloor-Yonge, p = .053, ηp2 = .08) and the station two stops to the east (Sherbourne, p < .001, ηp2 = .24). Figure 1 summarizes these results.
# reproduce results for St. George
stg_comparison <- d_processed %>%
filter(STN_NAME == "STG") %>%
lm(
DISTANCE ~ DIRECTION,
data = .
)
summary(stg_comparison)
##
## Call:
## lm(formula = DISTANCE ~ DIRECTION, data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.7692 -0.6400 0.2308 0.3600 2.2308
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.7692 0.1824 15.184 < 2e-16 ***
## DIRECTIONWEST -1.1292 0.2605 -4.335 7.23e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9299 on 49 degrees of freedom
## Multiple R-squared: 0.2772, Adjusted R-squared: 0.2625
## F-statistic: 18.79 on 1 and 49 DF, p-value: 7.229e-05
etaSquared(stg_comparison)
## eta.sq eta.sq.part
## DIRECTION 0.2772092 0.2772092
print(
str_c(
"St.George, p < .001, ", # p-value
"ηp2 = ", round(etaSquared(stg_comparison)[[2]], 2) # eta-squared
)
)
## [1] "St.George, p < .001, ηp2 = 0.28"
## reproduce results for Spadina
spad_comparison <- d_processed %>%
filter(STN_NAME == "SPAD") %>%
lm(
DISTANCE ~ DIRECTION,
data = .
)
summary(spad_comparison)
##
## Call:
## lm(formula = DISTANCE ~ DIRECTION, data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.6538 -0.6400 0.3462 0.3600 2.3600
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 3.6538 0.2052 17.806 < 2e-16 ***
## DIRECTIONWEST -1.0138 0.2931 -3.459 0.00113 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.046 on 49 degrees of freedom
## Multiple R-squared: 0.1963, Adjusted R-squared: 0.1799
## F-statistic: 11.97 on 1 and 49 DF, p-value: 0.001131
etaSquared(spad_comparison)
## eta.sq eta.sq.part
## DIRECTION 0.1962763 0.1962763
print(
str_c(
"Spadina, p = ", round(tidy(spad_comparison)$p.value[[2]], 3), # p-value
", ηp2 = ", round(etaSquared(spad_comparison)[[2]], 2) # eta-squared
)
)
## [1] "Spadina, p = 0.001, ηp2 = 0.2"
## reproduce results for Bloor-Yonge
by_comparison <- d_processed %>%
filter(STN_NAME == "B-Y") %>%
lm(
DISTANCE ~ DIRECTION,
data = .
)
summary(by_comparison)
##
## Call:
## lm(formula = DISTANCE ~ DIRECTION, data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.1923 -0.6087 -0.1923 0.3913 2.8077
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.6087 0.2140 7.516 1.35e-09 ***
## DIRECTIONWEST 0.5836 0.2938 1.986 0.0528 .
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.026 on 47 degrees of freedom
## Multiple R-squared: 0.07745, Adjusted R-squared: 0.05782
## F-statistic: 3.945 on 1 and 47 DF, p-value: 0.05285
etaSquared(by_comparison)
## eta.sq eta.sq.part
## DIRECTION 0.0774451 0.0774451
print(
str_c(
"Bloor-Yonge, p = ", round(tidy(by_comparison)$p.value[[2]], 3), # p-value
", ηp2 = ", round(etaSquared(by_comparison)[[2]], 2) # eta-squared
)
)
## [1] "Bloor-Yonge, p = 0.053, ηp2 = 0.08"
## reproduce results for Sherbourne
sher_comparison <- d_processed %>%
filter(STN_NAME == "SHER") %>%
lm(
DISTANCE ~ DIRECTION,
data = .
)
summary(sher_comparison)
##
## Call:
## lm(formula = DISTANCE ~ DIRECTION, data = .)
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.0000 -0.7692 0.0000 1.0000 2.2308
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 2.7692 0.2217 12.491 < 2e-16 ***
## DIRECTIONWEST 1.2308 0.3166 3.887 0.000305 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.13 on 49 degrees of freedom
## Multiple R-squared: 0.2357, Adjusted R-squared: 0.2201
## F-statistic: 15.11 on 1 and 49 DF, p-value: 0.0003052
etaSquared(sher_comparison)
## eta.sq eta.sq.part
## DIRECTION 0.2356667 0.2356667
print(
str_c(
"Sherbourne, p < .001", # p-value
", ηp2 = ", round(etaSquared(sher_comparison)[[2]], 2) # eta-squared
)
)
## [1] "Sherbourne, p < .001, ηp2 = 0.24"
Were you able to reproduce the results you attempted to reproduce? If not, what part(s) were you unable to reproduce?
I did not fully reproduce the main ANOVA. The original statistics matched my reproduced statistics for the main effects of orientation and the interaction between orientation and station, but my F-statistic for the main effect of station was 23.45, whereas the authors report an F-statistic of 24.10. I wonder whether this might be a typo, since all the other model statistics match.
How difficult was it to reproduce your results?
It was pretty straightforward after figuring out some confusion about the variable operationalization and statistical tests.
What aspects made it difficult? What aspects made it easy?
- Variable operationalization was confusing. At first I thought that their “orientation” variable was literally a variable that detected whether the participant was going toward or away from a given station by comparing the station to their direction of travel. However, upon inspecting model results, I realized that they used direction of travel as the variable in all their models, and only referred to orientation in a conceptual manner in the writeup.
- It was not clear to me that the authors were reporting the partial eta-squared because they did not specify this. I was concerned because the eta-squared values for my ANOVA were different from theirs, but then I realized that the partial eta-squared values matched, so that was likely what they were reporting.
- It was not clear to me what statistical tests were used for the analyses of each individual station. At first I tried using post-hoc tests on the main ANOVA that would adjust for multiple comparisons (i.e., TukeyHSD), but it became clear that this was not what the authors had done. My next guess was that they used a simple linear regression (or one-way ANOVA), and this appeared to be correct based on the fact that my statistics match the statistics reported in their results section.