For this exercise, please try to reproduce the results from Study 1 of the associated paper (Maglio & Polman, 2014). The PDF of the paper is included in the same folder as this Rmd file.

Methods summary:

Researchers recruited 202 volunteers at a subway station in Toronto, Ontario, Canada. Half of the sample was traveling East, while the other half was traveling West. In a 2 (orientation: toward, away from) X 4 (station: Spadina, St. George, Bloor-Yonge, Sherbourne) design, each participant was randomly asked to estimate how far one of the four stations felt to them (1= very close, 7= very far). Authors conducted a 2 X 4 ANOVA on distance estimates, and then tested differences in distance estimates between East and West-bound groups for each individual station.

Target outcomes:

Below is the specific result you will attempt to reproduce (quoted directly from the results section of Study 1):

We carried out a 2 (orientation: toward, away from) × 4 (station: Spadina, St. George, Bloor-Yonge, Sherbourne) analysis of variance (ANOVA) on closeness ratings, which revealed no main effect of orientation, F < 1, and a main effect of station, F(3, 194) = 24.10, p < .001, ηp 2 = .27. This main effect was qualified by the predicted interaction between orientation and station, F(3, 194) = 16.28, p < .001, ηp2 = .20. We decomposed this interaction by the subjective-distance ratings between participants traveling east and west for each of the four subway stations. Westbound participants rated the stations to the west of Bay Street as closer than did eastbound participants; this effect was obtained for both the station one stop to the west (St. George, p < .001, ηp2 = .28) and the station two stops to the west (Spadina, p = .001, ηp2 = .20). The opposite pattern held true for stations to the east of Bay Street. Eastbound participants rated the stations to the east of Bay Street as closer than did westbound participants; this effect was obtained for both the station one stop to the east (Bloor-Yonge, p = .053, ηp2 = .08) and the station two stops to the east (Sherbourne, p < .001, ηp2 = .24). Figure 1 summarizes these results.

Step 1: Load packages

library(tidyverse) # for data munging
library(knitr) # for kable table formating
library(haven) # import and export 'SPSS', 'Stata' and 'SAS' Files
library(readxl) # import excel files
library(lsr)
library(ggthemes)

Step 2: Load data

# Just Study 1
d <- read_excel("data/S1_Subway.xlsx")

Step 3: Tidy data

The data are already tidy as provided by the authors.

Step 4: Run analysis

Pre-processing

You will notice that R doesn’t recognize some of these variables as factors. So let’s just factorize our variables before we proceed with the analyses.

d = d %>%
  mutate(
    DIRECTION = as.factor(DIRECTION),
    STN_NUMBER = as.factor(STN_NUMBER),
    STN_NAME = factor(STN_NAME, levels = c("SPAD", "STG", "B-Y", "SHER"))
  )

Inferential statistics

We carried out a 2 (orientation: toward, away from) × 4 (station: Spadina, St. George, Bloor-Yonge, Sherbourne) analysis of variance (ANOVA) on closeness ratings, which revealed no main effect of orientation, F < 1, and a main effect of station, F(3, 194) = 24.10, p < .001, ηp 2 = .27. This main effect was qualified by the predicted interaction between orientation and station, F(3, 194) = 16.28, p < .001, ηp2 = .20.

Note from teaching team: ηp2 = “partial eta squared”

First, check the number of observations for each condition

d %>%
  group_by(DIRECTION, STN_NAME) %>%
  summarize(
    num_obs = n(),  # Count of observations for each combination of DIRECTION and STN_NAME
    .groups = "keep"
  ) %>%
  ungroup()

## # A tibble: 8 × 3
##   DIRECTION STN_NAME num_obs
##   <fct>     <fct>      <int>
## 1 EAST      SPAD          26
## 2 EAST      STG           26
## 3 EAST      B-Y           23
## 4 EAST      SHER          26
## 5 WEST      SPAD          25
## 6 WEST      STG           25
## 7 WEST      B-Y           26
## 8 WEST      SHER          25

Looks pretty even to me!

Now run the model

When I see “analysis of variance” in the paper, my first thought is to use the aov function in R. This is a wrapper function that fits a linear model to the data.

Note on statistics: the goal of this assignment is for you to get used to reproducing analyses. It is not intended to teach you about the subtleties of statistics; that will be covered in PSYC201B. We are happy to chat about these details in office hours, but do not worry if you do not understand them right now!

Now fill in the aov command with the proper variables (e.g. replace “VAR1” with the first variable of interest. This is a two-way ANOVA that tests for a main effect of direction (east vs west), a main effect of station, and an interaction between direction and station. The interaction term is denoted by the * below. For more information, do a little Googling about two-way ANOVAs.

mod <- aov(data = d,
             formula = DISTANCE ~ DIRECTION + STN_NAME + DIRECTION * STN_NAME)
anova(mod)

## Analysis of Variance Table
## 
## Response: DISTANCE
##                     Df  Sum Sq Mean Sq F value    Pr(>F)    
## DIRECTION            1   0.713  0.7129  0.6644     0.416    
## STN_NAME             3  75.158 25.0525 23.3492 6.011e-13 ***
## DIRECTION:STN_NAME   3  52.413 17.4710 16.2832 1.765e-09 ***
## Residuals          194 208.152  1.0729                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Check your results against the instructional team

Note: the values in the parentheses denote the degrees of freedom for the F-test. * Main effect of direction: F(1, 194) = 0.664 * Main effect of station: F(3, 194) = 23.35 * Interaction between direction and station: F(3,194) = 16.28

Note from instructors: We’re unclear as to why the F-values are slightly different from those in the paper. But students in previous versions of this class came to the same answer.

Effect size

Get the partial eta-squared (under the eta.sq.part column). Do your results match the original paper? Hint: they should!

etaSquared(mod, type = 2, anova = TRUE)

##                         eta.sq eta.sq.part          SS  df         MS
## DIRECTION          0.001195964 0.001929304   0.4023651   1  0.4023651
## STN_NAME           0.223393543 0.265284110  75.1575503   3 25.0525168
## DIRECTION:STN_NAME 0.155789423 0.201151614  52.4131149   3 17.4710383
## Residuals          0.618698140          NA 208.1521070 194  1.0729490
##                             F            p
## DIRECTION           0.3750086 5.410039e-01
## STN_NAME           23.3492148 6.010747e-13
## DIRECTION:STN_NAME 16.2831954 1.765498e-09
## Residuals                  NA           NA

Station-specific analyses

We decomposed this interaction by the subjective-distance ratings between participants traveling east and west for each of the four subway stations. Westbound participants rated the stations to the west of Bay Street as closer than did eastbound participants; this effect was obtained for both the station one stop to the west (St. George, p < .001, ηp2 = .28) and the station two stops to the west (Spadina, p = .001, ηp2 = .20). The opposite pattern held true for stations to the east of Bay Street. Eastbound participants rated the stations to the east of Bay Street as closer than did westbound participants; this effect was obtained for both the station one stop to the east (Bloor-Yonge, p = .053, ηp2 = .08) and the station two stops to the east (Sherbourne, p < .001, ηp2 = .24). Figure 1 summarizes these results.

St. George

# STEP 1: filter the data to get only St. George estimates
st.george = d %>%
  filter(STN_NAME =="STG")
  


# STEP 2: was the estimated distance to St. George greater for eastbound or westbound travellers?
# no stats needed, just get the mean estimated distance between east and west directions
# Hint: you'll want group_by() and summarize()
st.george %>% 
  group_by(DIRECTION, STN_NAME) %>%
  summarize(
    num_obs = n(),  # Count of observations for each combination of DIRECTION and STN_NAME
    .groups = "keep"
  ) %>%
  ungroup()

## # A tibble: 2 × 3
##   DIRECTION STN_NAME num_obs
##   <fct>     <fct>      <int>
## 1 EAST      STG           26
## 2 WEST      STG           25

# STEP 3: Now run an aov on St. George data with distance as the response variable and direction as the explanatory variable
st.george.aov = aov(data = st.george, formula = DISTANCE ~ DIRECTION)
summary(st.george.aov)

##             Df Sum Sq Mean Sq F value   Pr(>F)    
## DIRECTION    1  16.25  16.252   18.79 7.23e-05 ***
## Residuals   49  42.38   0.865                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# STEP 4: Find the eta squared

etaSquared(st.george.aov, type = 2, anova = TRUE)

##              eta.sq eta.sq.part       SS df         MS        F           p
## DIRECTION 0.2772092   0.2772092 16.25207  1 16.2520664 18.79278 7.22865e-05
## Residuals 0.7227908          NA 42.37538 49  0.8648038       NA          NA

Spadina

# STEP 1: filter the data to get only St. George estimates
spadina = d %>%
  filter(STN_NAME =="SPAD")
  


# STEP 2: was the estimated distance to St. George greater for eastbound or westbound travellers?
# no stats needed, just get the mean estimated distance between east and west directions
# Hint: you'll want group_by() and summarize()
spadina %>% 
  group_by(DIRECTION, STN_NAME) %>%
  summarize(
    num_obs = n(),  # Count of observations for each combination of DIRECTION and STN_NAME
    .groups = "keep"
  ) %>%
  ungroup()

## # A tibble: 2 × 3
##   DIRECTION STN_NAME num_obs
##   <fct>     <fct>      <int>
## 1 EAST      SPAD          26
## 2 WEST      SPAD          25

# STEP 3: Now run an aov on St. George data with distance as the response variable and direction as the explanatory variable
spadina.aov = aov(data = st.george, formula = DISTANCE ~ DIRECTION)
summary(spadina.aov)

##             Df Sum Sq Mean Sq F value   Pr(>F)    
## DIRECTION    1  16.25  16.252   18.79 7.23e-05 ***
## Residuals   49  42.38   0.865                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# STEP 4: Find the eta squared
etaSquared(spadina.aov, type = 2, anova = TRUE)

##              eta.sq eta.sq.part       SS df         MS        F           p
## DIRECTION 0.2772092   0.2772092 16.25207  1 16.2520664 18.79278 7.22865e-05
## Residuals 0.7227908          NA 42.37538 49  0.8648038       NA          NA

Bloor-Yonge

## do the same for Bloor-Yonge



# STEP 1: filter the data to get only St. George estimates
bloor_yonge = d %>%
  filter(STN_NAME =="B-Y")
  


# STEP 2: was the estimated distance to St. George greater for eastbound or westbound travellers?
# no stats needed, just get the mean estimated distance between east and west directions
# Hint: you'll want group_by() and summarize()
bloor_yonge %>% 
  group_by(DIRECTION, STN_NAME) %>%
  summarize(
    num_obs = n(),  # Count of observations for each combination of DIRECTION and STN_NAME
    .groups = "keep"
  ) %>%
  ungroup()

## # A tibble: 2 × 3
##   DIRECTION STN_NAME num_obs
##   <fct>     <fct>      <int>
## 1 EAST      B-Y           23
## 2 WEST      B-Y           26

# STEP 3: Now run an aov on St. George data with distance as the response variable and direction as the explanatory variable
bloor_yonge.aov = aov(data = st.george, formula = DISTANCE ~ DIRECTION)
summary(bloor_yonge.aov)

##             Df Sum Sq Mean Sq F value   Pr(>F)    
## DIRECTION    1  16.25  16.252   18.79 7.23e-05 ***
## Residuals   49  42.38   0.865                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# STEP 4: Find the eta squared
etaSquared(bloor_yonge.aov, type = 2, anova = TRUE)

##              eta.sq eta.sq.part       SS df         MS        F           p
## DIRECTION 0.2772092   0.2772092 16.25207  1 16.2520664 18.79278 7.22865e-05
## Residuals 0.7227908          NA 42.37538 49  0.8648038       NA          NA

Sherbourne

## do the same for Sherbourne


# STEP 1: filter the data to get only St. George estimates
sherbourne = d %>%
  filter(STN_NAME =="SHER")
  


# STEP 2: was the estimated distance to St. George greater for eastbound or westbound travellers?
# no stats needed, just get the mean estimated distance between east and west directions
# Hint: you'll want group_by() and summarize()
sherbourne %>% 
  group_by(DIRECTION, STN_NAME) %>%
  summarize(
    num_obs = n(),  # Count of observations for each combination of DIRECTION and STN_NAME
    .groups = "keep"
  ) %>%
  ungroup()

## # A tibble: 2 × 3
##   DIRECTION STN_NAME num_obs
##   <fct>     <fct>      <int>
## 1 EAST      SHER          26
## 2 WEST      SHER          25

# STEP 3: Now run an aov on St. George data with distance as the response variable and direction as the explanatory variable
sherbourne.aov = aov(data = st.george, formula = DISTANCE ~ DIRECTION)
summary(sherbourne.aov)

##             Df Sum Sq Mean Sq F value   Pr(>F)    
## DIRECTION    1  16.25  16.252   18.79 7.23e-05 ***
## Residuals   49  42.38   0.865                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

# STEP 4: Find the eta squared

etaSquared(sherbourne.aov, type = 2, anova = TRUE)

##              eta.sq eta.sq.part       SS df         MS        F           p
## DIRECTION 0.2772092   0.2772092 16.25207  1 16.2520664 18.79278 7.22865e-05
## Residuals 0.7227908          NA 42.37538 49  0.8648038       NA          NA

Step 5: Reflection

Were you able to reproduce the results you attempted to reproduce? If not, what part(s) were you unable to reproduce?

yes

How difficult was it to reproduce your results?

It wasn’t too diffult.

What aspects made it difficult? What aspects made it easy?

I think it’s simply pattern recognition but I am not used to the syntax yet. I think only syntax was confusing

Bonus: Reproduce Fig 1

library(ggplot2)
library(dplyr)

# Calculate means and standard errors for each group
summary_data <- d %>%
  group_by(STN_NAME, DIRECTION) %>%
  summarize(
    mean_distance = mean(DISTANCE, na.rm = TRUE),
    se_distance = sd(DISTANCE, na.rm = TRUE) / sqrt(n()),  # Calculate standard error (±1 SE)
    .groups = "drop"
  )

# Plotting with ±1 SE error bars and transparent background
ggplot(summary_data, aes(x = STN_NAME, y = mean_distance, color = DIRECTION, group = DIRECTION)) +
  geom_line(size = 1) +                    # Thicker lines for better visibility
  geom_point(size = 1) +                     # Larger points
  geom_errorbar(aes(ymin = mean_distance - se_distance, ymax = mean_distance + se_distance), 
                width = 0.15, size = 0.8) +  # Error bars with ±1 SE
  scale_x_discrete(limits = c("SPAD", "STG", "B-Y", "SHER")) +  # Fixed order of stations
  scale_y_continuous(limits = c(0, 5), expand = c(0, 0)) +       # Set y-axis limits from 0 to 1
  scale_color_manual(values = c("West" = "black", "East" = "gray")) +  # Custom colors for directions
  labs(
    title = "Subjective-Distance Rating by Station and Orientation",
    x = "Subway Station",
    y = "Subjective-Distance Rating"
  ) +
  theme_minimal(base_size = 14) +            # Larger base font size for readability
  theme(
    legend.position = "top",                 # Position the legend at the top
    axis.title.x = element_text(margin = margin(t = 10)),  # Add space for x-axis label
    axis.title.y = element_text(margin = margin(r = 10)),  # Add space for y-axis label
    panel.background = element_rect(fill = "transparent", color = NA),    # Transparent panel background
    plot.background = element_rect(fill = "transparent", color = NA),     # Transparent plot background
    legend.background = element_rect(fill = "transparent", color = NA)    # Transparent legend background
  )

Reproducibility Report: Maglio & Polman