For this exercise, please try to reproduce the results from Experiment 2 of the associated paper (de la Fuente, Santiago, Roman, Dumitrache, & Casasanto, 2014). The PDF of the paper is included in the same folder as this Rmd file.

Methods summary:

Researchers tested the question of whether temporal focus differs between Moroccan and Spanish cultures, hypothesizing that Moroccans are more past-focused, whereas Spaniards are more future-focused. Two groups of participants (\(N = 40\) Moroccan and \(N=40\) Spanish) completed a temporal-focus questionnaire that contained questions about past-focused (“PAST”) and future-focused (“FUTURE”) topics. In response to each question, participants provided a rating on a 5-point Likert scale on which lower scores indicated less agreement and higher scores indicated greater agreement. The authors then performed a mixed-design ANOVA with agreement score as the dependent variable, group (Moroccan or Spanish, between-subjects) as the fixed-effects factor, and temporal focus (past or future, within-subjects) as the random effects factor. In addition, the authors performed unpaired two-sample t-tests to determine whether there was a significant difference between the two groups in agreement scores for PAST questions, and whether there was a significant difference in scores for FUTURE questions.


Target outcomes:

Below is the specific result you will attempt to reproduce (quoted directly from the results section of Experiment 2):

According to a mixed analysis of variance (ANOVA) with group (Spanish vs. Moroccan) as a between-subjects factor and temporal focus (past vs. future) as a within-subjectS factor, temporal focus differed significantly between Spaniards and Moroccans, as indicated by a significant interaction of temporal focus and group, F(1, 78) = 19.12, p = .001, ηp2 = .20 (Fig. 2). Moroccans showed greater agreement with past-focused statements than Spaniards did, t(78) = 4.04, p = .001, and Spaniards showed greater agreement with future-focused statements than Moroccans did, t(78) = −3.32, p = .001. (de la Fuente et al., 2014, p. 1685).


Step 1: Load packages

library(tidyverse) # for data munging
library(knitr) # for kable table formating
library(haven) # import and export 'SPSS', 'Stata' and 'SAS' Files
library(readxl) # import excel files

# #optional packages/functions:
# library(afex) # anova functions
# library(ez) # anova functions 2
# library(scales) # for plotting
# std.err <- function(x) sd(x)/sqrt(length(x)) # standard error

Step 2: Load data

# Just Experiment 2
data_path <- 'data/DeLaFuenteEtAl_2014_RawData.xls'
d <- read_excel(data_path, sheet=3)

Step 3: Tidy data

# Check the data
dim(d)
## [1] 1680    5
summary(d)
##     group            participant      subscale             item          
##  Length:1680        Min.   : 1.00   Length:1680        Length:1680       
##  Class :character   1st Qu.:10.75   Class :character   Class :character  
##  Mode  :character   Median :20.50   Mode  :character   Mode  :character  
##                     Mean   :20.88                                        
##                     3rd Qu.:31.25                                        
##                     Max.   :40.00                                        
##  Agreement (0=complete disagreement; 5=complete agreement)
##  Min.   :1.000                                            
##  1st Qu.:2.000                                            
##  Median :3.000                                            
##  Mean   :3.138                                            
##  3rd Qu.:4.000                                            
##  Max.   :5.000
# Check the balance of items per person (we'd want each participant to have the same number of past and future items and both groups to have the same count overall) 
d %>%
  group_by(group, subscale, participant) %>%
  summarise(n_items = n()) %>%
  group_by(group, subscale) %>%
  summarise(
    min_items = min(n_items),
    max_items = max(n_items),
    mean_items = mean(n_items),
    n_participants = n()
  )
## # A tibble: 4 × 6
## # Groups:   group [2]
##   group          subscale min_items max_items mean_items n_participants
##   <chr>          <chr>        <int>     <int>      <dbl>          <int>
## 1 Moroccan       FUTURE           4        20       10               40
## 2 Moroccan       PAST            11        22       11.3             39
## 3 young Spaniard FUTURE           4        20       10               40
## 4 young Spaniard PAST            11        22       11.3             39
# Check for missing or uneven sampling
table(d$group)
## 
##       Moroccan young Spaniard 
##            840            840
table(d$subscale)
## 
## FUTURE   PAST 
##    800    880
table(d$group, d$subscale)
##                 
##                  FUTURE PAST
##   Moroccan          400  440
##   young Spaniard    400  440
# Check per-participant averages
d_summary <- d %>%
  group_by(group, participant, subscale) %>%
  summarise(mean_agree = mean(`Agreement (0=complete disagreement; 5=complete agreement)`, na.rm = TRUE),
            .groups = "drop")

d_summary %>%
  group_by(group, subscale) %>%
  summarise(mean = mean(mean_agree), sd = sd(mean_agree), n = n())
## # A tibble: 4 × 5
## # Groups:   group [2]
##   group          subscale  mean    sd     n
##   <chr>          <chr>    <dbl> <dbl> <int>
## 1 Moroccan       FUTURE    3.14 0.573    40
## 2 Moroccan       PAST      3.28 0.715    39
## 3 young Spaniard FUTURE    3.49 0.403    40
## 4 young Spaniard PAST      2.69 0.633    39

Step 4: Run analysis

Pre-processing

unique(d$group)
## [1] "Moroccan"       "young Spaniard"
# Rename columns
names(d) <- c("group", "participant", "subscale", "item", "agreement")

# Recode group names properly
d <- d %>%
  mutate(
    group = recode_factor(group,
                          "young Spaniard" = "Spanish",
                          "Moroccan" = "Moroccan"),
    subscale = factor(subscale, levels = c("PAST", "FUTURE"))
  ) %>%
  drop_na()

# Check that both groups exist
print(table(d$group))
## 
##  Spanish Moroccan 
##      840      840
d_summary <- d %>%
  group_by(group, participant, subscale) %>%
  summarise(mean_agree = mean(agreement, na.rm = TRUE), .groups = "drop")

# Verify that both groups and subscales are balanced
print(table(d_summary$group, d_summary$subscale))
##           
##            PAST FUTURE
##   Spanish    39     40
##   Moroccan   39     40

Descriptive statistics

Try to recreate Figure 2 (fig2.png, also included in the same folder as this Rmd file):

# Calculate group means and standard deviations
descriptives <- d_summary %>%
  group_by(group, subscale) %>%
  summarise(
    mean = mean(mean_agree),
    sd = sd(mean_agree),
    n = n()
  )

# Print descriptive summary
print(descriptives)
## # A tibble: 4 × 5
## # Groups:   group [2]
##   group    subscale  mean    sd     n
##   <fct>    <fct>    <dbl> <dbl> <int>
## 1 Spanish  PAST      2.69 0.633    39
## 2 Spanish  FUTURE    3.49 0.403    40
## 3 Moroccan PAST      3.28 0.715    39
## 4 Moroccan FUTURE    3.14 0.573    40
# Visualize (Figure 2 reproduction)
ggplot(descriptives, aes(x = group, y = mean, fill = subscale)) +
  geom_bar(stat = "identity", position = position_dodge(width = 0.8)) +
  geom_errorbar(aes(ymin = mean - sd / sqrt(n),
                    ymax = mean + sd / sqrt(n)),
                width = .15,
                position = position_dodge(0.8)) +
  labs(
    title = "Mean agreement ratings for past- and future-focused statements",
    x = "Group",
    y = "Rating",
    fill = "Focus"
  ) +
  theme_minimal(base_size = 12)

Inferential statistics

According to a mixed analysis of variance (ANOVA) with group (Spanish vs. Moroccan) as a between-subjects factor and temporal focus (past vs. future) as a within-subjects factor, temporal focus differed significantly between Spaniards and Moroccans, as indicated by a significant interaction of temporal focus and group, F(1, 78) = 19.12, p = .001, ηp2 = .20 (Fig. 2).

# Collapse properly to participant-level means
anova_data <- d %>%
  group_by(group, participant, subscale) %>%
  summarise(mean_agree = mean(agreement, na.rm = TRUE),  # <-- fixed here
            .groups = "drop") %>%
  ungroup()

# Sanity check
nrow(anova_data)        # should be 158
## [1] 158
table(anova_data$group, anova_data$subscale)
##           
##            PAST FUTURE
##   Spanish    39     40
##   Moroccan   39     40
# Run ANOVA using the properly collapsed dataset
anova_res <- aov(mean_agree ~ group * subscale + Error(participant/subscale),
                 data = anova_data)
summary(anova_res)
## 
## Error: participant
##          Df Sum Sq Mean Sq
## subscale  1 0.6685  0.6685
## 
## Error: participant:subscale
##          Df Sum Sq Mean Sq
## subscale  1  2.806   2.806
## 
## Error: Within
##                 Df Sum Sq Mean Sq F value   Pr(>F)    
## group            1   0.49   0.488    1.40   0.2387    
## subscale         1   1.58   1.584    4.54   0.0347 *  
## group:subscale   1   8.82   8.820   25.28 1.38e-06 ***
## Residuals      152  53.03   0.349                     
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Moroccans showed greater agreement with past-focused statements than Spaniards did, t(78) = 4.04, p = .001,

# reproduce the above results here
# PAST comparison (Moroccan vs Spanish)
past_df <- subset(d_summary, subscale == "PAST")

t_past <- t.test(mean_agree ~ group, data = past_df, var.equal = TRUE)
t_past
## 
##  Two Sample t-test
## 
## data:  mean_agree by group
## t = -3.8562, df = 76, p-value = 0.0002394
## alternative hypothesis: true difference in means between group Spanish and group Moroccan is not equal to 0
## 95 percent confidence interval:
##  -0.8943343 -0.2851528
## sample estimates:
##  mean in group Spanish mean in group Moroccan 
##               2.691142               3.280886

and Spaniards showed greater agreement with future-focused statements than Moroccans did, t(78) = −3.32, p = .001.(de la Fuente et al., 2014, p. 1685)

# reproduce the above results here
# FUTURE comparison (Moroccan vs Spanish
future_df <- subset(d_summary, subscale == "FUTURE")

t_future <- t.test(mean_agree ~ group, data = future_df, var.equal = TRUE)
t_future
## 
##  Two Sample t-test
## 
## data:  mean_agree by group
## t = 3.2098, df = 78, p-value = 0.001929
## alternative hypothesis: true difference in means between group Spanish and group Moroccan is not equal to 0
## 95 percent confidence interval:
##  0.1349746 0.5758588
## sample estimates:
##  mean in group Spanish mean in group Moroccan 
##               3.493750               3.138333

Step 5: Reflection

Were you able to reproduce the results you attempted to reproduce? If not, what part(s) were you unable to reproduce?

ANSWER HERE Partially. I successfully reproduced the direction and pattern of the reported finding, though not the exact numerical values. For the mixed ANOVA, I obtained a singificant interaction between “group” and “temporal focus”, F(1, 151) = 25.28, p < 0.001, ηp2 ~= 0.14. This mirrors the paper’s main result (F(1, 78) = 19.12, p = 0.001, ηp2 = 0.20), indicating that the two groups differed in how they focused on past vs. future. For the PAST-focused comparison, Moroccans showed greater agreement with past-focused statements (t(76) = -3.86, p < 0.001), consistent in direction with the paper’s t(78) = 4.04, p = 0.01. For the future-focused comparison, Spaniards showed greater agreement with future-focused statements (t(78) = 3.21, p = 0.002), closely matcing the reported t(78) = -3.32, p = 0.001. My figure also successfully reproduced the qualitative pattern reported in the original paper (Moroccans rated past items higher than Spaniards did, while Spaniards rated future items higher).

How difficult was it to reproduce your results?

ANSWER HERE Quite difficult. While the overall steps were straightforward, matching the exact structure of the authors’ analysis was challenging. The data required multiple preprocessing steps to achieve the correct participant-level summaries, and the mixed-design ANOVA in R did not yield identical degrees of freedom to those reported in the paper.

What aspects made it difficult? What aspects made it easy?

ANSWER HERE The main difficulty was identifying how the original authors handled the repeated-measures structure. SPSS and R compute error terms differently for within-subject factors, which likely caused the discrepancy in the denominator degrees of freedom (151 vs. 78). It also took some time to confirm that each participant contributed one mean per condition (past and future), ensuring a balanced design. What made it easier was that the dataset was clean, well-labeled, and produced group means that closely matchedthe paper. Once the data were properly summarized, reproducing the qualitative pattern of results and generating a similar figure was relatively straightforward.