For this exercise, please try to reproduce the results from Experiment 2 of the associated paper (de la Fuente, Santiago, Roman, Dumitrache, & Casasanto, 2014). The PDF of the paper is included in the same folder as this Rmd file.

Methods summary:

Researchers tested the question of whether temporal focus differs between Moroccan and Spanish cultures, hypothesizing that Moroccans are more past-focused, whereas Spaniards are more future-focused. Two groups of participants (\(N = 40\) Moroccan and \(N=40\) Spanish) completed a temporal-focus questionnaire that contained questions about past-focused (“PAST”) and future-focused (“FUTURE”) topics. In response to each question, participants provided a rating on a 5-point Likert scale on which lower scores indicated less agreement and higher scores indicated greater agreement. The authors then performed a mixed-design ANOVA with agreement score as the dependent variable, group (Moroccan or Spanish, between-subjects) as the fixed-effects factor, and temporal focus (past or future, within-subjects) as the random effects factor. In addition, the authors performed unpaired two-sample t-tests to determine whether there was a significant difference between the two groups in agreement scores for PAST questions, and whether there was a significant difference in scores for FUTURE questions.


Target outcomes:

Below is the specific result you will attempt to reproduce (quoted directly from the results section of Experiment 2):

According to a mixed analysis of variance (ANOVA) with group (Spanish vs. Moroccan) as a between-subjects factor and temporal focus (past vs. future) as a within-subjectS factor, temporal focus differed significantly between Spaniards and Moroccans, as indicated by a significant interaction of temporal focus and group, F(1, 78) = 19.12, p = .001, ηp2 = .20 (Fig. 2). Moroccans showed greater agreement with past-focused statements than Spaniards did, t(78) = 4.04, p = .001, and Spaniards showed greater agreement with future-focused statements than Moroccans did, t(78) = −3.32, p = .001. (de la Fuente et al., 2014, p. 1685).


Step 1: Load packages

library(tidyverse) # for data munging
library(knitr) # for kable table formating
library(haven) # import and export 'SPSS', 'Stata' and 'SAS' Files
library(readxl) # import excel files
library(dplyr)

# #optional packages/functions:
library(afex) # anova functions
library(emmeans)
# library(ez) # anova functions 2
# library(scales) # for plotting
std.err <- function(x) sd(x)/sqrt(length(x)) # standard error

Step 2: Load data

# Just Experiment 2
data_path <- 'data/DeLaFuenteEtAl_2014_RawData.xls'
d <- read_excel(data_path, sheet=3)

Step 3: Tidy data

#Inspect dataset
glimpse(d)
## Rows: 1,680
## Columns: 5
## $ group                                                       <chr> "Moroccan"…
## $ participant                                                 <dbl> 1, 1, 1, 1…
## $ subscale                                                    <chr> "PAST", "P…
## $ item                                                        <chr> "1. Para m…
## $ `Agreement (0=complete disagreement; 5=complete agreement)` <dbl> 4, 4, 5, 2…
#Clean variables

d <- d %>%
  mutate(group = if_else(group == "young Spaniard", "Spaniard", group))
table(d$group)
## 
## Moroccan Spaniard 
##      840      840
#check for consistancy
d %>%
  group_by(participant) %>%
  summarise(rows_per_participant = n())
## # A tibble: 40 × 2
##    participant rows_per_participant
##          <dbl>                <int>
##  1           1                   42
##  2           2                   42
##  3           3                   42
##  4           4                   42
##  5           5                   42
##  6           6                   42
##  7           7                   42
##  8           8                   42
##  9           9                   42
## 10          10                   42
## # ℹ 30 more rows
#There are errors in the data: participant number 25, 24 and 40

# Inspect rows for participants 24, 25, and 40
problem_participants <- d %>%
  filter(participant %in% c(24, 25, 40)) %>%
  arrange(participant, subscale, item)

problem_participants
## # A tibble: 126 × 5
##    group    participant subscale item                     Agreement (0=complet…¹
##    <chr>          <dbl> <chr>    <chr>                                     <dbl>
##  1 Moroccan          24 FUTURE   12.Entiendo que las cre…                      3
##  2 Spaniard          24 FUTURE   12.Entiendo que las cre…                      4
##  3 Moroccan          24 FUTURE   13. Los valores y creen…                      5
##  4 Spaniard          24 FUTURE   13. Los valores y creen…                      4
##  5 Moroccan          24 FUTURE   14. Veo muy positiva la…                      5
##  6 Spaniard          24 FUTURE   14. Veo muy positiva la…                      4
##  7 Moroccan          24 FUTURE   15.Los avances en tecno…                      3
##  8 Spaniard          24 FUTURE   15.Los avances en tecno…                      4
##  9 Moroccan          24 FUTURE   16. Los valores y creen…                      3
## 10 Spaniard          24 FUTURE   16. Los valores y creen…                      3
## # ℹ 116 more rows
## # ℹ abbreviated name:
## #   ¹​`Agreement (0=complete disagreement; 5=complete agreement)`
# Count rows per participant
d %>%
  filter(participant %in% c(24, 25, 40)) %>%
  group_by(participant) %>%
  summarise(rows_per_participant = n())
## # A tibble: 3 × 2
##   participant rows_per_participant
##         <dbl>                <int>
## 1          24                   34
## 2          25                    8
## 3          40                   84
# Check for duplicates in these participants
d %>%
  filter(participant %in% c(24, 25, 40)) %>%
  group_by(participant, group, subscale, item, `Agreement (0=complete disagreement; 5=complete agreement)`) %>%
  filter(n() > 1) %>%
  summarise(duplicate_count = n(), .groups = 'drop')
## # A tibble: 11 × 6
##    participant group    subscale item     Agreement (0=complet…¹ duplicate_count
##          <dbl> <chr>    <chr>    <chr>                     <dbl>           <int>
##  1          40 Moroccan FUTURE   16. Los…                      3               2
##  2          40 Moroccan FUTURE   18. Los…                      5               2
##  3          40 Moroccan FUTURE   21. Los…                      3               2
##  4          40 Moroccan PAST     4. La j…                      5               2
##  5          40 Moroccan PAST     6. El m…                      3               2
##  6          40 Moroccan PAST     7. Me c…                      3               2
##  7          40 Moroccan PAST     8. La f…                      5               2
##  8          40 Moroccan PAST     9. La f…                      5               2
##  9          40 Spaniard FUTURE   17. Par…                      2               2
## 10          40 Spaniard PAST     10. Con…                      1               2
## 11          40 Spaniard PAST     6. El m…                      2               2
## # ℹ abbreviated name:
## #   ¹​`Agreement (0=complete disagreement; 5=complete agreement)`
#I cant see a clear pattern, so I have to delete this cases
# Remove problematic participants
d <- d %>%
  filter(!participant %in% c(24, 25, 40))

Step 4: Run analysis

Pre-processing

d_summary <- d %>%
  group_by(group, participant, subscale) %>%
  summarise(
    mean_agreement = mean(`Agreement (0=complete disagreement; 5=complete agreement)`),
    .groups = "drop"
  )

d_summary <- d_summary %>%
  mutate(
    subscale = factor(subscale, levels = c("PAST", "FUTURE")),
    group = factor(group, levels = c("Spaniard", "Moroccan"))
  )

d_summary <- d_summary %>%
  mutate(
    participant_id = paste(group, participant, sep = "_"),
    group = factor(group, levels = c("Spaniard", "Moroccan")),
    subscale = factor(subscale, levels = c("PAST", "FUTURE"))
  )

Descriptive statistics

Try to recreate Figure 2 (fig2.png, also included in the same folder as this Rmd file):

# Create a compact dataset for plotting
plot_data <- d_summary %>%
  group_by(group, subscale, participant) %>%
  summarise(
    mean_participant = mean(mean_agreement, na.rm = TRUE),
    .groups = "drop_last"
  ) %>%
  group_by(group, subscale) %>%
  summarise(
    mean_agreement = mean(mean_participant, na.rm = TRUE),
    se_agreement = std.err(mean_participant),
    .groups = "drop"
  )

#  labels for plotting
plot_data <- plot_data %>%
  mutate(
    subscale_label = factor(
      subscale,
      levels = c("PAST", "FUTURE"),
      labels = c("Past-Focused Statements", "Future-Focused Statements")
    )
  )

# bar plot
ggplot(plot_data, aes(x = group, y = mean_agreement, fill = subscale_label)) +
  geom_bar(stat = "identity", position = position_dodge(width = 0.9)) +
  geom_errorbar(
    aes(
      ymin = mean_agreement - se_agreement,
      ymax = mean_agreement + se_agreement
    ),
    width = 0.2,
    position = position_dodge(width = 0.9)
  ) +
  scale_fill_manual(
    name = NULL,
    values = c(
      "Past-Focused Statements" = "gray40",
      "Future-Focused Statements" = "gray80"
    )
  ) +
  scale_y_continuous(breaks = seq(2, 4, 0.25)) + # tick marks only
  coord_cartesian(ylim = c(2, 4)) +            # zoom without dropping rows
  labs(
    y = "Rating",
    x = NULL
  ) +
  theme_classic(base_size = 14) +
  theme(
    legend.position = "top",
    legend.text = element_text(size = 11)
  )

Inferential statistics

According to a mixed analysis of variance (ANOVA) with group (Spanish vs. Moroccan) as a between-subjects factor and temporal focus (past vs. future) as a within-subjects factor, temporal focus differed significantly between Spaniards and Moroccans, as indicated by a significant interaction of temporal focus and group, F(1, 78) = 19.12, p = .001, ηp2 = .20 (Fig. 2).

# reproduce the above results here
anova_result <- aov_ez(
  id = "participant_id",       # unique participant identifier
  dv = "mean_agreement",
  data = d_summary,
  within = "subscale",
  between = "group",
  type = 3
)

anova_result
## Anova Table (Type 3 tests)
## 
## Response: mean_agreement
##           Effect    df  MSE         F  ges p.value
## 1          group 1, 72 0.21      1.44 .006    .235
## 2       subscale 1, 72 0.52   7.24 ** .067    .009
## 3 group:subscale 1, 72 0.52 15.78 *** .136   <.001
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '+' 0.1 ' ' 1

Moroccans showed greater agreement with past-focused statements than Spaniards did, t(78) = 4.04, p = .001,

# reproduce the above results here
past_data <- d_summary %>%
  filter(subscale == "PAST") %>%
  mutate(group = factor(group, levels = c("Spaniard", "Moroccan")))

t_test_result <- t.test(
  mean_agreement ~ group,
  data = past_data,
  var.equal = TRUE  # assumes equal variances like in classical reporting
)

t_test_result
## 
##  Two Sample t-test
## 
## data:  mean_agreement by group
## t = -3.5082, df = 72, p-value = 0.0007816
## alternative hypothesis: true difference in means between group Spaniard and group Moroccan is not equal to 0
## 95 percent confidence interval:
##  -0.8785148 -0.2418783
## sample estimates:
## mean in group Spaniard mean in group Moroccan 
##               2.705160               3.265356

and Spaniards showed greater agreement with future-focused statements than Moroccans did, t(78) = −3.32, p = .001.(de la Fuente et al., 2014, p. 1685)

# reproduce the above results here
future_data <- d_summary %>%
  filter(subscale == "FUTURE") %>%
  mutate(group = factor(group, levels = c("Spaniard", "Moroccan")))

t_test_future <- t.test(
  mean_agreement ~ group,
  data = future_data,
  var.equal = TRUE  # assumes equal variances
)

t_test_future
## 
##  Two Sample t-test
## 
## data:  mean_agreement by group
## t = 3.251, df = 72, p-value = 0.001751
## alternative hypothesis: true difference in means between group Spaniard and group Moroccan is not equal to 0
## 95 percent confidence interval:
##  0.1474087 0.6147535
## sample estimates:
## mean in group Spaniard mean in group Moroccan 
##               3.494595               3.113514

Step 5: Reflection

Were you able to reproduce the results you attempted to reproduce? If not, what part(s) were you unable to reproduce?

I was not able to reproduce the results. The discrepancies were due to errors in the dataset, specifically participants 24, 25, and 40, who had missing or duplicated rows compared to other participants. These inconsistencies prevented accurate replication of the original analysis.

How difficult was it to reproduce your results?

Reproducing the results was moderately difficult. Ihe errors with a few participants required additional inspection and cleaning before any meaningful analysis could be performed. I also went back and forward before I realized what the error was.

What aspects made it difficult? What aspects made it easy?

The difficulty arose from irregularities in the data: participants 24 and 25 had fewer rows than expected, and participant 40 had duplicate rows. These anomalies meant that simple aggregation or analysis scripts could not directly reproduce the original results and I kept having wornings and errors until I figured out what the problems were.