For this exercise, please try to reproduce the results from Experiment 2 of the associated paper (de la Fuente, Santiago, Roman, Dumitrache, & Casasanto, 2014). The PDF of the paper is included in the same folder as this Rmd file.

Methods summary:

Researchers tested the question of whether temporal focus differs between Moroccan and Spanish cultures, hypothesizing that Moroccans are more past-focused, whereas Spaniards are more future-focused. Two groups of participants (\(N = 40\) Moroccan and \(N=40\) Spanish) completed a temporal-focus questionnaire that contained questions about past-focused (“PAST”) and future-focused (“FUTURE”) topics. In response to each question, participants provided a rating on a 5-point Likert scale on which lower scores indicated less agreement and higher scores indicated greater agreement. The authors then performed a mixed-design ANOVA with agreement score as the dependent variable, group (Moroccan or Spanish, between-subjects) as the fixed-effects factor, and temporal focus (past or future, within-subjects) as the random effects factor. In addition, the authors performed unpaired two-sample t-tests to determine whether there was a significant difference between the two groups in agreement scores for PAST questions, and whether there was a significant difference in scores for FUTURE questions.


Target outcomes:

Below is the specific result you will attempt to reproduce (quoted directly from the results section of Experiment 2):

According to a mixed analysis of variance (ANOVA) with group (Spanish vs. Moroccan) as a between-subjects factor and temporal focus (past vs. future) as a within-subjectS factor, temporal focus differed significantly between Spaniards and Moroccans, as indicated by a significant interaction of temporal focus and group, F(1, 78) = 19.12, p = .001, ηp2 = .20 (Fig. 2). Moroccans showed greater agreement with past-focused statements than Spaniards did, t(78) = 4.04, p = .001, and Spaniards showed greater agreement with future-focused statements than Moroccans did, t(78) = −3.32, p = .001. (de la Fuente et al., 2014, p. 1685).


Step 1: Load packages

library(tidyverse) # for data munging
library(knitr) # for kable table formating
library(haven) # import and export 'SPSS', 'Stata' and 'SAS' Files
library(readxl) # import excel files
library(lme4)

# #optional packages/functions:
library(afex) # anova functions
library(ez) # anova functions 2
library(emmeans)
# library(scales) # for plotting
# std.err <- function(x) sd(x)/sqrt(length(x)) # standard error

Step 2: Load data

# Just Experiment 2
data_path <- 'data/DeLaFuenteEtAl_2014_RawData.xls'
d <- read_excel(data_path, sheet=3)

Step 3: Tidy data

# Step 1 - Change item to number only
d = d %>% 
  rename("Agreement" = "Agreement (0=complete disagreement; 5=complete agreement)") 

# Step 2 - Break into two separate datasets
## Step 2a - Create dataset for Moroccan participants
d.moroccan = d %>% 
  filter(group == "Moroccan")

## Step 2b - Create dataset for young spainards 
d.spain = d %>% 
  filter(group == "young Spaniard")

# Step 3 - Clean each dataframe
## Step 3a - Clean Moroccan dataframe
d.moroccan.clean = d.moroccan %>% 
  group_by(participant, subscale) %>% 
  summarise(mean = mean(Agreement, na.rm = T))
d.moroccan.clean$group = "Moroccans"

## Step 3b - Clean spain dataframe
d.spain.clean = d.spain %>% 
  group_by(participant, subscale) %>% 
  summarise(mean = mean(Agreement, na.rm = T)) 
d.spain.clean$group = "Spainards"

# Step 4 - Change participant ID for spain dataframe
d.spain.clean$participant = d.spain.clean$participant + 40

# Step 5 - Combine Spain and Morrocan dataframes
d.new = rbind(d.moroccan.clean, d.spain.clean)

Step 4: Run analysis

Pre-processing

# Change variable names to be consistent with publication
d.new$statement_label = factor(d.new$subscale, 
         levels = c("PAST","FUTURE"),
         labels = c("Past-Focused Statements", "Future-Focused Statements"))

# Add group level label for graphing
d.new$group_label = factor(d.new$group, 
         levels = c("Spainards","Moroccans"),
         labels = c("Spainards","Moroccans"))

# Change variable names to be consistent with publication
d.new$statement_num = factor(d.new$subscale, 
         levels = c("PAST","FUTURE"),
         labels = c(0, 1))

# Add group as a factored numeric variable for analysis pusposes 
d.new$group_num = factor(d.new$group, 
         levels = c("Spainards","Moroccans"),
         labels = c(0, 1))

Descriptive statistics

Try to recreate Figure 2 (fig2.png, also included in the same folder as this Rmd file):

# Use ggplot to set up the data
ggplot(data = d.new, 
       aes(x = group_label,
           y = mean,
           group = statement_label,
           fill = statement_label)) +
  
  # Add bar plots
  stat_summary(fun = "mean",
               geom = "bar",
               position = position_dodge(width = 0.90), # Change position
               color = "black") +
  
  # add confidence inteveals
   stat_summary(fun.data = "mean_cl_boot",
               geom = "errorbar", # Make them errorbar format
               width = 0.2, # Change width
               position = position_dodge(width = 0.90)) +

  # change y-acis
  coord_cartesian(ylim = c(2,4)) +

  # Add black and white theme
  theme(panel.background = element_blank(), 
        legend.title = element_blank(),
          plot.background  = element_blank(),  #
          panel.grid = element_blank(),  
          axis.line = element_line(color = "black"),
        axis.title.x = element_blank()
        )+
  
  # Change colors
  scale_fill_manual(values = c("gray7", "lightgray")) +
  
  # Change name of y-axiz
  ylab("Rating")

Inferential statistics

According to a mixed analysis of variance (ANOVA) with group (Spanish vs. Moroccan) as a between-subjects factor and temporal focus (past vs. future) as a within-subjects factor, temporal focus differed significantly between Spaniards and Moroccans, as indicated by a significant interaction of temporal focus and group, F(1, 78) = 19.12, p = .001, ηp2 = .20 (Fig. 2).

# reproduce the above results here
# Use aov_ez to produce results
results = aov_ez(
  id = "participant",     # Group by participant id
  dv = "mean",     # dependent variable
  data = d.new,   # dataframe
  between = "group",   # between-subjects factor
  within = "subscale",    # within-subjects factor 
)

# Print results
print(results)
## Anova Table (Type 3 tests)
## 
## Response: mean
##           Effect    df  MSE         F  ges p.value
## 1          group 1, 76 0.20      2.19 .008    .143
## 2       subscale 1, 76 0.50   7.98 ** .070    .006
## 3 group:subscale 1, 76 0.50 18.35 *** .147   <.001
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '+' 0.1 ' ' 1

Moroccans showed greater agreement with past-focused statements than Spaniards did, t(78) = 4.04, p = .001,

# reproduce the above results here
# Use emmeans from results to obtain contrast effect (where subscale = PAST; bottom part of results)
pairs(emmeans(results, ~ group|subscale))
## subscale = FUTURE:
##  contrast              estimate    SE df t.ratio p.value
##  Moroccans - Spainards   -0.377 0.111 76  -3.390  0.0011
## 
## subscale = PAST:
##  contrast              estimate    SE df t.ratio p.value
##  Moroccans - Spainards    0.590 0.153 76   3.856  0.0002

and Spaniards showed greater agreement with future-focused statements than Moroccans did, t(78) = −3.32, p = .001.(de la Fuente et al., 2014, p. 1685)

# reproduce the above results here (where subscale = FUTURE; top part of results)
pairs(emmeans(results, ~ group|subscale))
## subscale = FUTURE:
##  contrast              estimate    SE df t.ratio p.value
##  Moroccans - Spainards   -0.377 0.111 76  -3.390  0.0011
## 
## subscale = PAST:
##  contrast              estimate    SE df t.ratio p.value
##  Moroccans - Spainards    0.590 0.153 76   3.856  0.0002

Step 5: Reflection

Were you able to reproduce the results you attempted to reproduce? If not, what part(s) were you unable to reproduce?

I was not able to reproduce the exact values, but, given the limitations of the raw data, I was not fully able to reproduce the results. Specefically, there were some missing participants from the raw data, which made the numbers difficult, if not impossible, to reproduce. As a result, the degrees of freedom were slightly off for a few calculations. However, it seems that some other researchers had the same issues, and a new file was uploaded to OSF.

How difficult was it to reproduce your results?

These results were quite difficult to reproduce. I suspect that the original authors either did not use R for their analysis or they did not use this datafile. It took quite some time to properly clean the file. I ended up just separating it into two files based on Morrocoan or Spanish participants, and then rejoining them once they were cleaned.

What aspects made it difficult? What aspects made it easy?

It was quite difficult since the data were incomplete. It was also difficult because of the way the original file had been organized (i.e., no unique IDs for participant, the “item” variable was unorganized. However, despite the cleaning that was required, reading the original file was quite intuitive, even without a code book. Still, it too me several tries to figure out how the data was structured.