For this exercise, please try to reproduce the results from Experiment 2 of the associated paper (de la Fuente, Santiago, Roman, Dumitrache, & Casasanto, 2014). The PDF of the paper is included in the same folder as this Rmd file.
Researchers tested the question of whether temporal focus differs between Moroccan and Spanish cultures, hypothesizing that Moroccans are more past-focused, whereas Spaniards are more future-focused. Two groups of participants (\(N = 40\) Moroccan and \(N=40\) Spanish) completed a temporal-focus questionnaire that contained questions about past-focused (“PAST”) and future-focused (“FUTURE”) topics. In response to each question, participants provided a rating on a 5-point Likert scale on which lower scores indicated less agreement and higher scores indicated greater agreement. The authors then performed a mixed-design ANOVA with agreement score as the dependent variable, group (Moroccan or Spanish, between-subjects) as the fixed-effects factor, and temporal focus (past or future, within-subjects) as the random effects factor. In addition, the authors performed unpaired two-sample t-tests to determine whether there was a significant difference between the two groups in agreement scores for PAST questions, and whether there was a significant difference in scores for FUTURE questions.
Below is the specific result you will attempt to reproduce (quoted directly from the results section of Experiment 2):
According to a mixed analysis of variance (ANOVA) with group (Spanish vs. Moroccan) as a between-subjects factor and temporal focus (past vs. future) as a within-subjectS factor, temporal focus differed significantly between Spaniards and Moroccans, as indicated by a significant interaction of temporal focus and group, F(1, 78) = 19.12, p = .001, ηp2 = .20 (Fig. 2). Moroccans showed greater agreement with past-focused statements than Spaniards did, t(78) = 4.04, p = .001, and Spaniards showed greater agreement with future-focused statements than Moroccans did, t(78) = −3.32, p = .001. (de la Fuente et al., 2014, p. 1685).
library(tidyverse) # for data munging
library(knitr) # for kable table formating
library(haven) # import and export 'SPSS', 'Stata' and 'SAS' Files
library(readxl) # import excel files
library(lme4)
# #optional packages/functions:
library(afex) # anova functions
library(ez) # anova functions 2
library(emmeans)
# library(scales) # for plotting
# std.err <- function(x) sd(x)/sqrt(length(x)) # standard error
# Just Experiment 2
data_path <- 'data/DeLaFuenteEtAl_2014_RawData.xls'
d <- read_excel(data_path, sheet=3)
# Step 1 - Change item to number only
d = d %>%
rename("Agreement" = "Agreement (0=complete disagreement; 5=complete agreement)")
# Step 2 - Break into two separate datasets
## Step 2a - Create dataset for Moroccan participants
d.moroccan = d %>%
filter(group == "Moroccan")
## Step 2b - Create dataset for young spainards
d.spain = d %>%
filter(group == "young Spaniard")
# Step 3 - Clean each dataframe
## Step 3a - Clean Moroccan dataframe
d.moroccan.clean = d.moroccan %>%
group_by(participant, subscale) %>%
summarise(mean = mean(Agreement, na.rm = T))
d.moroccan.clean$group = "Moroccans"
## Step 3b - Clean spain dataframe
d.spain.clean = d.spain %>%
group_by(participant, subscale) %>%
summarise(mean = mean(Agreement, na.rm = T))
d.spain.clean$group = "Spainards"
# Step 4 - Change participant ID for spain dataframe
d.spain.clean$participant = d.spain.clean$participant + 40
# Step 5 - Combine Spain and Morrocan dataframes
d.new = rbind(d.moroccan.clean, d.spain.clean)
# Change variable names to be consistent with publication
d.new$statement_label = factor(d.new$subscale,
levels = c("PAST","FUTURE"),
labels = c("Past-Focused Statements", "Future-Focused Statements"))
# Add group level label for graphing
d.new$group_label = factor(d.new$group,
levels = c("Spainards","Moroccans"),
labels = c("Spainards","Moroccans"))
# Change variable names to be consistent with publication
d.new$statement_num = factor(d.new$subscale,
levels = c("PAST","FUTURE"),
labels = c(0, 1))
# Add group as a factored numeric variable for analysis pusposes
d.new$group_num = factor(d.new$group,
levels = c("Spainards","Moroccans"),
labels = c(0, 1))
Try to recreate Figure 2 (fig2.png, also included in the same folder as this Rmd file):
# Use ggplot to set up the data
ggplot(data = d.new,
aes(x = group_label,
y = mean,
group = statement_label,
fill = statement_label)) +
# Add bar plots
stat_summary(fun = "mean",
geom = "bar",
position = position_dodge(width = 0.90), # Change position
color = "black") +
# add confidence inteveals
stat_summary(fun.data = "mean_cl_boot",
geom = "errorbar", # Make them errorbar format
width = 0.2, # Change width
position = position_dodge(width = 0.90)) +
# change y-acis
coord_cartesian(ylim = c(2,4)) +
# Add black and white theme
theme(panel.background = element_blank(),
legend.title = element_blank(),
plot.background = element_blank(), #
panel.grid = element_blank(),
axis.line = element_line(color = "black"),
axis.title.x = element_blank()
)+
# Change colors
scale_fill_manual(values = c("gray7", "lightgray")) +
# Change name of y-axiz
ylab("Rating")
According to a mixed analysis of variance (ANOVA) with group (Spanish vs. Moroccan) as a between-subjects factor and temporal focus (past vs. future) as a within-subjects factor, temporal focus differed significantly between Spaniards and Moroccans, as indicated by a significant interaction of temporal focus and group, F(1, 78) = 19.12, p = .001, ηp2 = .20 (Fig. 2).
# reproduce the above results here
# Use aov_ez to produce results
results = aov_ez(
id = "participant", # Group by participant id
dv = "mean", # dependent variable
data = d.new, # dataframe
between = "group", # between-subjects factor
within = "subscale", # within-subjects factor
)
# Print results
print(results)
## Anova Table (Type 3 tests)
##
## Response: mean
## Effect df MSE F ges p.value
## 1 group 1, 76 0.20 2.19 .008 .143
## 2 subscale 1, 76 0.50 7.98 ** .070 .006
## 3 group:subscale 1, 76 0.50 18.35 *** .147 <.001
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '+' 0.1 ' ' 1
Moroccans showed greater agreement with past-focused statements than Spaniards did, t(78) = 4.04, p = .001,
# reproduce the above results here
# Use emmeans from results to obtain contrast effect (where subscale = PAST; bottom part of results)
pairs(emmeans(results, ~ group|subscale))
## subscale = FUTURE:
## contrast estimate SE df t.ratio p.value
## Moroccans - Spainards -0.377 0.111 76 -3.390 0.0011
##
## subscale = PAST:
## contrast estimate SE df t.ratio p.value
## Moroccans - Spainards 0.590 0.153 76 3.856 0.0002
and Spaniards showed greater agreement with future-focused statements than Moroccans did, t(78) = −3.32, p = .001.(de la Fuente et al., 2014, p. 1685)
# reproduce the above results here (where subscale = FUTURE; top part of results)
pairs(emmeans(results, ~ group|subscale))
## subscale = FUTURE:
## contrast estimate SE df t.ratio p.value
## Moroccans - Spainards -0.377 0.111 76 -3.390 0.0011
##
## subscale = PAST:
## contrast estimate SE df t.ratio p.value
## Moroccans - Spainards 0.590 0.153 76 3.856 0.0002
Were you able to reproduce the results you attempted to reproduce? If not, what part(s) were you unable to reproduce?
I was not able to reproduce the exact values, but, given the limitations of the raw data, I was not fully able to reproduce the results. Specefically, there were some missing participants from the raw data, which made the numbers difficult, if not impossible, to reproduce. As a result, the degrees of freedom were slightly off for a few calculations. However, it seems that some other researchers had the same issues, and a new file was uploaded to OSF.
How difficult was it to reproduce your results?
These results were quite difficult to reproduce. I suspect that the original authors either did not use R for their analysis or they did not use this datafile. It took quite some time to properly clean the file. I ended up just separating it into two files based on Morrocoan or Spanish participants, and then rejoining them once they were cleaned.
What aspects made it difficult? What aspects made it easy?
It was quite difficult since the data were incomplete. It was also difficult because of the way the original file had been organized (i.e., no unique IDs for participant, the “item” variable was unorganized. However, despite the cleaning that was required, reading the original file was quite intuitive, even without a code book. Still, it too me several tries to figure out how the data was structured.