For this exercise, please try to reproduce the results from Experiment 2 of the associated paper (de la Fuente, Santiago, Roman, Dumitrache, & Casasanto, 2014). The PDF of the paper is included in the same folder as this Rmd file.
Researchers tested the question of whether temporal focus differs between Moroccan and Spanish cultures, hypothesizing that Moroccans are more past-focused, whereas Spaniards are more future-focused. Two groups of participants (\(N = 40\) Moroccan and \(N=40\) Spanish) completed a temporal-focus questionnaire that contained questions about past-focused (“PAST”) and future-focused (“FUTURE”) topics. In response to each question, participants provided a rating on a 5-point Likert scale on which lower scores indicated less agreement and higher scores indicated greater agreement. The authors then performed a mixed-design ANOVA with agreement score as the dependent variable, group (Moroccan or Spanish, between-subjects) as the fixed-effects factor, and temporal focus (past or future, within-subjects) as the random effects factor. In addition, the authors performed unpaired two-sample t-tests to determine whether there was a significant difference between the two groups in agreement scores for PAST questions, and whether there was a significant difference in scores for FUTURE questions.
Below is the specific result you will attempt to reproduce (quoted directly from the results section of Experiment 2):
According to a mixed analysis of variance (ANOVA) with group (Spanish vs. Moroccan) as a between-subjects factor and temporal focus (past vs. future) as a within-subjectS factor, temporal focus differed significantly between Spaniards and Moroccans, as indicated by a significant interaction of temporal focus and group, F(1, 78) = 19.12, p = .001, ηp2 = .20 (Fig. 2). Moroccans showed greater agreement with past-focused statements than Spaniards did, t(78) = 4.04, p = .001, and Spaniards showed greater agreement with future-focused statements than Moroccans did, t(78) = −3.32, p = .001. (de la Fuente et al., 2014, p. 1685).
library(tidyverse) # for data munging
library(knitr) # for kable table formating
library(haven) # import and export 'SPSS', 'Stata' and 'SAS' Files
library(readxl) # import excel files
library(Rmisc)
library(purrr)
library(broom)
library(car)
library(psych)
library(rstatix)
# #optional packages/functions:
# library(afex) # anova functions
# library(ez) # anova functions 2
# library(scales) # for plotting
# std.err <- function(x) sd(x)/sqrt(length(x)) # standard error
# Just Experiment 2
data_path <- 'data/DeLaFuenteEtAl_2014_RawData_jbyun.xls'
d <- read_excel(data_path, sheet=3)
str(d)
## tibble [1,680 × 5] (S3: tbl_df/tbl/data.frame)
## $ group : chr [1:1680] "Moroccan" "Moroccan" "Moroccan" "Moroccan" ...
## $ participant : num [1:1680] 1 1 1 1 1 1 1 1 1 1 ...
## $ subscale : chr [1:1680] "PAST" "PAST" "PAST" "PAST" ...
## $ item : chr [1:1680] "1. Para mí son muy importantes las tradiciones y las antiguas costumbres" "2. Los jóvenes deben conservar las tradiciones" "3. Creo que las personas eran más felices hace unas décadas que en la actualidad" "4. La juventud de hoy en día necesita mantener los valores de sus padres y sus abuelos" ...
## $ Agreement (0=complete disagreement; 5=complete agreement): num [1:1680] 4 4 5 2 4 3 4 2 2 3 ...
colnames(d) <- c('group', 'id', 'subscale', 'item', 'agreement')
#d_past <- d %>%
# filter(subscale == "PAST")
#d_future <- d %>%
# filter(subscale == "FUTURE")
df <- pivot_wider(data = d, names_from = item, values_from = agreement)
#df_past <- pivot_wider(data = d_past, names_from = item, values_from = 'agreement')
#colnames(df_past) <- c('group', 'id', 'subscale', 'Q1', 'Q2', 'Q3', 'Q4', 'Q5', 'Q6', 'Q7', 'Q8', 'Q9', 'Q10', 'Q11')
#df_future <- pivot_wider(data = d_future, names_from = item, values_from = 'agreement')
#colnames(df_future) <- c('group', 'id', 'subscale', 'Q12', 'Q13', 'Q14', 'Q15', 'Q16', 'Q17', 'Q18', 'Q19', 'Q20', 'Q21')
colnames(df) <- c('group', 'id', 'subscale', 'Q1', 'Q2', 'Q3', 'Q4', 'Q5', 'Q6', 'Q7', 'Q8', 'Q9', 'Q10', 'Q11', 'Q12', 'Q13', 'Q14', 'Q15', 'Q16', 'Q17', 'Q18', 'Q19', 'Q20', 'Q21')
df <- df %>%
mutate(Moroccan = ifelse(group == "Moroccan", 1, 0)) %>%
mutate(Moroccan = as.factor(Moroccan)) %>%
mutate(Future = ifelse(subscale == "FUTURE", 1, 0)) %>%
mutate(Future = as.factor(Future)) %>%
mutate(group = ifelse(group == "young Spaniard", "Spaniard", "Moroccan")) %>%
mutate(group = as.factor(group)) %>%
mutate(subscale = as.factor(subscale)) %>%
mutate(id = as.factor(id))
# get average agreement score
df <- df %>%
mutate(avg_agreement = rowMeans(df[ , c('Q1', 'Q2', 'Q3', 'Q4', 'Q5', 'Q6', 'Q7', 'Q8', 'Q9', 'Q10', 'Q11', 'Q12', 'Q13', 'Q14', 'Q15', 'Q16', 'Q17', 'Q18', 'Q19', 'Q20', 'Q21')], na.rm = T))
col_order <- c('id', 'Moroccan', 'Future', 'avg_agreement', 'group', 'subscale', 'Q1', 'Q2', 'Q3', 'Q4', 'Q5', 'Q6', 'Q7', 'Q8', 'Q9', 'Q10', 'Q11', 'Q12', 'Q13', 'Q14', 'Q15', 'Q16', 'Q17', 'Q18', 'Q19', 'Q20', 'Q21')
df <- df[, col_order]
col_short <- c('id', 'Moroccan', 'Future', 'avg_agreement', 'group', 'subscale')
df_tidy <- df[, col_short]
Pre-processing was done while tidying up data.
df_summ <- summarySE(df_tidy, measurevar = 'avg_agreement', groupvars = c('group', 'subscale'), na.rm = TRUE)
df_summ
## group subscale N avg_agreement sd se ci
## 1 Moroccan FUTURE 40 3.120000 0.5561774 0.08793937 0.1778742
## 2 Moroccan PAST 40 3.293182 0.7311921 0.11561162 0.2338466
## 3 Spaniard FUTURE 40 3.492500 0.4257045 0.06730980 0.1361469
## 4 Spaniard PAST 40 2.675000 0.6473862 0.10236075 0.2070442
#kable(df_summ)
df_tidy %>%
dplyr::group_by(group, subscale) %>%
get_summary_stats(avg_agreement, type = "mean_sd")
## # A tibble: 4 × 6
## group subscale variable n mean sd
## <fct> <fct> <chr> <dbl> <dbl> <dbl>
## 1 Moroccan FUTURE avg_agreement 40 3.12 0.556
## 2 Moroccan PAST avg_agreement 40 3.29 0.731
## 3 Spaniard FUTURE avg_agreement 40 3.49 0.426
## 4 Spaniard PAST avg_agreement 40 2.68 0.647
Try to recreate Figure 2 (fig2.png, also included in the same folder as this Rmd file):
ggplot(data = df_summ, aes(y = avg_agreement, x = group, fill = subscale)) +
geom_bar(position = position_dodge(), stat = 'identity') +
geom_errorbar(aes(ymin = avg_agreement - se, ymax = avg_agreement + se),
width = .2, position = position_dodge(.9)) +
coord_cartesian(ylim = c(2.0, 4.0)) +
scale_y_continuous(breaks = seq(2.00, 4.00, 0.25)) +
theme(legend.direction = "vertical",
legend.background = element_rect(fill = "transparent"),
axis.line = element_line(),
panel.grid = element_blank(),
panel.background = element_blank(),
plot.title = element_text(hjust = 0.5)) +
labs(x = "Group", y = "Rating", fill = "Temporal Focus")
According to a mixed analysis of variance (ANOVA) with group (Spanish vs. Moroccan) as a between-subjects factor and temporal focus (past vs. future) as a within-subjects factor, temporal focus differed significantly between Spaniards and Moroccans, as indicated by a significant interaction of temporal focus and group, F(1, 78) = 19.12, p = .001, ηp2 = .20 (Fig. 2).
# two-way mixed design ANOVA using the r base function
aov_mix <- aov(avg_agreement ~ Moroccan*Future + Error(id/Future), data = df_tidy)
summary(aov_mix)
##
## Error: id
## Df Sum Sq Mean Sq F value Pr(>F)
## Residuals 39 8.549 0.2192
##
## Error: id:Future
## Df Sum Sq Mean Sq F value Pr(>F)
## Future 1 4.151 4.151 6.974 0.0118 *
## Residuals 39 23.215 0.595
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Error: Within
## Df Sum Sq Mean Sq F value Pr(>F)
## Moroccan 1 0.604 0.604 1.917 0.17
## Moroccan:Future 1 9.815 9.815 31.164 3.32e-07 ***
## Residuals 78 24.565 0.315
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# reproduce the above results here: two-way mixed ANOVA using rstatix package
mix_anova <- anova_test(data = df_tidy, dv = avg_agreement, wid = id, between = Moroccan, within = Future)
get_anova_table(mix_anova)
## ANOVA Table (type II tests)
##
## Effect DFn DFd F p p<.05 ges
## 1 Moroccan 1 78 2.881 9.40e-02 0.011
## 2 Future 1 78 8.098 6.00e-03 * 0.069
## 3 Moroccan:Future 1 78 19.145 3.71e-05 * 0.148
This test (Mixed ANOVA using rstatix package) gives me results that are more similar to those from original work.
Moroccans showed greater agreement with past-focused statements than Spaniards did, t(78) = 4.04, p = .001,
# reproduce the above results here
df_past <- df_tidy %>%
filter(Future == 0)
t1 <- t.test(avg_agreement ~ group, data = df_past, alternative = "two.sided", conf.level = 0.95)
t1
##
## Welch Two Sample t-test
##
## data: avg_agreement by group
## t = 4.0034, df = 76.872, p-value = 0.0001428
## alternative hypothesis: true difference in means between group Moroccan and group Spaniard is not equal to 0
## 95 percent confidence interval:
## 0.3106955 0.9256681
## sample estimates:
## mean in group Moroccan mean in group Spaniard
## 3.293182 2.675000
I would say the test statistic seems similar.
and Spaniards showed greater agreement with future-focused statements than Moroccans did, t(78) = −3.32, p = .001.(de la Fuente et al., 2014, p. 1685)
# reproduce the above results here
df_future <- df_tidy %>%
filter(Future == 1)
t2 <- t.test(avg_agreement ~ group, data = df_future, alternative = "two.sided", conf.level = 0.95)
t2
##
## Welch Two Sample t-test
##
## data: avg_agreement by group
## t = -3.3637, df = 73.02, p-value = 0.001228
## alternative hypothesis: true difference in means between group Moroccan and group Spaniard is not equal to 0
## 95 percent confidence interval:
## -0.5932088 -0.1517912
## sample estimates:
## mean in group Moroccan mean in group Spaniard
## 3.1200 3.4925
The test statistic seems similar here as well.
tab <- map_df(list(t1, t2), tidy)
tab <- tab %>% add_column("group" = c("Past", "Future"))
tab <- tab %>% select(c("group", "estimate", "estimate1", "estimate2", "statistic", "p.value", "conf.low", "conf.high", "alternative"))
kable(tab, caption = "t-test results (Mean agreement by group) for Past-focused statements and Future-focused statements")
| group | estimate | estimate1 | estimate2 | statistic | p.value | conf.low | conf.high | alternative |
|---|---|---|---|---|---|---|---|---|
| Past | 0.6181818 | 3.293182 | 2.6750 | 4.003398 | 0.0001428 | 0.3106955 | 0.9256681 | two.sided |
| Future | -0.3725000 | 3.120000 | 3.4925 | -3.363653 | 0.0012279 | -0.5932088 | -0.1517912 | two.sided |
Were you able to reproduce the results you attempted to reproduce? If not, what part(s) were you unable to reproduce?
For the most part, I was able to get similar numbers. However, I could not exactly reproducte the results I attempted to reproduce. I used two different packages to run the mixed design ANOVA and got different results in terms of test statistics. Only one of the results seemed similar to the original results.
How difficult was it to reproduce your results?
I was quite difficult. The dataset was not in the tidy format, so I had to tidy up data first. There were also errors in the dataset, which made me to make some decisions.
What aspects made it difficult? What aspects made it easy?
First, there were errors in the data (with participant 24.) I was not sure what happened but it seemed like a coding error. So instead of dropping the observations with error I modified the data (participant number to be exact), which might have caused discrepancies between my results and the original authors’. Second, the dataset was not in a tidy format so it required some time to clean it up. What made it really difficult for me to reproduce the results was the lack of clarity in terms of the used statistical models. I had to speculate what had been done to proceed. Having a codebook made it easier for me to understand data. Also, having a long format data was helpful because it was easy to read and understand data structure.