Replication of Study ‘What makes words special? Words as unmotivated cues’ (2015, Cognition)

Author

Replication Author[s]: Jenna Brooks(j8brooks@ucsd.edu)

Published

April 8, 2025

Resources

Introduction

This study aimed to explore whether verbal labels, such as the words “dog” or “guitar,” activate conceptual knowledge more effectively than environmental sounds associated with these objects, such as the bark of a dog or the strum of a guitar. Previous studies show that environmental sounds, like a dog bark, activate specific instances of a concept, while words, like “dog,” cue broader, more abstract mental representations (Lupyan & Thompson-Schill, 2012). This study investigates this “label advantage” and tests whether verbal labels promote faster, more abstract mental processing compared to environmental sounds in an image recognition task.

This study finds that verbal labels (or words) are more effective than sounds in activating abstract category concepts because labels act as “unmotivated cues,” broadly representing a category without specific reference to particular instances. In this experiment, participants will be presented with either a word or environmental sound for the following categories: bird, dog, drum, guitar, motorcycle, and phone. Participants are presented with an auditory cue (either a word or a sound), followed by a picture displayed 1 second later, and are tested on how quickly and accurately they can determine whether the picture matches the auditory cue, with reaction times serving as the primary measure. This research seeks to deepen our understanding of how verbal labels enhance abstract categorization compared to environmental sounds by examining their influence on reaction time speed in an image recognition task.

Methods

Power Analysis

Based on guidance from instructional staff, the sample size was determined with an a priori power analysis with the package simr, and is adequate to achieve at least 80% power for detecting the effect reported in the original study at a significance criterion of alpha = .05 (any random effects not specified in the original paper were taken from a small pilot study).

Planned Sample

We plan to have a sample size of n = 50. For pre-screening, participants must speak English fluently and not have any hearing difficulties to ensure comprehension in the task. Participants are recruited and compensated on the Prolific online platform.

Materials

The materials were followed precisely as follows. All materials were provided by the original authors.

“The auditory cues comprised basic-level category labels and environmental sounds for six categories: bird, dog, drum, guitar, motorcycle, and phone. For each category, we obtained two distinct environmental sound cues, e.g., , , and two separate images for each subordinate cate- gory, e.g., two electric guitars for , two acoustic guitars for . To control for cue variability, we also used two versions of each spoken category label: one pronounced by a female speaker, one by a male speaker. All auditory cues were equated in duration (600 ms.) and normalized in volume. The images were color photographs (four images per category). The materials, obtained from online repositories, are available for download at http://sapir.psych.wisc.edu/stimuli/ MotivatedCuesExp1A-1B.zip”(Edmiston & Lupyan, 2015, p.94).

The link to our online experiment can be found here

Procedure

This procedure will be followed as closely as possible:

“On each trial participants heard a cue and saw a picture. We instructed participants to decide as quickly and accurately as possible if the picture they saw came from the same basic-level cate- gory as the word or sound they heard Participants were tested in individual rooms sitting approximately 2400 from a monitor such that images subtended 10 10°. Trials began with a 250 ms. fixation cross followed immediately by the auditory cue, delivered via headphones. The target image appeared centrally 1 s after the off- set of the auditory cue and remained visible until a response was made. Each participant completed 6 practice and 384 test trials. If the picture matched the auditory cue (50% of trials) participants were instructed to respond ‘Yes’ on a gaming controller (e.g., or ‘‘phone’’ followed by a picture of any phone). Otherwise, they were to press ‘No’ (e.g., or ‘‘phone’’ followed by a dog). All factors (cue type, congruence) var- ied randomly within subjects. Auditory feedback (buzz or bleep) was given after each trial”(Edmiston & Lupyan, 2015, p. 94).

We aim to follow the original procedure of Experiment 1A as precisely as possible. However, instead of running the trials in-person, the experiment will be conducted online using jsPsych. For this reason, the task will be slightly different such that participants will respond to trials using their keyboard keys instead of a gaming controller. We will also encourage participants to wear headphones, be in a quiet area for the auditory cues during the experiment, and provide an initial audio check to ensure that participants have access to all stimuli presented throughout the experiment. In addition, we did not have access to the exact instructions and prompts that the original researchers used, so these may differ slightly.

Analysis Plan

Pre-Registration

Our Pre-Registration can be found here

From the original study:

“All participants performed very accurately on all items (M = 97%). Response times (RTs) shorter than 250 ms. or longer than 1500 ms. were removed (292 trials removed, 1.77% of total).

We fit RTs for correct responses on matching trials (‘Yes’ responses) with linear mixed regression using maximum likelihood estimation (Bates, Maechler, Bolker, & Walker, 2013), including random intercepts and random slopes for within-subject factors and random intercepts for repeated items (unique trial types) following the recommendations of Barr, Levy, Scheepers, and Tily (2013). Reported below are the parameter estimates (b) and confidence intervals for each contrast of interest. Significance tests were calculated using chi-square tests that compared nested models—models with and without the factor of interest—on improvement in log-likelihood”(Edmiston & Lupyan, 2015, p.94).

For our replication, we will also remove trials where response times are shorter than 250 ms or longer than 1500ms from the analysis. We will then run a similar linear mixed regression model, as was performed the original study. We will also use chi-square tests to assess significance of the results. We anticipate that a successful replication of the original study will yield similar effects. More specifically, we hypothesize to find that verbal labels elicit the lowest overall reaction times, and that congruent sounds elicit lower reaction times than incongruent sounds.

Differences from Original Study

This replication will differ from the original study in both setting and procedure. Instead of being conducted in a lab, the experiments will take place online using Prolific. Variations in participants’ device performance and internet speed may influence reaction times. Additionally, participants will respond to trials using a keyboard rather than the gaming controller used previously. Despite these changes, we expect these differences will have minimal impact on the results or the ability to replicate the effect reported in the original study.

Methods Addendum (Post Data Collection)

Actual Sample

We collected data from 50 participants as planned, with 42 passing the data quality check (accuracy rate above 90%). Refer to the Results section for an overview of performance and the code for data quality assurance. We ensured that all participants were adult English speakers without hearing difficulties using Prolific screening criteria.

Differences from pre-data collection methods plan

None

Design Overview

Within-subjects design with 2 factors: Auditory Cue (Environmental Sound vs. Label) and Match to Basic Category (Match vs. No Match). Congruency is further manipulated within the Matching Environmental Sound condition.

Reaction time was the only measure taken.

It uses within-participants design where everyone does each condition.

Measures were repeated for 384 experimental trials.

They didn’t take any measures to reduce demand characteristics.

To improve the experiment attention checks could be added. Given that incorrect responses could be labeled incongruent, incorrect responses could have also been included in the data. This sample could not be representative of all populations because it only studied undergraduates and WEIRD populations.

Results

Data preparation

We are replicating Experiment 1A from the paper. The data cleaning process begins by loading necessary libraries and importing multiple CSV files into a combined dataframe. Practice trials are excluded to focus on actual experimental data. A data exclusion step ensures that only trials where responses match the correct responses are retained, and participant accuracy is calculated. Participants with an accuracy below 90% are excluded to maintain high data quality. Non-relevant responses are filtered out, and a congruency column is created to classify trials as congruent or incongruent. Reaction times (RTs) shorter than 250 ms or longer than 1500 ms are removed, and the count of trials before and after filtering is compared to verify exclusions. Finally, the dataset is refined to include only relevant columns (rt, ID, sound_category, cue, congruency) for analysis, ensuring it is clean and ready for further processing.

### Data Preparation

#### Load Relevant Libraries and Functions
library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.5
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
✔ purrr     1.0.2     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(readr)
library(lme4)

Loading required package: Matrix

Attaching package: 'Matrix'

The following objects are masked from 'package:tidyr':

    expand, pack, unpack

#### Import data
#replace this with your own path (for now)
folder_path <- "../data/complete"
csv_files <- list.files(folder_path, pattern = "*.csv", full.names = TRUE)
df_list <- lapply(csv_files, read_csv, show_col_types = FALSE)
df <- bind_rows(df_list)

#exclude practice trials 
combined_df <- df %>%
  filter(exp_part == "actual")

# Data exclusion / filtering
combined_df <- combined_df %>%
  mutate(correct_response = as.character(correct_response)) %>%
  mutate(response = as.character(response)) %>%
  mutate(correct = correct_response == response)

accuracy_table <- combined_df %>%
  group_by(ID) %>%
  summarise(accuracy = mean(correct, na.rm = TRUE))
accuracy_table

# A tibble: 50 × 2
   ID                accuracy
   <chr>                <dbl>
 1 pilotb_0avgjvg9fq    0.909
 2 pilotb_1ek0dtzsue    0.982
 3 pilotb_1jvj8tk7kj    0.984
 4 pilotb_2hxtegq7gk    0.982
 5 pilotb_2xj1owf03n    0.961
 6 pilotb_3517eh85oh    0.979
 7 pilotb_42vlt395x7    0.974
 8 pilotb_5sqn8xrq7f    0.961
 9 pilotb_5uav4k1slc    0.844
10 pilotb_6fr0jrn133    0.961
# ℹ 40 more rows

# QA
accuracy_thresh <- 0.9
# select people who passed QA
id_pass_qa <- accuracy_table %>%
  filter(accuracy > accuracy_thresh) %>%
  pull(ID)
  
# filter combined_df
combined_df <- combined_df %>%
  filter(ID %in% id_pass_qa) %>%
  select(-correct) # drop the correctness column

#check whether response is correct
combined_df <- combined_df %>%
  filter(response == "f") %>%
  filter(correct_response == "f")

#rename "sound_subtype" to "cue"
combined_df <- combined_df %>%
  rename(cue = sound_subtype)

#create congruency column 
combined_df <- combined_df %>%
  mutate(congruency = case_when(
    cue == "label" ~ "label",
    img_subtype %in% c("song", "york", "bongo", "acoustic", "harley", "rotary") & sound_version == "A" ~ "incongruent",
    TRUE ~ "congruent"
  ))

#calculate count before filtering
initial_count <- nrow(combined_df)
  
  
#filter reaction time 
combined_df <- combined_df %>%
  filter(rt >250, rt <1500)

#calculate count after filtering
final_count <- nrow(combined_df)

#compare
print(initial_count)

[1] 7732

print(final_count)

[1] 7328

# Prepare data for analysis - create columns etc.
combined_df <- combined_df %>%
  select(rt, ID, sound_category,cue, congruency)

#show the prepossessed data
tail(combined_df)

# A tibble: 6 × 5
     rt ID                sound_category cue   congruency 
  <dbl> <chr>             <chr>          <chr> <chr>      
1   585 pilotb_zbcngypoxh drum           label label      
2   451 pilotb_zbcngypoxh guitar         label label      
3   446 pilotb_zbcngypoxh motor          sound congruent  
4   440 pilotb_zbcngypoxh bird           sound congruent  
5   427 pilotb_zbcngypoxh motor          sound congruent  
6   412 pilotb_zbcngypoxh dog            sound incongruent

More stats on people performance (i.e., how many people pass the quality check).

pass_thresh_rate <- accuracy_table %>%
  mutate(pass=accuracy > accuracy_thresh) %>%
  summarise(mean=mean(pass))

pass_thresh_rate

# A tibble: 1 × 1
   mean
  <dbl>
1  0.84

Confirmatory analysis

This analysis aims to determine whether (1) verbal labels lead to faster response times compared to sound labels, and (2) congruent sounds produce faster responses than incongruent sounds, in line with the findings of Experiment 1A from the original study.

As outlined in our analysis plan, we adopt the approach used by Edmiston & Lupyan (2015), modeling reaction times for “Yes” responses in matching trials across three cue conditions (verbal label, congruent sound, incongruent sound) using a linear mixed-effects regression model. The model includes random intercepts and random slopes for within-subject factors (i.e., participants) and random intercepts for repeated items (e.g., the specific stimuli such as “bird,” “guitar,” or “dog”). The primary fixed effect of interest in the model is the “condition” variable, which categorizes each trial as presenting a label, a congruent sound, or an incongruent sound.

To evaluate the significance of the condition effect, we conduct chi-square tests to compare nested models—with and without the condition variable—based on improvements in log-likelihood (as described on p. 94 of the original study). This analysis aims to test the central hypothesis that condition modulates reaction times.

The parameter estimates derived from the model provide insight into the magnitude and direction of the effect of each condition on reaction time. Additionally, the model facilitates direct comparisons between conditions, enabling us to assess the relative effects of ‘label,’ ‘congruent sound,’ and ‘incongruent sound’ conditions. Pairwise comparisons are also performed to specifically examine the relationship between ‘label’ and ‘incongruent’ trials, a contrast not directly represented in the baseline model where the reference level is set to one of the conditions.

We anticipate that the model outputs will align closely with the results reported by Edmiston & Lupyan (2015). More specifically, incongruent environmental sounds will elicit longer response times in comparison to congruent environmental sounds. Verbal labels will elicit the shortest response time.

Side-by-side graph with original graph is ideal here ## Our Graph

#load libraries: 
library(lmerTest)


Attaching package: 'lmerTest'

The following object is masked from 'package:lme4':

    lmer

The following object is masked from 'package:stats':

    step

library(emmeans)

Welcome to emmeans.
Caution: You lose important information if you filter this package's results.
See '? untidy'

library(lme4)
library(ggplot2)

####visualization####

# Reorder the 'congruency' factor levels so that "Label" comes first
combined_df$congruency <- factor(combined_df$congruency, levels = c("label", "congruent", "incongruent"))

#boxpplot (Fig. 2 of the original paper)
ggplot(data = combined_df,
       mapping = aes(x = congruency,
                     y = rt,
                     color = congruency, 
                     fill = congruency)) +
  geom_bar(stat = "summary", fun = "mean", 
           width = 0.4, 
           color = "black") + 
  stat_summary(fun.data = "mean_cl_boot", 
               geom = "linerange", 
               color = "black") +
   geom_errorbar(stat = "summary", 
                fun.data = "mean_cl_boot", 
                width = 0.2, 
                color = "black") +
  scale_fill_manual(values = c("darkred", "darkblue", "darkgreen")) +
  scale_color_manual(values = c("darkred", "darkblue", "darkgreen")) +
  labs(
    x = "Cue Type",
    y = "Reaction Time (Ms)",
    color = "Cue Type",
    fill = "Cue Type"
  ) +
  scale_y_continuous(expand = c(0, 0)) +
  theme_classic() +
  theme(
  axis.text.x = element_blank(), 
  axis.text.y = element_text(size = 14, color = "black"),
  axis.title.x = element_text(size = 16, color = "black", face = "bold"), 
  axis.title.y = element_text(size = 16, color = "black", face = "bold"), 
  legend.title = element_text(size = 14, color = "black"), 
  legend.text = element_text(size = 14, color = "black") 
)

######statistical analysis########

#find best optimizer
#allFit(model_full)

# set reference level back to congruent
combined_df$congruency <- relevel(combined_df$congruency, ref = "congruent")

#set up model and view results
model_full <- lmer(rt ~ congruency + (1 + congruency|ID) + (1|sound_category), data = combined_df, control = lmerControl(optimizer = "bobyqa"))

summary(model_full)

Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: rt ~ congruency + (1 + congruency | ID) + (1 | sound_category)
   Data: combined_df
Control: lmerControl(optimizer = "bobyqa")

REML criterion at convergence: 99407.8

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-2.3748 -0.6827 -0.1974  0.4509  4.2050 

Random effects:
 Groups         Name                  Variance Std.Dev. Corr       
 ID             (Intercept)           18945.80 137.644             
                congruencylabel        1768.01  42.048  -0.46      
                congruencyincongruent   822.77  28.684  -0.10  0.30
 sound_category (Intercept)              16.17   4.021             
 Residual                             44351.37 210.598             
Number of obs: 7328, groups:  ID, 42; sound_category, 6

Fixed effects:
                      Estimate Std. Error      df t value Pr(>|t|)    
(Intercept)            690.928     21.686  41.062  31.860  < 2e-16 ***
congruencylabel        -52.267      8.390  39.660  -6.229 2.33e-07 ***
congruencyincongruent   15.047      9.365  41.783   1.607    0.116    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
            (Intr) cngrncyl
congrncylbl -0.439         
cngrncyncng -0.126  0.317

#get 95% CI
confint.merMod(model_full,method="Wald")

                           2.5 %    97.5 %
.sig01                        NA        NA
.sig02                        NA        NA
.sig03                        NA        NA
.sig04                        NA        NA
.sig05                        NA        NA
.sig06                        NA        NA
.sig07                        NA        NA
.sigma                        NA        NA
(Intercept)           648.424554 733.43232
congruencylabel       -68.712168 -35.82266
congruencyincongruent  -3.307674  33.40179

# #post-hoc test: allows you to examine relationship between incongruent and label
#####. 95% CI for this??? ###3
contrast_results <- model_full %>%
  emmeans(pairwise ~ congruency,
          #adjusts p values so that it is more difficult to get a significance (correction method)
          adjust = "bonferroni", 
          conf.int = TRUE) %>%
  pluck("contrasts")

Note: D.f. calculations have been disabled because the number of observations exceeds 3000.
To enable adjustments, add the argument 'pbkrtest.limit = 7328' (or larger)
[or, globally, 'set emm_options(pbkrtest.limit = 7328)' or larger];
but be warned that this may result in large computation time and memory use.

Note: D.f. calculations have been disabled because the number of observations exceeds 3000.
To enable adjustments, add the argument 'lmerTest.limit = 7328' (or larger)
[or, globally, 'set emm_options(lmerTest.limit = 7328)' or larger];
but be warned that this may result in large computation time and memory use.

#I was having trouble getting 95% CI for the emmeans output, I looked it up and found someone's code that looked like this
confint(contrast_results, calc = c(n = ~.wgt.))

Warning in summary.emmGrid(object, infer = c(TRUE, FALSE), level = level, : The
column 'n' could not be calculated, so it is omitted

 contrast                estimate    SE  df asymp.LCL asymp.UCL
 congruent - label           52.3  8.39 Inf      32.2     72.35
 congruent - incongruent    -15.0  9.36 Inf     -37.5      7.37
 label - incongruent        -67.3 10.40 Inf     -92.2    -42.41

Degrees-of-freedom method: asymptotic 
Confidence level used: 0.95 
Conf-level adjustment: bonferroni method for 3 estimates

#chi-squared significance test
model_reduced <- lmer(rt ~ (1 + congruency|ID) + (1|sound_category), data = combined_df)
anova(model_full, model_reduced)

refitting model(s) with ML (instead of REML)

Data: combined_df
Models:
model_reduced: rt ~ (1 + congruency | ID) + (1 | sound_category)
model_full: rt ~ congruency + (1 + congruency | ID) + (1 | sound_category)
              npar   AIC   BIC logLik deviance  Chisq Df Pr(>Chisq)    
model_reduced    9 99481 99543 -49731    99463                         
model_full      11 99450 99526 -49714    99428 34.967  2  2.553e-08 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The results of the statistical analysis showed us that labels elicited faster reaction times than congruent trials (b = -52.3, CI [-68.71, -35.82]) and that congruent trials were not significantly faster than incongruent trials (b = 15.04, CI [-3.31 33.4]. Furthermore, the results of a pairwise contrast shows us that labels elicited faster reaction times than incongruent trials (b = -67.3, CI [-94.2, -42.21]).

Graph Comparison

library(gridExtra)


Attaching package: 'gridExtra'

The following object is masked from 'package:dplyr':

    combine

library(jpeg)
library(grid)
library(dplyr)
library(forcats)
library(ggplot2)


# rename and reorder some columns for plotting
plot_df <- combined_df |>
  mutate(congruency=factor(
    congruency, levels = c("label", "congruent", "incongruent"))) |>
  mutate(congruency = fct_recode(congruency, 
    "Label"="label", 
    "Congruent Sound"="congruent", 
    "Incongruent Sound"="incongruent")
  )

# plot the result of our replication
our_plot <- ggplot(data = plot_df,
       mapping = aes(x = congruency,
                     y = rt,
                     color = congruency, 
                     fill = congruency)) +
  geom_bar(stat = "summary", fun = "mean", 
           width = 1,
           color = "black") + 
  stat_summary(fun.data = "mean_cl_boot", 
               geom = "linerange", 
               color = "black") +
   geom_errorbar(stat = "summary", 
                fun.data = "mean_cl_boot", 
                width = 0.5, 
                size=0.5,
                color = "black") +
  scale_fill_manual(values = c("firebrick", "darkblue", "slategray2")) +
  scale_color_manual(values = c("firebrick", "darkblue", "slategray2")) +
  labs(
    x = "Experiment 1A",
    y = "Verification Speed (ms)",
    color = "Cue Type",
    fill = "Cue Type") +
  coord_cartesian(xlim=c(0, 4), ylim = c(400, 725), expand=FALSE) +
  scale_y_continuous(breaks = seq(400, 700, by = 50), expand=c(0, 25)) +
  theme_classic() +
  theme(
    axis.text.x = element_blank(), 
    axis.text.y = element_text(size = 10, color = "black"),
    axis.title.x = element_text(size = 10, color = "black"), 
    axis.title.y = element_text(size = 10, color = "black"), 
    legend.title = element_text(size = 10, color = "black", face='bold'), 
    legend.text = element_text(size = 10, color = "black"),
    plot.margin = unit(c(0.0, 0.1, 0.0, 0.1), "npc")
  ) +
  theme(
    aspect.ratio = 1.8)

Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.

# load the plot from the original study
edmiston_img <- rasterGrob(readJPEG("edmiston_exp1a.jpg"), interpolate = TRUE)

# display side by side
plot_title <- textGrob("Our Result", gp = gpar(fontface = "bold", fontsize = 14))
image_title <- textGrob("Original Study Result", gp = gpar(fontface = "bold", fontsize = 14))

grid.arrange(
  arrangeGrob(image_title, edmiston_img, ncol = 1, heights = c(0.5, 5)),
  arrangeGrob(plot_title, ggplotGrob(our_plot), ncol = 1, heights = c(0.5, 5)),
  ncol = 2, widths=c(1, 1.7))

The graph directly above (refer to Experiment 1A) from the original paper shows a significant difference between label and congruent conditions, as well as a significant difference between congruent sounds and incongruent sounds. This differs from our graph (on the right) because our study only shows a difference between label and congruent, but no statistically significant difference between the congruent and incongruent conditions.

Exploratory analyses

The results of the original data were presented using a bar graph, however, we were curious about the distribution of the data, especially since our results were slightly different. To investigate, we included a box plot that shows the individual data points. We expand a bit on the results shown in the original paper by re-running the same analysis, but instead of filtering out ‘No’ responses, we filter out ‘Yes’ responses.

#box plot
combined_df$congruency <- factor(combined_df$congruency, levels = c("label", "congruent", "incongruent"))
ggplot(data = combined_df,
       mapping = aes(x = congruency,
                     y = rt,
                     color = congruency)) +
  geom_boxplot(width = 0.6, outlier.shape = NA) + 
  geom_jitter(width = 0.2, alpha = 0.1, size = 1.5) +
  scale_color_manual(values = c("darkred", "darkblue", "darkgreen")) +
  labs(
    y = "Reaction Time (Ms)",
    color = "Cue Type",
    x = "Cue Type",
    fill = "Cue Type "
  ) +
  theme_classic() +
  theme(
  axis.text.x = element_blank(), 
  axis.text.y = element_text(size = 14, color = "black"),
  axis.title.x = element_text(size = 16, color = "black", face = "bold"), 
  axis.title.y = element_text(size = 16, color = "black", face = "bold"), 
  legend.title = element_text(size = 14, color = "black"), 
  legend.text = element_text(size = 14, color = "black") 
)

#re-run statistics on "No" data
df_n <- df %>%
  filter(exp_part == "actual") %>%
  filter(response == "f") %>%
  filter(correct_response == "f") %>%
  rename(cue = sound_subtype) %>%
  mutate(congruency = case_when(
    cue == "label" ~ "label",
    img_subtype %in% c("song", "york", "bongo", "acoustic", "harley", "rotary") & sound_version == "A" ~ "incongruent",
    TRUE ~ "congruent"
  )) %>%
  filter(rt >250, rt <1500)

  
model_n <- lmer(rt ~ congruency + (1 + congruency|ID) + (1|sound_category), data = df_n, control = lmerControl(optimizer = "bobyqa"))
summary(model_n)

Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: rt ~ congruency + (1 + congruency | ID) + (1 | sound_category)
   Data: df_n
Control: lmerControl(optimizer = "bobyqa")

REML criterion at convergence: 109825.9

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-2.4104 -0.6811 -0.1950  0.4574  4.1140 

Random effects:
 Groups         Name                  Variance Std.Dev. Corr       
 ID             (Intercept)           20633.8  143.645             
                congruencyincongruent   827.9   28.772  -0.02      
                congruencylabel        1864.1   43.175  -0.34  0.27
 sound_category (Intercept)              16.6    4.075             
 Residual                             46409.2  215.428             
Number of obs: 8068, groups:  ID, 48; sound_category, 6

Fixed effects:
                      Estimate Std. Error      df t value Pr(>|t|)    
(Intercept)            708.103     21.185  46.671  33.425  < 2e-16 ***
congruencyincongruent   12.297      9.124  44.482   1.348    0.185    
congruencylabel        -51.721      8.131  45.375  -6.361 8.77e-08 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
            (Intr) cngrncyn
cngrncyncng -0.092         
congrncylbl -0.350  0.308

confint.merMod(model_full,method="Wald")

                           2.5 %    97.5 %
.sig01                        NA        NA
.sig02                        NA        NA
.sig03                        NA        NA
.sig04                        NA        NA
.sig05                        NA        NA
.sig06                        NA        NA
.sig07                        NA        NA
.sigma                        NA        NA
(Intercept)           648.424554 733.43232
congruencylabel       -68.712168 -35.82266
congruencyincongruent  -3.307674  33.40179

Looking at the graph with the data distribution, it is clear that there are quite a few participants who had long response times for congruent trials. This could have increased the mean response time for this category, thus creating a less straightforward distinction in response times between congruent and incongruent trials. It suggests there could be some confusion on the distinction between a congruent and incongruent sound.

Furthermore, we get the same trends when the statistical model is run on the “No” responses: the difference between label and congruent is significant, but not the difference between congruent and incongruent sound.

Discussion

Summary of Replication Attempt

The main finding of the original experiment (1A) replicated successfully, as was shown in the finding that labels elicit faster cue recognition than environmental sounds. However, our results only partially replicated on our collected sample, because there was no statistically significant difference in response times between congruent and incongruent sounds, though it was trending in the right direction. This failure to fully replicate results could be explained by a number of factors. The first is a difference in experimental environment: i.e., the original results were taken in a lab using video game controllers, while ours was taken online using a keyboard, in an online environment. In addition, the stimuli used for congruent vs. incongruent conditions were generally confusing (the difference between a hawk and a chickadee sound) and could not be generalizable to a wider audience , which may explain the lack of significant difference in this finding.

Commentary

The results of our study replicated partially for the difference between label and congruent condition, but did not replicate on the congruent vs. incongruent condition. While I initially believed the findings would replicate, after creating the paradigm and doing the study it myself, I was more skeptical on the difference between a congruent vs. incongruent sound condition, but was confident about the primacy of the “label advantage”, which had been proven previously (Lupyan and Schill 2012). The main contribution Edminston & Lupyan (2015) intended to make was to show and quantify faster response times to labels, i.e. the”label advantage”, which they attributed to the extra processing time it takes to decide whether the sound/picture combination is congruent or incongruent. In their own words “the label-advantage is obtained precisely because labels are detached from idiosyncracies of specific category members, and thereby able to selectively activat[e] the features/dimensions most diagnostic of the named category”, which distinguishes words as generalized, unmotivated cues (Edmiston & Lupyan, 2015, p.98).

On the other hand, sounds represent motivated cues that represent specific instances and require more mental processing time. This increased specificity of environmental sounds leads to longer reaction times, which as the authors attribute to the fact that “people appear unable to fully detach environmental sounds from perceptual details corresponding to their causes (Edmiston & Lupyan, 2015, p.98). Thus, our findings indicate clear evidence towards a label advantage, but do not help to explain why that advantage occurs through a delineation of sounds representing detailed specific instances through the (congruent/incongruent condition). Because there is no significant difference in reaction times between congruent and incongruent sounds, our results can’t support the claim that sounds take longer to process because they imply more specific instances, only that words have an advantage over sounds in general.

However, the differences in our results could also be attributed to the environment of our study (online) vs. the original study conducted in the lab. Our study was over 25 minutes long and did not include any breaks, so it was particularly difficult to concentrate on the task in an online study. Additionally, the congruent and incongruent distinction could have been difficult for participants to interpret and may not be generalizable to a broader audience that aren’t necessarily well versed in the difference between bird sounds, different sized dogs barking, or motorcycle sounds. In conclusion, we can confidently say that words are special in comparison to environmental sounds, yet the reasoning for this should still be investigated further.

Statement of Contributions

Jenna Brooks: Project Administration (lead), Software (JsPsych Experiment) (supporting), Writing and Editing (equal)
Noah Khaloo: Formal Analysis (lead), Data Curation (lead), Statistical Analysis and Visualization (lead), Writing and Editing (equal)
Sihan Yang: Conceptualization (lead), Software (JsPsych Experiment) (lead), Writing and Editing (equal)
Reeka Estacio: Software (JsPsych Experiment) (supporting), Validation (lead), JsPsych Experiment (supporting), Presentation (lead), Writing and Editing (equal)

Citations

Edmiston, P., & Lupyan, G. (2015). What makes words special? Words as unmotivated cues. Cognition, 143, 93–100. http://dx.doi.org/10.1016/j.cognition.2015.06.008
Lupyan, G., & Thompson-Schill, S. L. (2012). The evocative power of words: Activation of concepts by verbal and nonverbal means. Journal of Experimental Psychology: General, 141(1), 170–186. https://doi.org/10.1037/a0024904