This study aimed to explore whether verbal labels, such as the words “dog” or “guitar,” activate conceptual knowledge more effectively than environmental sounds associated with these objects, such as the bark of a dog or the strum of a guitar. Previous studies show that environmental sounds, like a dog bark, activate specific instances of a concept, while words, like “dog,” cue broader, more abstract mental representations (Lupyan & Thompson-Schill, 2012). This study investigates this “label advantage” and tests whether verbal labels promote faster, more abstract mental processing compared to environmental sounds in an image recognition task.
This study finds that verbal labels (or words) are more effective than sounds in activating abstract category concepts because labels act as “unmotivated cues,” broadly representing a category without specific reference to particular instances. In this experiment, participants will be presented with either a word or environmental sound for the following categories: bird, dog, drum, guitar, motorcycle, and phone. Participants are presented with an auditory cue (either a word or a sound), followed by a picture displayed 1 second later, and are tested on how quickly and accurately they can determine whether the picture matches the auditory cue, with reaction times serving as the primary measure. This research seeks to deepen our understanding of how verbal labels enhance abstract categorization compared to environmental sounds by examining their influence on reaction time speed in an image recognition task.
Methods
Power Analysis
Based on guidance from instructional staff, the sample size was determined with an a priori power analysis with the package simr, and is adequate to achieve at least 80% power for detecting the effect reported in the original study at a significance criterion of alpha = .05 (any random effects not specified in the original paper were taken from a small pilot study).
Planned Sample
We plan to have a sample size of n = 50. For pre-screening, participants must speak English fluently and not have any hearing difficulties to ensure comprehension in the task. Participants are recruited and compensated on the Prolific online platform.
Materials
The materials were followed precisely as follows. All materials were provided by the original authors.
“The auditory cues comprised basic-level category labels and environmental sounds for six categories: bird, dog, drum, guitar, motorcycle, and phone. For each category, we obtained two distinct environmental sound cues, e.g., , , and two separate images for each subordinate cate- gory, e.g., two electric guitars for , two acoustic guitars for . To control for cue variability, we also used two versions of each spoken category label: one pronounced by a female speaker, one by a male speaker. All auditory cues were equated in duration (600 ms.) and normalized in volume. The images were color photographs (four images per category). The materials, obtained from online repositories, are available for download at http://sapir.psych.wisc.edu/stimuli/ MotivatedCuesExp1A-1B.zip”(Edmiston & Lupyan, 2015, p.94).
The link to our online experiment can be found here
Procedure
This procedure will be followed as closely as possible:
“On each trial participants heard a cue and saw a picture. We instructed participants to decide as quickly and accurately as possible if the picture they saw came from the same basic-level cate- gory as the word or sound they heard Participants were tested in individual rooms sitting approximately 2400 from a monitor such that images subtended 10 10°. Trials began with a 250 ms. fixation cross followed immediately by the auditory cue, delivered via headphones. The target image appeared centrally 1 s after the off- set of the auditory cue and remained visible until a response was made. Each participant completed 6 practice and 384 test trials. If the picture matched the auditory cue (50% of trials) participants were instructed to respond ‘Yes’ on a gaming controller (e.g., or ‘‘phone’’ followed by a picture of any phone). Otherwise, they were to press ‘No’ (e.g., or ‘‘phone’’ followed by a dog). All factors (cue type, congruence) var- ied randomly within subjects. Auditory feedback (buzz or bleep) was given after each trial”(Edmiston & Lupyan, 2015, p. 94).
We aim to follow the original procedure of Experiment 1A as precisely as possible. However, instead of running the trials in-person, the experiment will be conducted online using jsPsych. For this reason, the task will be slightly different such that participants will respond to trials using their keyboard keys instead of a gaming controller. We will also encourage participants to wear headphones, be in a quiet area for the auditory cues during the experiment, and provide an initial audio check to ensure that participants have access to all stimuli presented throughout the experiment. In addition, we did not have access to the exact instructions and prompts that the original researchers used, so these may differ slightly.
“All participants performed very accurately on all items (M = 97%). Response times (RTs) shorter than 250 ms. or longer than 1500 ms. were removed (292 trials removed, 1.77% of total).
We fit RTs for correct responses on matching trials (‘Yes’ responses) with linear mixed regression using maximum likelihood estimation (Bates, Maechler, Bolker, & Walker, 2013), including random intercepts and random slopes for within-subject factors and random intercepts for repeated items (unique trial types) following the recommendations of Barr, Levy, Scheepers, and Tily (2013). Reported below are the parameter estimates (b) and confidence intervals for each contrast of interest. Significance tests were calculated using chi-square tests that compared nested models—models with and without the factor of interest—on improvement in log-likelihood”(Edmiston & Lupyan, 2015, p.94).
For our replication, we will also remove trials where response times are shorter than 250 ms or longer than 1500ms from the analysis. We will then run a similar linear mixed regression model, as was performed the original study. We will also use chi-square tests to assess significance of the results. We anticipate that a successful replication of the original study will yield similar effects. More specifically, we hypothesize to find that verbal labels elicit the lowest overall reaction times, and that congruent sounds elicit lower reaction times than incongruent sounds.
Differences from Original Study
This replication will differ from the original study in both setting and procedure. Instead of being conducted in a lab, the experiments will take place online using Prolific. Variations in participants’ device performance and internet speed may influence reaction times. Additionally, participants will respond to trials using a keyboard rather than the gaming controller used previously. Despite these changes, we expect these differences will have minimal impact on the results or the ability to replicate the effect reported in the original study.
Methods Addendum (Post Data Collection)
Actual Sample
We collected data from 50 participants as planned, with 42 passing the data quality check (accuracy rate above 90%). Refer to the Results section for an overview of performance and the code for data quality assurance. We ensured that all participants were adult English speakers without hearing difficulties using Prolific screening criteria.
Differences from pre-data collection methods plan
None
Design Overview
Within-subjects design with 2 factors: Auditory Cue (Environmental Sound vs. Label) and Match to Basic Category (Match vs. No Match). Congruency is further manipulated within the Matching Environmental Sound condition.
Reaction time was the only measure taken.
It uses within-participants design where everyone does each condition.
Measures were repeated for 384 experimental trials.
They didn’t take any measures to reduce demand characteristics.
To improve the experiment attention checks could be added. Given that incorrect responses could be labeled incongruent, incorrect responses could have also been included in the data. This sample could not be representative of all populations because it only studied undergraduates and WEIRD populations.
Results
Data preparation
We are replicating Experiment 1A from the paper. The data cleaning process begins by loading necessary libraries and importing multiple CSV files into a combined dataframe. Practice trials are excluded to focus on actual experimental data. A data exclusion step ensures that only trials where responses match the correct responses are retained, and participant accuracy is calculated. Participants with an accuracy below 90% are excluded to maintain high data quality. Non-relevant responses are filtered out, and a congruency column is created to classify trials as congruent or incongruent. Reaction times (RTs) shorter than 250 ms or longer than 1500 ms are removed, and the count of trials before and after filtering is compared to verify exclusions. Finally, the dataset is refined to include only relevant columns (rt, ID, sound_category, cue, congruency) for analysis, ensuring it is clean and ready for further processing.
### Data Preparation#### Load Relevant Libraries and Functionslibrary(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.1
✔ purrr 1.0.2
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(readr)library(lme4)
Loading required package: Matrix
Attaching package: 'Matrix'
The following objects are masked from 'package:tidyr':
expand, pack, unpack
#### Import data#replace this with your own path (for now)folder_path <-"../data/complete"csv_files <-list.files(folder_path, pattern ="*.csv", full.names =TRUE)df_list <-lapply(csv_files, read_csv, show_col_types =FALSE)df <-bind_rows(df_list)#exclude practice trials combined_df <- df %>%filter(exp_part =="actual")# Data exclusion / filteringcombined_df <- combined_df %>%mutate(correct_response =as.character(correct_response)) %>%mutate(response =as.character(response)) %>%mutate(correct = correct_response == response)accuracy_table <- combined_df %>%group_by(ID) %>%summarise(accuracy =mean(correct, na.rm =TRUE))accuracy_table
This analysis aims to determine whether (1) verbal labels lead to faster response times compared to sound labels, and (2) congruent sounds produce faster responses than incongruent sounds, in line with the findings of Experiment 1A from the original study.
As outlined in our analysis plan, we adopt the approach used by Edmiston & Lupyan (2015), modeling reaction times for “Yes” responses in matching trials across three cue conditions (verbal label, congruent sound, incongruent sound) using a linear mixed-effects regression model. The model includes random intercepts and random slopes for within-subject factors (i.e., participants) and random intercepts for repeated items (e.g., the specific stimuli such as “bird,” “guitar,” or “dog”). The primary fixed effect of interest in the model is the “condition” variable, which categorizes each trial as presenting a label, a congruent sound, or an incongruent sound.
To evaluate the significance of the condition effect, we conduct chi-square tests to compare nested models—with and without the condition variable—based on improvements in log-likelihood (as described on p. 94 of the original study). This analysis aims to test the central hypothesis that condition modulates reaction times.
The parameter estimates derived from the model provide insight into the magnitude and direction of the effect of each condition on reaction time. Additionally, the model facilitates direct comparisons between conditions, enabling us to assess the relative effects of ‘label,’ ‘congruent sound,’ and ‘incongruent sound’ conditions. Pairwise comparisons are also performed to specifically examine the relationship between ‘label’ and ‘incongruent’ trials, a contrast not directly represented in the baseline model where the reference level is set to one of the conditions.
We anticipate that the model outputs will align closely with the results reported by Edmiston & Lupyan (2015). More specifically, incongruent environmental sounds will elicit longer response times in comparison to congruent environmental sounds. Verbal labels will elicit the shortest response time.
Side-by-side graph with original graph is ideal here ## Our Graph
#load libraries: library(lmerTest)
Attaching package: 'lmerTest'
The following object is masked from 'package:lme4':
lmer
The following object is masked from 'package:stats':
step
library(emmeans)
Welcome to emmeans.
Caution: You lose important information if you filter this package's results.
See '? untidy'
library(lme4)library(ggplot2)####visualization##### Reorder the 'congruency' factor levels so that "Label" comes firstcombined_df$congruency <-factor(combined_df$congruency, levels =c("label", "congruent", "incongruent"))#boxpplot (Fig. 2 of the original paper)ggplot(data = combined_df,mapping =aes(x = congruency,y = rt,color = congruency, fill = congruency)) +geom_bar(stat ="summary", fun ="mean", width =0.4, color ="black") +stat_summary(fun.data ="mean_cl_boot", geom ="linerange", color ="black") +geom_errorbar(stat ="summary", fun.data ="mean_cl_boot", width =0.2, color ="black") +scale_fill_manual(values =c("darkred", "darkblue", "darkgreen")) +scale_color_manual(values =c("darkred", "darkblue", "darkgreen")) +labs(x ="Cue Type",y ="Reaction Time (Ms)",color ="Cue Type",fill ="Cue Type" ) +scale_y_continuous(expand =c(0, 0)) +theme_classic() +theme(axis.text.x =element_blank(), axis.text.y =element_text(size =14, color ="black"),axis.title.x =element_text(size =16, color ="black", face ="bold"), axis.title.y =element_text(size =16, color ="black", face ="bold"), legend.title =element_text(size =14, color ="black"), legend.text =element_text(size =14, color ="black") )
######statistical analysis#########find best optimizer#allFit(model_full)# set reference level back to congruentcombined_df$congruency <-relevel(combined_df$congruency, ref ="congruent")#set up model and view resultsmodel_full <-lmer(rt ~ congruency + (1+ congruency|ID) + (1|sound_category), data = combined_df, control =lmerControl(optimizer ="bobyqa"))summary(model_full)
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: rt ~ congruency + (1 + congruency | ID) + (1 | sound_category)
Data: combined_df
Control: lmerControl(optimizer = "bobyqa")
REML criterion at convergence: 99407.8
Scaled residuals:
Min 1Q Median 3Q Max
-2.3748 -0.6827 -0.1974 0.4509 4.2050
Random effects:
Groups Name Variance Std.Dev. Corr
ID (Intercept) 18945.80 137.644
congruencylabel 1768.01 42.048 -0.46
congruencyincongruent 822.77 28.684 -0.10 0.30
sound_category (Intercept) 16.17 4.021
Residual 44351.37 210.598
Number of obs: 7328, groups: ID, 42; sound_category, 6
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 690.928 21.686 41.062 31.860 < 2e-16 ***
congruencylabel -52.267 8.390 39.660 -6.229 2.33e-07 ***
congruencyincongruent 15.047 9.365 41.783 1.607 0.116
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr) cngrncyl
congrncylbl -0.439
cngrncyncng -0.126 0.317
2.5 % 97.5 %
.sig01 NA NA
.sig02 NA NA
.sig03 NA NA
.sig04 NA NA
.sig05 NA NA
.sig06 NA NA
.sig07 NA NA
.sigma NA NA
(Intercept) 648.424554 733.43232
congruencylabel -68.712168 -35.82266
congruencyincongruent -3.307674 33.40179
# #post-hoc test: allows you to examine relationship between incongruent and label#####. 95% CI for this??? ###3contrast_results <- model_full %>%emmeans(pairwise ~ congruency,#adjusts p values so that it is more difficult to get a significance (correction method)adjust ="bonferroni", conf.int =TRUE) %>%pluck("contrasts")
Note: D.f. calculations have been disabled because the number of observations exceeds 3000.
To enable adjustments, add the argument 'pbkrtest.limit = 7328' (or larger)
[or, globally, 'set emm_options(pbkrtest.limit = 7328)' or larger];
but be warned that this may result in large computation time and memory use.
Note: D.f. calculations have been disabled because the number of observations exceeds 3000.
To enable adjustments, add the argument 'lmerTest.limit = 7328' (or larger)
[or, globally, 'set emm_options(lmerTest.limit = 7328)' or larger];
but be warned that this may result in large computation time and memory use.
#I was having trouble getting 95% CI for the emmeans output, I looked it up and found someone's code that looked like thisconfint(contrast_results, calc =c(n =~.wgt.))
Warning in summary.emmGrid(object, infer = c(TRUE, FALSE), level = level, : The
column 'n' could not be calculated, so it is omitted
The results of the statistical analysis showed us that labels elicited faster reaction times than congruent trials (b = -52.3, CI [-68.71, -35.82]) and that congruent trials were not significantly faster than incongruent trials (b = 15.04, CI [-3.31 33.4]. Furthermore, the results of a pairwise contrast shows us that labels elicited faster reaction times than incongruent trials (b = -67.3, CI [-94.2, -42.21]).
Graph Comparison
library(gridExtra)
Attaching package: 'gridExtra'
The following object is masked from 'package:dplyr':
combine
library(jpeg)library(grid)library(dplyr)library(forcats)library(ggplot2)# rename and reorder some columns for plottingplot_df <- combined_df |>mutate(congruency=factor( congruency, levels =c("label", "congruent", "incongruent"))) |>mutate(congruency =fct_recode(congruency, "Label"="label", "Congruent Sound"="congruent", "Incongruent Sound"="incongruent") )# plot the result of our replicationour_plot <-ggplot(data = plot_df,mapping =aes(x = congruency,y = rt,color = congruency, fill = congruency)) +geom_bar(stat ="summary", fun ="mean", width =1,color ="black") +stat_summary(fun.data ="mean_cl_boot", geom ="linerange", color ="black") +geom_errorbar(stat ="summary", fun.data ="mean_cl_boot", width =0.5, size=0.5,color ="black") +scale_fill_manual(values =c("firebrick", "darkblue", "slategray2")) +scale_color_manual(values =c("firebrick", "darkblue", "slategray2")) +labs(x ="Experiment 1A",y ="Verification Speed (ms)",color ="Cue Type",fill ="Cue Type") +coord_cartesian(xlim=c(0, 4), ylim =c(400, 725), expand=FALSE) +scale_y_continuous(breaks =seq(400, 700, by =50), expand=c(0, 25)) +theme_classic() +theme(axis.text.x =element_blank(), axis.text.y =element_text(size =10, color ="black"),axis.title.x =element_text(size =10, color ="black"), axis.title.y =element_text(size =10, color ="black"), legend.title =element_text(size =10, color ="black", face='bold'), legend.text =element_text(size =10, color ="black"),plot.margin =unit(c(0.0, 0.1, 0.0, 0.1), "npc") ) +theme(aspect.ratio =1.8)
Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
ℹ Please use `linewidth` instead.
# load the plot from the original studyedmiston_img <-rasterGrob(readJPEG("edmiston_exp1a.jpg"), interpolate =TRUE)# display side by sideplot_title <-textGrob("Our Result", gp =gpar(fontface ="bold", fontsize =14))image_title <-textGrob("Original Study Result", gp =gpar(fontface ="bold", fontsize =14))grid.arrange(arrangeGrob(image_title, edmiston_img, ncol =1, heights =c(0.5, 5)),arrangeGrob(plot_title, ggplotGrob(our_plot), ncol =1, heights =c(0.5, 5)),ncol =2, widths=c(1, 1.7))
The graph directly above (refer to Experiment 1A) from the original paper shows a significant difference between label and congruent conditions, as well as a significant difference between congruent sounds and incongruent sounds. This differs from our graph (on the right) because our study only shows a difference between label and congruent, but no statistically significant difference between the congruent and incongruent conditions.
Exploratory analyses
The results of the original data were presented using a bar graph, however, we were curious about the distribution of the data, especially since our results were slightly different. To investigate, we included a box plot that shows the individual data points. We expand a bit on the results shown in the original paper by re-running the same analysis, but instead of filtering out ‘No’ responses, we filter out ‘Yes’ responses.
#box plotcombined_df$congruency <-factor(combined_df$congruency, levels =c("label", "congruent", "incongruent"))ggplot(data = combined_df,mapping =aes(x = congruency,y = rt,color = congruency)) +geom_boxplot(width =0.6, outlier.shape =NA) +geom_jitter(width =0.2, alpha =0.1, size =1.5) +scale_color_manual(values =c("darkred", "darkblue", "darkgreen")) +labs(y ="Reaction Time (Ms)",color ="Cue Type",x ="Cue Type",fill ="Cue Type " ) +theme_classic() +theme(axis.text.x =element_blank(), axis.text.y =element_text(size =14, color ="black"),axis.title.x =element_text(size =16, color ="black", face ="bold"), axis.title.y =element_text(size =16, color ="black", face ="bold"), legend.title =element_text(size =14, color ="black"), legend.text =element_text(size =14, color ="black") )
Linear mixed model fit by REML. t-tests use Satterthwaite's method [
lmerModLmerTest]
Formula: rt ~ congruency + (1 + congruency | ID) + (1 | sound_category)
Data: df_n
Control: lmerControl(optimizer = "bobyqa")
REML criterion at convergence: 109825.9
Scaled residuals:
Min 1Q Median 3Q Max
-2.4104 -0.6811 -0.1950 0.4574 4.1140
Random effects:
Groups Name Variance Std.Dev. Corr
ID (Intercept) 20633.8 143.645
congruencyincongruent 827.9 28.772 -0.02
congruencylabel 1864.1 43.175 -0.34 0.27
sound_category (Intercept) 16.6 4.075
Residual 46409.2 215.428
Number of obs: 8068, groups: ID, 48; sound_category, 6
Fixed effects:
Estimate Std. Error df t value Pr(>|t|)
(Intercept) 708.103 21.185 46.671 33.425 < 2e-16 ***
congruencyincongruent 12.297 9.124 44.482 1.348 0.185
congruencylabel -51.721 8.131 45.375 -6.361 8.77e-08 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr) cngrncyn
cngrncyncng -0.092
congrncylbl -0.350 0.308
confint.merMod(model_full,method="Wald")
2.5 % 97.5 %
.sig01 NA NA
.sig02 NA NA
.sig03 NA NA
.sig04 NA NA
.sig05 NA NA
.sig06 NA NA
.sig07 NA NA
.sigma NA NA
(Intercept) 648.424554 733.43232
congruencylabel -68.712168 -35.82266
congruencyincongruent -3.307674 33.40179
Looking at the graph with the data distribution, it is clear that there are quite a few participants who had long response times for congruent trials. This could have increased the mean response time for this category, thus creating a less straightforward distinction in response times between congruent and incongruent trials. It suggests there could be some confusion on the distinction between a congruent and incongruent sound.
Furthermore, we get the same trends when the statistical model is run on the “No” responses: the difference between label and congruent is significant, but not the difference between congruent and incongruent sound.
Discussion
Summary of Replication Attempt
The main finding of the original experiment (1A) replicated successfully, as was shown in the finding that labels elicit faster cue recognition than environmental sounds. However, our results only partially replicated on our collected sample, because there was no statistically significant difference in response times between congruent and incongruent sounds, though it was trending in the right direction. This failure to fully replicate results could be explained by a number of factors. The first is a difference in experimental environment: i.e., the original results were taken in a lab using video game controllers, while ours was taken online using a keyboard, in an online environment. In addition, the stimuli used for congruent vs. incongruent conditions were generally confusing (the difference between a hawk and a chickadee sound) and could not be generalizable to a wider audience , which may explain the lack of significant difference in this finding.
Commentary
The results of our study replicated partially for the difference between label and congruent condition, but did not replicate on the congruent vs. incongruent condition. While I initially believed the findings would replicate, after creating the paradigm and doing the study it myself, I was more skeptical on the difference between a congruent vs. incongruent sound condition, but was confident about the primacy of the “label advantage”, which had been proven previously (Lupyan and Schill 2012). The main contribution Edminston & Lupyan (2015) intended to make was to show and quantify faster response times to labels, i.e. the”label advantage”, which they attributed to the extra processing time it takes to decide whether the sound/picture combination is congruent or incongruent. In their own words “the label-advantage is obtained precisely because labels are detached from idiosyncracies of specific category members, and thereby able to selectively activat[e] the features/dimensions most diagnostic of the named category”, which distinguishes words as generalized, unmotivated cues (Edmiston & Lupyan, 2015, p.98).
On the other hand, sounds represent motivated cues that represent specific instances and require more mental processing time. This increased specificity of environmental sounds leads to longer reaction times, which as the authors attribute to the fact that “people appear unable to fully detach environmental sounds from perceptual details corresponding to their causes (Edmiston & Lupyan, 2015, p.98). Thus, our findings indicate clear evidence towards a label advantage, but do not help to explain why that advantage occurs through a delineation of sounds representing detailed specific instances through the (congruent/incongruent condition). Because there is no significant difference in reaction times between congruent and incongruent sounds, our results can’t support the claim that sounds take longer to process because they imply more specific instances, only that words have an advantage over sounds in general.
However, the differences in our results could also be attributed to the environment of our study (online) vs. the original study conducted in the lab. Our study was over 25 minutes long and did not include any breaks, so it was particularly difficult to concentrate on the task in an online study. Additionally, the congruent and incongruent distinction could have been difficult for participants to interpret and may not be generalizable to a broader audience that aren’t necessarily well versed in the difference between bird sounds, different sized dogs barking, or motorcycle sounds. In conclusion, we can confidently say that words are special in comparison to environmental sounds, yet the reasoning for this should still be investigated further.
Edmiston, P., & Lupyan, G. (2015). What makes words special? Words as unmotivated cues. Cognition, 143, 93–100. http://dx.doi.org/10.1016/j.cognition.2015.06.008
Lupyan, G., & Thompson-Schill, S. L. (2012). The evocative power of words: Activation of concepts by verbal and nonverbal means. Journal of Experimental Psychology: General, 141(1), 170–186. https://doi.org/10.1037/a0024904