16 December, 2025


Brief Introduction

Reaction time measurements are commonly used to explore cognitive processing in language. Countless research has proven concrete words (e.g., table, apple) are processed more quickly and more accurately than abstract words (e.g., justice, hope). However, there is limited research that takes into account potential influential lexical factors. In the present analysis, word length and frequency are controlled for when examining reaction times of concrete and abstract words.



Data

Three data sets were used in the present analysis. Only SimLex and SUBTLEX were directly imported into R, while the third data set was used for reference purpose:



SimLex: Provides concreteness ratings for a wide range of English words, which are used to classify words as abstract or concrete. Concreteness categories are determined using a median split of the ratings, with words below the median classified as abstract and words above the median classified as concrete.

SUBTLEX: Provides frequency ratings for 74,286 English words based on a large corpus. These frequency measures are used to limit the analysis to words within a comparable frequency range to reduce frequency related effects on reaction time.

English Lexicon Project: Provides reaction time measures for approximately 40,481 English words






SIMLEX DATA


Code used to load in SimLex:
setwd(“~/Desktop”)
SimLex <- read.delim(“SimLex-999.txt”, header = TRUE)


Orignal Imported Data Set:

head(SimLex)

From the SimLex data set, the only data of interest for this analysis are the concreteness ratings of words. SimLex contains two word columns and two corresponding concreteness columns. To create a clean data set, the two word columns were combined into a single word column, and the two concreteness columns were combined into a single concreteness column. Any duplicate words were then removed so that each word only appears once.

A median split of the concreteness ratings was performed, with words below the median classified as Abstract and words above the median classified as Concrete. Words equal to the median were removed and concreteness rating was kept between 3.43 and 4.03.




Code Used:
To extract the information and then merge it:
SimLex1 <- data.frame(Word = SimLex\(word1, Conc.Rating = SimLex\)conc.w1.)
SimLex2 <- data.frame(Word = SimLex\(word2, Conc.Rating = SimLex\)conc.w2.)
SimLex_CombinedWords <- rbind(SimLex1, SimLex2)

To create a median split and a categorization column:
median_value <- median(SimLex_CombinedWords$Conc.Rating)
SimLex_NoMedian <- SimLex_CombinedWords %>%
filter(Conc.Rating != median_value)
SimLex_FinalDataset <- SimLex_NoMedian %>%
mutate(Conc.Type = case_when(Conc.Rating > median_value ~ “Concrete”, Conc.Rating < median_value ~ “Abstract”))

To remove duplicates:
SimLex_FinalDataset <- SimLex_FinalDataset[!duplicated(SimLex_CombinedWords$Word), ]

To keep concreteness rating within a range:
median_adjacent <- SimLex_FinalDataset %>%
filter(Conc.Rating >= 3.43 & Conc.Rating <= 4.03)

Split by concreteness type and then merge:
abstract_median_adjacent <- median_adjacent %>% filter(Conc.Type == “Abstract”)
concrete_median_adjacent <- median_adjacent %>% filter(Conc.Type == “Concrete”)
SimLex_FinalDataset <- median_adjacent


This is the resulting data set now containing 77 words:

head(SimLex_FinalDataset)





SUBTLEX DATA:


Code used to load in SUBTLEX:
install.packages(“readxl”)
library(readxl)
SUBTLEX <- read_excel(“SUBTLEXusExcel2007.xlsx”)


Original Imported Data Set:

head(SUBTLEX)

From the SUBTLEX data set, the only data of interest for this analysis are the words and their frequency counts. A new column to indicate the number of characters in each word was added. Words shorter than 3 letters or longer than 7 letters were removed to reduce potential effects.

A median split of the FREQcount column was then performed. Words with frequency counts above the median were labeled high frequency, and words below the median were labeled low frequency. Frequencies equal to the median were not considered. Any duplicate words were removed. The resulting data set contains only the necessary columns for analysis.




Code Used:
To add word length column:
SUBLTEX <- SUBTLEX %>%
mutate(Word_Length = nchar(Word))

To remove irrelevant columns of data:
SUBTLEX_Clean <- SUBTLEX %>%
select(Word, Length, FREQcount)

To filter word length:
SUBTLEX_Clean <- SUBTLEX_Clean %>%
filter(Length >= 3 & Length <= 7)

To median split word frequency and add frequency type:
Median_Freq <- median(SUBTLEX_Clean$FREQcount)
SUBTLEX_FinalDataset <- SUBTLEX_Clean %>%
mutate(freq.type = ifelse(FREQcount > Median_Freq, “High”, “Low”))

To remove duplicates:
SUBTLEX_FinalDataset <- SUBTLEX_FinalDataset[!duplicated(SUBTLEX_FinalDataset$Word), ]


This is the resulting data set:

head(SUBTLEX_FinalDataset)



Sample Selection

To analyze reaction times for concrete and abstract words while also controlling for word length and frequency, a subset of words was selected from the cleaned SimLex and SUBTLEX data sets. 28 abstract words of high frequency were selected while only 24 concrete words of high frequency were selected. Note that only high frequency words were considered. The relevant data from SimLex and SUBTLEX were then compiled into separate tables containing only the columns needed for analysis (word, concreteness type, concreteness rating, and reaction time). Reaction time was added in manually from the English Lexicon Project data set.




Here is the resulting data table:

head(final_sample)


Results

ggplot(final_sample, aes(x = Conc.Type, y = ReactionTime, fill = Conc.Type)) +
geom_boxplot(alpha = 0.7) +
geom_jitter(width = 0.2, alpha = 0.5, color = "black") +
labs(x = "Word Type", y = "Reaction Time (ms)", title = "Reaction Times for Abstract vs. Concrete Words") + coord_flip()

The box plot compares reaction times (in milliseconds) for abstract and concrete words. Overall, reaction times for abstract and concrete words are fairly similar, however a smaller sample size was used which could be an explanation for the similarities. Results may vary with a larger sample size. The median reaction time for abstract words is slightly higher (~620ms) than that of concrete words (~600ms), indicating faster processing of concrete words on average. Abstract words also display a more narrow interquartile range which suggests more consistent reaction times across the abstract sample. In contrast, concrete words show a slightly wider range and this is evident with the extreme outliers seen in the visualization.


ggplot(final_sample, aes(x = Conc.Rating, y = ReactionTime, color = Conc.Type)) +
  geom_point(size = 3) +
  geom_smooth(method = "lm", color = "black") +
  labs(x = "Concreteness Rating", y = "Reaction Time (ms)", color = " Word Type", title = "Concreteness and Reaction Time")

This scatter plot illustrates the relationship between concreteness rating and reaction time, with abstract words shown in pink and concrete words shown in blue. Overall, there is a slight negative trend suggesting that higher concreteness ratings are associated with faster reaction times.The spread of points and overlap show that this relationship is relatively weak given the smaller sample size used.


Conclusion

The present analysis explored the relationship between word concreteness and reaction time while controlling for word length and word frequency. Overall, the results suggest that concrete words tend to be processed slightly faster than abstract words which is consistent with prior findings. This may indicate that other lexical factors have a smaller influence on reaction time than initially expected, or that prior research has already accounted for these factors. The small sample size used limits the results. Future studies using a larger sample would help to clarify the effect of concreteness on reaction times. Additionally, this analysis focused only on high frequency words. The inclusion of lower frequency words might yield different patterns.








---
title: "Lexical Influences of Reaction Time: A Comparison of Concrete and Abstract Words"
author: "Gabriella Coogan"
output: html_notebook
---
16 December, 2025 <br> 
<div style="text-align: center;">
<br> **Brief Introduction** <br>

<div style="line-height: 2;">
Reaction time measurements are commonly used to explore cognitive processing in language. Countless research has proven concrete words (e.g., table, apple) are processed more quickly and more accurately than abstract words (e.g., justice, hope). However, there is limited research that takes into account potential influential lexical factors. In the present analysis, word length and frequency are controlled for when examining reaction times of concrete and abstract words.</div> </div> <br> <br>

<div style="text-align: center;">
**Data** <br>

<div style="line-height: 2;">
*Three data sets were used in the present analysis. Only SimLex and SUBTLEX were directly imported into R, while the third data set was used for reference purpose:* </div> </div> <br> <br>

<div style="line-height: 2;">
<u>**SimLex:**</u> Provides concreteness ratings for a wide range of English words, which are used to classify words as abstract or concrete. Concreteness categories are determined using a median split of the ratings, with words below the median classified as abstract and words above the median classified as concrete.

<u>**SUBTLEX:**</u> Provides frequency ratings for 74,286 English words based on a large corpus. These frequency measures are used to limit the analysis to words within a comparable frequency range to reduce frequency related effects on reaction time.

<u>**English Lexicon Project:**</u> Provides reaction time measures for approximately 40,481 English words </div> <br> <br> <br> <br>

<div style="text-align: center">
<br> **SIMLEX DATA**
</div> <br>
**Code used to load in SimLex:**<br>
setwd("~/Desktop")<br>
SimLex <- read.delim("SimLex-999.txt", header = TRUE) <br> <br> <br>
<u>Orignal Imported Data Set:</u>
```{r}
head(SimLex)
```

<div style="text-align: center;line-height: 2;"<br><br>From the SimLex data set, the only data of interest for this analysis are the concreteness ratings of words. SimLex contains two word columns and two corresponding concreteness columns. To create a clean data set, the two word columns were combined into a single word column, and the two concreteness columns were combined into a single concreteness column. Any duplicate words were then removed so that each word only appears once.<br> <br>

A median split of the concreteness ratings was performed, with words below the median classified as Abstract and words above the median classified as Concrete. Words equal to the median were removed and concreteness rating was kept between 3.43 and 4.03.
</div> <br> <br> <br>

**Code Used:**<br>
<u>To extract the information and then merge it:</u> <br>
SimLex1 <- data.frame(Word = SimLex$word1, Conc.Rating = SimLex$conc.w1.) <br>
SimLex2 <- data.frame(Word = SimLex$word2, Conc.Rating = SimLex$conc.w2.) <br>
SimLex_CombinedWords <- rbind(SimLex1, SimLex2) <br> <br>

<u>To create a median split and a categorization column:</u> <br>
median_value <- median(SimLex_CombinedWords$Conc.Rating)<br> 
SimLex_NoMedian <- SimLex_CombinedWords %>% <br>
  filter(Conc.Rating != median_value) <br> 
SimLex_FinalDataset <- SimLex_NoMedian %>% <br>
  mutate(Conc.Type = case_when(Conc.Rating > median_value ~ "Concrete", Conc.Rating < median_value ~ "Abstract")) <br> <br>
  
<u>To remove duplicates:</u> <br>
SimLex_FinalDataset <- SimLex_FinalDataset[!duplicated(SimLex_CombinedWords$Word), ]<br>

<u>To keep concreteness rating within a range:</u> <br>
median_adjacent <- SimLex_FinalDataset %>% <br>
  filter(Conc.Rating >= 3.43 & Conc.Rating <= 4.03) <br>

<u>Split by concreteness type and then merge:</u> <br>
abstract_median_adjacent <- median_adjacent %>% filter(Conc.Type == "Abstract") <br>
concrete_median_adjacent <- median_adjacent %>% filter(Conc.Type == "Concrete") <br>
SimLex_FinalDataset <- median_adjacent <br> <br> <br>




<u>This is the resulting data set now containing 77 words:</u>
```{r}
head(SimLex_FinalDataset)
```
<div style="text-align: center">
<br> <br> <br> <br> **SUBTLEX DATA:**
</div> <br>

**Code used to load in SUBTLEX:** <br>
install.packages("readxl") <br>
library(readxl) <br>
SUBTLEX <- read_excel("SUBTLEXusExcel2007.xlsx") <br> <br> <br>

<u>Original Imported Data Set:</u>
```{r}
head(SUBTLEX)
```

<div style="text-align: center;line-height: 2;"<br><br>From the SUBTLEX data set, the only data of interest for this analysis are the words and their frequency counts. A new column to indicate the number of characters in each word was added. Words shorter than 3 letters or longer than 7 letters were removed to reduce potential effects. <br> <br>

A median split of the FREQcount column was then performed. Words with frequency counts above the median were labeled high frequency, and words below the median were labeled low frequency. Frequencies equal to the median were not considered. Any duplicate words were removed. The resulting data set contains only the necessary columns for analysis. </div> <br> <br> <br>


**Code Used:** <br>
<u>To add word length column:</u> <br>
SUBLTEX <- SUBTLEX %>% <br>
  mutate(Word_Length = nchar(Word)) <br> <br>
<u>To remove irrelevant columns of data:</u> <br>
SUBTLEX_Clean <- SUBTLEX %>% <br>
  select(Word, Length, FREQcount) <br> <br>
<u>To filter word length:</u> <br>
SUBTLEX_Clean <- SUBTLEX_Clean %>% <br>
  filter(Length >= 3 & Length <= 7) <br> <br>
  
<u>To median split word frequency and add frequency type:</u> <br>
Median_Freq <- median(SUBTLEX_Clean$FREQcount) <br>
SUBTLEX_FinalDataset <- SUBTLEX_Clean %>% <br>
  mutate(freq.type = ifelse(FREQcount > Median_Freq, "High", "Low")) <br> <br>
  
<u>To remove duplicates:</u> <br>
SUBTLEX_FinalDataset <- SUBTLEX_FinalDataset[!duplicated(SUBTLEX_FinalDataset$Word), ] <br> <br> <br>

<u>This is the resulting data set:</u>
```{r}
head(SUBTLEX_FinalDataset)
```
<div style="text-align: center; line-height: 2; "> <br> <br> **Sample Selection**

To analyze reaction times for concrete and abstract words while also controlling for word length and frequency, a subset of words was selected from the cleaned SimLex and SUBTLEX data sets.
28 abstract words of high frequency were selected while only 24 concrete words of high frequency were selected. Note that only high frequency words were considered.
The relevant data from SimLex and SUBTLEX were then compiled into separate tables containing only the columns needed for analysis *(word, concreteness type, concreteness rating, and reaction time).* Reaction time was added in manually from the English Lexicon Project data set. </div> <br> <br> <br>

<u>Here is the resulting data table:</u>
```{r}
head(final_sample)
```
<div style="text-align: center; "> <br> **Results** <br>

```{r}
ggplot(final_sample, aes(x = Conc.Type, y = ReactionTime, fill = Conc.Type)) +
geom_boxplot(alpha = 0.7) +
geom_jitter(width = 0.2, alpha = 0.5, color = "black") +
labs(x = "Word Type", y = "Reaction Time (ms)", title = "Reaction Times for Abstract vs. Concrete Words") + coord_flip()
```
<div style = "line-height: 2; "> The box plot compares reaction times (in milliseconds) for abstract and concrete words. Overall, reaction times for abstract and concrete words are fairly similar, however a smaller sample size was used which could be an explanation for the similarities. Results may vary with a larger sample size. The median reaction time for abstract words is slightly higher (~620ms) than that of concrete words (~600ms), indicating faster processing of concrete words on average. Abstract words also display a more narrow interquartile range which suggests more consistent reaction times across the abstract sample. In contrast, concrete words show a slightly wider range and this is evident with the extreme outliers seen in the visualization.</div> </div> <br>

```{r}
ggplot(final_sample, aes(x = Conc.Rating, y = ReactionTime, color = Conc.Type)) +
  geom_point(size = 3) +
  geom_smooth(method = "lm", color = "black") +
  labs(x = "Concreteness Rating", y = "Reaction Time (ms)", color = " Word Type", title = "Concreteness and Reaction Time")
```
<div style = "line-height: 2; text-align: center; "> This scatter plot illustrates the relationship between concreteness rating and reaction time, with abstract words shown in pink and concrete words shown in blue. Overall, there is a slight negative trend suggesting that higher concreteness ratings are associated with faster reaction times.The spread of points and overlap show that this relationship is relatively weak given the smaller sample size used.<br> <br> <br>

**Conclusion** <br> <br>
The present analysis explored the relationship between word concreteness and reaction time while controlling for word length and word frequency. Overall, the results suggest that concrete words tend to be processed slightly faster than abstract words which is consistent with prior findings. This may indicate that other lexical factors have a smaller influence on reaction time than initially expected, or that prior research has already accounted for these factors. 
The small sample size used limits the results. Future studies using a larger sample would help to clarify the effect of concreteness on reaction times. Additionally, this analysis focused only on high frequency words. The inclusion of lower frequency words might yield different patterns. </div> <br> <br> <br> <br> <br> <br> <br>