Abstract
One or two sentences providing a basic introduction to the field, comprehensible to a scientist in any discipline. Two to three sentences of more detailed background, comprehensible to scientists in related disciplines. One sentence clearly stating the general problem being addressed by this particular study. One sentence summarizing the main result (with the words “here we show” or their equivalent). Two or three sentences explaining what the main result reveals in direct comparison to what was thought to be the case previously, or how the main result adds to previous knowledge. One or two sentences to put the results into a more general context. Two or three sentences to provide a broader perspective, readily comprehensible to a scientist in any discipline.
In Study 1, we aimed to replicate (Samuelson 1999) by collecting judgment ratings on the first 300 words learned from the MCDI US English across the same three dimensions of solidity, count, and mass syntax, and shape-based categorization. The study, however, collected these ratings on the first 300 words learned by children in sixteen languages including English.
377 English-speaking adults recruited online from the Prolific platform (mean age = 41, sd = 14) were asked to rate words in three dimensions: solidity (solid, non-solid, unclear), count and mass noun syntax (count, mass, unclear), and organizing feature (shape, color, material, none of these).
Three hundred nouns were sourced from MCDI data on the WordBank repository of 16 languages that belonged to seven linguistic families: (LIST OF LANGUAGES). The nouns were chosen based on the earliest age of acquisition in the dataset, up to a maximum of 300 nouns (CITE). Because we only recruited English-speaking participants, Nouns were presented in English and mapped across languages via conceptual mappings (“unilemmas”, Frank et al., 2021). A “unilemma” is the concept behind the noun expressed in the English language (“dog” and “dog in german” both map to the same underlying concept of “dog”, which is used as the unilemma in this case).
The final sample consisted of a total of 23355 nouns from all languages, with N = ~664 after removing duplicates “uni-lemmas shared between languages”. (Figure to show how much overlap between languages). With the Semantic categories breakdown of the words in each language attached in the supplementary materials. Most words belonged to the categories of food/drinks, clothing, and animals, with a few words referring to vehicles, toys, body parts, household items, and outside things.
The experiment started with familiarization trials in which participants were shown examples of count and mass nouns, solid objects, and non-solid objects. They were also familiarized with what a category-organizing or characterizing feature means, as in Table [2]. Then the test trials followed the same wording and structure. Every dimension (solidity, count-mass syntax, and category organizing feature) constituted a block, and every participant rated 20 words per block, so there were 60 test trials in total. Familiarization trials, test blocks, and words within a block were all randomized across participants. No information about the semantic category of words was given to participants, nor were the words grouped by their semantic category. Participants were only allowed to choose one feature in all questions.
This is a figure caption.
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.
We collected a total of 20,130 ratings across 377 participants for 664 words in three blocks (solidity, count-mass syntax, and category-organizing feature). After removing responses with no answer, we retained 19,635 ratings. Each word received an average of between 8 and 10 ratings per block. Included in the supplementary materials are the demographic details of the participants (Table 1) and the distribution of ratings across words and blocks (Figure 1).
Across languages, ratings were very consistent across languages, suggesting limited variation in lexical statistics as can be seen in figure @ref{fig:correspondenceeng1}. Nouns were predominantly rated as solid (M = 0.72, SD = 0.32) and count (M = 0.68, SD = 0.29), but were less reliably judged as being organized by shape (M = 0.31, SD = 0.24). Pearson correlation analysis revealed a moderate positive correlation between solidity and count syntax (r = 0.62, p < 0.001), a weak positive correlation between count syntax and shape-based categorization (r = 0.45, p < 0.001), and a weaker positive correlation between solidity and shape-based categorization (r = 0.38, p < 0.001). The venn diagram in figure \(\ref{fig:venn_eng_1}\) shows categories of words as solid, count noun, and shape_characterized by using a cutoff of 80% i.e. a word that is rated by at least 80% of the participants to be a solid is categorized as solid. The venn diagram shows that a large number of nouns were consistently rated as solid and count nouns, with more overlap between these two dimensions those with the minority of words rated as referring to objects organized by shape.
To test for distributional differences in shape responses across languages, we conducted a permutation test (see Methods). The results showed that the observed density curves for each language fell within the null distribution generated by permuting language labels, indicating no significant differences in lexical statistics across languages \(\ref{fig:permutated_dist_eng_1}\).
## Warning: Removed 3 rows containing missing values or values outside the scale range
## (`geom_point()`).
## Discussion The emergence of the shape bias as a word learning
strategy in English speaking kids in the US, is thought to be a product
of the distribution of early learned count nouns that correspond to
solid objects which happen to have similar shapes. Their attention is
thus tuned to shape as the relevant feature for categorizing and
correctly extending nouns to new members of the referent category. Even
the fewest set of noun-referents that follow this distirbution and its
correspondence is sufficient to trigger the shape bias in toddlers.
Following from this, we expected to find significant cross-linguistic
variation in the distribution of nouns across these three dimensions,
which would then predict differences in the emergence of the shape bias
across languages. However, our results showed that lexical statistics
were remarkably consistent across languages. Across the three
dimensions, distributions of all languages look the same and are not
significantly different from the null distribution using a permutation
test. Correspondence between the ontological status, syntactic frame,
and shape organization was also very weak. However, maybe the question
about the organizing feature was not clear, not intuitive for a naive
participants, and instructions were ambiguous, which is why solidity and
countability correspond to each other but less with shape ratings. We
test the reliability of these measure in study 1B.
The idea of the organizing feature of an object category is not an intuitive one, and is not something that people actively think about when recognizing category members i.e. when they are searching for apples, they don’t break down the category into features of apple shape, red color, and leaf material and then decide which feature is the most diagnosing of apples. It is possible that the instructions were not clear enough for participants to understand what was being asked. To test the reliability of this measure, we collected a second sample of ratings on the same set of words, with improved instructions and examples.
377 English-speaking adults recruited online from the Prolific platform (mean age = 41, sd = 14) were asked to rate words in three dimensions: solidity (solid, non-solid, unclear), count and mass noun syntax (count, mass, unclear), and organizing feature (shape, color, material, none of these).
The same set of words used in Study 1A. However, only the category-organizing feature block was included in this study.
The experiment started with familiarization trials of what a category-organizing or characterizing feature means. However, this time the instructions were improved with more examples, adding picture depictions, training questions with feedback and clarifications involving the four feature answers so not to bias subjects into a specific answer as in Table [2]. The rest of the study was similar to Study 1A.
## [1] 0.5022178
## [1] 0.5828712
We report how we determined our sample size, all data exclusions (if any), all manipulations, and all measures in the study.
We used R (Version 4.4.2; R Core Team, 2024) and the R-packages brms (Version 2.22.0; Bürkner, 2017, 2018, 2021), dplyr (Version 1.1.4; Wickham, François, Henry, Müller, & Vaughan, 2023), forcats (Version 1.0.0; Wickham, 2023a), GGally (Version 2.3.0; Schloerke et al., 2025), ggplot2 (Version 3.5.2; Wickham, 2016), ggrepel (Version 0.9.6; Slowikowski, 2024), ggvenn (Version 0.1.10; Yan, 2023), glue (Version 1.8.0; Hester & Bryan, 2024), gridExtra (Version 2.3; Auguie, 2017), here (Version 1.0.1; Müller, 2020), irr (Version 0.84.1; Gamer, Lemon, & <puspendra.pusp22@gmail.com>, 2019), janitor (Version 2.2.1; Firke, 2024), jsonlite (Version 2.0.0; Ooms, 2014), lme4 (Version 1.1.37; Bates, Mächler, Bolker, & Walker, 2015), lmerTest (Version 3.1.3; Kuznetsova, Brockhoff, & Christensen, 2017), lpSolve (Version 5.6.23; Berkelaar, 2024), lubridate (Version 1.9.4; Grolemund & Wickham, 2011), Matrix (Version 1.7.2; Bates, Maechler, & Jagan, 2025), modelr (Version 0.1.11; Wickham, 2023b), papaja (Version 0.1.3; Aust & Barth, 2024), performance (Version 0.15.0; Lüdecke, Ben-Shachar, Patil, Waggoner, & Makowski, 2021), psych (Version 2.5.6; William Revelle, 2025), purrr (Version 1.1.0; Wickham & Henry, 2025), quanteda (Version 4.3.1; Benoit et al., 2018), Rcpp (Eddelbuettel & Balamuta, 2018; Version 1.1.0; Eddelbuettel & François, 2011), readr (Version 2.1.5; Wickham, Hester, & Bryan, 2024), skimr (Version 2.1.5; Waring et al., 2022), stringr (Version 1.5.1; Wickham, 2023c), tibble (Version 3.3.0; Müller & Wickham, 2025), tidyr (Version 1.3.1; Wickham, Vaughan, & Girlich, 2024), tidyverse (Version 2.0.0; Wickham et al., 2019), tinylabels (Version 0.2.5; Barth, 2025) and tmcn (Version 0.2.13; Li, 2019) for all our analyses.