Introduction

Study 1:

Methods

In Study 1, we aimed to replicate (Samuelson 1999) by collecting judgment ratings on the first 300 words learned from the MCDI US English across the same three dimensions of solidity, count, and mass syntax, and shape-based categorization. The study, however, collected these ratings on the first 300 words learned by children in sixteen languages including English.

Participants

377 English-speaking adults recruited online from the Prolific platform (mean age = 41, sd = 14) were asked to rate words in three dimensions: solidity (solid, non-solid, unclear), count and mass noun syntax (count, mass, unclear), and organizing feature (shape, color, material, none of these).

Material

Three hundred nouns were sourced from MCDI data on the WordBank repository of 16 languages that belonged to seven linguistic families: (LIST OF LANGUAGES). The nouns were chosen based on the earliest age of acquisition in the dataset, up to a maximum of 300 nouns (CITE). Because we only recruited English-speaking participants, Nouns were presented in English and mapped across languages via conceptual mappings (“unilemmas”, Frank et al., 2021). A “unilemma” is the concept behind the noun expressed in the English language (“dog” and “dog in german” both map to the same underlying concept of “dog”, which is used as the unilemma in this case).

The final sample consisted of a total of 23355 nouns from all languages, with N = ~664 after removing duplicates “uni-lemmas shared between languages”. (Figure to show how much overlap between languages). With the Semantic categories breakdown of the words in each language attached in the supplementary materials. Most words belonged to the categories of food/drinks, clothing, and animals, with a few words referring to vehicles, toys, body parts, household items, and outside things.

Procedure

The experiment started with familiarization trials in which participants were shown examples of count and mass nouns, solid objects, and non-solid objects. They were also familiarized with what a category-organizing or characterizing feature means, as in Table [2]. Then the test trials followed the same wording and structure. Every dimension (solidity, count-mass syntax, and category organizing feature) constituted a block, and every participant rated 20 words per block, so there were 60 test trials in total. Familiarization trials, test blocks, and words within a block were all randomized across participants. No information about the semantic category of words was given to participants, nor were the words grouped by their semantic category. Participants were only allowed to choose one feature in all questions.

Data analysis

Results

This is a figure caption.

This is a figure caption.

## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

We collected a total of 20,130 ratings across 377 participants for 664 words in three blocks (solidity, count-mass syntax, and category-organizing feature). After removing responses with no answer, we retained 19,635 ratings. Each word received an average of between 8 and 10 ratings per block. Included in the supplementary materials are the demographic details of the participants (Table 1) and the distribution of ratings across words and blocks (Figure 1).

Across languages, ratings were very consistent across languages, suggesting limited variation in lexical statistics as can be seen in figure @ref{fig:correspondenceeng1}. Nouns were predominantly rated as solid (M = 0.72, SD = 0.32) and count (M = 0.68, SD = 0.29), but were less reliably judged as being organized by shape (M = 0.31, SD = 0.24). Pearson correlation analysis revealed a moderate positive correlation between solidity and count syntax (r = 0.62, p < 0.001), a weak positive correlation between count syntax and shape-based categorization (r = 0.45, p < 0.001), and a weaker positive correlation between solidity and shape-based categorization (r = 0.38, p < 0.001). The venn diagram in figure \(\ref{fig:venn_eng_1}\) shows categories of words as solid, count noun, and shape_characterized by using a cutoff of 80% i.e. a word that is rated by at least 80% of the participants to be a solid is categorized as solid. The venn diagram shows that a large number of nouns were consistently rated as solid and count nouns, with more overlap between these two dimensions those with the minority of words rated as referring to objects organized by shape.

To test for distributional differences in shape responses across languages, we conducted a permutation test (see Methods). The results showed that the observed density curves for each language fell within the null distribution generated by permuting language labels, indicating no significant differences in lexical statistics across languages \(\ref{fig:permutated_dist_eng_1}\).

## Warning: Removed 3 rows containing missing values or values outside the scale range
## (`geom_point()`).

## Discussion The emergence of the shape bias as a word learning strategy in English speaking kids in the US, is thought to be a product of the distribution of early learned count nouns that correspond to solid objects which happen to have similar shapes. Their attention is thus tuned to shape as the relevant feature for categorizing and correctly extending nouns to new members of the referent category. Even the fewest set of noun-referents that follow this distirbution and its correspondence is sufficient to trigger the shape bias in toddlers. Following from this, we expected to find significant cross-linguistic variation in the distribution of nouns across these three dimensions, which would then predict differences in the emergence of the shape bias across languages. However, our results showed that lexical statistics were remarkably consistent across languages. Across the three dimensions, distributions of all languages look the same and are not significantly different from the null distribution using a permutation test. Correspondence between the ontological status, syntactic frame, and shape organization was also very weak. However, maybe the question about the organizing feature was not clear, not intuitive for a naive participants, and instructions were ambiguous, which is why solidity and countability correspond to each other but less with shape ratings. We test the reliability of these measure in study 1B.

Study 1B

The idea of the organizing feature of an object category is not an intuitive one, and is not something that people actively think about when recognizing category members i.e. when they are searching for apples, they don’t break down the category into features of apple shape, red color, and leaf material and then decide which feature is the most diagnosing of apples. It is possible that the instructions were not clear enough for participants to understand what was being asked. To test the reliability of this measure, we collected a second sample of ratings on the same set of words, with improved instructions and examples.

Participants

377 English-speaking adults recruited online from the Prolific platform (mean age = 41, sd = 14) were asked to rate words in three dimensions: solidity (solid, non-solid, unclear), count and mass noun syntax (count, mass, unclear), and organizing feature (shape, color, material, none of these).

Material

The same set of words used in Study 1A. However, only the category-organizing feature block was included in this study.

Procedure

The experiment started with familiarization trials of what a category-organizing or characterizing feature means. However, this time the instructions were improved with more examples, adding picture depictions, training questions with feedback and clarifications involving the four feature answers so not to bias subjects into a specific answer as in Table [2]. The rest of the study was similar to Study 1A.

Data analysis

Results

## [1] 0.5022178
## [1] 0.5828712

Study 2:

Methods

We report how we determined our sample size, all data exclusions (if any), all manipulations, and all measures in the study.

Participants

Material

Procedure

Data analysis

We used R (Version 4.4.2; R Core Team, 2024) and the R-packages brms (Version 2.22.0; Bürkner, 2017, 2018, 2021), dplyr (Version 1.1.4; Wickham, François, Henry, Müller, & Vaughan, 2023), forcats (Version 1.0.0; Wickham, 2023a), GGally (Version 2.3.0; Schloerke et al., 2025), ggplot2 (Version 3.5.2; Wickham, 2016), ggrepel (Version 0.9.6; Slowikowski, 2024), ggvenn (Version 0.1.10; Yan, 2023), glue (Version 1.8.0; Hester & Bryan, 2024), gridExtra (Version 2.3; Auguie, 2017), here (Version 1.0.1; Müller, 2020), irr (Version 0.84.1; Gamer, Lemon, & <puspendra.pusp22@gmail.com>, 2019), janitor (Version 2.2.1; Firke, 2024), jsonlite (Version 2.0.0; Ooms, 2014), lme4 (Version 1.1.37; Bates, Mächler, Bolker, & Walker, 2015), lmerTest (Version 3.1.3; Kuznetsova, Brockhoff, & Christensen, 2017), lpSolve (Version 5.6.23; Berkelaar, 2024), lubridate (Version 1.9.4; Grolemund & Wickham, 2011), Matrix (Version 1.7.2; Bates, Maechler, & Jagan, 2025), modelr (Version 0.1.11; Wickham, 2023b), papaja (Version 0.1.3; Aust & Barth, 2024), performance (Version 0.15.0; Lüdecke, Ben-Shachar, Patil, Waggoner, & Makowski, 2021), psych (Version 2.5.6; William Revelle, 2025), purrr (Version 1.1.0; Wickham & Henry, 2025), quanteda (Version 4.3.1; Benoit et al., 2018), Rcpp (Eddelbuettel & Balamuta, 2018; Version 1.1.0; Eddelbuettel & François, 2011), readr (Version 2.1.5; Wickham, Hester, & Bryan, 2024), skimr (Version 2.1.5; Waring et al., 2022), stringr (Version 1.5.1; Wickham, 2023c), tibble (Version 3.3.0; Müller & Wickham, 2025), tidyr (Version 1.3.1; Wickham, Vaughan, & Girlich, 2024), tidyverse (Version 2.0.0; Wickham et al., 2019), tinylabels (Version 0.2.5; Barth, 2025) and tmcn (Version 0.2.13; Li, 2019) for all our analyses.

Results

Discussion

References

Auguie, B. (2017). gridExtra: Miscellaneous functions for "grid" graphics.
Aust, F., & Barth, M. (2024). papaja: Prepare reproducible APA journal articles with R Markdown. http://doi.org/10.32614/CRAN.package.papaja
Barth, M. (2025). tinylabels: Lightweight variable labels. http://doi.org/10.32614/CRAN.package.tinylabels
Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1), 1–48. http://doi.org/10.18637/jss.v067.i01
Bates, D., Maechler, M., & Jagan, M. (2025). Matrix: Sparse and dense matrix classes and methods. Retrieved from https://CRAN.R-project.org/package=Matrix
Benoit, K., Watanabe, K., Wang, H., Nulty, P., Obeng, A., Müller, S., & Matsuo, A. (2018). Quanteda: An r package for the quantitative analysis of textual data. Journal of Open Source Software, 3(30), 774. http://doi.org/10.21105/joss.00774
Berkelaar, M. (2024). lpSolve: Interface to ’lp_solve’ v. 5.5 to solve linear/integer programs. Retrieved from https://github.com/gaborcsardi/lpSolve
Bürkner, P.-C. (2017). brms: An R package for Bayesian multilevel models using Stan. Journal of Statistical Software, 80(1), 1–28. http://doi.org/10.18637/jss.v080.i01
Bürkner, P.-C. (2018). Advanced Bayesian multilevel modeling with the R package brms. The R Journal, 10(1), 395–411. http://doi.org/10.32614/RJ-2018-017
Bürkner, P.-C. (2021). Bayesian item response modeling in R with brms and Stan. Journal of Statistical Software, 100(5), 1–54. http://doi.org/10.18637/jss.v100.i05
Eddelbuettel, D., & Balamuta, J. J. (2018). Extending R with C++: A Brief Introduction to Rcpp. The American Statistician, 72(1), 28–36. http://doi.org/10.1080/00031305.2017.1375990
Eddelbuettel, D., & François, R. (2011). Rcpp: Seamless R and C++ integration. Journal of Statistical Software, 40(8), 1–18. http://doi.org/10.18637/jss.v040.i08
Firke, S. (2024). Janitor: Simple tools for examining and cleaning dirty data. Retrieved from https://CRAN.R-project.org/package=janitor
Gamer, M., Lemon, J., & <puspendra.pusp22@gmail.com>, I. F. P. S. (2019). Irr: Various coefficients of interrater reliability and agreement. Retrieved from https://www.r-project.org
Grolemund, G., & Wickham, H. (2011). Dates and times made easy with lubridate. Journal of Statistical Software, 40(3), 1–25. Retrieved from https://www.jstatsoft.org/v40/i03/
Hester, J., & Bryan, J. (2024). Glue: Interpreted string literals. Retrieved from https://glue.tidyverse.org/
Kuznetsova, A., Brockhoff, P. B., & Christensen, R. H. B. (2017). lmerTest package: Tests in linear mixed effects models. Journal of Statistical Software, 82(13), 1–26. http://doi.org/10.18637/jss.v082.i13
Li, J. (2019). Tmcn: A text mining toolkit for chinese.
Lüdecke, D., Ben-Shachar, M. S., Patil, I., Waggoner, P., & Makowski, D. (2021). performance: An R package for assessment, comparison and testing of statistical models. Journal of Open Source Software, 6(60), 3139. http://doi.org/10.21105/joss.03139
Müller, K. (2020). Here: A simpler way to find your files. Retrieved from https://here.r-lib.org/
Müller, K., & Wickham, H. (2025). Tibble: Simple data frames. Retrieved from https://tibble.tidyverse.org/
Ooms, J. (2014). The jsonlite package: A practical and consistent mapping between JSON data and r objects. arXiv:1403.2805 [Stat.CO]. Retrieved from https://arxiv.org/abs/1403.2805
R Core Team. (2024). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. Retrieved from https://www.R-project.org/
Schloerke, B., Cook, D., Larmarange, J., Briatte, F., Marbach, M., Thoen, E., … Crowley, J. (2025). GGally: Extension to ’ggplot2’. Retrieved from https://ggobi.github.io/ggally/
Slowikowski, K. (2024). Ggrepel: Automatically position non-overlapping text labels with ’ggplot2’. Retrieved from https://ggrepel.slowkow.com/
Waring, E., Quinn, M., McNamara, A., Arino de la Rubia, E., Zhu, H., & Ellis, S. (2022). Skimr: Compact and flexible summaries of data. Retrieved from https://docs.ropensci.org/skimr/ (website)
Wickham, H. (2016). ggplot2: Elegant graphics for data analysis. Springer-Verlag New York. Retrieved from https://ggplot2.tidyverse.org
Wickham, H. (2023a). Forcats: Tools for working with categorical variables (factors). Retrieved from https://forcats.tidyverse.org/
Wickham, H. (2023b). Modelr: Modelling functions that work with the pipe. Retrieved from https://modelr.tidyverse.org
Wickham, H. (2023c). Stringr: Simple, consistent wrappers for common string operations. Retrieved from https://stringr.tidyverse.org
Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L. D., François, R., … Yutani, H. (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686. http://doi.org/10.21105/joss.01686
Wickham, H., François, R., Henry, L., Müller, K., & Vaughan, D. (2023). Dplyr: A grammar of data manipulation. Retrieved from https://dplyr.tidyverse.org
Wickham, H., & Henry, L. (2025). Purrr: Functional programming tools. Retrieved from https://purrr.tidyverse.org/
Wickham, H., Hester, J., & Bryan, J. (2024). Readr: Read rectangular text data. Retrieved from https://readr.tidyverse.org
Wickham, H., Vaughan, D., & Girlich, M. (2024). Tidyr: Tidy messy data. Retrieved from https://tidyr.tidyverse.org
William Revelle. (2025). Psych: Procedures for psychological, psychometric, and personality research. Evanston, Illinois: Northwestern University. Retrieved from https://CRAN.R-project.org/package=psych
Yan, L. (2023). Ggvenn: Draw venn diagram by ’ggplot2’.