The title

First Author

Ernst-August Doelle

1,2

Abstract

One or two sentences providing a basic introduction to the field, comprehensible to a scientist in any discipline. Two to three sentences of more detailed background, comprehensible to scientists in related disciplines. One sentence clearly stating the general problem being addressed by this particular study. One sentence summarizing the main result (with the words “here we show” or their equivalent). Two or three sentences explaining what the main result reveals in direct comparison to what was thought to be the case previously, or how the main result adds to previous knowledge. One or two sentences to put the results into a more general context. Two or three sentences to provide a broader perspective, readily comprehensible to a scientist in any discipline.

Introduction

Study 1:

Methods

In Study 1, we aimed to replicate (Samuelson 1999) by collecting judgment ratings on the first 300 words learned from the MCDI US English across the same three dimensions of solidity, count, and mass syntax, and shape-based categorization. The study, however, collected these ratings on the first 300 words learned by children in sixteen languages including English.

Participants

377 English-speaking adults recruited online from the Prolific platform (mean age = 41, sd = 14) were asked to rate words in three dimensions: solidity (solid, non-solid, unclear), count and mass noun syntax (count, mass, unclear), and organizing feature (shape, color, material, none of these).

Material

Three hundred nouns were sourced from MCDI data on the WordBank repository of 16 languages that belonged to seven linguistic families: (LIST OF LANGUAGES). The nouns were chosen based on the earliest age of acquisition in the dataset, up to a maximum of 300 nouns (CITE). Because we only recruited English-speaking participants, Nouns were presented in English and mapped across languages via conceptual mappings (“unilemmas”, Frank et al., 2021). A “unilemma” is the concept behind the noun expressed in the English language (“dog” and “dog in german” both map to the same underlying concept of “dog”, which is used as the unilemma in this case).

The final sample consisted of a total of 23355 nouns from all languages, with N = ~664 after removing duplicates “uni-lemmas shared between languages”. (Figure to show how much overlap between languages). With the Semantic categories breakdown of the words in each language attached in the supplementary materials. Most words belonged to the categories of food/drinks, clothing, and animals, with a few words referring to vehicles, toys, body parts, household items, and outside things.

Procedure

The experiment started with familiarization trials in which participants were shown examples of count and mass nouns, solid objects, and non-solid objects. They were also familiarized with what a category-organizing or characterizing feature means, as in Table [2]. Then the test trials followed the same wording and structure. Every dimension (solidity, count-mass syntax, and category organizing feature) constituted a block, and every participant rated 20 words per block, so there were 60 test trials in total. Familiarization trials, test blocks, and words within a block were all randomized across participants. No information about the semantic category of words was given to participants, nor were the words grouped by their semantic category. Participants were only allowed to choose one feature in all questions.

Data analysis

Results

This is a figure caption.

## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
## ℹ Please use `linewidth` instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

We collected a total of 20,130 ratings across 377 participants for 664 words in three blocks (solidity, count-mass syntax, and category-organizing feature). After removing responses with no answer, we retained 19,635 ratings. Each word received an average of between 8 and 10 ratings per block. Included in the supplementary materials are the demographic details of the participants (Table 1) and the distribution of ratings across words and blocks (Figure 1).

Across languages, ratings were very consistent across languages, suggesting limited variation in lexical statistics as can be seen in figure @ref{fig:correspondenceeng1}. Nouns were predominantly rated as solid (M = 0.72, SD = 0.32) and count (M = 0.68, SD = 0.29), but were less reliably judged as being organized by shape (M = 0.31, SD = 0.24). Pearson correlation analysis revealed a moderate positive correlation between solidity and count syntax (r = 0.62, p < 0.001), a weak positive correlation between count syntax and shape-based categorization (r = 0.45, p < 0.001), and a weaker positive correlation between solidity and shape-based categorization (r = 0.38, p < 0.001). The venn diagram in figure \(\ref{fig:venn_eng_1}\) shows categories of words as solid, count noun, and shape_characterized by using a cutoff of 80% i.e. a word that is rated by at least 80% of the participants to be a solid is categorized as solid. The venn diagram shows that a large number of nouns were consistently rated as solid and count nouns, with more overlap between these two dimensions those with the minority of words rated as referring to objects organized by shape.

To test for distributional differences in shape responses across languages, we conducted a permutation test (see Methods). The results showed that the observed density curves for each language fell within the null distribution generated by permuting language labels, indicating no significant differences in lexical statistics across languages \(\ref{fig:permutated_dist_eng_1}\).

## Warning: Removed 3 rows containing missing values or values outside the scale range
## (`geom_point()`).

## Discussion The emergence of the shape bias as a word learning strategy in English speaking kids in the US, is thought to be a product of the distribution of early learned count nouns that correspond to solid objects which happen to have similar shapes. Their attention is thus tuned to shape as the relevant feature for categorizing and correctly extending nouns to new members of the referent category. Even the fewest set of noun-referents that follow this distirbution and its correspondence is sufficient to trigger the shape bias in toddlers. Following from this, we expected to find significant cross-linguistic variation in the distribution of nouns across these three dimensions, which would then predict differences in the emergence of the shape bias across languages. However, our results showed that lexical statistics were remarkably consistent across languages. Across the three dimensions, distributions of all languages look the same and are not significantly different from the null distribution using a permutation test. Correspondence between the ontological status, syntactic frame, and shape organization was also very weak. However, maybe the question about the organizing feature was not clear, not intuitive for a naive participants, and instructions were ambiguous, which is why solidity and countability correspond to each other but less with shape ratings. We test the reliability of these measure in study 1B.

Study 1B

The idea of the organizing feature of an object category is not an intuitive one, and is not something that people actively think about when recognizing category members i.e. when they are searching for apples, they don’t break down the category into features of apple shape, red color, and leaf material and then decide which feature is the most diagnosing of apples. It is possible that the instructions were not clear enough for participants to understand what was being asked. To test the reliability of this measure, we collected a second sample of ratings on the same set of words, with improved instructions and examples.

Participants

Material

The same set of words used in Study 1A. However, only the category-organizing feature block was included in this study.

Procedure

The experiment started with familiarization trials of what a category-organizing or characterizing feature means. However, this time the instructions were improved with more examples, adding picture depictions, training questions with feedback and clarifications involving the four feature answers so not to bias subjects into a specific answer as in Table [2]. The rest of the study was similar to Study 1A.

Data analysis

Results

## [1] 0.5022178

## [1] 0.5828712

Study 2:

Methods

We report how we determined our sample size, all data exclusions (if any), all manipulations, and all measures in the study.

Participants

Material

Procedure

Data analysis

We used R (Version 4.4.2; R Core Team, 2024) and the R-packages brms (Version 2.22.0; Bürkner, 2017, 2018, 2021), dplyr (Version 1.1.4; Wickham, François, Henry, Müller, & Vaughan, 2023), forcats (Version 1.0.0; Wickham, 2023a), GGally (Version 2.3.0; Schloerke et al., 2025), ggplot2 (Version 3.5.2; Wickham, 2016), ggrepel (Version 0.9.6; Slowikowski, 2024), ggvenn (Version 0.1.10; Yan, 2023), glue (Version 1.8.0; Hester & Bryan, 2024), gridExtra (Version 2.3; Auguie, 2017), here (Version 1.0.1; Müller, 2020), irr (Version 0.84.1; Gamer, Lemon, & <puspendra.pusp22@gmail.com>, 2019), janitor (Version 2.2.1; Firke, 2024), jsonlite (Version 2.0.0; Ooms, 2014), lme4 (Version 1.1.37; Bates, Mächler, Bolker, & Walker, 2015), lmerTest (Version 3.1.3; Kuznetsova, Brockhoff, & Christensen, 2017), lpSolve (Version 5.6.23; Berkelaar, 2024), lubridate (Version 1.9.4; Grolemund & Wickham, 2011), Matrix (Version 1.7.2; Bates, Maechler, & Jagan, 2025), modelr (Version 0.1.11; Wickham, 2023b), papaja (Version 0.1.3; Aust & Barth, 2024), performance (Version 0.15.0; Lüdecke, Ben-Shachar, Patil, Waggoner, & Makowski, 2021), psych (Version 2.5.6; William Revelle, 2025), purrr (Version 1.1.0; Wickham & Henry, 2025), quanteda (Version 4.3.1; Benoit et al., 2018), Rcpp (Eddelbuettel & Balamuta, 2018; Version 1.1.0; Eddelbuettel & François, 2011), readr (Version 2.1.5; Wickham, Hester, & Bryan, 2024), skimr (Version 2.1.5; Waring et al., 2022), stringr (Version 1.5.1; Wickham, 2023c), tibble (Version 3.3.0; Müller & Wickham, 2025), tidyr (Version 1.3.1; Wickham, Vaughan, & Girlich, 2024), tidyverse (Version 2.0.0; Wickham et al., 2019), tinylabels (Version 0.2.5; Barth, 2025) and tmcn (Version 0.2.13; Li, 2019) for all our analyses.

Results

Discussion

References

Auguie, B. (2017). gridExtra: Miscellaneous functions for "grid" graphics.

Aust, F., & Barth, M. (2024). papaja: Prepare reproducible APA journal articles with R Markdown. http://doi.org/10.32614/CRAN.package.papaja

Barth, M. (2025). tinylabels: Lightweight variable labels. http://doi.org/10.32614/CRAN.package.tinylabels

Bates, D., Mächler, M., Bolker, B., & Walker, S. (2015). Fitting linear mixed-effects models using lme4. Journal of Statistical Software, 67(1), 1–48. http://doi.org/10.18637/jss.v067.i01

Bates, D., Maechler, M., & Jagan, M. (2025). Matrix: Sparse and dense matrix classes and methods. Retrieved from https://CRAN.R-project.org/package=Matrix

Benoit, K., Watanabe, K., Wang, H., Nulty, P., Obeng, A., Müller, S., & Matsuo, A. (2018). Quanteda: An r package for the quantitative analysis of textual data. Journal of Open Source Software, 3(30), 774. http://doi.org/10.21105/joss.00774

Berkelaar, M. (2024). lpSolve: Interface to ’lp_solve’ v. 5.5 to solve linear/integer programs. Retrieved from https://github.com/gaborcsardi/lpSolve

Bürkner, P.-C. (2017). brms: An R package for Bayesian multilevel models using Stan. Journal of Statistical Software, 80(1), 1–28. http://doi.org/10.18637/jss.v080.i01

Bürkner, P.-C. (2018). Advanced Bayesian multilevel modeling with the R package brms. The R Journal, 10(1), 395–411. http://doi.org/10.32614/RJ-2018-017

Bürkner, P.-C. (2021). Bayesian item response modeling in R with brms and Stan. Journal of Statistical Software, 100(5), 1–54. http://doi.org/10.18637/jss.v100.i05

Eddelbuettel, D., & Balamuta, J. J. (2018). Extending R with C++: A Brief Introduction to Rcpp. The American Statistician, 72(1), 28–36. http://doi.org/10.1080/00031305.2017.1375990

Eddelbuettel, D., & François, R. (2011). Rcpp: Seamless R and C++ integration. Journal of Statistical Software, 40(8), 1–18. http://doi.org/10.18637/jss.v040.i08

Firke, S. (2024). Janitor: Simple tools for examining and cleaning dirty data. Retrieved from https://CRAN.R-project.org/package=janitor

Gamer, M., Lemon, J., & <puspendra.pusp22@gmail.com>, I. F. P. S. (2019). Irr: Various coefficients of interrater reliability and agreement. Retrieved from https://www.r-project.org

Grolemund, G., & Wickham, H. (2011). Dates and times made easy with lubridate. Journal of Statistical Software, 40(3), 1–25. Retrieved from https://www.jstatsoft.org/v40/i03/

Hester, J., & Bryan, J. (2024). Glue: Interpreted string literals. Retrieved from https://glue.tidyverse.org/

Kuznetsova, A., Brockhoff, P. B., & Christensen, R. H. B. (2017). lmerTest package: Tests in linear mixed effects models. Journal of Statistical Software, 82(13), 1–26. http://doi.org/10.18637/jss.v082.i13

Li, J. (2019). Tmcn: A text mining toolkit for chinese.

Lüdecke, D., Ben-Shachar, M. S., Patil, I., Waggoner, P., & Makowski, D. (2021). performance: An R package for assessment, comparison and testing of statistical models. Journal of Open Source Software, 6(60), 3139. http://doi.org/10.21105/joss.03139

Müller, K. (2020). Here: A simpler way to find your files. Retrieved from https://here.r-lib.org/

Müller, K., & Wickham, H. (2025). Tibble: Simple data frames. Retrieved from https://tibble.tidyverse.org/

Ooms, J. (2014). The jsonlite package: A practical and consistent mapping between JSON data and r objects. arXiv:1403.2805 [Stat.CO]. Retrieved from https://arxiv.org/abs/1403.2805

R Core Team. (2024). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. Retrieved from https://www.R-project.org/

Schloerke, B., Cook, D., Larmarange, J., Briatte, F., Marbach, M., Thoen, E., … Crowley, J. (2025). GGally: Extension to ’ggplot2’. Retrieved from https://ggobi.github.io/ggally/

Slowikowski, K. (2024). Ggrepel: Automatically position non-overlapping text labels with ’ggplot2’. Retrieved from https://ggrepel.slowkow.com/

Waring, E., Quinn, M., McNamara, A., Arino de la Rubia, E., Zhu, H., & Ellis, S. (2022). Skimr: Compact and flexible summaries of data. Retrieved from https://docs.ropensci.org/skimr/ (website)

Wickham, H. (2016). ggplot2: Elegant graphics for data analysis. Springer-Verlag New York. Retrieved from https://ggplot2.tidyverse.org

Wickham, H. (2023a). Forcats: Tools for working with categorical variables (factors). Retrieved from https://forcats.tidyverse.org/

Wickham, H. (2023b). Modelr: Modelling functions that work with the pipe. Retrieved from https://modelr.tidyverse.org

Wickham, H. (2023c). Stringr: Simple, consistent wrappers for common string operations. Retrieved from https://stringr.tidyverse.org

Wickham, H., Averick, M., Bryan, J., Chang, W., McGowan, L. D., François, R., … Yutani, H. (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686. http://doi.org/10.21105/joss.01686

Wickham, H., François, R., Henry, L., Müller, K., & Vaughan, D. (2023). Dplyr: A grammar of data manipulation. Retrieved from https://dplyr.tidyverse.org

Wickham, H., & Henry, L. (2025). Purrr: Functional programming tools. Retrieved from https://purrr.tidyverse.org/

Wickham, H., Hester, J., & Bryan, J. (2024). Readr: Read rectangular text data. Retrieved from https://readr.tidyverse.org

Wickham, H., Vaughan, D., & Girlich, M. (2024). Tidyr: Tidy messy data. Retrieved from https://tidyr.tidyverse.org

William Revelle. (2025). Psych: Procedures for psychological, psychometric, and personality research. Evanston, Illinois: Northwestern University. Retrieved from https://CRAN.R-project.org/package=psych

Yan, L. (2023). Ggvenn: Draw venn diagram by ’ggplot2’.