This study investigated whether 4- to 8-year-olds are sensitive to sampling information in making inferences about a social group, i.e., whether they can adjust their inferences after seeing a skewed sample of group members.
We preregistered that children would be sensitive, i.e., be more likely to report that a novel group population is taller than the sample observed, after seeing a sample selected for being short.
In a deviation from our preregistration, we included children who failed one of the warmup questions (“the same” warmup), since it proved more difficult than expected. The results are qualitatively the same if we exclude these participants.
Pending video validation, we confirmed our prediction: participants were more likely to choose the population as taller in the skewed condition than the not skewed condition.
The study was preregistered on OSF.
See 1a_children_power_analysis.html.
Data was collected from 156 children recruited via PANDA in November 6-7, 2025. Participants were required to be in the United States.
Participants were paid $10 for an estimated 10-15 minute task.
The final sample included 134 children (n = 63-71 in each of the 2 conditions).
| boarding | participants |
|---|---|
| not skewed | 71 |
| skewed | 63 |
A total of 22 participants (14.1% of all participants) were excluded for meeting at least 1 of the following exclusion criteria:
failing the sound check (n = 1 participants)
failing the “taller” of the two warmup questions (n = 2 participants)
In a deviation from our preregistration, where we intended to exclude either failure, participants were not excluded for failing “the same” warmup question, as this question was unexpectedly challenging (n = 56 participants failed).
failing the memory check (n = 18 participants)
failing the comprehension check (n = 4 participants)
not passing video validation, e.g., no parent or child in frame for entire duration of video, parental interference (n = TBD participants)
We asked participants questions designed to elicit a “taller” and a “the same” response to get participants comfortable with answering either option.
The order of these two questions was counterbalanced. The order of options within each question was fixed, and not counterbalanced.
Participants mostly answered the taller warmup correctly.
Participants who made incorrect responses were excluded.
Unexpectedly, performance overall was not very good on “the same” control question - a small minority of children thought the duck was taller, perhaps due to the chicken’s comb - so participants were included if they failed this question, in a deviation from the preregistration.
Participants mostly passed the memory check for the Quaffa boarding sequence, i.e., “no”, not all the Quaffas made it onto the boat.
Participants who made incorrect responses were excluded.
Participants overwhelmingly passed the comprehension check for the Zarpie boarding sequence. Note the correct answer to this question depends on condition:
In the skewed condition, the correct answer is “no”, not all of the Zarpies made it onto the boat.
In the not skewed condition, the correct answer is “yes”, all of the Zarpies made it onto the boat.
Note that there are some non-responses (NAs), because I forgot to require a response on this question in Qualtrics. These participants were included below, but excluding them does not change anything.
| age | ||
| mean | sd | n |
|---|---|---|
| 6.56 | 1.38 | 134 |
| gender | n | prop |
|---|---|---|
| female | 69 | 51.5% |
| male | 65 | 48.5% |
| race | n | prop |
|---|---|---|
| Caucasian | 76 | 56.7% |
| Asian, Caucasian | 18 | 13.4% |
| Asian | 17 | 12.7% |
| Caucasian, Hispanic | 7 | 5.2% |
| African American | 5 | 3.7% |
| Hispanic | 5 | 3.7% |
| African American, Caucasian | 3 | 2.2% |
| Asian, Hispanic | 2 | 1.5% |
| South American | 1 | 0.7% |
| gender | race | n | prop |
|---|---|---|---|
| female | African American | 4 | 3.0% |
| female | African American, Caucasian | 2 | 1.5% |
| female | Asian | 5 | 3.7% |
| female | Asian, Caucasian | 7 | 5.2% |
| female | Asian, Hispanic | 2 | 1.5% |
| female | Caucasian | 42 | 31.3% |
| female | Caucasian, Hispanic | 6 | 4.5% |
| female | Hispanic | 1 | 0.7% |
| male | African American | 1 | 0.7% |
| male | African American, Caucasian | 1 | 0.7% |
| male | Asian | 12 | 9.0% |
| male | Asian, Caucasian | 11 | 8.2% |
| male | Caucasian | 34 | 25.4% |
| male | Caucasian, Hispanic | 1 | 0.7% |
| male | Hispanic | 4 | 3.0% |
| male | South American | 1 | 0.7% |
| education | n | prop |
|---|---|---|
| High school/GED | 4 | 3.0% |
| Some college | 16 | 11.9% |
| Bachelor's (B.A., B.S.) | 44 | 32.8% |
| Master's (M.A., M.S.) | 49 | 36.6% |
| Doctoral (Ph.D., J.D., M.D.) | 20 | 14.9% |
| NA | 1 | 0.7% |
This study was administered as a Qualtrics survey, and approved by the NYU IRB (IRB-FY2024-9169).
After providing their consent, participants completed a captcha and sound check, and were asked to watch videos sound on. Participants then watched the following videos in order:
In the warmup phase, participants were familiarized with answering questions about height in terms of who is taller or whether they are the same height. Participants saw a duck and a chicken appear on screen against a grid, who were the same in height, and were asked who is taller: the duck, the chicken, or are they the same. A same question was asked about a giraffe and a bunny, where the giraffe is in fact taller. The order of these two questions was counterbalanced.
In the prior setting and familiarization phase, participants saw a photorealistic picture of 5 human adults and then another picture of a different 5 adults appear on screen against a grid. These adults were all 10 gridline units tall.
In the boat training phase, participants were shown a parade of fictional animals attempting to board the boat, to illustrate how the boat works. In the skewed condition, the boat was 6 units tall. In the not skewed condiiton, the boat was 10 units tall.
The boat height was specified to be accidental (“When the boat builders were building the boat, they started building the boat from the bottom, but ran out of the special wood they needed for the boat! So the boat ended up being this tall. It might be hard for anyone who is taller than the boat to get on the boat.”), to avoid any justificatory reasoning about the height of the boat being informative about the height of Zarpies or vice versa.
To communicate how the boat functions to exclude those shorter than the boat, participants then watched a parade of 20 fictional animals (Quaffas, taken from Foster-Hanson et al., 2019) attempt to board the boat, one at a time, from shortest to tallest.
The height of animals were scaled to the height of the boat, such that 10 animals were always shorter than the boat (these animals boarded successfully) and 10 animals were always taller than the boat (all but one were unable to board; the third quaffa successfully boards by bending its head).
In the boat boarding phase, participants learned that Zarpies live on Zarpie island, and saw an island with many Zarpies overhead. Participants learned that all the grownup Zarpies’ names were put into a hat, and some of their names “were drawn out of a hat to try and visit us”. Participants saw then saw a parade of Zarpies attempt to board the boat to visit us, one at a time. Participants were told that they were all grown-up Zarpies. The boarding phase was occluded: i.e., the heights of Zarpies were hidden behind a curtain that showed only their feet.
In the skewed condition, the boat is 6 units tall. 20 Zarpies attempt to board, 6 of whom successfully make it on (6 out of 16 successful = 30% successful). Of the 6 who make it on, 2 had to stoop to board.
In the not skewed condition, the boat is 10 units tall. 6 Zarpies attempt to board, all of whom successfully make it on (6 out of 6 successful = 100% successful). Of the 6 who board, none had to stoop to board.
After the boat boarding phase, participants were asked a comprehension check: “Did all of the Zarpies board the boat?” (yes/no), and received either an affirmation (if they said “no” in the skewed condition, or “yes” in the not skewed condition) or correction (if they said “yes” in the skewed condition, or “no” in the not skewed condition).
In the sample observation phase, all participants saw the Zarpies who successfully boarded the boat get off the boat to visit us. The Zarpies got off one at a time, and each waved/descrunched if relevant. The height of this observed sample (4, 5, 6, 6, 7, 8) was held constant across conditions.
Participants were asked a single DV:
Finally, participants’ parent or guardian were asked for any problems or confusion they had and demographic information.
Participants were explicitly asked to compare the population and the sample: “Who is taller? The Zarpies on Zarpie island, the Zarpies who visited, or are they the same?” The order of the first two options was counterbalanced across participants.
We pre-registered that if children do adjust, they should be more likely to say “Zarpies on Zarpie island” in the skewed condition than the not skewed condition.
##
## Fisher's Exact Test for Count Data
##
## data: dv_comp_table
## p-value = 0.004556
## alternative hypothesis: two.sided
As predicted, overall, participants provided different responses to the question “Who’s taller?” in the skewed condition versus not skewed condition (Fisher’s test, p = 0.005).
##
## Call:
## lm(formula = dv_comp_pop ~ boarding, data = data %>% mutate(dv_comp_pop = case_when(dv_comp ==
## "Zarpies on Zarpie island" ~ 1, TRUE ~ 0)))
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.4921 -0.2253 -0.2253 0.5079 0.7746
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.22535 0.05480 4.113 0.0000684 ***
## boardingskewed 0.26671 0.07992 3.337 0.0011 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4617 on 132 degrees of freedom
## Multiple R-squared: 0.07782, Adjusted R-squared: 0.07083
## F-statistic: 11.14 on 1 and 132 DF, p-value: 0.001099
As predicted, specifically, participants in the skewed condition were more likely to say that Zarpies on Zarpie island (the population) is taller in the skewed condition, compared to the not skewed condition (t(132) = 3.3374168, p = 0.001).
##
## Fisher's Exact Test for Count Data
##
## data: dv_comp_table
## p-value = 0.007207
## alternative hypothesis: two.sided
As predicted, overall, participants provided different responses to the question “Who’s taller?” in the skewed condition versus not skewed condition (Fisher’s test, p = 0.007).
##
## Call:
## lm(formula = dv_comp_pop ~ boarding, data = data_exclude_on_both_warmups %>%
## mutate(dv_comp_pop = case_when(dv_comp == "Zarpies on Zarpie island" ~
## 1, TRUE ~ 0)))
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.5476 -0.2273 -0.2273 0.4524 0.7727
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.22727 0.07004 3.245 0.00169 **
## boardingskewed 0.32035 0.10023 3.196 0.00196 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.4646 on 84 degrees of freedom
## Multiple R-squared: 0.1084, Adjusted R-squared: 0.09782
## F-statistic: 10.22 on 1 and 84 DF, p-value: 0.001963
As predicted, specifically, participants in the skewed condition were more likely to say that Zarpies on Zarpie island (the population) is taller in the skewed condition, compared to the not skewed condition (t(84) = 3.1962289, p = 0.002).
We suspected but did not preregister an interaction between age and condition (see 1a power analysis).
## # weights: 15 (8 variable)
## initial value 147.214047
## iter 10 value 131.258930
## final value 131.151555
## converged
## Analysis of Deviance Table (Type II tests)
##
## Response: dv_comp
## LR Chisq Df Pr(>Chisq)
## boarding 9.5389 2 0.008485 **
## age_exact 2.6965 2 0.259697
## boarding:age_exact 2.1024 2 0.349512
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
There was not a statistically significant interaction between condition and age (exact) in a multinomial model of responses with age, condition, and their interaction as predictors.
Power analysis bootstrapped off the data from this study.
| Power on target effects by total sample size | |
| multinomial logistic regression predicting response from age, condition, and their interaction | |
| total sample size | power: age x condition interaction |
|---|---|
| 200 | 0.453 |
| 300 | 0.590 |
| 400 | 0.784 |
| 500 | 0.866 |
| 600 | 0.933 |
## R version 4.4.2 (2024-10-31)
## Platform: aarch64-apple-darwin20
## Running under: macOS Sequoia 15.7.2
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRblas.0.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.4-arm64/Resources/lib/libRlapack.dylib; LAPACK version 3.12.0
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## time zone: America/New_York
## tzcode source: internal
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] effectsize_1.0.0 emmeans_1.10.4 nnet_7.3-19 lmerTest_3.1-3
## [5] lme4_1.1-35.5 Matrix_1.7-1 car_3.1-3 carData_3.0-5
## [9] ggtext_0.1.2 lubridate_1.9.3 forcats_1.0.0 stringr_1.5.1
## [13] dplyr_1.1.4 purrr_1.0.2 readr_2.1.5 tidyr_1.3.1
## [17] tibble_3.2.1 ggplot2_3.5.1 tidyverse_2.0.0 gt_0.11.1
## [21] scales_1.3.0 janitor_2.2.0 here_1.0.1
##
## loaded via a namespace (and not attached):
## [1] tidyselect_1.2.1 farver_2.1.2 fastmap_1.2.0
## [4] TH.data_1.1-2 bayestestR_0.17.0 digest_0.6.37
## [7] timechange_0.3.0 estimability_1.5.1 lifecycle_1.0.4
## [10] survival_3.7-0 magrittr_2.0.3 compiler_4.4.2
## [13] rlang_1.1.4 sass_0.4.9 tools_4.4.2
## [16] yaml_2.3.10 knitr_1.49 labeling_0.4.3
## [19] bit_4.5.0.1 xml2_1.3.6 abind_1.4-8
## [22] multcomp_1.4-26 withr_3.0.2 numDeriv_2016.8-1.1
## [25] grid_4.4.2 datawizard_1.3.0 colorspace_2.1-1
## [28] MASS_7.3-61 insight_1.4.2 cli_3.6.3
## [31] mvtnorm_1.3-1 crayon_1.5.3 rmarkdown_2.29
## [34] ragg_1.3.2 generics_0.1.3 rstudioapi_0.17.1
## [37] tzdb_0.4.0 parameters_0.28.2 minqa_1.2.8
## [40] cachem_1.1.0 splines_4.4.2 ggthemes_5.1.0
## [43] parallel_4.4.2 vctrs_0.6.5 boot_1.3-31
## [46] sandwich_3.1-1 jsonlite_1.8.9 hms_1.1.3
## [49] bit64_4.5.2 Formula_1.2-5 systemfonts_1.1.0
## [52] jquerylib_0.1.4 glue_1.8.0 nloptr_2.1.1
## [55] codetools_0.2-20 stringi_1.8.4 gtable_0.3.5
## [58] munsell_0.5.1 pillar_1.10.0 htmltools_0.5.8.1
## [61] R6_2.5.1 textshaping_0.4.0 rprojroot_2.0.4
## [64] vroom_1.6.5 evaluate_1.0.1 lattice_0.22-6
## [67] gridtext_0.1.5 snakecase_0.11.1 bslib_0.8.0
## [70] Rcpp_1.0.13 coda_0.19-4.1 nlme_3.1-166
## [73] xfun_0.49 zoo_1.8-12 pkgconfig_2.0.3