Summary

Study 11 explored how hearing generic statements about one type of category features, vs specific statements about other types of category features, affect adults’ generalization of other features of the same vs different types.

In this study, adults (n = 300, 100/condition) heard 5 statements each about 3 different types of features of Zarpies (i.e., physical, diet, and personality features). For one feature type, the target feature type, the statements were generic statements. For the other two feature types, the statements were specific statements.

In this study, audio of each statement was generated by a Hume.AI model prompted to voice the statements as a children’s storybook narrator.

Adults then completed an inductive potential task: in each trial, they observed a Zarpie with a novel feature, and rated its prevalence among Zarpies (0-100%). Trials involved 5 novel features from each cluster (i.e., 5 novel physical, diet, and personality features), comprising 15 novel features in total.

Methods

Participants

condition	n
physical generics	94
diet generics	99
personality generics	102

Data was collected from 300 adults (n = 94-102/condition) via Prolific on Friday, January 9, 2026 through Monday, January 11, 2026. Participants required to be in the United States, fluent in English, and having not participated in prior studies under this protocol. Participants were paid $2.00 for an estimated 8 minute task. Participants were requested to participate via desktop.

Exclusion criteria

We recruited 300 participants, of whom 5 participants (1.7% of all participants) were excluded for meeting at least 1 of the following exclusion criteria:

failing the sound check (i.e., did not select “bird” as the sound heard during the sound check video) (n = 3 participants)
failing the attention check (i.e., did not select 100% on slider when asked to during induction task) (n = 0 participants)
admitting to use of AI after being explicitly informed use was prohibited (n = 0 participants)
failing the task check (n = 2 participants)

Demographics

We used the Prolific representative sample feature to recruit a sample representative of the US based on Census data on sex, age, and ethnicity (Simplified US Census).

mean	sd	n
age
46.04	16.22	295

gender	n	prop
Male	147	49.8%
Female	144	48.8%
Non-binary	3	1.0%
Prefer not to specify	1	0.3%

race	n	prop
White, Caucasian, or European American	176	59.7%
Black or African American	37	12.5%
Hispanic or Latino/a	20	6.8%
East Asian	13	4.4%
White, Caucasian, or European American,Hispanic or Latino/a	13	4.4%
South or Southeast Asian	7	2.4%
White, Caucasian, or European American,South or Southeast Asian	6	2.0%
White, Caucasian, or European American,Black or African American	5	1.7%
Middle Eastern or North African	2	0.7%
Prefer not to specify	2	0.7%
White, Caucasian, or European American,Black or African American,Native American, American Indian, or Alaska Native	2	0.7%
White, Caucasian, or European American,East Asian	2	0.7%
White, Caucasian, or European American,Middle Eastern or North African	2	0.7%
White, Caucasian, or European American,Native American, American Indian, or Alaska Native	2	0.7%
American	1	0.3%
Biracial	1	0.3%
Hispanic or Latino/a,Black or African American	1	0.3%
Hispanic or Latino/a,Native Hawaiian or other Pacific Islander	1	0.3%
White, Caucasian, or European American,Hispanic or Latino/a,Black or African American	1	0.3%
Zarpie	1	0.3%

education	n	prop
Less than high school	2	0.7%
High school/GED	35	11.9%
Some college	94	31.9%
Bachelor's (B.A., B.S.)	113	38.3%
Master's (M.A., M.S.)	36	12.2%
Doctoral (Ph.D., J.D., M.D.)	14	4.7%
Prefer not to specify	1	0.3%

The sample was about evenly split on college completion.

Procedure

This study was administered as a Qualtrics survey, and approved by the NYU IRB (IRB-FY2023-6812).

After providing their consent, participants completed a captcha, pledge not to use AI, and sound check. Participants then completed:

Training phase: Participants were randomly assigned to one of 3 conditions:

Physical generic condition - 5 generic statements about Zarpies’ physical features, 5 specific statements about their diet features, 5 specific statements about their personality features, in random order.
Diet generic condition - 5 generic statements about Zarpies’ diet features, 5 specific statements about their physical features, 5 specific statements about their personality features, in random order.
Personality generic condition - 5 generic statements about Zarpies’ personality features, 5 specific statements about their physical features, 5 specific statements about their diet features, in random order.

Test phase (induction task): Participants completed an induction task where they imagined seeing a Zarpie with a novel feature, and estimated the prevalence of that feature among Zarpies using a slider from 0 to 100 (initialized at 0). All participants completed the same 15 trials, with order of trials randomized:

5 physical features
5 diet features
5 personality features

Test phase (group characterization): Participants were then asked to respond to a freeform question asking: “What do you think characterizes Zarpies as a group?”

Participants then completed a few task completion questions, demographics, and were debriefed.

Data processing

Prevalence judgments were converted to a scale from 0 to 1, with 0 and 1 values trimmed to 0.01 and 0.99 to support a beta regression, since a uniform beta distribution does not include its endpoints of 0 and 1.

Primary results

Induction task

Analyses of the induction task were logistic regressions unless otherwise specified, predicting prevalence (.01-.99) with participant and test feature as random intercepts. Test feature (“can snap with their toes”, etc.) is technically nested within test feature type (physical, diet, personality), but since each test feature is unique to each test feature type, a model with the nesting term is analytically equivalent to the previous model, so the nesting term was omitted for simplicity of specification.

By condition x test feature type

# condition x test feature type
glmm_condition_testfeaturetype <-
  glmmTMB(prevalence ~ condition * test_feature_type + (1|participant),
          # prevalence ~ condition * test_feature_type + (1|participant) + (1|test_feature_type:test_feature),
          data = data_tidy, 
          family = beta_family(link = "logit"))

glmm_condition_testfeaturetype %>% 
  Anova()

There is no significant interaction between condition and test feature type on prevalence ($\chi^2$(4) = 4.38, p = 0.358), based on an ANOVA conducted on a logistic regression with condition, test feature type, and their interaction as fixed effects, with random intercepts per participant.

The only significant effect was a main effect of test feature type ($\chi^2$(2) = 230.59, p < .001).

# condition x test feature type
brm_condition_testfeaturetype <-
  brm(prevalence ~ condition * test_feature_type + (1|participant),
      data = data_tidy,
      family = beta_family(link = "logit"),
      save_pars = save_pars(all = TRUE))

# vs null
brm_condition_testfeaturetype_null <-
  brm(prevalence ~ condition + test_feature_type + (1|participant),
      data = data_tidy,
      family = beta_family(link = "logit"),
      save_pars = save_pars(all = TRUE))

bf_condition_testfeaturetype <-
  bayes_factor(brm_condition_testfeaturetype, brm_condition_testfeaturetype_null)

A Bayesian analysis revealed moderate evidence against an interaction between condition and test feature type on prevalence (BF = 0.19), comparing Bayesian logistic models with and without the interaction, with default priors.

By condition x test feature

We can look at how prevalence judgments vary by condition and individual test features.

By test feature type match

Another way to look at the data is to code responses by whether the test feature type matched the feature type that received generic statements in that condition. If they match (e.g., a diet generic condition responding to a diet test question), we can code that as a match, or if they mismatch (e.g., diet generic condition responding to a personality test question), we can code that as a mismatch.

In this study, match = heard generics about that feature type, mismatch = heard specifics about that feature type.

If the chosen clusters capture some systematicity in how people generalize, matches should result in higher prevalence estimates than mismatches. Indeed, that’s what we find.

# condition
glmm_condition_test_match <-
  glmmTMB(prevalence ~ condition_test_match + (1|participant) + (1|test_feature), 
          data = data_tidy, 
          family = beta_family(link = "logit"))

glmm_condition_test_match %>% 
  Anova()

There is a marginal effect of whether one heard generics or specifics about features in the test feature type on prevalence, in an ANOVA conducted on a logistic regression with match as a main effect, and with participant and test feature as random intercepts ($\chi^2$(1) = 2.8, p = 0.094).

By cosine similarity

# average cosine similarity of the test feature, to the training features in that condition
glmm_cosine_similarity <-
  glmmTMB(prevalence ~ cosine_similarity + (1|participant) + (1|test_feature), 
          data = data_tidy, 
          family = beta_family(link = "logit"))

glmm_cosine_similarity %>% 
  summary()

There is no effect of average cosine similarity of the test feature to the genericized training features in the condition ($\chi^2$(1) = 0.82, p = 0.365), in a logistic model with random intercepts per participant and test feature.

Group characterization

Participants were asked to describe what characterizes Zarpies as a group. TBD

Secondary results

Straight-lining

Despite the induction task plots suggesting a lot of anchoring around the 50% marker, straightlining was not a pervasive phenomenon.

2 out of 295 participants (0.68%) answered 50% to all test questions.
4 out of 295 participants (1.36%) answered 48-52% to all test questions, a looser criterion.

Session info

## R version 4.5.2 (2025-10-31)
## Platform: aarch64-apple-darwin20
## Running under: macOS Sequoia 15.7.3
## 
## Matrix products: default
## BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.1
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## time zone: America/New_York
## tzcode source: internal
## 
## attached base packages:
## [1] grid      stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] brms_2.23.0     Rcpp_1.1.0      emmeans_2.0.0   car_3.1-3      
##  [5] carData_3.0-5   glmmTMB_1.1.13  lubridate_1.9.4 forcats_1.0.1  
##  [9] stringr_1.6.0   dplyr_1.1.4     purrr_1.2.0     readr_2.1.6    
## [13] tidyr_1.3.1     tibble_3.3.0    ggplot2_4.0.1   tidyverse_2.0.0
## [17] gt_1.1.0        scales_1.4.0    janitor_2.2.1   here_1.0.2     
## 
## loaded via a namespace (and not attached):
##   [1] RColorBrewer_1.1-3    tensorA_0.36.2.1      rstudioapi_0.17.1    
##   [4] jsonlite_2.0.0        magrittr_2.0.4        TH.data_1.1-5        
##   [7] estimability_1.5.1    farver_2.1.2          nloptr_2.2.1         
##  [10] rmarkdown_2.30        fs_1.6.6              ragg_1.5.0           
##  [13] vctrs_0.6.5           minqa_1.2.8           base64enc_0.1-3      
##  [16] htmltools_0.5.8.1     distributional_0.5.0  curl_7.0.0           
##  [19] Formula_1.2-5         sass_0.4.10           StanHeaders_2.32.10  
##  [22] bslib_0.9.0           htmlwidgets_1.6.4     sandwich_3.1-1       
##  [25] zoo_1.8-14            cachem_1.1.0          TMB_1.9.18           
##  [28] lifecycle_1.0.4       pkgconfig_2.0.3       Matrix_1.7-4         
##  [31] R6_2.6.1              fastmap_1.2.0         rbibutils_2.4        
##  [34] snakecase_0.11.1      digest_0.6.38         numDeriv_2016.8-1.1  
##  [37] colorspace_2.1-2      ps_1.9.1              rprojroot_2.1.1      
##  [40] textshaping_1.0.4     Hmisc_5.2-4           labeling_0.4.3       
##  [43] timechange_0.3.0      abind_1.4-8           mgcv_1.9-4           
##  [46] compiler_4.5.2        bit64_4.6.0-1         withr_3.0.2          
##  [49] htmlTable_2.4.3       S7_0.2.1              backports_1.5.0      
##  [52] inline_0.3.21         QuickJSR_1.8.1        pkgbuild_1.4.8       
##  [55] MASS_7.3-65           loo_2.8.0             tools_4.5.2          
##  [58] foreign_0.8-90        nnet_7.3-20           glue_1.8.0           
##  [61] callr_3.7.6           nlme_3.1-168          checkmate_2.3.3      
##  [64] cluster_2.1.8.1       generics_0.1.4        gtable_0.3.6         
##  [67] tzdb_0.5.0            data.table_1.17.8     hms_1.1.4            
##  [70] xml2_1.5.0            pillar_1.11.1         vroom_1.6.6          
##  [73] posterior_1.6.1       splines_4.5.2         lattice_0.22-7       
##  [76] survival_3.8-3        bit_4.6.0             tidyselect_1.2.1     
##  [79] knitr_1.50            reformulas_0.4.2      gridExtra_2.3        
##  [82] V8_8.0.1              stats4_4.5.2          xfun_0.54            
##  [85] bridgesampling_1.1-2  matrixStats_1.5.0     rstan_2.32.7         
##  [88] stringi_1.8.7         yaml_2.3.10           boot_1.3-32          
##  [91] evaluate_1.0.5        codetools_0.2-20      cli_3.6.5            
##  [94] RcppParallel_5.1.11-1 rpart_4.1.24          xtable_1.8-4         
##  [97] systemfonts_1.3.1     Rdpack_2.6.4          processx_3.8.6       
## [100] jquerylib_0.1.4       coda_0.19-4.1         parallel_4.5.2       
## [103] rstantools_2.5.0      bayesplot_1.14.0      Brobdingnag_1.2-9    
## [106] lme4_1.1-37           ggthemes_5.1.0        mvtnorm_1.3-3        
## [109] crayon_1.5.3          rlang_1.1.6           multcomp_1.4-29

Compgenerics study 11 exploratory study

Marianna Zhang

2026-01-09