Summary

This exploratory study 8b (n=85/condition * 5 conditions = 425 participants) was an exploratory study to assess how proportionality of generics vs specifics affect inductive potential, holding count constant.

Unlike the previous study, we focus on the baseline and 4/n conditions (4/4, 4/8, 4/12, 4/16), such that we hold number of generics heard constant by design, and manipulate the number of specifics (i.e., proportion of generics to specifics).

Consistent with the previous study, we find an effect of condition / proportion of generics to specifics on adults’ judgments of the prevalence of novel features of a social group. This effect holds both including and excluding the baseline condition.

The effect is mostly driven by the 4/4 condition, which patterns significantly differently from both the baseline condition and the other 4/n conditions. This effect suggests hearing only generic statements may be particularly powerful, and that generic statements may be more powerful than specific statements in driving inductive inferences.

Methods

Participants

Data was collected from 421 adults via Prolific on Tues 5/20/2026. Participants required to be in the United States, fluent in English, and having not participated in prior studies under this protocol. Participants were paid $1.75 for an estimated 5-7 minute task. Participants were requested to particpate via desktop.

num_generics	total_utt	n
0	0	84
4	4	83
4	8	83
4	12	78
4	16	82

Exclusion criteria

We recruited 425 participants, of whom 11 participants (2.6% of all participants) were excluded for meeting at least 1 of the following exclusion criteria:

failing the sound check (n = 3 participants)
failing the attention check (i.e., did not select 100% on slider when asked to during induction task) (n = 1 participants)
admitting to use of AI after being explicitly informed use was prohibited (n = 1 participants)
failing the task check (n = 7 participants)

Participants who failed the sound check were included, since a few participants mentioned technical difficulties with the Qualtrics automatically progressing past that video.

Demographics

We used the Prolific representative sample feature to recruit a sample representative of the US based on Census data on sex, age, and ethnicity (Simplified US Census).

mean	sd	n
age
NA	NA	410

The sample skewed young in age.

gender	n	prop
Female	205	50.0%
Male	196	47.8%
Non-binary	7	1.7%
64	1	0.2%
genderqueer	1	0.2%

The sample reflected the diversity of the gender identities in the US.

race	n	prop
White, Caucasian, or European American	249	60.7%
Black or African American	47	11.5%
Hispanic or Latino/a	32	7.8%
White, Caucasian, or European American,Hispanic or Latino/a	14	3.4%
East Asian	12	2.9%
South or Southeast Asian	11	2.7%
White, Caucasian, or European American,Black or African American	5	1.2%
White, Caucasian, or European American,East Asian	5	1.2%
Native American, American Indian, or Alaska Native	4	1.0%
White, Caucasian, or European American,Native American, American Indian, or Alaska Native	4	1.0%
Middle Eastern or North African	3	0.7%
White, Caucasian, or European American,Middle Eastern or North African	3	0.7%
White, Caucasian, or European American,Hispanic or Latino/a,Native American, American Indian, or Alaska Native	2	0.5%
White, Caucasian, or European American,South or Southeast Asian	2	0.5%
Asian American	1	0.2%
Black or African American,East Asian,Native Hawaiian or other Pacific Islander	1	0.2%
Black or African American,South or Southeast Asian	1	0.2%
Caribbean/Multiracial	1	0.2%
Hispanic or Latino/a,Black or African American	1	0.2%
Hispanic or Latino/a,Native American, American Indian, or Alaska Native	1	0.2%
Mixed	1	0.2%
Native Hawaiian or other Pacific Islander	1	0.2%
Prefer not to specify	1	0.2%
White, Caucasian, or European American,Black or African American,Native American, American Indian, or Alaska Native	1	0.2%
White, Caucasian, or European American,Black or African American,South or Southeast Asian	1	0.2%
White, Caucasian, or European American,Middle Eastern or North African,East Asian	1	0.2%
White, Caucasian, or European American,Native American, American Indian, or Alaska Native,South or Southeast Asian,Native Hawaiian or other Pacific Islander	1	0.2%
mixed	1	0.2%
mixed 4 Races	1	0.2%
mixed black/white	1	0.2%
mixed race	1	0.2%

The sample was also racially diverse, with White Americans slightly overrepresented and Hispanic Americans undererepresented.

education	n	prop
Less than high school	2	0.5%
High school/GED	54	13.2%
Some college	118	28.8%
Bachelor's (B.A., B.S.)	164	40.0%
Master's (M.A., M.S.)	56	13.7%
Doctoral (Ph.D., J.D., M.D.)	14	3.4%
Prefer not to specify	2	0.5%

The sample was about evenly split on college completion.

Procedure

This study was administered as a Qualtrics survey, and approved by the NYU IRB (IRB-FY2023-6812).

After providing their consent, participants completed a captcha, pledge not to use AI, and sound check. Participants then completed:

Training phase: participants heard some number of generic statements and specific statements, based on condition. Which features were mentioned was randomized, as was statement order.
Test phase (induction task): participants completed an induction task where they imagined seeing a Zarpie with a novel feature, and estimated the prevalence of that feature among Zarpies using a slider from 0 to 100 (initialized at 0). All participants completed the same 16 trials, with order of trials randomized.

Participants then completed a few task completion questions, demographics, and were debriefed.

Data processing

Prevalence judgments were converted to a scale from 0 to 1, with 0 and 1 values trimmed to 0.01 and 0.99 to support a beta regression, since a uniform beta distribution does not include its endpoints of 0 and 1.

Participant feedback

The most frequent participant issue was audio issues (n = 4). One participant reported “The audio was horrible. It scared my cat.”, another reported “you might want to equalize the volume between the videos and the questions” (I had noticed equalization issues but was unable to fix in HTML, will fix later in the video itself).

One person had trouble with the attention check (“I couldn’t see where to move the slider for the attention check, so I just chose 50%”).

One person complained that they were unable to copy the consent form (this issue was a result of anti-AI study-wide CSS and will be changed to question-specific CSS in the future).

When asked to guess what the study was about, many participants reported that it was about judging other people by their characteristics.

Primary results

Induction task

Plots

Summary

Density

Histogram

Cumulative

Analyses

The following beta regressions predict prevalence with random intercepts per participant and per test feature.

# condition
glmmTMB(prevalence ~ condition + (1|participant) + (1|test_feature), 
        data = data_tidy, 
        family = beta_family(link = "logit")) %>% 
  Anova()

## Analysis of Deviance Table (Type II Wald chisquare tests)
## 
## Response: prevalence
##            Chisq Df Pr(>Chisq)    
## condition 21.437  4  0.0002593 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Across all conditions, there is a main effect of condition ($\chi$(4)=21.44, p<.001) on the inferred prevalence of novel features.

# condition
glmmTMB(prevalence ~ condition + (1|participant) + (1|test_feature), 
        data = data_tidy %>% 
          filter(condition != "baseline"), 
        family = beta_family(link = "logit")) %>% 
  Anova()

## Analysis of Deviance Table (Type II Wald chisquare tests)
## 
## Response: prevalence
##            Chisq Df Pr(>Chisq)    
## condition 19.573  3  0.0002081 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Across the 4/n conditions (excludes baseline), there is a main effect of condition ($\chi$(3)=19.57, p<.001) on the inferred prevalence of novel features.

# proportion of generics
glmmTMB(prevalence ~ prop_generics + (1|participant) + (1|test_feature),
        data = data_tidy, 
        family = beta_family(link = "logit")) %>% 
  summary()

##  Family: beta  ( logit )
## Formula:          
## prevalence ~ prop_generics + (1 | participant) + (1 | test_feature)
## Data: data_tidy
## 
##       AIC       BIC    logLik -2*log(L)  df.resid 
##   -3689.8   -3657.0    1849.9   -3699.8      5210 
## 
## Random effects:
## 
## Conditional model:
##  Groups       Name        Variance Std.Dev.
##  participant  (Intercept) 0.7395   0.8600  
##  test_feature (Intercept) 0.1829   0.4277  
## Number of obs: 5215, groups:  participant, 326; test_feature, 16
## 
## Dispersion parameter for beta family (): 2.61 
## 
## Conditional model:
##               Estimate Std. Error z value    Pr(>|z|)    
## (Intercept)    -0.7264     0.1480  -4.908 0.000000921 ***
## prop_generics   0.6956     0.1703   4.085 0.000044047 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Across all conditions, proportion of generics to specifics (z = 4.09, p < .001) predicted the inferred prevalence of novel features.

# proportion of generics
glmmTMB(prevalence ~ prop_generics + (1|participant) + (1|test_feature),
        data = data_tidy %>% 
          filter(condition != "baseline"), 
        family = beta_family(link = "logit")) %>% 
  summary()

##  Family: beta  ( logit )
## Formula:          
## prevalence ~ prop_generics + (1 | participant) + (1 | test_feature)
## Data: data_tidy %>% filter(condition != "baseline")
## 
##       AIC       BIC    logLik -2*log(L)  df.resid 
##   -3689.8   -3657.0    1849.9   -3699.8      5210 
## 
## Random effects:
## 
## Conditional model:
##  Groups       Name        Variance Std.Dev.
##  participant  (Intercept) 0.7395   0.8600  
##  test_feature (Intercept) 0.1829   0.4277  
## Number of obs: 5215, groups:  participant, 326; test_feature, 16
## 
## Dispersion parameter for beta family (): 2.61 
## 
## Conditional model:
##               Estimate Std. Error z value    Pr(>|z|)    
## (Intercept)    -0.7264     0.1480  -4.908 0.000000921 ***
## prop_generics   0.6956     0.1703   4.085 0.000044047 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Across the 4/n conditions (excludes baseline), proportion of generics to specifics (z = 4.09, p < .001) predicted the inferred prevalence of novel features.

Secondary results

Straight-lining

Induction task plots suggest a lot of anchoring around the 50% marker, and debriefing suggests many participants thought it was a strange task since the test features were odd/bizarre. Are these just from participants who were straightlining through all the test items?

3 out of 410 included participants (0.73%) answered 50% to all test questions.
4 out of 410 included participants (0.98%) answered 48-52% to all test questions, a looser criterion.

Since there were only a few participants who consistently straightlined, these participants were not excluded from analyses.

Pairwise condition comparisons

We can make pairwise comparisons between conditions using two-sample Kolmogorov–Smirnov tests, with Bonferroni correction for number of tests run (10). The Kolmogorov–Smirnov test compares the cumulative distributions of two samples and returns a statistic D that reflects the maximum difference between the two distributions, as well as a p-value for the test.

One quick and dirty way to think about these results is to look at how often pairs of conditions with the same (boxed) versus different (not in box) number of generics are significantly different from each other, and to do the same for pairs of conditions with same (boxed) versus different (not in box) proportions of generics.

Note, there are way way more pairwise comparisons comparing different numbers or proportions of generics than same numbers or proportion of generics, so this is a bit of a lopsided comparison.

If say number of generics matters, we would expect to see that the distribution of prevalence ratings rarely differ when comparing pairs that are the same number of generics, and differ much more when comparing pairs that are different number of generics.

If say proportion of generics matters, we would expect to see that the distribution of prevalence ratings rarely differ when comparing pairs that are the same proportion of generics, and differ much more when comparing pairs that are different proportion of generics.

same_num_generics	sig_corr_tests	total_tests	prop_sig_corr
FALSE	1	4	25.0%
TRUE	4	6	66.7%

same_prop_generics	sig_corr_tests	total_tests	prop_sig_corr
FALSE	4	6	66.7%
NA	1	4	25.0%

Footnotes

## R version 4.5.2 (2025-10-31)
## Platform: aarch64-apple-darwin20
## Running under: macOS Tahoe 26.4.1
## 
## Matrix products: default
## BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.1
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## time zone: America/New_York
## tzcode source: internal
## 
## attached base packages:
## [1] grid      stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] emmeans_2.0.1   car_3.1-3       carData_3.0-5   glmmTMB_1.1.14 
##  [5] lubridate_1.9.4 forcats_1.0.1   stringr_1.6.0   dplyr_1.1.4    
##  [9] purrr_1.2.1     readr_2.1.6     tidyr_1.3.2     tibble_3.3.1   
## [13] ggplot2_4.0.1   tidyverse_2.0.0 gt_1.3.0        scales_1.4.0   
## [17] janitor_2.2.1   here_1.0.2     
## 
## loaded via a namespace (and not attached):
##  [1] Rdpack_2.6.5        gridExtra_2.3       sandwich_3.1-1     
##  [4] rlang_1.1.7         magrittr_2.0.4      multcomp_1.4-29    
##  [7] snakecase_0.11.1    otel_0.2.0          compiler_4.5.2     
## [10] mgcv_1.9-4          systemfonts_1.3.1   vctrs_0.7.1        
## [13] pkgconfig_2.0.3     crayon_1.5.3        fastmap_1.2.0      
## [16] backports_1.5.0     labeling_0.4.3      rmarkdown_2.30     
## [19] tzdb_0.5.0          nloptr_2.2.1        ragg_1.5.0         
## [22] bit_4.6.0           xfun_0.56           cachem_1.1.0       
## [25] jsonlite_2.0.0      parallel_4.5.2      cluster_2.1.8.1    
## [28] R6_2.6.1            bslib_0.10.0        stringi_1.8.7      
## [31] RColorBrewer_1.1-3  boot_1.3-32         rpart_4.1.24       
## [34] jquerylib_0.1.4     numDeriv_2016.8-1.1 estimability_1.5.1 
## [37] Rcpp_1.1.1          knitr_1.51          zoo_1.8-15         
## [40] base64enc_0.1-3     Matrix_1.7-4        splines_4.5.2      
## [43] nnet_7.3-20         timechange_0.3.0    tidyselect_1.2.1   
## [46] rstudioapi_0.18.0   abind_1.4-8         yaml_2.3.12        
## [49] TMB_1.9.19          codetools_0.2-20    lattice_0.22-7     
## [52] withr_3.0.2         S7_0.2.1            coda_0.19-4.1      
## [55] evaluate_1.0.5      foreign_0.8-90      survival_3.8-6     
## [58] xml2_1.5.2          pillar_1.11.1       checkmate_2.3.3    
## [61] reformulas_0.4.3.1  generics_0.1.4      vroom_1.6.7        
## [64] rprojroot_2.1.1     hms_1.1.4           minqa_1.2.8        
## [67] xtable_1.8-4        glue_1.8.0          Hmisc_5.2-5        
## [70] tools_4.5.2         data.table_1.18.0   lme4_1.1-38        
## [73] fs_1.6.6            mvtnorm_1.3-3       rbibutils_2.4.1    
## [76] colorspace_2.1-2    nlme_3.1-168        htmlTable_2.4.3    
## [79] Formula_1.2-5       cli_3.6.5           textshaping_1.0.4  
## [82] viridisLite_0.4.2   ggthemes_5.2.0      gtable_0.3.6       
## [85] sass_0.4.10         digest_0.6.39       TH.data_1.1-5      
## [88] htmlwidgets_1.6.4   farver_2.1.2        htmltools_0.5.9    
## [91] lifecycle_1.0.5     bit64_4.6.0-1       MASS_7.3-65

Compgenerics study 8b (proportionality) study

Marianna Zhang

2026-05-20

Summary

Methods

Participants

Exclusion criteria

Demographics

Procedure

Data processing

Participant feedback

Primary results

Induction task

Plots

Summary

Density

Histogram

Cumulative

Analyses

Secondary results

Straight-lining

Pairwise condition comparisons

Footnotes