Summary

Study 9 was an exploratory study to explore how generic statements about certain features affect adults’ generalization across similar versus different features.

In this study, adults (n = 397, 98-100/condition) heard 15 generics about similar features of Zarpies (i.e., physical, diet and food preferences, or personality and related behaviors) or 15 generics about heterogeneous features of Zarpies (i.e., a fixed subset of 5 generics from each of the aforementioned clusters). Adults then completed an inductive potential task: in each trial, they observed a Zarpie with a novel feature, and rated its prevalence among Zarpies (0-100%). Trials involved 5 novel features from each cluster (i.e., 5 novel physical, diet, and personality features), comprising 15 novel features in total.

We were thinking:

If adults learn feature-specific coherence based on the features in the generics they heard, then adults who hear generics about certain features should generalize more within that feature type than across feature types. In other words, the physical condition may lead to higher prevalence estimates for novel physical features, compared to diet or personality conditions, etc.
Also, hearing generics about a broad range of features may lead to higher overall coherence than hearing generics about a specific set of features. In other words, the heterogeneous condition may lead to higher overall prevalence estimates than the specific feature conditions.

What we found:

We found mixed evidence for feature-specific coherence.

In general, conditions where participants heard the generics about the same type of feature led to higher prevalence for test features in the same type (see By test feature type match).
However, when looking at specific conditions, only the physical condition led to higher prevalence estimates for the features of the same type compared to other single-type conditions.

We did not find evidence for hearing heterogenous generics leading to higher coherence.

The heterogeneous condition did not lead to higher overall prevalence estimates than the specific feature conditions (see Overall).

We also asked participants to freeform describe what characterizes Zarpies as a group, and found descriptions that generally matched their conditions, although physical responses remained rare, even in the physical condition where they were most common.

Methods

Participants

Data was collected from 402 adults (n = 98-100/condition) via Prolific on Thursday-Friday 10/16-10/17/2025. Participants required to be in the United States, fluent in English, and having not participated in prior studies under this protocol. Participants were paid $1.75 for an estimated 7 minute task. Participants were requested to particpate via desktop.

condition	n
physical	100
diet	100
personality	99
heterogeneous	98

Exclusion criteria

We recruited 400 participants, of whom 5 participants (1.2% of all participants) were excluded for meeting at least 1 of the following exclusion criteria:

failing the sound check (i.e., did not select “bird” as the sound heard during the sound check video) (n = 1 participants)
failing the attention check (i.e., did not select 100% on slider when asked to during induction task) (n = 1 participants)
admitting to use of AI after being explicitly informed use was prohibited (n = 1 participants)
failing the task check (n = 4 participants)

Demographics

We used the Prolific representative sample feature to recruit a sample representative of the US based on Census data on sex, age, and ethnicity (Simplified US Census).

mean	sd	n
age
NA	NA	397

The sample skewed young in age.

gender	n	prop
Male	198	49.9%
Female	194	48.9%
Non-binary	3	0.8%
Prefer not to specify	2	0.5%

The sample reflected the diversity of the gender identities in the US.

race	n	prop
White, Caucasian, or European American	246	62.0%
Black or African American	48	12.1%
Hispanic or Latino/a	27	6.8%
East Asian	13	3.3%
South or Southeast Asian	13	3.3%
White, Caucasian, or European American,Hispanic or Latino/a	8	2.0%
White, Caucasian, or European American,Black or African American	6	1.5%
White, Caucasian, or European American,East Asian	6	1.5%
White, Caucasian, or European American,Native American, American Indian, or Alaska Native	6	1.5%
Middle Eastern or North African	3	0.8%
Prefer not to specify	3	0.8%
White, Caucasian, or European American,Hispanic or Latino/a,Black or African American	3	0.8%
White, Caucasian, or European American,Middle Eastern or North African	3	0.8%
Native American, American Indian, or Alaska Native	2	0.5%
White, Caucasian, or European American,South or Southeast Asian	2	0.5%
American	1	0.3%
Black or African American,Native Hawaiian or other Pacific Islander	1	0.3%
Hispanic or Latino/a,Middle Eastern or North African	1	0.3%
Hispanic or Latino/a,Native American, American Indian, or Alaska Native	1	0.3%
Indigenous American	1	0.3%
Latina/Caucasian	1	0.3%
Middle Eastern or North African,South or Southeast Asian	1	0.3%
White, Caucasian, or European American,Black or African American,Native American, American Indian, or Alaska Native,East Asian	1	0.3%

The sample was also racially diverse.

education	n	prop
Less than high school	3	0.8%
High school/GED	51	12.8%
Some college	103	25.9%
Bachelor's (B.A., B.S.)	179	45.1%
Master's (M.A., M.S.)	47	11.8%
Doctoral (Ph.D., J.D., M.D.)	13	3.3%
Prefer not to specify	1	0.3%

The sample was about evenly split on college completion.

Procedure

This study was administered as a Qualtrics survey, and approved by the NYU IRB (IRB-FY2023-6812).

After providing their consent, participants completed a captcha, pledge not to use AI, and sound check. Participants then completed:

Training phase: participants heard 15 generic statements in random order. Participants were randomly assigned to one of 4 conditions:

Diet condition - 15 generic statements about Zarpies’ diet and food preferences
Physical condition - 15 generic statements about Zarpies’ physical features
Personality condition - 15 generic statements about Zarpies’ personality and related behaviors
Heterogeneous condition - a fixed subset of 5 generic statements from each of the 3 feature types above (diet, physical, personality)

Test phase (induction task): Participants completed an induction task where they imagined seeing a Zarpie with a novel feature, and estimated the prevalence of that feature among Zarpies using a slider from 0 to 100 (initialized at 0). All participants completed the same 15 trials, with order of trials randomized:

5 physical features
5 diet features
5 personality features

Test phase (group characterization): Participants were then asked to respond to a freeform question asking: “What do you think characterizes Zarpies as a group?” Responses were loosely coded blind to condition by Marianna.

Participants then completed a few task completion questions, demographics, and were debriefed.

Data processing

Prevalence judgments were converted to a scale from 0 to 1, with 0 and 1 values trimmed to 0.01 and 0.99 to support a beta regression, since a uniform beta distribution does not include its endpoints of 0 and 1.

Computational modeling

To get a sense of feature space, we embed all the features (training and test features), as they appear in generic statements, using a sentence transformer (MiniLM). We then use PCA to reduce that multi-dimensional space down to a 2 dimensional feature space. The code doing this and generating these plots is in Python, in the project folder under “Model”.

Stimuli

All conditions

All training features plus test features in 2D feature space based on embeddings from a sentence embedding model.

Physical condition

Physical training features (darker circles) plus test features (lighter diamonds) in 2D feature space based on embeddings from a sentence embedding model.

Diet condition

Diet training features (darker circles) plus test features (lighter diamonds) in 2D feature space based on embeddings from a sentence embedding model.

Personality condition

Personality training features (darker circles) plus test features (lighter diamonds) in 2D feature space based on embeddings from a sentence embedding model.

Predicted concepts

The training conditions are all generics, which indicate what features are known to be kind-linked. Based on what features are known to be kind-linked, our feature-specific model tries to fit a multivariate Gaussian function (3D) over feature space (2D). This multivariate Gaussian function can be thought of a “kind concept”, and is centered on a mean and has a covariance matrix defining its spread. For example, if the Gaussian is centered over physical features, it will have stronger generalization to other physical features, and weaker generalization to more distant features in feature space.

We can then use the model to make predictions about novel test features based on their embedding location in feature space and the Gaussian concept:

The Gaussian provides a probability over “kind scores” for each test feature.
A test feature’s “kind score” in turn is entered into a beta distribution which provides a probability over the likelihood of the test feature being kind-linked.
The likelihood of the test feature being kind-linked then sets a Bernoulli distribution, which determines whether the test feature is in fact kind-linked.
Note that we have yet to link this model to prevalence judgments (or embed it in the rational speech acts framework).

It’s hard to visualize a 3D Gaussian over a 2D space, so the below visualizations use ellipses to depict cross-sections from sampled Gaussians, to show approximately what the expected Gaussian might look like after each training condition.

Physical condition

Example kind concepts after the physical condition.

Diet condition

Example kind concepts after the diet condition.

Personality condition

Example kind concepts after the personality condition.

Heterogeneous condition

$Example kind concepts after the heterogeneous condition.$

Example kind concepts after the heterogeneous condition.

Primary results

Induction task

Analyses of the induction task were logistic regressions unless otherwise specified, predicting prevalence (.01-.99) with participant and test feature as random intercepts. Test feature (“can snap with their toes”, etc.) is technically nested within test feature type (physical, diet, personality), but since each test feature is unique to each test feature type, a model with the nesting term is analytically equivalent to the previous model, so the nesting term was omitted for simplicity of specification.

By test feature

We can look at how prevalence judgments vary by condition and individual test feature.

By test feature type

We can look at how prevalence judgments vary by condition and test feature type (i.e., physical, diet, or personality).

If the chosen clusters capture some systematicity in how people generalize, the physical condition should make the highest prevalence estimates for physical test features, the diet condition for the diet test features, and the personality condition for personality test features. This appears to be true for the physical and personality conditions, but not for the diet condition.

# condition * test feature type
glmm_condition_testfeaturetype <-
  glmmTMB(prevalence ~ condition * test_feature_type + (1|participant) + (1|test_feature), 
          data = data_tidy, 
          family = beta_family(link = "logit"))

glmm_condition_testfeaturetype %>% 
  Anova()

## Analysis of Deviance Table (Type II Wald chisquare tests)
## 
## Response: prevalence
##                               Chisq Df           Pr(>Chisq)    
## condition                    6.4863  3              0.09020 .  
## test_feature_type            6.6197  2              0.03652 *  
## condition:test_feature_type 86.1700  6 < 0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

glmm_condition_testfeaturetype %>% 
  emmeans(~ condition * test_feature_type) %>%
  contrast(method = "pairwise") %>%
  summary(adjust = "FDR")

##  contrast                                            estimate    SE  df z.ratio
##  physical physical - diet physical                    0.57177 0.134 Inf   4.258
##  physical physical - personality physical             0.44917 0.135 Inf   3.334
##  physical physical - heterogeneous physical           0.30804 0.135 Inf   2.284
##  physical physical - physical diet                    0.06678 0.203 Inf   0.329
##  physical physical - diet diet                        0.18582 0.236 Inf   0.786
##  physical physical - personality diet                 0.35124 0.236 Inf   1.485
##  physical physical - heterogeneous diet               0.32310 0.237 Inf   1.364
##  physical physical - physical personality            -0.20105 0.203 Inf  -0.989
##  physical physical - diet personality                -0.00323 0.237 Inf  -0.014
##  physical physical - personality personality         -0.34012 0.237 Inf  -1.437
##  physical physical - heterogeneous personality       -0.04994 0.237 Inf  -0.211
##  diet physical - personality physical                -0.12261 0.135 Inf  -0.910
##  diet physical - heterogeneous physical              -0.26374 0.135 Inf  -1.956
##  diet physical - physical diet                       -0.50499 0.236 Inf  -2.137
##  diet physical - diet diet                           -0.38595 0.203 Inf  -1.900
##  diet physical - personality diet                    -0.22054 0.236 Inf  -0.933
##  diet physical - heterogeneous diet                  -0.24868 0.237 Inf  -1.050
##  diet physical - physical personality                -0.77282 0.237 Inf  -3.267
##  diet physical - diet personality                    -0.57501 0.203 Inf  -2.828
##  diet physical - personality personality             -0.91189 0.237 Inf  -3.853
##  diet physical - heterogeneous personality           -0.62171 0.237 Inf  -2.625
##  personality physical - heterogeneous physical       -0.14113 0.135 Inf  -1.043
##  personality physical - physical diet                -0.38239 0.237 Inf  -1.616
##  personality physical - diet diet                    -0.26334 0.237 Inf  -1.113
##  personality physical - personality diet             -0.09793 0.203 Inf  -0.482
##  personality physical - heterogeneous diet           -0.12607 0.237 Inf  -0.532
##  personality physical - physical personality         -0.65021 0.237 Inf  -2.746
##  personality physical - diet personality             -0.45240 0.237 Inf  -1.911
##  personality physical - personality personality      -0.78929 0.203 Inf  -3.880
##  personality physical - heterogeneous personality    -0.49910 0.237 Inf  -2.105
##  heterogeneous physical - physical diet              -0.24126 0.237 Inf  -1.019
##  heterogeneous physical - diet diet                  -0.12221 0.237 Inf  -0.516
##  heterogeneous physical - personality diet            0.04320 0.237 Inf   0.182
##  heterogeneous physical - heterogeneous diet          0.01506 0.203 Inf   0.074
##  heterogeneous physical - physical personality       -0.50908 0.237 Inf  -2.149
##  heterogeneous physical - diet personality           -0.31127 0.237 Inf  -1.314
##  heterogeneous physical - personality personality    -0.64816 0.237 Inf  -2.735
##  heterogeneous physical - heterogeneous personality  -0.35798 0.203 Inf  -1.760
##  physical diet - diet diet                            0.11904 0.135 Inf   0.883
##  physical diet - personality diet                     0.28445 0.135 Inf   2.106
##  physical diet - heterogeneous diet                   0.25632 0.136 Inf   1.889
##  physical diet - physical personality                -0.26783 0.204 Inf  -1.316
##  physical diet - diet personality                    -0.07001 0.237 Inf  -0.296
##  physical diet - personality personality             -0.40690 0.237 Inf  -1.718
##  physical diet - heterogeneous personality           -0.11672 0.237 Inf  -0.493
##  diet diet - personality diet                         0.16541 0.135 Inf   1.225
##  diet diet - heterogeneous diet                       0.13727 0.136 Inf   1.012
##  diet diet - physical personality                    -0.38687 0.237 Inf  -1.635
##  diet diet - diet personality                        -0.18906 0.203 Inf  -0.929
##  diet diet - personality personality                 -0.52594 0.237 Inf  -2.221
##  diet diet - heterogeneous personality               -0.23576 0.237 Inf  -0.995
##  personality diet - heterogeneous diet               -0.02814 0.136 Inf  -0.207
##  personality diet - physical personality             -0.55228 0.237 Inf  -2.332
##  personality diet - diet personality                 -0.35447 0.237 Inf  -1.497
##  personality diet - personality personality          -0.69136 0.204 Inf  -3.397
##  personality diet - heterogeneous personality        -0.40117 0.237 Inf  -1.692
##  heterogeneous diet - physical personality           -0.52414 0.237 Inf  -2.210
##  heterogeneous diet - diet personality               -0.32633 0.237 Inf  -1.376
##  heterogeneous diet - personality personality        -0.66322 0.237 Inf  -2.795
##  heterogeneous diet - heterogeneous personality      -0.37304 0.204 Inf  -1.831
##  physical personality - diet personality              0.19781 0.136 Inf   1.460
##  physical personality - personality personality      -0.13907 0.136 Inf  -1.025
##  physical personality - heterogeneous personality     0.15111 0.136 Inf   1.111
##  diet personality - personality personality          -0.33689 0.136 Inf  -2.482
##  diet personality - heterogeneous personality        -0.04670 0.136 Inf  -0.343
##  personality personality - heterogeneous personality  0.29018 0.136 Inf   2.130
##  p.value
##   0.0014
##   0.0113
##   0.1055
##   0.8165
##   0.5275
##   0.2749
##   0.3076
##   0.4349
##   0.9891
##   0.2841
##   0.8758
##   0.4602
##   0.1514
##   0.1108
##   0.1554
##   0.4566
##   0.4349
##   0.0120
##   0.0412
##   0.0026
##   0.0520
##   0.4349
##   0.2257
##   0.4191
##   0.7167
##   0.7136
##   0.0412
##   0.1554
##   0.0026
##   0.1108
##   0.4349
##   0.7136
##   0.8820
##   0.9554
##   0.1108
##   0.3195
##   0.0412
##   0.1915
##   0.4697
##   0.1108
##   0.1554
##   0.3195
##   0.8303
##   0.2021
##   0.7167
##   0.3642
##   0.4349
##   0.2247
##   0.4566
##   0.1108
##   0.4349
##   0.8758
##   0.1001
##   0.2749
##   0.0112
##   0.2064
##   0.1108
##   0.3076
##   0.0412
##   0.1702
##   0.2802
##   0.4349
##   0.4191
##   0.0718
##   0.8165
##   0.1108
## 
## Results are given on the log odds ratio (not the response) scale. 
## P value adjustment: fdr method for 66 tests

Indeed, there is a significant interaction between condition and test feature type in an ANOVA conducted on a logistic regression with condition, test feature type, and their interaction as fixed effects, and with participant and test feature as random intercepts ($\chi$(6) = 86.17, p < .001). There is also a main effect of test feature type ($\chi$(2) = 6.62, p = .037) and a marginal effect of condition ($\chi$(3) = 6.49, p = .090).

When rating the prevalence of physical features, the physical condition produced significantly higher prevalence estimates than the diet condition (FDR-corrected z = 4.26, p = .0014) or personality condition (z = 3.33, p = .011), but no different from the heterogeneous condition (z = 2.28, p = .11).

When rating the prevalence of diet features, the diet condition did not produce different prevalence estimates than the physical condition (z = 0.88, p = .47), personality condition (z = 1.23, p = 0.36), or heterogeneous condition (z = 1.01, p = .43).

When rating the prevalence of personality features, the personality condition produced only marginally higher prevalence estimates than the diet condition (z = 2.48, p = .072), and heterogeneous condition (z = 2.13, p = .11), and no different from the physical condition (z = 1.03, p = .43).

# make contrast matrix for condition
C <- matrix(
  c(
    # physical  diet   pers    hetero
      1,       -1/3,  -1/3,   -1/3,    # Contrast 1: physical vs others
      0,        1,    -1,      0,      # Contrast 2: diet vs personality
      0,        1,     0,     -1,      # Contrast 3: diet vs heterogeneous
      1,        1,     1,      1       # Overall mean (intercept)
  ),
  nrow = 4,
  byrow = TRUE
)

# assign row names
rownames(C) <- levels(data_tidy$condition)

# apply and center columns
contrasts(data_tidy$condition) <- C[,1:3]  # first 3 rows are true contrasts


# condition * test feature type
glmm_condition_testfeaturetype_phys <-
  glmmTMB(prevalence ~ condition * test_feature_type + (1|participant) + (1|test_feature), 
          data = data_tidy, 
          family = beta_family(link = "logit"))

glmm_condition_testfeaturetype_phys %>% 
  summary()

##  Family: beta  ( logit )
## Formula:          
## prevalence ~ condition * test_feature_type + (1 | participant) +  
##     (1 | test_feature)
## Data: data_tidy
## 
##       AIC       BIC    logLik -2*log(L)  df.resid 
##   -3517.8   -3417.4    1773.9   -3547.8      5940 
## 
## Random effects:
## 
## Conditional model:
##  Groups       Name        Variance Std.Dev.
##  participant  (Intercept) 0.72984  0.8543  
##  test_feature (Intercept) 0.09435  0.3072  
## Number of obs: 5955, groups:  participant, 397; test_feature, 15
## 
## Dispersion parameter for beta family ():  3.6 
## 
## Conditional model:
##                                         Estimate Std. Error z value  Pr(>|z|)
## (Intercept)                              0.31645    0.27317   1.158  0.246683
## condition1                               0.01851    0.23386   0.079  0.936905
## condition2                              -0.35364    0.16843  -2.100  0.035759
## condition3                               0.12262    0.13467   0.911  0.362553
## test_feature_typediet                   -0.22892    0.24380  -0.939  0.347752
## test_feature_typepersonality             0.88587    0.24477   3.619  0.000295
## condition1:test_feature_typediet         0.17504    0.14596   1.199  0.230423
## condition2:test_feature_typediet         0.32682    0.10515   3.108  0.001882
## condition3:test_feature_typediet        -0.28803    0.08395  -3.431  0.000602
## condition1:test_feature_typepersonality -0.64558    0.14731  -4.382 0.0000117
## condition2:test_feature_typepersonality -0.09657    0.10618  -0.910  0.363073
## condition3:test_feature_typepersonality  0.21427    0.08495   2.522  0.011658
##                                            
## (Intercept)                                
## condition1                                 
## condition2                              *  
## condition3                                 
## test_feature_typediet                      
## test_feature_typepersonality            ***
## condition1:test_feature_typediet           
## condition2:test_feature_typediet        ** 
## condition3:test_feature_typediet        ***
## condition1:test_feature_typepersonality ***
## condition2:test_feature_typepersonality    
## condition3:test_feature_typepersonality *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

By test feature type match

Another way to look at the data is to code responses by whether the test feature type matched the training condition. If they match (e.g., diet condition responding to a diet test question), we can code that as a match, or if they mismatch (e.g., diet condition responding to a personality test question), we can code that as a mismatch. We can leave the heterogeneous condition as its own category, since it’s a semi-match to everything.

If the chosen clusters capture some systematicity in how people generalize, matches should result in higher prevalence estimates than mismatches. Indeed, that’s what we find.

# condition
glmm_condition_test_match <-
  glmmTMB(prevalence ~ condition_test_match + (1|participant) + (1|test_feature), 
          data = data_tidy, 
          family = beta_family(link = "logit"))

glmm_condition_test_match %>% 
  Anova()

## Analysis of Deviance Table (Type II Wald chisquare tests)
## 
## Response: prevalence
##                       Chisq Df            Pr(>Chisq)    
## condition_test_match 74.026  2 < 0.00000000000000022 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

glmm_condition_test_match %>% 
  emmeans(~ condition_test_match) %>% 
  contrast(method = "pairwise") %>% 
  summary(adjust = "FDR")

##  contrast                 estimate    SE  df z.ratio p.value
##  match - heterogeneous      0.2422 0.106 Inf   2.285  0.0335
##  match - mismatch           0.2570 0.030 Inf   8.577 <0.0001
##  heterogeneous - mismatch   0.0148 0.105 Inf   0.142  0.8874
## 
## Results are given on the log odds ratio (not the response) scale. 
## P value adjustment: fdr method for 3 tests

Indeed, there is a main effect of whether condition and test variables match (match, hetereogenous, or mismatch) on prevalence, in an ANOVA conducted on a logistic regression with match as a main effect, and with participant and test feature as random intercepts ($\chi$(2) = 74.03, p < .001). Post-hoc FDR-corrected pairwise comparisons reveal that the matching condition results in higher prevalence estimates of test features than the heterogeneous condition (z = 8.58, p < .001) or the mismatching conditions (z = 2.29, p = .034).

By cosine similarity

Instead of grouping features into discrete types, we can also look at the distance (cosine similarity) between each individual test feature to the training features presented in each condition, in the multidimensional embedding space.

For each test feature, we can calculate the average distance to training features in each condition, and see if that metric of cosine similarity predicts measures of prevalence.

# average cosine similarity of the test feature, to the training features in that condition
glmm_cosine_similarity_avg <-
  glmmTMB(prevalence ~ cosine_similarity_avg + (1|participant) + (1|test_feature), 
          data = data_tidy, 
          family = beta_family(link = "logit"))

glmm_cosine_similarity_avg %>% 
  summary()

Indeed, there is a significant effect of average cosine similarity of the test feature to the various training features in the condition ($z$ = 7.38, p < .001), such that higher average cosine similarity predicts higher prevalence estimates, in a logistic model with random intercepts per participant and test feature.

We can also focus on maximum cosine similarity, i.e., the distance from the test feature to the closest training feature in a given condition, and see if that metric predicts prevalence judgments.

# max cosine similarity of the test feature, to the closest training features in that condition
glmm_cosine_similarity_max <-
  glmmTMB(prevalence ~ cosine_similarity_max + (1|participant) + (1|test_feature), 
          data = data_tidy, 
          family = beta_family(link = "logit"))

glmm_cosine_similarity_max %>% 
  summary()

Indeed, maximum cosine similarity is also a significant predictor of prevalence estimates ($z$ = 7.24, p < .001), such that higher maximum cosine similarity predicts higher prevalence estimates, in a logistic model with random intercepts per participant and test feature.

glmmTMB(prevalence ~ cosine_similarity_avg + cosine_similarity_max + (1|participant) + (1|test_feature), 
          data = data_tidy, 
          family = beta_family(link = "logit"))  %>% 
  summary()

However, when including both average and max cosine similarity as predictors, only average cosine similarity remains a significant predictor of prevalence estimates, suggesting that people are integrating over all training features in the condition, rather than just attending to the most similar training feature.

Overall

We can look at prevalence estimates overall. If the heterogeneous condition leads to the highest overall coherence, we should see the highest prevalence estimates in that condition overall. However, that’s not what we find.

# condition * test feature type
glmm_condition <-
  glmmTMB(prevalence ~ condition + (1|participant) + (1|test_feature_type),
          data = data_tidy, 
          family = beta_family(link = "logit"))

glmm_condition %>% 
  Anova()

## Analysis of Deviance Table (Type II Wald chisquare tests)
## 
## Response: prevalence
##           Chisq Df Pr(>Chisq)  
## condition 6.475  3    0.09065 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

glmm_condition %>% 
  emmeans("condition") %>%
  contrast(method = "pairwise") %>%
  summary(adjust = "FDR")

##  contrast                    estimate    SE  df z.ratio p.value
##  physical - diet               0.2907 0.121 Inf   2.404  0.0973
##  physical - personality        0.1925 0.121 Inf   1.588  0.2244
##  physical - heterogeneous      0.2310 0.122 Inf   1.901  0.1719
##  diet - personality           -0.0982 0.121 Inf  -0.810  0.6267
##  diet - heterogeneous         -0.0596 0.122 Inf  -0.491  0.7483
##  personality - heterogeneous   0.0385 0.122 Inf   0.316  0.7516
## 
## Results are given on the log odds ratio (not the response) scale. 
## P value adjustment: fdr method for 6 tests

There is only a marginal effect of condition on prevalence ($\chi$(3) = 6.48, p = .091). Post-hoc FDR-corrected pairwise comparisons reveal no significant differences between any conditions (ps > .10).

By test feature, vs model

We can get the model’s predictions and compare those to people’s ratings of prevalence. For now, we get the model’s “kind score” for each test feature, which is a measure of the expected value of the Gaussian function at that location in feature space.

Group characterization

Participants were asked to describe what characterizes Zarpies as a group, with responses coded by Marianna blind to condition.

Eyeballing the plot below, participants in the diet and personality conditions often characterized Zarpies in terms of their diet or personality, seemingly moreso than in the other conditions.

In the physical condition, participants appeared more likely to describe Zarpies in terms of physical characteristics than the other conditions, but this effect seems less pronounced than in the diet and personality conditions, with physical descriptions remaining a minority of descriptions in the physical condition. (maybe a bit more when merged with appearance, but still remaining below a majority)

TBD: analyses of the frequency of these codes

Secondary results

Straight-lining

Despite the induction task plots suggesting a lot of anchoring around the 50% marker, straightlining was not a pervasive phenomenon.

2 out of 397 participants (0.50%) answered 50% to all test questions.
4 out of 397 participants (1.01%) answered 48-52% to all test questions, a looser criterion.

Test features order effects

All participants rated the prevalence of the same set of 15 test features, in random order. Did the order of test feature/prevalence judgment questions matter for prevalence judgments?

## Analysis of Deviance Table (Type II Wald chisquare tests)
## 
## Response: prevalence
##                                                  Chisq Df            Pr(>Chisq)
## condition                                       6.4616  3             0.0911905
## test_feature_type                               6.6336  2             0.0362684
## test_feature_order                             10.8788  1             0.0009727
## condition:test_feature_type                    87.7843  6 < 0.00000000000000022
## condition:test_feature_order                    3.5322  3             0.3166129
## test_feature_type:test_feature_order            0.0298  2             0.9852302
## condition:test_feature_type:test_feature_order  5.8935  6             0.4352206
##                                                   
## condition                                      .  
## test_feature_type                              *  
## test_feature_order                             ***
## condition:test_feature_type                    ***
## condition:test_feature_order                      
## test_feature_type:test_feature_order              
## condition:test_feature_type:test_feature_order    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

##  Family: beta  ( logit )
## Formula:          
## prevalence ~ test_feature_order + (1 | participant) + (1 | test_feature)
## Data: data_tidy
## 
##       AIC       BIC    logLik -2*log(L)  df.resid 
##   -3449.4   -3415.9    1729.7   -3459.4      5950 
## 
## Random effects:
## 
## Conditional model:
##  Groups       Name        Variance Std.Dev.
##  participant  (Intercept) 0.7390   0.8597  
##  test_feature (Intercept) 0.1375   0.3709  
## Number of obs: 5955, groups:  participant, 397; test_feature, 15
## 
## Dispersion parameter for beta family (): 3.55 
## 
## Conditional model:
##                     Estimate Std. Error z value Pr(>|z|)   
## (Intercept)         0.345480   0.108224   3.192  0.00141 **
## test_feature_order -0.008898   0.002856  -3.115  0.00184 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

There is a main effect of test feature order on prevalence, but no significant interactions with condition, test feature type, or both. This suggests that while there may be some order effects in prevalence judgments, these effects do not differ by condition or test feature type.

Footnotes

## R version 4.5.2 (2025-10-31)
## Platform: aarch64-apple-darwin20
## Running under: macOS Sequoia 15.7.3
## 
## Matrix products: default
## BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.1
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## time zone: America/New_York
## tzcode source: internal
## 
## attached base packages:
## [1] grid      stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] reticulate_1.44.1 emmeans_2.0.1     car_3.1-3         carData_3.0-5    
##  [5] glmmTMB_1.1.14    lubridate_1.9.4   forcats_1.0.1     stringr_1.6.0    
##  [9] dplyr_1.1.4       purrr_1.2.1       readr_2.1.6       tidyr_1.3.2      
## [13] tibble_3.3.1      ggplot2_4.0.1     tidyverse_2.0.0   gt_1.3.0         
## [17] scales_1.4.0      janitor_2.2.1     here_1.0.2       
## 
## loaded via a namespace (and not attached):
##  [1] Rdpack_2.6.5        gridExtra_2.3       sandwich_3.1-1     
##  [4] rlang_1.1.7         magrittr_2.0.4      multcomp_1.4-29    
##  [7] snakecase_0.11.1    otel_0.2.0          compiler_4.5.2     
## [10] mgcv_1.9-4          systemfonts_1.3.1   png_0.1-8          
## [13] vctrs_0.7.1         pkgconfig_2.0.3     crayon_1.5.3       
## [16] fastmap_1.2.0       backports_1.5.0     labeling_0.4.3     
## [19] rmarkdown_2.30      tzdb_0.5.0          nloptr_2.2.1       
## [22] ragg_1.5.0          bit_4.6.0           xfun_0.56          
## [25] cachem_1.1.0        jsonlite_2.0.0      parallel_4.5.2     
## [28] cluster_2.1.8.1     R6_2.6.1            bslib_0.10.0       
## [31] stringi_1.8.7       RColorBrewer_1.1-3  boot_1.3-32        
## [34] rpart_4.1.24        jquerylib_0.1.4     numDeriv_2016.8-1.1
## [37] estimability_1.5.1  Rcpp_1.1.1          knitr_1.51         
## [40] zoo_1.8-15          base64enc_0.1-3     Matrix_1.7-4       
## [43] splines_4.5.2       nnet_7.3-20         timechange_0.3.0   
## [46] tidyselect_1.2.1    rstudioapi_0.18.0   abind_1.4-8        
## [49] yaml_2.3.12         TMB_1.9.19          codetools_0.2-20   
## [52] lattice_0.22-7      withr_3.0.2         S7_0.2.1           
## [55] coda_0.19-4.1       evaluate_1.0.5      foreign_0.8-90     
## [58] survival_3.8-6      xml2_1.5.2          pillar_1.11.1      
## [61] checkmate_2.3.3     reformulas_0.4.3.1  generics_0.1.4     
## [64] vroom_1.6.7         rprojroot_2.1.1     hms_1.1.4          
## [67] minqa_1.2.8         xtable_1.8-4        glue_1.8.0         
## [70] Hmisc_5.2-5         tools_4.5.2         data.table_1.18.0  
## [73] lme4_1.1-38         fs_1.6.6            mvtnorm_1.3-3      
## [76] rbibutils_2.4.1     colorspace_2.1-2    nlme_3.1-168       
## [79] htmlTable_2.4.3     Formula_1.2-5       cli_3.6.5          
## [82] textshaping_1.0.4   ggthemes_5.2.0      gtable_0.3.6       
## [85] sass_0.4.10         digest_0.6.39       TH.data_1.1-5      
## [88] htmlwidgets_1.6.4   farver_2.1.2        htmltools_0.5.9    
## [91] lifecycle_1.0.5     bit64_4.6.0-1       MASS_7.3-65

Compgenerics study 9 (features) exploratory study

Marianna Zhang

2025-10-17

Summary

Methods

Participants

Exclusion criteria

Demographics

Procedure

Data processing

Computational modeling

Stimuli

All conditions

Physical condition

Diet condition

Personality condition

Predicted concepts

Physical condition

Diet condition

Personality condition

Heterogeneous condition

Primary results

Induction task

By test feature

By test feature type

By test feature type match

By cosine similarity

Overall

By test feature, vs model

Group characterization

Secondary results

Straight-lining

Test features order effects

Footnotes