Uncovering language markers of (classes of) psychological mechanisms underyling decisions under risk and uncertainty

The preliminary preliminary findings
FoKo 3

Sabou Rani Stocker

03.03.2025

Natural language as data?

???

Natural language as data?

Experimental Setup

Natural language as data?

Experimental Setup

Research Questions

Question 1

To what degree can natural language data from think-aloud protocols be used to reliably classify different classes of experimentally induced psychological mechanisms?

Question 2a

To what extent are the language markers associated with each psychological mechanism unique to their respective class (vs. co-occurring across multiple classes)?

Question 2b

If there are language markers that overlap across classes of mechanisms, to what extent can classes of decision mechanisms still be distinguished from each other?

Experimental setup

Agenda

Experimental setup

Analysis

Results

Overview

Four conditions:

  • Choice attributes (Baseline Condition)
  • Social norms (Social Influence)
  • Knowledge (Experience and Knowledge)
  • Need (Goals and Motivation)

Highlights and learnings (1)

Highlights and learnings (2)

  • Testing on all devices and browser engines (Safari is my enemy)
  • If you have limitations that cannot be resolved with preselection filters, write them at the top of your description
  • Get yourself an Olivia as a supervisor1

Population description

177 137 4 1 0 50 100 150 200 female male other none Gender count Gender distribution Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 Mean: 44.4 0.00 0.01 0.02 0.03 20 40 60 Age Density Age distribution 26 52 69 101 28 19 13 11 0 30 60 90 less10 10to20 20to30 30to50 50to70 70to100 more100 noresponse Income range Count Income range distribution 20 8 196 46 18 30 1 0 50 100 150 200 none student salaried self homemaker em_other noresponse Employment status Count Employment status distribution 48 50 130 69 16 2 4 0 50 100 mandatory VET bachelors masters doctor ed_other noresponse Education Count Education distribution

Analysis

Agenda

Experimental setup

Analysis

Results

Methods

all examples
all examples
divergent examples
divergent examples
all vs. all
all vs. all
1 vs. 1
1 vs. 1
1 vs. all
1 vs. all
"sampleverse"
"sampleverse"
Text is not SVG - cannot display

Random Forest Classifier
Random Forest Classi...
Definitions
Definitions
Examples
Examples
Definitions and Examples
Definitions and Exam...
Retrieval of most similar example from dictionary of definitions
Retrieval of most si...
Retrieval of top 5 most similar examples from dictionary of definitions + probability based on ranking
Retrieval of top 5 most si...
Combination of promising methods?
Combination of promising methods?
SBERT 
SBERT 
GPT-4.0 (API)
GPT-4.0 (API)
Deepseek R1:8b (local)
Deepseek R1:8b (loc...
top shot classification
probabilistic classification
top shot classification...
"methodverse"
"methodverse"
no preprocessing
no preprocessing
preprocessing
preprocessing
normalizing
normalizing
removal of most common words
removal of most comm...
Classification by features
Classification by fe...
SBERT 
SBERT 
Classification by dictionary
Classification by di...
Classification by generation
Classification by ge...
Text is not SVG - cannot display

Methods

Random Forest Classifier
Random Forest Classi...
Definitions
Definitions
Examples
Examples
Definitions and Examples
Definitions and Exam...
Retrieval of most similar example from dictionary of definitions
Retrieval of most si...
Retrieval of top 5 most similar examples from dictionary of definitions + probability based on ranking
Retrieval of top 5 most si...
Combination of promising methods?
Combination of promising methods?
all examples
all examples
divergent examples
divergent examples
SBERT 
SBERT 
GPT-4.0 (API)
GPT-4.0 (API)
Deepseek R1:8b (local)
Deepseek R1:8b (loc...
top shot classification
probabilistic classification
top shot classification...
all vs. all
all vs. all
1 vs. 1
1 vs. 1
1 vs. all
1 vs. all
"sampleverse"
"sampleverse"
"methodverse"
"methodverse"
no preprocessing
no preprocessing
preprocessing
preprocessing
normalizing
normalizing
removal of most common words
removal of most comm...
Classification by features
Classification by fe...
SBERT 
SBERT 
Classification by dictionary
Classification by di...
Classification by generation
Classification by ge...
Text is not SVG - cannot display

Methods

Inference

Accuracy

\[ \text{Accuracy} = \frac{TP + TN}{TP + TN + FP + FN} \]

AUC-ROC

\[ \text{AUC} = \int_{0}^{1} \text{TPR}(t) \, d\text{FPR}(t) \]

Results

Agenda

Experimental setup

Analysis

Results

Manipulation effectiveness

Effectiveness of Classification

Questions?

Questions!

Methods

  • What sample is considered “divergent”?
  • Missing method- or sample-configurations?
  • Model choice?
  • Prompting: Improvement without overfitting?

Structure and reporting

  • Structure: Width vs. depth?
  • Configurations that don’t influence results?
  • Other comments?

How (horrible) was that?

Answer code:

A = Thank god it’s finally over!

B = 🦗🦗🦗 (cricket noises)

C = That was alright.

Appendix

Manipulation effectiveness for different conditions

Social norms

Knowledge

### Need

Expanding Graphs on Effectiveness of classification

Baseline condition

0.88 0.86 0.86 0.86 0.86 0.86 0.86 0.86 0.86 0.86 0.85 0.85 0.85 0.85 0.85 0.85 0.85 0.85 0.85 0.85 0.85 0.85 0.85 0.85 0.85 0.85 0.84 0.84 0.84 0.84 0.82 0.82 0.82 0.81 0.81 0.81 0.81 0.81 0.8 0.8 0.79 0.78 0.78 0.78 0.78 0.78 0.77 0.76 0.76 0.76 0.74 0.74 0.74 0.73 0.73 0.73 0.73 0.72 0.72 0.72 0.72 0.72 0.72 0.72 0.72 0.72 0.72 0.72 0.7 0.7 0.7 0.7 0.7 0.7 0.69 0.69 0.68 0.68 0.68 0.67 0.67 0.67 0.67 0.66 0.66 0.65 0.65 0.62 0.57 0.47 0.00 0.25 0.50 0.75 1.00 Accuracy in % Accuracy and AUC-ROC depending on Method and Data Treatment for Choice Attributes 0.78 0.68 0.69 0.68 0.75 0.67 0.63 0.73 0.68 0.66 0.72 0.7 0.64 0.69 0.6 0.67 0.61 0.64 0.76 0.58 0.67 0.58 0.57 0.6 0.69 0.66 0.63 0.71 0.65 0.66 0.7 0.66 0.63 0.66 0.68 0.68 0.7 0.71 0.69 0.61 0.8 0.74 0.53 0.63 0.71 0.7 0.68 0.55 0.68 0.52 0.57 0.64 0.46 0.48 0.6 0.52 0.55 0.55 0.61 0.62 0.57 0.47 0.58 0.61 0.52 0.53 0.55 0.56 0.51 0.59 0.52 0.55 0.58 0.6 0.56 0.51 0.51 0.51 0.57 0.55 0.59 0.51 0.59 0.58 0.00 0.25 0.50 0.75 1.00 AUC-ROC in % OpenAI Deepseek similarity_scores partial_all partial_CA full Both Examples Definition MAX MEAN CLS top200 normalized preprocessed none

0.8 0.78 0.76 0.75 0.74 0.73 0.72 0.71 0.71 0.71 0.7 0.7 0.7 0.7 0.69 0.69 0.69 0.69 0.68 0.68 0.68 0.68 0.68 0.68 0.68 0.67 0.67 0.67 0.66 0.66 0.66 0.66 0.66 0.65 0.64 0.64 0.64 0.63 0.63 0.63 0.63 0.62 0.61 0.61 0.61 0.61 0.6 0.6 0.6 0.6 0.59 0.59 0.59 0.58 0.58 0.58 0.58 0.58 0.57 0.57 0.57 0.57 0.56 0.56 0.55 0.55 0.55 0.55 0.55 0.55 0.53 0.53 0.52 0.52 0.52 0.52 0.51 0.51 0.51 0.51 0.51 0.48 0.47 0.46 0.00 0.25 0.50 0.75 1.00 AUC-ROC in % AUC-ROC depending on Method and Data Treatment for Choice Attributes OpenAI Deepseek similarity_scores partial_all partial_CA full Both Examples Definition MAX MEAN CLS top200 normalized preprocessed none

0.00 0.25 0.50 0.75 1.00 Accuracy in % Accuracy and AUC-ROC depending on Method and Data Treatment for One vs. X comparisons: Choice Attributes 0.00 0.25 0.50 0.75 1.00 AUC-ROC in % partial_all partial_CA full top200 normalized preprocessed none MAX MEAN CLS vs. Social Norms partial_all partial_CA full top200 normalized preprocessed none MAX MEAN CLS vs. Knowledge partial_all partial_CA full top200 normalized preprocessed none MAX MEAN CLS vs. Need partial_all partial_CA full top200 normalized preprocessed none MAX MEAN CLS vs. Other

Knowledge

0.91 0.88 0.87 0.87 0.87 0.87 0.87 0.86 0.86 0.86 0.86 0.86 0.86 0.86 0.85 0.85 0.85 0.85 0.85 0.85 0.85 0.84 0.84 0.83 0.83 0.82 0.82 0.82 0.81 0.81 0.8 0.8 0.8 0.8 0.79 0.79 0.79 0.78 0.77 0.77 0.77 0.77 0.76 0.76 0.75 0.75 0.74 0.74 0.74 0.73 0.73 0.73 0.72 0.72 0.71 0.71 0.71 0.7 0.69 0.68 0.68 0.68 0.67 0.66 0.66 0.65 0.65 0.64 0.63 0.63 0.62 0.62 0.62 0.62 0.62 0.62 0.61 0.61 0.6 0.6 0.6 0.6 0.6 0.59 0.58 0.58 0.57 0.57 0.57 0.57 0.00 0.25 0.50 0.75 1.00 Accuracy in % Accuracy and AUC-ROC depending on Method and Data Treatment for Knowledge 0.86 0.72 0.74 0.79 0.85 0.65 0.74 0.76 0.8 0.75 0.71 0.85 0.77 0.75 0.7 0.9 0.8 0.87 0.8 0.73 0.81 0.84 0.71 0.61 0.8 0.54 0.79 0.85 0.62 0.66 0.64 0.66 0.6 0.69 0.71 0.57 0.69 0.67 0.62 0.66 0.6 0.67 0.72 0.73 0.6 0.58 0.75 0.71 0.74 0.72 0.7 0.56 0.74 0.67 0.66 0.65 0.65 0.7 0.66 0.53 0.62 0.66 0.51 0.55 0.51 0.51 0.51 0.54 0.54 0.54 0.5 0.54 0.53 0.55 0.59 0.58 0.62 0.52 0.54 0.54 0.61 0.56 0.5 0.54 0.00 0.25 0.50 0.75 1.00 AUC-ROC in % OpenAI Deepseek similarity_scores partial_all partial_CA full Both Examples Definition MAX MEAN CLS top200 normalized preprocessed none

0.9 0.87 0.86 0.85 0.85 0.85 0.84 0.81 0.8 0.8 0.8 0.8 0.79 0.79 0.77 0.76 0.75 0.75 0.75 0.74 0.74 0.74 0.74 0.73 0.73 0.72 0.72 0.72 0.71 0.71 0.71 0.71 0.7 0.7 0.7 0.69 0.69 0.67 0.67 0.67 0.66 0.66 0.66 0.66 0.66 0.66 0.65 0.65 0.65 0.64 0.62 0.62 0.62 0.62 0.61 0.61 0.6 0.6 0.6 0.59 0.58 0.58 0.57 0.56 0.56 0.55 0.55 0.54 0.54 0.54 0.54 0.54 0.54 0.54 0.54 0.53 0.53 0.52 0.51 0.51 0.51 0.51 0.5 0.5 0.00 0.25 0.50 0.75 1.00 AUC-ROC in % AUC-ROC depending on Method and Data Treatment for Knowledge OpenAI Deepseek similarity_scores partial_all partial_CA full Both Examples Definition MAX MEAN CLS top200 normalized preprocessed none

0.00 0.25 0.50 0.75 1.00 Accuracy in % Accuracy and AUC-ROC depending on Method and Data Treatment for One vs. X comparisons: Knowledge 0.00 0.25 0.50 0.75 1.00 AUC-ROC in % partial_all partial_CA full top200 normalized preprocessed none MAX MEAN CLS vs. Baseline Condition partial_all partial_CA full top200 normalized preprocessed none MAX MEAN CLS vs. Social Norms partial_all partial_CA full top200 normalized preprocessed none MAX MEAN CLS vs. Need partial_all partial_CA full top200 normalized preprocessed none MAX MEAN CLS vs. Other

Social norms

0.86 0.86 0.86 0.86 0.86 0.86 0.86 0.86 0.86 0.86 0.86 0.86 0.86 0.85 0.85 0.85 0.85 0.85 0.85 0.85 0.85 0.85 0.85 0.85 0.85 0.84 0.84 0.84 0.84 0.84 0.84 0.83 0.83 0.82 0.82 0.82 0.82 0.82 0.81 0.81 0.81 0.81 0.8 0.8 0.79 0.79 0.79 0.79 0.79 0.79 0.79 0.75 0.75 0.74 0.73 0.73 0.73 0.71 0.7 0.7 0.69 0.69 0.69 0.69 0.68 0.68 0.68 0.68 0.68 0.68 0.67 0.66 0.66 0.66 0.65 0.65 0.65 0.65 0.65 0.64 0.64 0.63 0.63 0.63 0.63 0.63 0.62 0.61 0.6 0.58 0.00 0.25 0.50 0.75 1.00 Accuracy in % Accuracy and AUC-ROC depending on Method and Data Treatment for Social Norms 0.52 0.71 0.61 0.64 0.64 0.74 0.67 0.78 0.53 0.7 0.62 0.67 0.62 0.67 0.63 0.68 0.65 0.58 0.53 0.66 0.65 0.63 0.59 0.59 0.74 0.66 0.71 0.61 0.67 0.83 0.52 0.74 0.72 0.64 0.72 0.67 0.55 0.56 0.63 0.68 0.67 0.57 0.64 0.67 0.72 0.56 0.61 0.57 0.69 0.71 0.62 0.66 0.7 0.61 0.65 0.61 0.68 0.59 0.66 0.58 0.64 0.61 0.6 0.6 0.56 0.57 0.55 0.57 0.57 0.57 0.61 0.61 0.56 0.6 0.54 0.56 0.6 0.6 0.6 0.55 0.58 0.58 0.61 0.58 0.00 0.25 0.50 0.75 1.00 AUC-ROC in % OpenAI Deepseek similarity_scores partial_all partial_CA full Both Examples Definition MAX MEAN CLS top200 normalized preprocessed none

0.83 0.78 0.74 0.74 0.74 0.72 0.72 0.72 0.71 0.71 0.71 0.7 0.7 0.69 0.68 0.68 0.68 0.67 0.67 0.67 0.67 0.67 0.67 0.67 0.66 0.66 0.66 0.66 0.65 0.65 0.65 0.64 0.64 0.64 0.64 0.64 0.63 0.63 0.63 0.62 0.62 0.62 0.61 0.61 0.61 0.61 0.61 0.61 0.61 0.61 0.61 0.6 0.6 0.6 0.6 0.6 0.6 0.59 0.59 0.59 0.58 0.58 0.58 0.58 0.58 0.57 0.57 0.57 0.57 0.57 0.57 0.56 0.56 0.56 0.56 0.56 0.55 0.55 0.55 0.54 0.53 0.53 0.52 0.52 0.00 0.25 0.50 0.75 1.00 AUC-ROC in % AUC-ROC depending on Method and Data Treatment for Social Norms OpenAI Deepseek similarity_scores partial_all partial_CA full Both Examples Definition MAX MEAN CLS top200 normalized preprocessed none

0.00 0.25 0.50 0.75 1.00 Accuracy in % Accuracy and AUC-ROC depending on Method and Data Treatment for One vs. X comparisons: Social Norms 0.00 0.25 0.50 0.75 1.00 AUC-ROC in % partial_all partial_CA full top200 normalized preprocessed none MAX MEAN CLS vs. Baseline Condition partial_all partial_CA full top200 normalized preprocessed none MAX MEAN CLS vs. Knowledge partial_all partial_CA full top200 normalized preprocessed none MAX MEAN CLS vs. Need partial_all partial_CA full top200 normalized preprocessed none MAX MEAN CLS vs. Other

Need

0.84 0.73 0.72 0.72 0.69 0.69 0.68 0.68 0.68 0.68 0.68 0.68 0.67 0.66 0.66 0.66 0.66 0.65 0.64 0.64 0.64 0.63 0.63 0.63 0.63 0.62 0.62 0.62 0.62 0.61 0.61 0.61 0.61 0.61 0.6 0.6 0.59 0.59 0.58 0.58 0.58 0.58 0.58 0.58 0.58 0.58 0.58 0.58 0.57 0.57 0.57 0.57 0.57 0.57 0.57 0.56 0.56 0.56 0.56 0.56 0.56 0.56 0.56 0.56 0.55 0.55 0.55 0.54 0.54 0.54 0.53 0.53 0.52 0.51 0.51 0.51 0.5 0.48 0.48 0.48 0.47 0.47 0.46 0.45 0.44 0.44 0.43 0.4 0.35 0.28 0.00 0.25 0.50 0.75 1.00 Accuracy in % Accuracy and AUC-ROC depending on Method and Data Treatment for Need 0.67 0.66 0.65 0.66 0.66 0.68 0.64 0.55 0.61 0.63 0.64 0.55 0.54 0.63 0.56 0.55 0.57 0.57 0.54 0.56 0.58 0.55 0.57 0.6 0.52 0.58 0.64 0.59 0.54 0.54 0.58 0.54 0.56 0.57 0.75 0.57 0.67 0.56 0.65 0.71 0.65 0.72 0.62 0.69 0.65 0.71 0.7 0.56 0.6 0.6 0.69 0.64 0.61 0.66 0.68 0.74 0.73 0.74 0.73 0.69 0.61 0.65 0.64 0.66 0.58 0.66 0.58 0.57 0.6 0.57 0.62 0.6 0.54 0.59 0.57 0.57 0.54 0.5 0.6 0.64 0.62 0.57 0.62 0.64 0.00 0.25 0.50 0.75 1.00 AUC-ROC in % OpenAI Deepseek similarity_scores partial_all partial_CA full Both Examples Definition MAX MEAN CLS top200 normalized preprocessed none

0.75 0.74 0.74 0.73 0.73 0.72 0.71 0.71 0.7 0.69 0.69 0.69 0.68 0.68 0.67 0.67 0.66 0.66 0.66 0.66 0.66 0.66 0.65 0.65 0.65 0.65 0.65 0.64 0.64 0.64 0.64 0.64 0.64 0.64 0.63 0.63 0.62 0.62 0.62 0.62 0.61 0.61 0.61 0.6 0.6 0.6 0.6 0.6 0.6 0.59 0.59 0.58 0.58 0.58 0.58 0.58 0.57 0.57 0.57 0.57 0.57 0.57 0.57 0.57 0.57 0.57 0.56 0.56 0.56 0.56 0.56 0.55 0.55 0.55 0.55 0.54 0.54 0.54 0.54 0.54 0.54 0.54 0.52 0.5 0.00 0.25 0.50 0.75 1.00 AUC-ROC in % AUC-ROC depending on Method and Data Treatment for Need OpenAI Deepseek similarity_scores partial_all partial_CA full Both Examples Definition MAX MEAN CLS top200 normalized preprocessed none

0.00 0.25 0.50 0.75 1.00 Accuracy in % Accuracy and AUC-ROC depending on Method and Data Treatment for One vs. X comparisons: Need 0.00 0.25 0.50 0.75 1.00 AUC-ROC in % partial_all partial_CA full top200 normalized preprocessed none MAX MEAN CLS vs. Baseline Condition partial_all partial_CA full top200 normalized preprocessed none MAX MEAN CLS vs. Social Norms partial_all partial_CA full top200 normalized preprocessed none MAX MEAN CLS vs. Knowledge partial_all partial_CA full top200 normalized preprocessed none MAX MEAN CLS vs. Other