Replication of ‘Latent motives guide structure learning during adaptive social choice’ by van Baar et al. (2022, Nature Human Behavior)

Author

Nora Dee (noradee@stanford.edu)

Published

December 14, 2025

Introduction

For this project, I propose to replicate a portion of Experiment 1 from van Baar et al. (2022). Its findings demonstrate that people make predictions about the behavior of others by detecting and leveraging their latent motives. This paper is of import to my undergraduate honors thesis, which explores how humans learn information that allows them to successfully generalize about others. In my thesis, I will run a behavioral experiment which may utilize these same economic games and a similar prediction task. I would run this experiment and, due to the constraints of the class timeline and my experience, conduct its analyses up until the computational modeling component.

In this experiment, in order to determine their decision-making strategy, participants first indicate how they would play each of four economic game types (the stimuli in this experiment). Then, they play four blocks of the “Social Prediction Game,” wherein in each of sixteen trials they predict what a (experimenter-generated) player would choose in these economic games and rate their confidence in their prediction. At the end, they self-report what they think the player’s strategy was in a free response format. Using t-tests, the prediction accuracy of participants is compared to what it would be under a few potential learning strategies humans may use to investigate how plausible they are. I plan to test some of the more basic hypotheses that experimenters found evidence against here. More specifically, they found that participants don’t 1) simply expect players to repeat their past behavior, 2) refrain from generalizing across trials, or 3) engage in a form of “naive statistical learning.”

Relevant Links

Experiment paradigm

GitHub repository for this project

Original paper

Preregistration

Methods

Power Analysis

There are two analyses which I will attempt to replicate. For the first, the authors conducted a two-tailed one-sample t-test and reported a Cohen’s d of 1.00. Power analysis indicates that to achieve 80%, 90%, and 95% power, 10, 13, and 16 participants will be needed, respectively.

pwr.t.test(d = 1.00, power = 0.8, sig.level = 0.05, type = "one.sample", alternative="two.sided")


     One-sample t test power calculation 

              n = 9.93785
              d = 1
      sig.level = 0.05
          power = 0.8
    alternative = two.sided

pwr.t.test(d = 1.00, power = 0.9, sig.level = 0.05, type = "one.sample", alternative="two.sided")


     One-sample t test power calculation 

              n = 12.58546
              d = 1
      sig.level = 0.05
          power = 0.9
    alternative = two.sided

pwr.t.test(d = 1.00, power = 0.95, sig.level = 0.05, type = "one.sample", alternative="two.sided")


     One-sample t test power calculation 

              n = 15.0631
              d = 1
      sig.level = 0.05
          power = 0.95
    alternative = two.sided

For the second analysis, the authors conducted a two-tailed paired-samples t-test and reported a Cohen’s d of 1.80. Power analysis indicates that to achieve 80%, 90%, and 95% power, 5, 6, and 7 participants will be needed, respectively.

pwr.t.test(d = 1.80, power = 0.8, sig.level = 0.05, type = "paired", alternative="two.sided")


     Paired t test power calculation 

              n = 4.662612
              d = 1.8
      sig.level = 0.05
          power = 0.8
    alternative = two.sided

NOTE: n is number of *pairs*

pwr.t.test(d = 1.80, power = 0.9, sig.level = 0.05, type = "paired", alternative="two.sided")


     Paired t test power calculation 

              n = 5.499921
              d = 1.8
      sig.level = 0.05
          power = 0.9
    alternative = two.sided

NOTE: n is number of *pairs*

pwr.t.test(d = 1.80, power = 0.95, sig.level = 0.05, type = "paired", alternative="two.sided")


     Paired t test power calculation 

              n = 6.270878
              d = 1.8
      sig.level = 0.05
          power = 0.95
    alternative = two.sided

NOTE: n is number of *pairs*

Overall, the analyses which I will be running will require very few participants to be well-powered. The same is likely not true for the later analyses in the paper, which I will not be conducting. The authors included 150 participants in their study, indicating that “The sample size was chosen such that key effects from smaller pilot studies could be observed with high statistical power” (van Baar et al. 2022).

Planned Sample

I planned to conduct this experiment with 75 participants — half of the 150 that was used in the original study. Just as in the original study, all participants will be American adults. However, Prolific will be used as the sampling frame instead of MTurk.

Materials

The article provided the following set of instructions which were given to participants. I made only a few minor modifications: I replaced “HIT” with “study” due to the experiment being conducted on Prolific as opposed to MTurk, as well as added an estimate of time required to complete the study (20 minutes).

Welcome to the Social Prediction Game. This HIT consists of a task (the Social Prediction Game) and several questionnaires.

Social Prediction Game

This game is designed to study how we make predictions about the decisions of other people. You will observe Decision Games played by pairs of other people. These people took part in a previous experiment (in 2015) where they played these Decision Games. These people could earn money in these Decision Games: the more points they earned, the more money they earned. Therefore, these people were motivated to play the Decision Games well.

Goal of the task

In the Social Prediction Game, your job is to predict the choices that other people (the Players) have already made in these Decision Games. The current Player you will be asked to follow will always be indicated by their initials, for example A.B. You will see this Player play 16 Decision Games, each time with a different Opponent. Keep in mind, these scenarios were really played out between these people. Your job is to predict what action the current Player (for example A.B.) will take in each scenario. You do NOT have to predict how the Opponent will decide, just the current Player.

Earning a bonus

If you predict correctly what the Player does in each scenario, you will earn a Point. The more Points you earn, the more money you will earn for doing this HIT. You will see 4 different Players play 16 Decision Games each. This means you can earn at most 64 points. Each Point is worth $0.01 in the Social Prediction Game (your task). This will be added as a bonus to your base payment of $4.00.

Next steps

On the next screens, you will read more about the Decision Games and see some examples. Afterwards, you will be quizzed to make sure you fully understand the Social Prediction Game task. Then, you will be asked to indicate how you yourself would play the Decision Game, if you were a Player. Finally, you will start the actual task: the Social Prediction Game. After the game is over, you will be asked to complete several questionnaires.

NOTE: You will need to answer all quiz questions correctly to start the task and complete this HIT.

The original authors also provided a screenshot of what the Social Prediction Game looked like to participants. This was used as the model for designing my own game interface.

An example Social Prediction Game interface from the original study

An example Social Prediction Game interface in my replication

Procedure

The authors provided the following text explaining the procedure of their experiment.

The participants first read the instructions and were quizzed to ensure their understanding and filter out potential bots. The participants were then asked to indicate for each game type in the Social Prediction Game how they themselves would choose, from which we estimated the participants’ own decision strategies. They then completed the Social Prediction Game. . .

. . .The participants played four blocks of the Social Prediction Game, each block with a different Player, and were tasked with predicting the choices of this particular Player across 16 consecutive economic games. The Players always played single-shot against anonymous Opponents. Each game was presented as a 2 × 2 payoff matrix (Fig. 1a) where the Player and Opponent each have two choices: co-operation and defection. In the task, these choices were labelled by arbitrary colour words (such as blue or green) whose mapping to co-operation and defection changed on every task block.

The games varied on two features central to social interactions: risk of co-operating (here operationalized as S) and T (Fig. 1b). At T < 10 and S > 5, the games fall under a class of Harmony Game, where each player’s payoff-maximizing action aligns with the jointly payoff-maximizing action, and thus no conflict arises except through potential envy61. At T > 10 and S > 5, the games are classified as Snowdrift Games (also known as Volunteer’s Dilemmas), which are anti-coordination games where unilateral defection is preferable to mutual co-operation, but mutual defection yields the smallest payoff for all62. At T > 10 and S < 5 lie the Prisoner’s Dilemma games, which are characterized by a high value of T even if one’s opponent defects as well, and co-operation is risky as unilateral co-operation yields the lowest possible payoff33. At T < 10 and S < 5, the games are Stag Hunts, in which mutual co-operation yields the highest payoff for both, but co-operation is risky as unilateral co-operation is met with the lowest payoff63.

The task of the participants was to indicate, in each trial, what they believed the Player would choose to do in the current game, and to rate their confidence in this prediction on an 11-point scale from 0% to 100% (10% increments). They received feedback on every trial indicating whether their prediction was correct or not, and earned a US$0.01 bonus for every correct trial. At the end of 16 trials (one block), the participants self-reported what they believed the Player’s strategy was using a free-response answer box. After four blocks, the total earned bonus was presented to the participants and added to the base payment. The participants were then taken to a survey hosted on Qualtrics to finish the experiment.

The only change I made was that the survey to finish the experiment was conducted with the rest of the behavioral experiment, using jsPsych, instead of rerouting to Qualtrics.

Analysis Plan

For this replication project, I ran two of the authors’ analyses, detailed below in their own words:

One possibility is that they expect a Player to simply repeat their past behaviour due to stable preferences for co-operation10. This could be thought of as basic reinforcement learning, where the participant learns the value of predicting ‘co-operate’ and ‘defect’ for the current Player without distinguishing between different games (an approach doomed to fail in the more complex Social Prediction Game). Another possibility is that participants refrain from generalizing across games at all, because each trial is unique. Since all Players co-operate and defect on half the trials, both these strategies would yield on average 50% accuracy in our task. However, the observed accuracy was significantly greater (59.1% ± 9.1% (s.d.); two-tailed one-sample t-test: t(149) = 12.2, P < 0.001, Cohen’s d = 1.00).

A third possible strategy is naïve statistical learning, whereby participants detect the mapping between S or T and the Player’s choices (for example, learning that Inverse Risk-Averse co-operates when S < 5). Such a strategy reflects how participants learn latent structure in non-social tasks containing abstract stimuli such as coloured shapes and fractals26,27. If true, task performance should be equal across all Player strategies, as each strategy is a step function with a single change point on the S or T dimension (Fig. 1c). However, performance was much higher for human than artificial strategies (Greedy and Risk-Averse: average accuracy, 71.6% ± 10.5%; Inverse strategies: 46.5% ± 12.4%; two-tailed paired-samples t-test: t(149) = 22.0, P < 0.001, d = 1.80; Fig. 2a).

To summarize, the authors ran:

A two-tailed one-sample t-test of
\[ H_0: p = 0.5 \quad \text{vs.} \quad H_a: p \neq 0.5 \]
A two-tailed paired-samples t-test of
\[ H_0: p_H = p_A \quad \text{vs.} \quad H_a: p_H \neq p_A \]

where $p$ is accuracy, $p_H$ is accuracy against human strategies, and $p_A$ is accuracy against artificial strategies. I do not plan to run any additional analyses.

The authors did not specify any data cleaning rules, and shared their clean data but not the raw data. They also did not indicate any data exclusion rules (although, as mentioned earlier, participants were not allowed to proceed to the task without passing comprehension checks).

Differences from Original Study

In terms of sample, there are not likely to be meaningful differences between mine and that of the original authors. The only difference is that they recruited participants from MTurk, whereas I will recruit participants from Prolific. After more data is collected, average age and gender split of participants can be compared to what was reported by the authors to further investigate the similarity of the samples.

There are, however, a handful of differences to note with respect to the procedure, which I will note in the order in which they appear. First is that our comprehension checks likely differ. While the authors indicate that participants were “were quizzed to ensure their understanding and filter out potential bots,” they do not indicate what the questions or the filtering criteria were (van Baar et al. 2022). I created my own two comprehension check questions and designed the procedure to keep sending participants back to reread the instructions if they did not answer both of the comprehension checks correctly.

Additionally, while the authors provided examples of the task (the “Social Prediction Game”) to the participants before they engaged in it, I did not. The authors did not include any details about these examples, and I was not confident in how to create and walk a participant through an example in a helpful way without biasing them to think about the potential motives of the person for whom they are predicting and consequently creating demand characteristics. This choice may lead to some participants not understanding the task as well. A future iteration of my experiment may include examples if I can think of one which would be helpful to participants and not contaminate the sample.

I also added in various transitional instruction pages to the experiment because the authors did not provide the code for the experiment or all of the text presented to participants – they only provided the main instructions page.

Methods Addendum (Post Data Collection)

Actual Sample

The final sample size was 48 people. This number is fewer than what was preregistered due to an increase in the expected task length which was, in error, not propogated to the participant cost calculation, increasing costs. As a result, instead of stopping at the preregistered 75 participants, I stopped running participants when I hit my originally decided budget. As preregistered, I did not exclude any data because the original authors did not report that they did so – instead I had comprehension checks at the beginning of the experiment which participants had to pass before continuing to the task. Some of the code from the confirmatory analysis section (the first chunk below) has been moved here because the analysis of sample size, age, and gender are dependent on its outputs.

### Data Preparation
#### Load Relevant Libraries and Functions
import sys, os, glob, scipy, matplotlib
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns
import matplotlib.ticker as mtick
import json
import glob

#### Import data
proj_dir = os.path.abspath('../')
data_dir = os.path.join(proj_dir,'data/fulldata')

##### Get a list of all CSV files in the folder
all_files = glob.glob(os.path.join(data_dir, "*-anon.csv"))

##### Get the number of CSV files in the folder
num_participants = len(all_files)
print(f'There are {num_participants} participants found')

There are 48 participants found

##### Read each CSV into a DataFrame and store them in a list
list_of_dfs = [pd.read_csv(f) for f in all_files]

##### Concatenate all DataFrames in the list into a single DataFrame
df = pd.concat(list_of_dfs, ignore_index=True)

##### Examine df
print(f'The columns are {df.columns}')

The columns are Index(['view_history', 'rt', 'trial_type', 'trial_index', 'plugin_version',
       'time_elapsed', 'subjectID', 'studyID', 'sessionID',
       'overallBonusPoints', 'task', 'response', 'question_order', 'success',
       'Matrix', 'S', 'T', 'R', 'P', 'GameType', 'choice', 'GivenAns',
       'Player', 'PlayerType', 'CorrAns', 'confidence', 'ScoreNum',
       'time_on_trial', 'stimulus'],
      dtype='object')

df.head(3)

                                        view_history  ...  stimulus
0  [{"page_index":0,"viewing_time":5539},{"page_i...  ...       NaN
1            [{"page_index":0,"viewing_time":39304}]  ...       NaN
2             [{"page_index":0,"viewing_time":4101}]  ...       NaN

[3 rows x 29 columns]

##### Check how many participants to see if matches with before
print(f'There are {len(df['subjectID'].unique())} participants found')

There are 48 participants found

#### Data exclusion / filtering
print('There is no data exclusion / filtering')

There is no data exclusion / filtering

### Demographics
demographics_df = (
    df[df['task'] == 'demographics']
    .assign(
        age=lambda x: x['response'].apply(lambda v: int(json.loads(v)['age'])),
        gender=lambda x: x['response'].apply(lambda v: json.loads(v)['gender'])
    )
    .groupby('subjectID', as_index=False)
    .first()
)

#### Age
print(f'Mean age: {round(demographics_df["age"].mean())}')

Mean age: 44

#### Gender
gender = demographics_df["gender"].value_counts().reset_index()
print('Gender of Participants:')

Gender of Participants:

print(gender)

       gender  count
0        Male     28
1      Female     19
2  Non-binary      1

By contrast, the mean age in the original study was 35.4 and gender split was 95 males, 52 females, and 3 no response. My sample is about ten years older on average, and has a different, but more representative, gender split.

Differences from pre-data collection methods plan

The only difference in methods post-data collection (besides the aforementioned sample size difference and movement of a code chunk) was the correction of a copy and paste error. From the original authors’ analysis code, I accidentally copied over a t-test comparing performance against artificial strategies compared to random chance as opposed to comparing performance against all strategies compared to random chance, as I described in my preregistration and commented in my code. I corrected the error in the code below, but left the original, incorrect t-test commented out for transparency.

Results

Data preparation

#### Prepare data for analysis - create columns etc.
##### Filter for rows which hold the responses to the social prediction game
taskDat = df[df['task'] == 'socialPredictionGame']

##### Remove unnecessary columns
cols = ['rt', 'time_elapsed', 'subjectID', 'studyID', 'sessionID', 'task', 'Matrix', 'S', 'T', 'R', 'P', 'GameType', 'choice', 'GivenAns', 'Player', 'PlayerType', 'CorrAns', 'confidence', 'ScoreNum', 'stimulus', 'time_on_trial']
taskDat = taskDat[cols]

##### Examine df
taskDat.head(3)

    rt  time_elapsed   subjectID  ... ScoreNum stimulus time_on_trial
11 NaN        146455  crl6ayhdwz  ...      1.0      NaN       11987.1
13 NaN        154908  crl6ayhdwz  ...      1.0      NaN        5662.6
15 NaN        161854  crl6ayhdwz  ...      1.0      NaN        5353.6

[3 rows x 21 columns]


##### Rename columns to correspond with those used in paper
taskDat.rename(columns = {
    'subjectID': 'subID',
    'PlayerType': 'Type_Total',
    'confidence': 'Confidence',
    'ScoreNum': 'Score'
}, inplace=True)

##### Add 'Type' and 'Variant' columns from 'Type_Total'
taskDat[['Type', 'Variant']] = taskDat['Type_Total'].str.split('_', expand=True)

Confirmatory analysis

Statistical tests

Here, I run the two aforementioned t-tests which examine whether 1) participants perform better than chance and 2) participants before better on human versus artificial strategies.

# Conduct t-tests (code provided by authors, with some small modifications)

## Calculate the mean score for each subject, for each condition
meanPerSubCondition = taskDat.groupby(['subID','Variant'], as_index=False)['Score'].mean().pivot(
    index='subID', columns='Variant', values='Score')
meanPerSubCondition.head()

Variant         inv      nat
subID                       
0enj4c49hf  0.37500  0.84375
0myc60ayy5  0.37500  0.68750
1akb5zhyzv  0.43750  0.50000
43kggh8wp2  0.46875  0.65625
59sjyd8kmf  0.68750  0.56250

## Run t-test comparing overall accuracy against random choice (50%)
meanPerSub = taskDat.groupby('subID')['Score'].mean().values
t_statistic, p_value = scipy.stats.ttest_1samp(meanPerSub, .5)
# Previous, incorrect t-test:
# t_statistic, p_value = scipy.stats.ttest_1samp(meanPerSubCondition['inv'], .5)
print(f"T-statistic: {t_statistic:.3f}")

T-statistic: 5.207

print(f"P-value: {p_value:.3f}")

P-value: 0.000

## Run t-test comparing accuracy against natural vs. artifical strategies
t_statistic, p_value = scipy.stats.ttest_rel(meanPerSubCondition['inv'],meanPerSubCondition['nat'])
print(f"T-statistic: {t_statistic:.3f}")

T-statistic: -5.655

print(f"P-value: {p_value:.3f}")

P-value: 0.000

We fail to reject the null hypothesis for both t-tests. There is statistically significant evidence that participants performed better than random chance in the Social Prediction Game (t(47) = 5.21, p = < .001) and that they performed better against the human strategies than against the artificial strategies (t(47) = -5.66, p = < .001).

Original graph versus graph from replication

Here, I only show a replication of the graph on the left because the graph on the right visualizes data from the authors’ Study 2, which I chose not to replicate.

# Create Figure 2 Panel A (code provided by authors, with some small modifications)
sns.set_context('poster')
blockDat = (taskDat.groupby(['subID', 'Variant'],as_index=False)[['Confidence', 'Score']].mean())
fig, ax = plt.subplots(1,1,figsize=[6,5])
sns.barplot(data=blockDat,x='Variant',y='Score', ax=ax, errwidth = 3, capsize=.1,
            order=['nat','inv'],alpha=0, errcolor='red')
sns.swarmplot(data=blockDat,x='Variant',y='Score', ax=ax,
            order=['nat','inv'], alpha=.3, color = 'k')
ax.plot([-5,5],[.5,.5], 'k--', lw=2)
ax.set(ylim = [0,1.1], xlim = [-.5,1.5], xlabel = None, yticks = [0,.25,.5,.75,1],
       title = 'Performance by strategy type',
       xticklabels = ['Human\nStrategies', 'Artificial\nStrategies'], ylabel = 'Accuracy     ');

<string>:1: UserWarning: set_ticklabels() should only be used with a fixed number of ticks, i.e. after set_ticks() or using a FixedLocator.

dat1 = blockDat.loc[blockDat['Variant']=='nat','Score'].values
dat2 = blockDat.loc[blockDat['Variant']=='inv','Score'].values
stats = scipy.stats.ttest_rel(dat2,dat1)
sns.despine(top=True,right=True)
ax.spines['left'].set_bounds(0,1)
ax.yaxis.set_major_formatter(mtick.PercentFormatter(xmax=1))
plt.subplots_adjust(left=0.25, bottom=0.3)
plt.show()

Exploratory analysis

Effect size of statistical tests

# For one-sample t-test: overall performance vs. random chance
x_bar = meanPerSub.mean()
mu = 0.5
sample_sd = np.std(meanPerSub, ddof=1)

d_test1 = (x_bar - mu) / sample_sd

print(f"Effect size for t-test comparing overall performance to random chance: {round(d_test1,2)}")

Effect size for t-test comparing overall performance to random chance: 0.75

# For paired-samples t-test: performance against human vs. artificial strategies
differences = y = meanPerSubCondition['nat'] - meanPerSubCondition['inv']
d_test2 = differences.mean() / differences.std(ddof=1)

print(f"Effect size for t-test comparing performance against human vs. artifical strategies: {round(d_test2,2)}")

Effect size for t-test comparing performance against human vs. artifical strategies: 0.82

As we discussed in this class, effect sizes in the literature can sometimes be inflated relative to the true underlying effect sizes. For the t-test comparing overall performance to random chance I found an effect size of 0.75 while the original authors found an effect size of 1.00. Similarly, for the t-test comparing performance against human versus artificial strategies, I found an effect size of 0.82 while the original authors found an effect size of 1.80. While I found relatively strong effect sizes for both statistical tests, they were in fact both smaller than those found by the original authors.

Time spent on trials

### Average spent on trial per participant
trial_time_df = (taskDat.groupby(['subID']).mean('time_on_trial')) / 1000
plt.figure()
plt.hist(trial_time_df['time_on_trial'], bins=20)
plt.xlabel('Mean Time Per Trial (seconds)')
plt.ylabel('# Participants')
plt.title('Average Trial Time Per Participant')
plt.xticks([5,10,15,20,25])

([<matplotlib.axis.XTick object at 0x13468ba70>, <matplotlib.axis.XTick object at 0x134578680>, <matplotlib.axis.XTick object at 0x134689be0>, <matplotlib.axis.XTick object at 0x1347017c0>, <matplotlib.axis.XTick object at 0x1346a2960>], [Text(5, 0, '5'), Text(10, 0, '10'), Text(15, 0, '15'), Text(20, 0, '20'), Text(25, 0, '25')])

plt.tight_layout()
plt.show()

### Overall average time spent on trials
print(f'Overall average trial time: {round(trial_time_df["time_on_trial"].mean(), 2)} seconds')

Overall average trial time: 7.54 seconds

print(f'Overall median trial time: {round(trial_time_df["time_on_trial"].median(), 2)} seconds')

Overall median trial time: 5.66 seconds

### Shortest trial times
shortest_times = trial_time_df['time_on_trial'].nsmallest(10)
print('10 Shortest Average Trial Times')

10 Shortest Average Trial Times

print(shortest_times)

subID
x921ptf2rz    2.938205
f2wl9saw2g    3.264986
vdudmljtvt    3.492744
1akb5zhyzv    3.558403
e2brtg5dde    3.800056
dvwu8xpo50    3.841653
awbqx0cg6x    3.873020
7udswjmebh    4.028562
0myc60ayy5    4.214031
ypm9ekzamj    4.224108
Name: time_on_trial, dtype: float64

Prior to running the study, I had concern that many participants would complete the trials without putting much thought into their answers, given the financial incentives of doing so. So as an exploratory analysis, I decided to explore trial completion time. The average amount of time spent on trials by participant is left-skewed, with a few participants having noticeably larger average trial times. The mean and median overall time spent per trial are 7.54 and 5.66 seconds, respectively. These times are larger than it would take to click through the trials as fast as possible. Nonetheless, a small number of participants do have questionably low average trial times: a higher performance-based bonus payment could be utilized in the future if this study were to be run again to provide a stronger financial incentive to think hard about one’s answer.

Discussion

Summary of Replication Attempt

The primary result of the confirmatory analysis aligned with that of the original researchers: participants perform better than random chance on the Social Prediction Game, and perform better against human strategies than artificial strategies. Therefore, in order to predict behavior in this social task, participants learned the latent structure linking game incentives to choice – a structure about which they held strong prior expectations. Accordingly, we see that visual pattern of the data in Figure 2a appears highly similar to that of the graph I generated to match it.

However, effect size diminished in this replication. While they were still relatively large, Cohen’s d calculated for both statistical tests replicated in this project were smaller than those found by the original authors.

Overall, the experiment I chose from the original paper replicated (albeit with smaller effect sizes). However, confidence in the results presented here are limited in that, due to budget constraints, I did not meet my pregistered sample size .

Commentary

Finally, for transparency, I would like to note a mistake I made in the preregistration process. I set up my preregistration but then didn’t realize I never pressed the final submit button until after I already started collecting a few participants in my full sample (it was my first time creating a preregistration in OSF). However, no changes were made to the preregistration in this time. In the future, I will check that my registration is viewable when logged out of my OSF account before I start collecting a full sample.

References

van Baar, Jeroen M, Matthew R Nassar, Wenning Deng, and Oriel FeldmanHall. 2022. “Latent Motives Guide Structure Learning During Adaptive Social Choice.” Nature Human Behaviour 6 (3): 404–14.