Group Coordination Goals

Hivemind

A casual Reddit poll was posted with the goal of coordinating an 80/20 split amongst anonymous internet strangers. The poll asked people to choose two options (A or B) and said that the goal was for 80% of people to choose option A and 20% to choose option B. The poll was pretty successful: the group achieved a 79%/21% split.

Someone else then ran another set of polls with varying splits: 70/30, 50/50, 80/20, etc. The most successful was the 80/20 though it was closer to 75/25.

In this project, we are investigating when and how people are able to coordinate group level shared goals with very little (or no) information about others’ actions.

How are people deciding which option to choose?
Do people’s choices vary across situations, or does someone who chooses option A always choose the majority option, and vice versa?
If people’s choices vary across situations, what cues are they using about themselves and their perceived group to calibrate their choices?
How are people conceptualizing the group they are participating with, given that they have little to no information about the other participants?

Revelant References

To open these, open in new tab/window

Woolley et al. (2010) Collective intelligence

Centola (2022) Network science of collective intelligence

Freeman et al (2020) Social and general intelligence improves collective action in a common pool resource system

As an initial proof of concept, we ran a replication of the Reddit Hivemind poll with students around Grounds. We tabled in front of Garrett Hall from 09/14/23 to 10/09/23. Participation was voluntary. Goal was to collect at least 200 responses.

Participants responded to a simple two question survey:

The goal of this survey is for 80% of respondents to select option A and 20% to select option B. Please make your selection below.
How did you decide to choose [chosen option]? Write down whatever came to mind as you were choosing.

Qualtrics title: Group Coordination Pilot

Results

We collected a total of 224 responses.


    1     2 
70.09 29.91

The group did not achieve an 80/20 split, it was much closer to 70/30.

Some notable themes that I observed in the free response question:

Evaluate self as part of majority
- I felt most people might pick option A so I picked option B
- thats what most people should do
- i consider myself part of the 80%
“I’m not like everyone else” a la TikTok cultural theme
- I’m different.
- I’m the main character
No introspective process. Reads instructions as telling them to choose A
- You told me to
- that is the purpose of your survey
- I thought I was supposed to select it
Underestimating how many others chose B
- Because I felt like everyone was going to choose A so I choose B so that my choice would help account for the 20%.
- I figured everyone would choose option A already so I picked B because there was probably less than 20 percent that selected it so far.
“Reverse psychology” or strategic variance
- Infelt like many people would try to be “different” and pick B which would sway the percentages out of the 80/20 range so i picked A to counteract one of those votes
- I felt like people were going to be inclined to pick option B as it is stated that the goal is for 20% of people to select B so I feel like more people will end up picking B, but option A is supposed to be the majority which is why I chose A
- I figured a lot of people would try to make b have the appropriate amount thinking it would naturally be overlooked and smaller, so I did the opposite to overcorrect

Pilot 1.2

In this second pilot, participants were recruited to respond to the hivemind task in exchange for a free bagel. Participants completed this study in combination with other studies [Natalie ran this, check with her for more details].

Responses were collected on 4/21 and 4/22. Qualtrics title: Group Coordination (Bagel Copy)

Results

We collected a total of 83 total responses.

Final Split:


    A     B 
64.63 35.37

estimate	statistic	p	parameter	Method	Alternative	95% CI
0.64	53.00	.001***	83.00	Exact binomial test	two.sided	[0.53, 0.74]

Do people expect success?

Warning: Using `bins = 30` by default. Pick better value with the argument
`bins`.

Warning: `geom_vline()`: Ignoring `mapping` because `xintercept` was provided.

Warning: `geom_vline()`: Ignoring `data` because `xintercept` was provided.

Warning: Removed 1 row containing non-finite outside the scale range
(`stat_bin()`).

[1] 66.28049

[1] 57.49123

[1] 87.9

Most people do not expect the group to achieve coordination. Most people expect about a 60/40 split. The mean expectation is 66% will choose option A.

Gender

Term	estimate	std.error	statistic	p
(Intercept)	0.00	1.41	0.00	1.00
gender_codedM	-0.00	2.00	-0.00	1.00

Interestingly, this is the one case where men outperformed women. The men were much closer to 80/20 than the women were. Maybe because there were only 21 men v 59 women.

Same question as pilot 1 was run on Participant Pool. Launched on 10/31. Target N = 300. Qualtrics title: Group Coordination 2 - PPool

Participants responded to following questions (new wording in bold):

The goal of this survey is for 80% of respondents to select option A and 20% to select option B. As you know, we are collecting about 300 responses from students on the participant pool. Please make your selection below.
How did you decide to choose [chosen option]? Write down whatever came to mind as you were choosing.
Do you think that this task will succeed at achieving an 80/20 split? i.e. do you think that roughly 80% of the 300 respondents will choose A and 20% will choose B? [yes/no/not sure]
[Participants who said no] You said “no”, you do not expect the task to succeed. What split do you think this task will yield?

Results

We collected a total of 293 responses. 11 were duplicate responses. Of the duplicates, I removed either the unfinished submission(s) or, if there were multiple finished submissions, I removed the submission with the later timestamp.

Final N = 286

Final split:


    A     B 
73.78 26.22

estimate	statistic	p	parameter	Method	Alternative	95% CI
0.26	75.00	< .001***	286.00	Exact binomial test	two.sided	[0.21, 0.32]

This study’s split was a little closer to 80/20 than the first pilot.

And a binomial test determines that the split is significantly different than 80/20 (p = .01)

Exploratory analyses:

Do people expect that the group task can be successful? We asked participants whether they think that the 300 respondents will be able to achieve the 80/20 split.


     yes       no not sure 
   11.89    65.73    22.38

Most people (65%) did not think that they would be able to achieve the 80/20 split. They underestimated their group’s ability to coordinate!

We asked people who said “no” to estimate what split would be achieved instead of the 80/20 goal. Participants reported what percentage of people will choose option A (meant to be 80%).

[1] 56.82639

[1] 90.97727

It seems that the doubters fell into two categories. Some people expected more than 80% of respondents to pick A. But more people expected less than 80% of respondents to pick A. Most commonly, people expected between 50 and 65% of respondents to pick A.

Exploring the doubters

Next we tested whether those who did not expect that the group could achieve an 80/20 split were more likely to have selected A or B.

For this test, I removed the ~40 something people that overestimated the percentage of people that would choose A.

Term	estimate	std.error	statistic	p
(Intercept)	-1.76	0.48	-3.63	< .001***
estimateSuccessno	0.97	0.52	1.88	.061
estimateSuccessnot sure	0.48	0.57	0.85	.396

Interestingly, those who didn’t think the group would succeed were slightly more likely to choose B - the minority option.

Ingroup identification as a predictor

We collected participants’ pretest responses, on which they responded to an ingroup identification scale. The scale consisted of three items:

How important is UVa to your own personal identity?
How similar do you feel in attitudes and opinions to other UVa students?
How strongly do you identify as a UVa student?

Items were collapsed into one variable: ingroup identification with UVa students. We tested whether ingroup identification predicted participants’ likelihood to choose A (the majority choice) or B (the minority choice).

Term	estimate	std.error	statistic	p
(Intercept)	-0.70	0.55	-1.27	.205
ingroup	-0.07	0.12	-0.58	.563

Ingroup identification did not have a significant relationship with participants’ likelihood to choose A or B.

Gender

Term	estimate	std.error	statistic	p
(Intercept)	-1.23	0.17	-7.26	< .001***
gender_codedM	0.54	0.29	1.87	.061

Males are slightly more likely to choose B than females.

We invited participants from Pilot 2 to take the same study again. Launched on 11/13 on the PPool for N = 286. Qualtrics title: Group Coordination 2.2 - PPool Part 2

Participants were invited with the following message:

Hi [First Name],

I am reaching out to let you know that you are qualified to participate in a new study - D4.1 (0.5 credits) - based on your participation in study D4.

Only those who participated in study D4 are qualified to participate in this second iteration of our study, so we encourage you to participate! You will receive another 0.5 credits upon completion of the study. You can sign up for study D4.1 on the Psychology Department’s Sona site.

Thank you!

In this iteration, participants estimated the results from pilot 2, then were told the results of pilot 2, then answered the same question again, and reported their estimate of this iteration’s split and their confidence in their estimate.

We collected a total of 286 responses in the previous study that you participated in a couple of weeks ago. Before we show you the results of that study, we want to know what you estimate the results were. What A/B split do you think was achieved in the study you participated in a couple of weeks ago?

74% of respondents chose option A and 26% chose option B. Now, we will ask you to participate in the same task again with the same group of 286 respondents.

The goal of this survey - as with the previous survey - is for 80% of respondents to select option A and 20% to select option B.
Once again, we are collecting 286 responses from the same students on the participant pool who participated in the previous iteration of this task.
Please make your selection below.

How did you choose? [free response]

What split will this study achieve?

How confident are you in that estimate?

Why do you think we invited you to a second iteration of this study? [free response]

Results

Final n = 127


    A     B 
62.99 37.01

estimate	statistic	p	parameter	Method	Alternative	95% CI
0.28	80.00	< .001***	286.00	Exact binomial test	two.sided	[0.23, 0.34]

Gender

Term	estimate	std.error	statistic	p
(Intercept)	-0.64	0.23	-2.79	.005**
gender_codedM	0.30	0.39	0.75	.451

Participant pool students completed reading the mind in the eyes measure of social sensitivity on the psych department pretest. Then, they were invited to participate in our in-lab study as part of a 3-study session. [Ask Natalie what the other two studies were, but sometimes it may have been only one other study]. Participants were collected from 2/26/2025 - 04/25/2025. Participants received class credit in exchange for completing all of the studies in their 30 minute session.

Qualtrics title: Group Coordination 3 - PPool with Reading Mind Eye

Reading the Mind in the Eyes Test

About 490 people responded to the RMET on the pretest. In general, people perform pretty well on the test.

Mean score = 25.6

SD = 5.0149427

Pilot 3

We collected a total of 129 participants to take the study in lab. Of those, less than 60 participants had completed the RMET in full on the psychology dept pretest.

There were a number of duplicate attempts on the pretest. In most cases, one of the attempts is incomplete. In those cases, I deleted the incomplete responses. In some cases, all attempts that the participant made are complete and yield different total REMT scores. This results in 2 sets of duplicates.

PID 57 took the pretest 3 times with total scores 30, 34, 33.
PID 107 took the pretest 2 times with total scores 22, 23

Overall study results


    A     B 
76.74 23.26

binomial test of achieved ratio and its distance from 80/20
estimate	statistic	p	parameter	Method	Alternative	95% CI
0.77	99.00	.378	129.00	Exact binomial test	two.sided	[0.68, 0.84]

Overall, participants achieved closer to a 75/25 split than an 80/20 split. But the binomial tests determined that it is not significantly different from 80/20

Gender

Term	estimate	std.error	statistic	p
(Intercept)	-1.34	0.29	-4.60	< .001***
gender_codedM	0.33	0.42	0.79	.431

There is no significant effect of gender on what choice people made. But women - again - were closer to an 80/20 split than men were. They achieved a 79%/21% split

Effects of social sensitivity (Reading the Mind in the Eyes Test)

How did our participants perform on the RMET?

Does gender predict performance on the RMET?

RMET score predicted by gender
Term	estimate	std.error	statistic	p
(Intercept)	27.19	0.87	31.22	< .001***
gender_codedM	-1.84	1.24	-1.48	.145

No, gender does not predict RMET performance.

The sample isn’t big enough to be able to chunk them into +1/-1SD around the mean on RMET performance. The best I can do is split them down the median and analyze the top half to the bottom half of performers. Median score is 27

With a median split, and those with the median score binned into the lower performers, the “lower performers group” is n = 30 , and the “higher performers” group is n = 24

Does performance on the RMET (i.e. social sensitivity) predict group performance on the hivemind task?

median_split	choice	n
Higher Performers	A	20
Higher Performers	B	4
Lower Performers	A	22
Lower Performers	B	8

There appears to be a difference between the group comprised of lower RMET performers and higher RMET performers. Lower RMET performers (i.e. lower social sensitivity) overshot the ratio, yielding closer to a 75/25 ratio.

In study 4, we tested whether being in a position of power would shift the way people approach the hivemind task. It is possible that a sense of power is one variable that pushes people to be more likely to choose the minority option.

To test this, we recruited passerby on campus in front of Garrett to complete a study for a free bagel. At the beginning of the study, they were ostensibly assigned to be a bagel chooser or a bagel receiver. That is, we told a random half of participants that they had been selected to choose the bagel flavor for the next 4 participants (power condition) and told the other half that they had been randomly selected to receive the bagel that another participant had chosen for them (control/no power condition). They then responded to the same hivemind task that we employed in previous studies.

Participants were collected from 10/09/2024 - 10/18/2024.

Qualtrics title: Group coordination 4 - Power manip

We collected a total of 215 responses.

condition	X80.20split	n
bagelChooser	1	80
bagelChooser	2	30
bagelReceiver	1	76
bagelReceiver	2	29

The splits achieved were closer to 70/30 than 80/20 for both bagel choosers and bagel receivers. There didn’t seem to be a difference between the bagel chooser (power condition) group and the bagel receiver group. Accordingly, we didn’t find evidence that a power manipulation shifts the way people approach the hivemind task.

Prolific study with varying group sizes (ostensibly), RMET and addition of a question to measure strategizing on the centipede game as a potential measure of ToM + backward induction. Launched on 06/20/25 for a target total N of 400. Compensation: 1.73 british pounds.

Qualtrics name: Group Coordination 5 - Prolific with varying group sizes

Method

Participants were randomly assigned to one of four conditions: group sizes of 10, 50, 100 or 500. (Note that participants were told they were in a group of 10 people, for example, but we aimed to collect an n of 100 per condition.)

Participants were told that they were being randomly assigned to a group and given the instructions that their group is tasked to achieve an 80/20 split. They were also told the number of people that would have to choose each option to achieve the split, i.e. people in the 10 group condition were told that 8 people should choose A and 2 people should choose B.

Participants then wrote their reasoning for how they chose their choice, and reported their estimate of what the group would achieve.

Next, participants completed the 36-item RMET.

Finally, participants were given instructions to imagine that they were playing in a centipede game.

They were offered a chance to take or pass the pot for each of their turns, with the game ending whenever they chose to take the pot or at the 9th turn (end of game).

Participants were thanked for their time and paid on the Prolific platform.

Useful references for the centipede game

To open these links, open in new tab/window

Brocas, I., & Carrillo, J. D. (2020). Iterative dominance in young children: Experimental evidence in simple two-person games. Journal of Economic Behavior & Organization, 179, 623-637

Gerber, A., & Wichardt, P. C. (2010). Iterated reasoning and welfare-enhancing instruments in the Centipede game. Journal of Economic Behavior & Organization, 74(1-2), 123-136

Izquierdo, S. S., & Izquierdo, L. R. centipede-test-two

Results

We collected a total of 395 complete responses.

condition	n
10	101
50	101
100	100
500	93

gender	n
man	162
man,transgender	1
non-binary	1
not listed	1
transgender	5
woman	225

Overall split results


    A     B 
82.03 17.97

binomial test of achieved ratio and its distance from 80/20
estimate	statistic	p	parameter	Method	Alternative	95% CI
0.82	324.00	.345	395.00	Exact binomial test	two.sided	[0.78, 0.86]

Split results by condition

Gender

Gender did not seem to play a role in this study. Note that this is only analyzing those that self-reported “man” or “woman”, and does not include those that self-reported as transgender or non-binary.

Reading the Mind in the Eyes

How did the participants perform on the RMET?

Overall, a normal distribution around a mean score of 21.4962025 and SD of 5.8058137. To analyze RMET score as a predictor of group level performance on the hivemind task, I subset the sample into +1 and -1 SD. Low performers scored <16 and High Performers scored >27.

The mean RMET score in the high performers groups is 30.0952381 and in the low performers group is 12.4761905

Does RMET score predict group level performance on the hivemind task?

As a group, it doesn’t look like low RMET scorers performed any differently from high RMET scorers. What if we break them into conditions?

RMET_performance	condition	n
low performers	10	14
low performers	50	17
low performers	100	14
low performers	500	18
middle performers	10	76
middle performers	50	68
middle performers	100	66
middle performers	500	59
high performers	10	11
high performers	50	16
high performers	100	20
high performers	500	16

Term	estimate	std.error	statistic	p
(Intercept)	0.16	0.16	1.05	.293
RMETscore	0.00	0.01	0.23	.817
condition50	-0.03	0.21	-0.13	.895
condition100	-0.15	0.22	-0.71	.479
condition500	-0.03	0.21	-0.14	.892
RMETscore × condition50	0.00	0.01	0.33	.742
RMETscore × condition100	0.00	0.01	0.44	.657
RMETscore × condition500	-0.00	0.01	-0.15	.884

Visually, it looks like high and low RMET scorers may be performing differently at different group sizes. However, neither RMET score, nor condition, nor their interaction significantly predict an individual’s choice for A or B.

Predictions of success

Did participants think they would succeed at achieving an 80/20 split?

On average, people thought their group would undershoot the split (i.e. that slightly too many people would choose B). Did not vary by condition.

Centipede game

Participants also imagined playing a centipede game, described above. Did their choices on the centipede game correspond to their choice on the hivemind task?

Term	estimate	std.error	statistic	p
(Intercept)	0.15	0.03	5.18	< .001***
centipede_game_takePoint	0.01	0.01	1.24	.214

Most people (n = 200) took the pot at the first turn. However, the point at which players chose to take the pot did not significantly predict their hivemind choice.

Prolific study with varying goal ratios, RMET and modified centipede game.

Qualtrics name: Group Coordination 6 - Prolific with varying goal ratios

Method

Participants were randomly assigned to one of four conditions. Their goal was to achieve a 60/40, 70/30, 80/20, or 90/10 split. Participants were recruited on Prolific and compensated 2.21 british pounds. Collection was launched on 06/23/25 with a target of N = 400; n = 100 per condition.

Procedure was identical to Pilot 5, except for two features: randomly assigned conditions determined their goal split, and all participants were told they were in a group of 100 participants.

Results

We collected a total of 400 complete responses.

condition	n
60/40	96
70/30	101
80/20	103
90/10	100

gender	n
man	180
man,not listed	1
man,transgender	1
non-binary	3
transgender	2
transgender,non-binary	1
transgender,non-binary,gender queer	1
woman	207
woman,gender queer	1
woman,man	2
woman,not listed	1

Split results by condition

Gender

Term	estimate	std.error	statistic	p
(Intercept)	-0.86	0.36	-2.39	.017*
genderwoman	0.25	0.46	0.54	.587
condition70/30	-0.36	0.51	-0.71	.475
condition80/20	-0.55	0.50	-1.09	.274
condition90/10	-0.47	0.51	-0.94	.348
genderwoman × condition70/30	-0.53	0.68	-0.78	.436
genderwoman × condition80/20	0.22	0.66	0.33	.742
genderwoman × condition90/10	-0.86	0.73	-1.18	.236

Gender did not significantly predict a participants’ likelihood of choosing A or B in any of the four conditions. Note that this is only analyzing those that self-reported “man” or “woman”, and does not include those that self-reported as transgender or non-binary.

Reading the Mind in the Eyes

How did the participants perform on the RMET?

Overall, a little bit of a left skewed distribution around a mean score of 24.2325 and SD of 5.746826. To analyze RMET score as a predictor of group level performance on the hivemind task, I subset the sample into +1 and -1 SD. Low performers scored <19 and High Performers scored >29.

The mean RMET score in the high performers groups is 31.5733333 and in the low performers group is 14.8857143

Does RMET score predict group level performance on the hivemind task?

RMET_performance	condition	n
low performers	60/40	9
low performers	70/30	23
low performers	80/20	24
low performers	90/10	14
middle performers	60/40	75
middle performers	70/30	65
middle performers	80/20	56
middle performers	90/10	59
high performers	60/40	12
high performers	70/30	13
high performers	80/20	23
high performers	90/10	27

Predictions of success

Did participants think they would succeed at achieving an 80/20 split?

Super interesting that people in this study on average generally expected their group to be able to achieve their respective splits - EXCEPT for the 90/10 group. By eyeballing it, I would estimate that groups ‘80/20’ and ‘90/10’ had fairly accurate estimates of their groups’ performance.

Centipede game

Participants also imagined playing a centipede game, described above. Did their choices on the centipede game correspond to their choice on the hivemind task?

Term	estimate	std.error	statistic	p
(Intercept)	0.35	0.07	5.09	< .001***
centipede_game_takePoint	-0.01	0.01	-0.54	.591
condition70/30	-0.09	0.10	-0.92	.357
condition80/20	-0.21	0.10	-2.16	.032*
condition90/10	-0.26	0.10	-2.64	.009**
centipede_game_takePoint × condition70/30	-0.00	0.02	-0.26	.796
centipede_game_takePoint × condition80/20	0.03	0.02	1.65	.099
centipede_game_takePoint × condition90/10	0.02	0.02	1.36	.174

Again, most people take on the first turn. And the point at which players choose to take the pot does not predict their behavior on the hivemind task

pilot 6: bring teams into lab, half of groups take hivemind before getting to know each other, half take after

pilot 7: in class, people are assigned to 60/40, 80/20, 70/30 or 90/10.

break people into small groups of 10, 15, etc

have people do task multiple times: “for this trial you have been randomly assigned to X group with X number of people…” etc
have people take timed test
- it seems that when people think too much they ‘choke’ and it reduces the success of the group
examine group level mind in the eyes scores predicting group level success
extract themes of free response in groups who succeed vs groups who dont

theory wise: interested in how individuals are thinking about the group that they are a part of and how they become in sync with each other. what is the big model? if ________ then _____ when we manipulate _______

are people picking up on the most salient dimension?

next studies would be: finding analogous situations, test those as well - is it sensitivity to norms?

sorting
- people are converging on what the relevant dimensions are
- do i need to step up in this situation?
- do i deserve the valued resource?
- do i fit in?
tell people that they are in a group of women/men or not
make choices (ostensibly) meaningful options, but with same goal. i.e. 20% of you have to choose chocolate, 80% have to choose lice

real world analogies:

self-sorting
- choosing residency specialties
risk taking?
giving ground in crowds

Note that this doesn’t include pilot 1.2, which was collected recently on a whim