This study was a replication of Jellema et al. (2024), “Social Intuition: Behavioral and Neurobiological Considerations.” This was a social and developmental psychology experiment, looking at social intuition and its relationship to autistic traits. This study asked whether participants could implicitly and quickly learn the disposition of someone towards them, as well as whether this ability is decreased as autistic traits increase. To answer this question, Jellema et al. (2024) used social stimuli in an implicit learning paradigm.
In the acquisition phase, face morphs of dynamic facial expressions (happy or angry) in which their gaze shifts towards or away from the participant were shown to participants across multiple trials. All faces were either identity A or B. Identity A and B were showing either a positive or negative disposition towards the participant. In the test phase, implicit learning was tested using composites of maximally smiling or frowning identities A and B, progressively shifting towards more of one identity than the other. Participants were tasked with indicating whether the composite face was closer to identity A or B. The smiling and the frowning faces were tested separately. The assumption was that if the participant had learned the disposition of the two identities, they would indicate the smiling face looked more like identity A (positive disposition) and the frowning face more like identity B (negative disposition). In the original experiment, there was also a nonsocial task, but that was not replicated due to feasibility on a very short timeline.
Methods
35 adult participants were recruited for this study. The study was run on Prolific and participants were compensated for their time. The experiment took approximately 10-15 minutes to complete. Google Gemini was used to generate the face morph stimuli. The experiment was coded in JsPsych.
There were two blocks of 28 trials in the acquisition phase. In the acquisition phase, the happy face condition started with gaze forward, smiling, and gradually shifted to gaze away from the participant, frowning. The angry condition started with gaze forward, frowning, and gradually shifted to gaze away from the participant, smiling. The clips were 2s long.
There were four blocks of 5 trials in the test phase. The test phase began with a combined face vertically divided into 60% maximally smiling identity A and 40% maximally smiling identity B and progressed in steps of 5% towards 60% maximally smiling identity B and 40% maximally smiling identity A. Participants saw this four times. Then the same thing was repeated for the frowning faces, and the participants saw this four times as well. Participants were asked to judge the likeness of these composite faces to the neutral faces of identities A and B. At the end, participants were asked questions probing whether they detected the contingencies in the experiment and if they did their data was excluded.
Power Analysis
In RStudio, an ANOVA power analysis using the smallest original main effect of ηp2 = 0.12 (disposition) found that I would need 30 participants to achieve a power of 80%. Thus, I planned to recruit 35 participants to account for exclusions. This is a feasible number of participants to recruit with the resources available.
Planned Sample
I plan to collect 35 participants on Prolific in one day. No screening will occur other than participants being 18 or older.
Materials
Participants were presented with short video clips (2 s) depicting the frontal face view of an actor (agent A or agent B; Figure 1). Their facial expressions and gaze directions changed smoothly over the course of the clips, displaying a natural facial movement. (Jellema et al., 2024, p. 5)
Actor A started with a happy expression looking straight ahead (at the observer), which then gradually morphed into an angry expression, while simultaneously the eye direction gradually moved away from the observer (so that in the final frame the actor looked angry away from the observer). The clips were also played backwards an equal number of times. Thus, it can be said that agent A had a positive disposition toward the observer. Agent B started with a happy expression looking away from the observer, which gradually morphed into an angry expression, while simultaneously the eye direction gradually moved toward the observer (clips were also played backwards an equal number of times). Actor B thus had a negative disposition toward the observer. (Jellema et al., 2024, p. 5)
The videos in this replication had all of the same characteristics, except instead of being computer-generated face morphs they were videos of an actual person that had been manipulated to slightly alter the identity of the actor to create identities A and B.
In the subsequent test phase, an indirect measure was used to find out whether any implicit learning had taken place. This measure involved a morph of the facial expressions of the two identities A and B, flanked by the original neutral faces of these two identities. The eyes were covered by black boxes. The morphed identity was composed of 60% of the maximally smiling actor A and 40% the maximally smiling agent B (happy morph), and then progressed in steps of 5% toward 40% of the maximally smiling actor A and 60% the maximally smiling agent B. The same procedure was followed for the frowning actors A and B. Participants had to indicate for each morphed identity whether it resembled more closely agent A (who had a positive disposition toward the observer) or agent B (who had a negative disposition toward the observer) (Jellema et al., 2024, p. 5)
The composite and neutral photos in this replication had all of the same characteristics, except instead of being computer-generated face morphs they were photos of an actual person that had been manipulated to slightly alter the identity of the actor.
The original study used the Autism Quotient (AQ), whereas the replication used the Autism Quotient-10 (AQ-10), which has 10 questions instead of 50. This adjustment was made to make the experiment shorter, which is more appropriate for an online Prolific setting.
Whether or not the participant had consciously detected the cue-identity contingencies was determined in a short debrief session, in which a series of questions were asked probing any awareness of the contingencies. (Jellema et al., 2024, p. 5)
This was followed in the replication. Specifically, there were two questions that asked “What was the disposition of Identity [A or B] towards you?”
Procedure
First, participants signed the consent form and then answered the AQ-10. The experiment began with instructions that showed participants identities A and B making neutral faces side by side, in addition to instructing them to focus on the videos and be ready for attention checks. Next, participants were shown the video clips (described in materials) forwards and backwards the same number of times and in a randomized order. There were 56 acquisition trials split across two blocks with a break in between. There were four attention checks spaced roughly evenly across the acquisition trials that asked participants to select the correct color word (i.e. If you are paying attention, select ‘yellow.’). The test trials began with a screen displaying the instructions. Then, participants were shown each set of five test images two times, alternating between the smiling and frowning conditions. Next, participants answered two questions probing whether they had detected the contingencies in the experiment. Lastly, participants were debriefed and provided their completion information for Prolific.
Analysis Plan
Load packages
library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ readr 2.1.5
✔ forcats 1.0.1 ✔ stringr 1.5.2
✔ ggplot2 4.0.0 ✔ tibble 3.3.0
✔ lubridate 1.9.4 ✔ tidyr 1.3.1
✔ purrr 1.1.0
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(ggplot2) # plottinglibrary(ggthemes) # good for making plots prettylibrary(effectsize)library(knitr)library(kableExtra)
Attaching package: 'kableExtra'
The following object is masked from 'package:dplyr':
group_rows
#the difference between the frown and smile at 50/50 (used in correlation)FrownScore <- data_long$frown50_1_cor_num + data_long$frown50_2_cor_num #sum the frown scoreFrownScore #print the frown score
# sum of scores on 50/50 trialsacc_sum <- data_complete$frown50_1_cor_num + data_complete$frown50_2_cor_num + data_complete$smile50_1_cor_num + data_complete$smile50_2_cor_numas.data.frame(acc_sum)
data_complete <- data_complete %>%rename(acc_sum =`...62`)# mean TestScoreacc_mean <-mean(TestScore)acc_mean
[1] 0.7567568
# mean of sum of scores on 50/50 trialsacc_mean_sum <-mean(data_complete$frown50_1_cor_num + data_complete$frown50_2_cor_num + data_complete$smile50_1_cor_num + data_complete$smile50_2_cor_num)acc_mean_sum
[1] 1.75
# Austim Quotient-10 mean scoreAQ_mean <-mean(data_complete$...61)
Warning: Unknown or uninitialised column: `...61`.
Warning in mean.default(data_complete$...61): argument is not numeric or
logical: returning NA
AQ_mean
[1] NA
Plot Accuracy and AQ-10 Scores
# histogram of accuracy on test trialshistogram <-ggplot(data_complete, aes(x = acc_sum)) +geom_histogram(binwidth =1, # adjust as neededfill ="#d96627",color ="black",alpha =0.7 ) +scale_y_continuous(breaks =c(0, 2, 4, 6, 8, 10, 12)) +theme_classic() +theme(legend.position ="none",# Axis line coloraxis.line =element_line(color ="darkgray"),# Tick mark coloraxis.ticks =element_line(color ="darkgray"),# Axis text coloraxis.text =element_text(color ="black"),# Title colorplot.title =element_text(color ="black", size =14, face ="bold"),# Axis label coloraxis.title =element_text(color ="black", size =12) ) +labs(title ="Distribution of Test Scores",x ="Test Score",y ="Count" )histogram # show histogram
# save histogramggsave("accdistribution.png",plot = histogram,width =5, # inchesheight =4, # inchesdpi =300)# histogram of AQ-10 scoreshistogramAQ <-ggplot(data_complete, aes(x = AQScore)) +geom_histogram(binwidth =1, # adjust as neededfill ="#d96627",color ="black",alpha =0.7 ) +scale_y_continuous(breaks =c(0, 2, 4, 6, 8, 10)) +theme_classic() +theme(legend.position ="none",# Axis line coloraxis.line =element_line(color ="darkgray"),# Tick mark coloraxis.ticks =element_line(color ="darkgray"),# Axis text coloraxis.text =element_text(color ="black"),# Title colorplot.title =element_text(color ="black", size =14, face ="bold"),# Axis label coloraxis.title =element_text(color ="black", size =12) ) +labs(title ="Distribution of AQ-10 Scores",x ="AQ-10 Score",y ="Count" )histogramAQ # show histogram
Error: ID
Df Sum Sq Mean Sq F value Pr(>F)
Residuals 27 27.59 1.022
Error: ID:Disposition
Df Sum Sq Mean Sq F value Pr(>F)
Disposition 1 1.61 1.607 1.223 0.279
Residuals 27 35.49 1.315
Error: ID:Proportion
Df Sum Sq Mean Sq F value Pr(>F)
Proportion 1 0.004 0.00357 0.016 0.9
Residuals 27 5.996 0.22209
Error: ID:Disposition:Proportion
Df Sum Sq Mean Sq F value Pr(>F)
Disposition:Proportion 1 0.175 0.1750 0.759 0.391
Residuals 27 6.225 0.2306
Error: Within
Df Sum Sq Mean Sq F value Pr(>F)
Residuals 448 55.6 0.1241
The first difference between Jellema et al. (2024) and this replication was that Jellema et al. (2024) used the AQ, and this replication used the AQ-10. The AQ-10, which has 10 questions, has been validated as working as well as the 50 question original, so this is not expected to create differences in results. The most significant difference is that the replication stimuli used videos of a real person manipulated to slightly alter the actor’s identity, creating identities A and B, while the original study used computer-generated face morphs. This may make a difference in the results as stimuli are centrally important to experimental results. Lastly, there were 56 trials in the replicated acquisition phase, but the number of trials in the original is unknown. While I have done my best to estimate how many trials there would be based on context in the original paper, if this is a significantly different number of trials, that could create differences in learning outcomes.
Methods Addendum (Post Data Collection)
Actual Sample
37 participants were collected and nine were excluded, resulting in a final sample of 28 participants. This is close to the goal of 30 participants. Participants were excluded for detecting the contingencies in the experiment, reaction time on test trials below 400ms or above 30s, and attention check reaction times above 11s. Participants also would have been excluded for missing responses on the test trials, but no one had missing trials. The most common age for participants was between 35-44 years old. Additionally, 46% of participants were men.
Differences from pre-data collection methods plan
Stimuli were not generated using Google Gemini, but instead videos were taken of a person and then manipulated to slightly alter the identity of the actor to create identities A and B. Additionally, the study took slightly less time than expected, averaging between 7-12 minutes.
Results
Data preparation
To prepare the data, I used RStudio. Full data preparation code is shown below.
Import .csv and Transpose
Data needs to be transformed into long format
#import csvdata <-read.csv(file ="/Users/wilderhartwell/Documents/jellema2024/data/final_data_JSON.csv", header =FALSE)#make tibbledata <-tibble(data)#filter out empty rows that jsPsych sometimes addsdata <- data %>%mutate(across(everything(), ~na_if(.x, ""))) %>%mutate(across(everything(), ~na_if(.x, " "))) %>%mutate(across(everything(), ~na_if(.x, "NA"))) %>%filter(!if_all(everything(), is.na))# tranpose tibbledata_long <-as.data.frame(t(data))# Rename columnscolnames(data_long) <-as.character(unlist(data_long[1, ])) # make first row the column namesdata_long <- data_long[-1, ]
Parse JSONs
The questionnaires output as JSONs that then need to be parsed before the data can be used.
#the difference between the frown and smile at 50/50 (used in correlation)FrownScore <- data_long$frown50_1_cor_num + data_long$frown50_2_cor_num #sum the frown scoreFrownScore #print the frown score
# A tibble: 6 × 2
ID Reason
<chr> <chr>
1 uycua16rbs Detected Contingencies
2 rv437srgtr Detected Contingencies
3 nz1u6exavc Detected Contingencies
4 3zmwaqv271 Detected Contingencies
5 b5183qbwfj Test Trial RT Out of Range
6 dm31sblpep Attention RT > 11000
Confirmatory analysis
A two-way repeated measures ANOVA with DISPOSITION (2 levels) and PROPORTION (5 levels) as factors was performed on participant responses to the task. This test was used to compare the means of multiple groups to see if there was a significant difference between them. A correlation was tested between the AQ-10 and a measure of implicit learning. The measure of implicit learning was the difference between scores on the 50% identity A and 50% identity B trials in the frowning and smiling conditions. This correlation test was completed to examine the potential connection between autistic traits and performance on the task.
There was no significant main effect of Disposition, meaning participants did not judge the smiling faces to be more like identity A (positive disposition), nor the frowning faces more like identity B (negative disposition). (F(1,27) = 1.223, p = 0.279, ηp2 = 0.04). There was also no main effect of Proportion (F(1,27) = 0.016, p = 0.9, ηp2 = 0.00). There was no significant interaction between disposition and proportion (F(1,27) = 0.759, p = 0.391, ηp2 = 0.03). Participants with higher autistic traits were expected to perform worse on the task, but this study found that AQ-10 scores and implicit learning were not significantly correlated (r(26) = -0.037, p = 0.851, 95% CI [-0.405, 0.341], two-tailed) (Figure 1).
Figure 1. The correlation between AQ-10 scores and the measure of implicit learning.
Exploratory analyses
As depicted in Figure 2., the majority of participants performed at chance on the task. However, 10 participants scored below chance, while only six participants scored above chance. The mean task performance is 1.75 out of four points, which is below chance.
Figure 2. The frequency of total scores on the 50% identity A and 50% identity B test trials.
The mean score of 5.93 on the AQ-10 for this sample is very close to the threshold for possible autism of six out of 10 points. This shows a relatively high level of autistic traits in the sample. Looking at Figure 3., we can see that the most common score was seven, which is above the threshold. This means the majority of scores were at or above the threshold.
Figure 3. Distribution of AQ-10 scores.
Discussion
Summary of Replication Attempt
My analyses found no main effect of disposition or proportion, contrary to Jellema et al. (2024) that found a main effect of both. Additionally, there was no correlation between AQ-10 scores and implicit learning. These results mean that Jellema et al. (2024) did not replicate here.
Commentary
One reason that the results did not replicate may have been that it was too challenging to tell the difference between the two identities. Jellema et al. (2024) indicated that the differences between identities A and B were subtle, but they may have made subtle changes to the nose and mouth, which I was not able to do because it would have distorted other aspects of the face in strange ways. I was only able to make changes to the upper half of the face, mostly changing the eyes. While the objective was for participants to identify the face based on the disposition towards them and not based on the facial features, it’s possible that having too little differences between the two faces made it too hard to learn who had which disposition.
The possibility that the two faces were too hard to distinguish is partially supported by 22 out of 28 participants scoring at or below chance with 10 of them scoring below chance. In particular, participants scoring below chance could indicate that they actually learned the inverse pattern, attributing the frowning face to the positive disposition and the smiling face to the negative disposition.
Another potential reason that most participants performed at chance was that there may not have been enough trials for learning to occur. However, I am concerned that more trials would have led more people to detect the contingencies in the experiment as four people detected the contingencies with the current number of trials.