#Load relevant libraries and functions.
suppressMessages(library(tidyverse))
suppressMessages(library(dplyr))
suppressMessages(library(ggplot2))
#Get all data file names from the data_raw folder.
folder_path <- file.path(getwd(), '..', 'data_raw')
#Get all file names in the folder, ignoring sub-directories.
file_names <- list.files(path = folder_path, full.names = FALSE, recursive = FALSE)
#Filter to include only files (exclude directories).
file_names <- file_names[!file.info(file.path(folder_path, file_names))$isdir]
#Check if there are any files in the list
if(length(file_names) == 0) {
print("No files found in the specified folder.")
} else {
#Create an empty dataframe to store all of the data
full_raw_data <- data.frame()
#Loop through each of the files
for(file in file_names) {
#Construct the full path to the file
file_path <- file.path(folder_path, file)
#Check if the file exists
if(file.exists(file_path)) {
#Read the data from the file
temp_data <- read.csv(file_path)
#Append the data to the full_raw_data dataframe
full_raw_data <- rbind(full_raw_data, temp_data)
} else {
print(paste("File not found:", file_path))
}
}
}Replication of The Origins of the Shape Bias: Evidence From the Tsimane’ by Jara-Ettinger et al. (2022, Journal of Experimental Psychology: General)
Introduction
I chose to try and rescue the replication of this experiment because I am interested in the intersection of language and spatial/object reasoning. As a developmental cognitive scientist, I also found the comparison of adults and children to be an important element relevant to my research interests. Finally, the cross-cultural comparisons made in this paper are something that I’d like to explore further in my own research, outside of the context of this project.
The stimuli that will be required to collect additional data for this experiment are available in the repository of the original replication project. The original replication collected data online from 144 U.S. adults, following the paradigm of Experiment 5 in the paper. It won’t be possible to collect additional data from Tsimane’ adults, so a replication of Experiments 6 and 7 cannot be performed. The new replication attempt will feature attention checks and a demographic questionnaire about the subject’s early geographic environment (e.g., urbanicity of current and childhood environments), given the theoretical claims in the paper about the importance of early environmental factors on the strength of the shape bias in children.
Click here for this rescue project’s Github repository. The PDF of the original paper can be found here. The experiment, hosted on cognition.run, can be accessed here.
Summary of prior replication attempt
The prior replication attempt tried to replicate only Experiment 5 from the original paper (the original paper had seven total experiments), which tested U.S. adults online using Amazon’s Mechanical Turk. While the replication attempt does not specify how they collected the data, I assume that they collected the data using Prolific instead of MTurk.
The biggest difference between the original study and the first replication is that the original authors used different stimuli from the first replication author. The stimuli in the original paper used the images shown in Table 1, but the first replication used the images shown in Figure 1. The replication author contacted the first author of the study for the original image files, but they were lost, so the replication author used screenshots of the shapes taken from the figures in the paper.
The demographics of the sample were the same for both the original study (specified as U.S. adults on MTurk) and the first replication (specified as English-speaking adults from the U.S.). The original study had a sample size of N=144 U.S. adults, and the first replication had a sample size of N=142 U.S. adults, with a planned sample of N=144, but two participants did not complete the study and the replication author did not collect two additional data points.
There are two main analyses:
- Calculating the percentage of participants who chose the object based on shape congruency with the exemplar, the percentage of participants who chose the object based on material congruency with the exemplar, and the percentage of participants who chose the object based on color congruency with the exemplar.
In the original study, the authors used a binomial test with chance set to 1/3 to test whether U.S. adults were significantly more likely to choose the shape match as opposed to the color or material matches within just the U.S. adult experiment. The first replication did not run this test.
- Running a logistic mixed-effects model that predicted the participant’s preference for the shape-match object.
In the original study, the authors used a baseline probability of 33.3% with the population and the age group dummy coded as independent variables. To control for the role of exemplar, the regression included random intercepts for the experiment (1 | Experiment), random intercepts for the exemplar object (Example), random slopes for population as a function of exemplar (1 + Location + AgeGroup | Example), and random slopes for age group (1 + Location + AgeGroup | Example) as a function of exemplar: glmer( Choice == "shape" ~ offset(Offset) + Location + AgeGroup + (1 + Location + AgeGroup | Example) + (1 | Experiment) ). The population by age interaction effect (Location * AgeGroup) model was not included based on a nested model comparison; instead, only the main effects of Location and AgeGroup were tested without their interaction (Location + AgeGroup). The Offset variable was set to the log-odds probability of 1/3 or 33.3% (logit(1/3)).
In the first replication, the author used a random effects model as their key analysis of interest, with random intercepts for the exemplar object (Example), testing for a participant’s preference for shape as a function of the exemplar object: glmer( Choice == "shape" ~ Location + (1 + Location | Example) ). Notably, the first replication author did not include the offset(Offset) of the log-odds probability of 1/3. They also removed the random slopes for age group as a function of exemplar ((1 + Location | Example) instead of (1 + Location + AgeGroup | Example)) since they were not comparing the adult data to the child data. Using this random effects model, the first replication author obtained an unsignificant result with a p-value of 0.318 and a small estimate of 1.138.
Methods
Power Analysis
Because of uncertainty about the true effect size, I am going to use a small telescopes approach and run 3 times the original sample size (N=144) for a total of N=432 participants, which means 8 subjects per condition (54 conditions in total). While the main hypothesis for the failed first replication is the use of the wrong stimuli, running 3 times the sample size will also increase the power.
Planned Sample
The sample size for Experiment 5 will be N=432 U.S. adults, whose primary language is English, recruited on Prolific. I will stop the study when I reach 432 participants who complete the task. I will also collect the following demographic information from the adult participants:
- Age: How old the participant is (in years).
- Nationality:
- If the participant was born in the USA.
- If the participant spent more than a year in the USA before age 10.
- If the participant currently lives in the USA.
- Geographic Location:
- The first three characters or digits of the participant’s current zipcode.
- The first three characters or digits of the zipcode where the participant spent the most time in growing up.
- Urbanicity:
- How the participant would describe their current home location on a scale from 1-100, with 1 being very rural, 100 being very urban, and 50 being suburban.
- How the participant would describe their childhood home location (if multiple, they are asked to choose the one they spent the most time in before 18 years old) on a scale from 1-100, with 1 being very rural, 100 being very urban, and 50 being suburban.
- Languages:
- The first language the participant learned.
- All languages that the participant currently speaks.
Materials
“Stimuli consisted of solid objects that varied in shape, color, and material…Each experiment consisted of three example objects and three extension objects. Each participant saw only one example object (counterbalanced across participants) and all extension objects…Experiment 5 which used photographs of the objects because it was conducted online” (Jara-Ettinger et al, 2022).
Procedure
Experiment 5 was a “one-shot learning trial and each participant completed one trial only. Although each trial only required one label, we used three different possible labels, randomized across participants. In the experiments with U.S. participants, the example object was called a koba, dax, or fep…Participants saw a single screen where the top said ‘This is a(n) x’ along with a picture of the object. Below, the text read ‘One of these is also a(n) x’ along with three pictures of the three possible extension choices. The text below read ‘Which one is the other x?’ Participants were allowed to select one of the three objects” (Jara-Ettinger et al, 2022).
Controls
“To ensure that participants were attending to the task, we also asked participants what each object shared in common with the original object. These questions were only included to motivate participants to look at the images carefully” (Jara-Ettinger et al, 2022). In the original study, answers to these questions were not used as exclusion criteria, but in this study, I will exclude participants who have more than one incorrect answer.
Analysis Plan
Participants will be excluded if they did not answer the main question of interest (choosing an extension object) or if they answer more than one attention check question incorrectly, asking how each extension object relates to the exemplar object. Data will be downloaded from Prolific and scrubbed of all identifying information (e.g., IP addresses) and then cleaned. I will conduct one main analysis and three exploratory analyses:
Main analysis:
- A binomial test with chance set to 1/3 tested whether U.S. adults were significantly more likely to choose the shape match as opposed to the color or material matches within just the U.S. adult experiment. This test matches the statistical test run by the original authors.
Exploratory analyses:
- A random effects model that takes into account the stimulus-level random effects by testing against chance within just the U.S. adult sample:
glmer( Choice=="shape" ~ offset(Offset) + 1 + (1 + Example) ). This model tests, given an offset of 1/3, if the intercept is significantly different from chance while accounting for random effects. - A random effects model, with random intercepts for the exemplar object (
Example), testing for a participant’s preference for shape as a function of the exemplar object and the population group (Location):glmer( Choice=="shape" ~ offset(Offset) + Location + (1 + Location | Example) ). The main difference between the model in the first replication and this model is the use of theOffsetvariable, set to the log-odds probability of 1/3. - The original paper’s logistic mixed-effects model that predicts the participant’s preference for the shape-match object:
glmer( Choice == "shape" ~ offset(Offset) + Location + AgeGroup + (1 + Location + AgeGroup | Example) + (1 | Experiment) ). I will swap out the USA adult data from the original paper with my new USA adult data but keep all of the other data the same (USA children, Tsimane’ adults, and Tsimane’ children).
Clarify key analysis of interest: I am primarily interested in replicating the result in the original paper, using the binomial test statistic described under “Main analysis” above, that U.S. adults are more likely to choose the extension object that is matched on the shape property compared to the color or material matches.
Differences from Original Study and First Replication
The only known major difference between this plan and the first replication is that the stimuli are images of the physical objects as opposed to artistic renderings of the objects. The random effects model will also include the Offset variable, set to the log-odds probability of 1/3.
The main difference between this plan and the original study is that I collected more demographic information from the participants for exploratory analyses not conducted in the original study if the main analysis of interest did not replicate. I also excluded participants who answered more than one attention check question incorrectly when indicating what each object has in common with the exemplar object.
Methods Addendum (Post Data Collection)
Actual Sample
The data were collected in eight batches of N=54 subjects each, one at a time so that subjects who participated in the previous batches could be excluded from participating. Each batch had 54 unique conditions, one for each subject who participated. The anticipated total number of subjects was N=432. Across the eight batches, N=426 unique subjects participated in the study. Subjects opted out of participating in the study by not answering or completing the study, and rows in the dataset that had NA values for the main question of interest were excluded. Despite implementing an exclusion rule for each subsequent experimental batch, N=9 subjects participated twice. However, since duplicate participation was not an exclusion criteria in the original analysis plan, these subjects were not removed from the final dataset.
Differences from pre-data collection methods plan
None.
Results
Data preparation
This section contains the code necessary to prepare the data for analysis. First, load in the raw data. The data were collected in 8 batches of 54 subjects, where each subject saw a different condition in each batch. The data were collected in batches because cognition.run limits the number of subjects that can be run at a time. We’re going to import and stitch together the eight data files here.
Check the raw data to make sure that all of the images for the subjects loaded successfully (they should, or they would not have been able to complete the study).
suppressMessages(library(stringr))
#If true, then all images loaded successfully for all subjects.
print(paste("Images loaded successfully for all subjects:", sum(str_detect(full_raw_data$success, 'false')) == 0)) [1] "Images loaded successfully for all subjects: TRUE"
Check how many unique subjects there are based on their PROLIFIC_PID.
subject_check <- full_raw_data |>
group_by(PROLIFIC_PID) |>
summarize(numRows = n())
#If a subject completed the study once, they should have 18 rows.
repeat_subjects <- subset(subject_check, numRows > 18)
cat(nrow(repeat_subjects), "subjects did the study more than once.")9 subjects did the study more than once.
Now clean the data. First remove columns that are unnecessary for the main confirmatory analyses and filter out subjects not relevant to the full sample (i.e., pilot subjects). Remove subjects who used phones to complete the study. Then parse the demographic responses into columns.
#Since the data were collected in 8 batches, there are repeat subject IDs (`batchSubjectID`). Create a new column called `subjectID` and make each one unique to the combined dataset. Each subject has one 'fullscreen' trial so use that to assign a unique subject ID.
suppressMessages(library(data.table))
#Convert the dataframe to a data.table
setDT(full_raw_data)
#Create a new column 'subjectID' based on unique subject codes
full_raw_data[, subjectID := .GRP, by = PROLIFIC_PID]
#Convert data.table back to a dataframe
full_raw_data <- as.data.frame(full_raw_data)
#Select only relevant columns and rename them.
data_main_analyses <- full_raw_data |>
select(c('rt', 'trial_type', 'run_id', 'subjectID', 'condition', 'recorded_at', 'device', 'success', 'stimulus', 'response', starts_with('exemplar'), starts_with('extension'))) |>
rename(reactionTime = rt, trialType = trial_type, batchSubjectID = run_id, recordedAt = recorded_at, imageLoadSuccess = success) |> #Rename columns to match naming conventions.
filter(device != 'iPhone') |> #Remove subjects who used phones.
select(c(-'device', -'recordedAt')) #Remove columns that aren't necessary anymore.
#Put demographic responses (currently in the 'response' column) into their own columns.
for(row_index in 1:nrow(data_main_analyses)) {
participant_response <- data_main_analyses[row_index, 'response'] #Create a variable for the response in the current row
stimulus <- data_main_analyses[row_index, 'stimulus'] #Create a variable for the stimulus in the current row
#Check that the 'response' is not an empty string.
if(nzchar(participant_response)) {
#Check if it's a survey response (has {} in the string) and update.
if(grepl("{", participant_response, fixed = TRUE)) {
#Attention check: left image.
if(grepl('extensionLeftImg_exemplar_commonalities', participant_response)) {
temp_response <- strsplit(participant_response, ':')[[1]][2] #Get only the answer
temp_response <- gsub('["}]', '', temp_response) #Trim the extra characters
data_main_analyses[row_index, 'extensionLeftImgAttnCheckResponse'] <- temp_response
}
#Attention check: center image.
else if(grepl('extensionCenterImg_exemplar_commonalities', participant_response)) {
temp_response <- strsplit(participant_response, ':')[[1]][2] #Get only the answer
temp_response <- gsub('["}]', '', temp_response) #Trim the extra characters
data_main_analyses[row_index, 'extensionCenterImgAttnCheckResponse'] <- temp_response
}
#Attention check: right image.
else if(grepl('extensionRightImg_exemplar_commonalities', participant_response)) {
temp_response <- strsplit(participant_response, ':')[[1]][2] #Get only the answer
temp_response <- gsub('["}]', '', temp_response) #Trim the extra characters
data_main_analyses[row_index, 'extensionRightImgAttnCheckResponse'] <- temp_response
}
#Participant age.
else if(grepl('Age', participant_response)) {
temp_response <- strsplit(participant_response, ':')[[1]][2] #Get only the answer
temp_response <- gsub('["}]', '', temp_response) #Trim the extra characters
data_main_analyses[row_index, 'participantAge'] <- temp_response
}
#Geographical location.
else if(grepl('CurrentUSA', participant_response)) {
temp_response <- strsplit(participant_response, ',') #List of responses for BornUSA, ChildhoodUSA, and CurrentUSA.
#Participant birth location.
if(grepl('BornUSA', temp_response[[1]][1])) {
geog_temp_response <- strsplit(temp_response[[1]][1], ':')[[1]][2] #Get only the answer for BornUSA.
geog_temp_response <- gsub('["}]', '', geog_temp_response) #Trim the extra characters
data_main_analyses[row_index, 'participantBornUSA'] <- geog_temp_response
}
#Participant childhood location.
if(grepl('ChildhoodUSA', temp_response[[1]][2])) {
geog_temp_response <- strsplit(temp_response[[1]][2], ':')[[1]][2] #Get only the answer for ChildhoodUSA.
geog_temp_response <- gsub('["}]', '', geog_temp_response) #Trim the extra characters
data_main_analyses[row_index, 'participantChildhoodUSA'] <- geog_temp_response
}
#Participant current location.
if(grepl('CurrentUSA', temp_response[[1]][3])) {
geog_temp_response <- strsplit(temp_response[[1]][3], ':')[[1]][2] #Get only the answer for CurrentUSA.
geog_temp_response <- gsub('["}]', '', geog_temp_response) #Trim the extra characters
data_main_analyses[row_index, 'participantCurrentUSA'] <- geog_temp_response
}
}
#Zipcodes.
else if(grepl('CurrentZipcode', participant_response)) {
temp_response <- strsplit(participant_response, ',') #List of responses for CurrentZipcode and ChildhoodZipcode.
#Participant current zipcode.
if(grepl('CurrentZipcode', temp_response[[1]][1])) {
zipcode_temp_response <- strsplit(temp_response[[1]][1], ':')[[1]][2] #Get only the answer for CurrentZipcode.
zipcode_temp_response <- gsub('["}]', '', zipcode_temp_response) #Trim the extra characters
data_main_analyses[row_index, 'participantCurrentZipcode'] <- zipcode_temp_response
}
#Participant childhood zipcode.
if(grepl('ChildhoodZipcode', temp_response[[1]][2])) {
zipcode_temp_response <- strsplit(temp_response[[1]][2], ':')[[1]][2] #Get only the answer for ChildhoodZipcode.
zipcode_temp_response <- gsub('["}]', '', zipcode_temp_response) #Trim the extra characters
data_main_analyses[row_index, 'participantChildhoodZipcode'] <- zipcode_temp_response
}
}
#Languages.
else if(grepl('FirstLanguage', participant_response)) {
temp_response <- strsplit(participant_response, '",') #List of responses for FirstLanguage and AllLanguages.
#Participant first language.
if(grepl('FirstLanguage', temp_response[[1]][1])) {
language_temp_response <- strsplit(temp_response[[1]][1], ':')[[1]][2] #Get only the answer for FirstLanguage.
language_temp_response <- gsub('["}]', '', language_temp_response) #Trim the extra characters
data_main_analyses[row_index, 'participantFirstLanguage'] <- language_temp_response
}
#Participant all languages spoken.
if(grepl('AllLanguages', temp_response[[1]][2])) {
language_temp_response <- strsplit(temp_response[[1]][2], ':')[[1]][2] #Get only the answer for AllLanguages.
language_temp_response <- gsub('["}]', '', language_temp_response) #Trim the extra characters
data_main_analyses[row_index, 'participantCurrentLanguages'] <- language_temp_response
}
}
}
#Check if it's a numerical response for participant's current location urbanicity rating (1-100).
else if(grepl("^[0-9]+$", participant_response) & (grepl('currently live', stimulus, fixed = TRUE))) {
data_main_analyses[row_index, 'participantCurrentUrbanicity'] <- participant_response
}
#Check if it's a numerical response for participant's childhood location urbanicity rating (1-100).
else if(grepl("^[0-9]+$", participant_response) & (grepl('grew up', stimulus, fixed = TRUE))) {
#Participant's childhood location urbanicity.
data_main_analyses[row_index, 'participantChildhoodUrbanicity'] <- participant_response
}
#Check if it's an image response (choosing the shape image).
else if(grepl("^[0-9]+$", participant_response) & (grepl('img', stimulus, fixed = TRUE))) {
data_main_analyses[row_index, 'participantExtensionChoice'] <- participant_response
}
}
#Interpret the participantExtensionChoice column to make the numerical answer choice values meaningful.
participant_image_choice <- data_main_analyses[row_index, 'participantExtensionChoice']
if(!is.na(participant_image_choice) && length(participant_image_choice) > 0) {
#Participant chose the left image.
if(participant_image_choice == "0") {
data_main_analyses[row_index, 'participantExtensionChoiceImage'] <- data_main_analyses[row_index, 'extensionLeftImg']
}
#Participant chose the center image.
else if(participant_image_choice == "1") {
data_main_analyses[row_index, 'participantExtensionChoiceImage'] <- data_main_analyses[row_index, 'extensionCenterImg']
}
#Participant chose the right image.
else if(participant_image_choice == "2") {
data_main_analyses[row_index, 'participantExtensionChoiceImage'] <- data_main_analyses[row_index, 'extensionRightImg']
}
}
}
#Combine subject rows into a single row and remove the extra columns.
data_main_analyses_tidy <- data_main_analyses |>
select(-reactionTime, -trialType, -imageLoadSuccess, -stimulus, -response, -batchSubjectID) |>
mutate(across(everything(), ~ ifelse(. == "", NA, .))) |> #Replace empty strings with NA
group_by(subjectID) |>
summarize(across(everything(), ~ if(all(is.na(.))) {NA} else {na.omit(.)[1]}), .groups = "drop")Check the pre-processed data to see if there is an equal spread of subjects per condition.
#Check that there are enough subjects per condition.
condition_check <- data_main_analyses_tidy |>
group_by(condition) |>
summarize(numSubjectsPerCondition = n()) #Count the number of times the condition comes up
print(paste("There are an equal number of subjects per condition:", all(condition_check$numSubjectsPerCondition == condition_check$numSubjectsPerCondition[1]))) #If true, then there are an equal number of subjects per condition.[1] "There are an equal number of subjects per condition: FALSE"
Remove the rows that are empty (no data collected) and that have repeat subject IDs.
#Check if there are null (NA) values in participantExtensionChoice.
null_values <- sum(is.na(data_main_analyses_tidy$participantExtensionChoice))
#Print the result
if (null_values > 0) {
cat("Out of", nrow(data_main_analyses_tidy), "subjects, there are", null_values, "null values in the column 'participantExtensionChoice.'.\n")
} else {
cat("Out of", nrow(data_main_analyses_tidy), "subjects, there are no null values in the column 'participantExtensionChoice.'.\n")
}Out of 429 subjects, there are 3 null values in the column 'participantExtensionChoice.'.
#Remove rows that don't have a response in participantExtensionChoice.
data_main_analyses_tidy <- data_main_analyses_tidy[!is.na(data_main_analyses_tidy$participantExtensionChoice), ]
#Check if the data were removed.
null_values <- sum(is.na(data_main_analyses_tidy$participantExtensionChoice))
#Print the result
if (null_values > 0) {
cat("After data cleaning, there are", null_values, "null values in the column 'participantExtensionChoice' and", nrow(data_main_analyses_tidy), "total subjects. \n")
} else {
cat("After data cleaning, there are no null values in the column 'participantExtensionChoice' and", nrow(data_main_analyses_tidy), "total subjects. \n")
}After data cleaning, there are no null values in the column 'participantExtensionChoice' and 426 total subjects.
Now that the data are cleaned and in a tidy format, figure out whether participants were choosing based on color, material, or shape. To do so, analyze exemplarImg and participantExtensionChoiceImage to figure out what the common property is between the exemplar and the extension choice. Put the overlapping information into a new column called participantOverlapProperty.
#Go through each subject.
for(row_index in 1:nrow(data_main_analyses_tidy)) {
#Get the exemplar the participant saw.
participant_exemplar <- unlist(strsplit(as.character(data_main_analyses_tidy[row_index, 'exemplarImg']), '_'))
#Get the extension the participant chose.
participant_extension_choice <- unlist(strsplit(as.character(data_main_analyses_tidy[row_index, 'participantExtensionChoiceImage']), '_'))
#Get the shape of the exemplar object and put into a new column called exemplarShape.
exemplar_shape <- participant_exemplar[3]
data_main_analyses_tidy[row_index, 'exemplarShape'] <- exemplar_shape
#Get the overlapping property.
overlap <- intersect(participant_exemplar, participant_extension_choice)
#Put the overlapping property into a new column.
data_main_analyses_tidy[row_index, 'participantOverlapAnswer'] <- overlap
#Interpret the overlapping answer
if(overlap == 'red' | overlap == 'blue' | overlap == 'yellow') {
data_main_analyses_tidy[row_index, 'participantOverlapProperty'] <- 'color'
}
else if(overlap == 'crepe' | overlap == 'foam' | overlap == 'yarn') {
data_main_analyses_tidy[row_index, 'participantOverlapProperty'] <- 'material'
}
else if(overlap == 'arch' | overlap == 'lamp' | overlap == 'snowman') {
data_main_analyses_tidy[row_index, 'participantOverlapProperty'] <- 'shape'
}
}Finally, reorder the columns into a logical order and add information about the participant group and experiment type.
data_main_analyses_tidy <- data_main_analyses_tidy |>
mutate(participantLocation = 'US', experiment = 'USA_Adults', ageGroup = 'Adults', exclude = 0) |>
select(subjectID, condition, experiment, ageGroup, participantLocation, exemplarName, exemplarImg, exemplarShape, participantOverlapAnswer, participantOverlapProperty, participantExtensionChoice, participantExtensionChoiceImage, extensionLeftImg, extensionCenterImg, extensionRightImg, extensionLeftImgAttnCheckResponse, extensionCenterImgAttnCheckResponse, extensionRightImgAttnCheckResponse, participantAge, participantBornUSA, participantChildhoodUSA, participantCurrentUSA, participantCurrentZipcode, participantChildhoodZipcode, participantFirstLanguage, participantCurrentLanguages, participantCurrentUrbanicity, participantChildhoodUrbanicity, exclude)Results of control measures
I examined the answers to the attention check question “what does each object share in common with the original object?” to verify that participants understood the task and were paying attention. I manually excluded participants who answered these questions incorrectly more than once. The default exclude value is 0.
Reproduction of original analyses
I want to make sure that I can get the same results as the original authors from their analysis code. Here, I am going to load in the original data and adapt the code from the original authors, accessible here.
#Load libraries
suppressMessages(library(tidyverse))
suppressMessages(library(boot))
suppressMessages(library(lme4))
#Load data
original_data_file <- "Original_ShapeBias_Data.csv"
original_data_path <- file.path(getwd(), '..', 'data', original_data_file)
original_data <- read.csv(original_data_path)
table(original_data$Experiment) #Contains data from all 7 experiments.
Tsi_Adults_Objects Tsi_Adults_wDistractor Tsi_Child_Objects
39 41 37
Tsi_Child_ShapeVsColor Tsi_Child_ShapeVsMaterial US_Children
30 42 30
US_Turk_Adults
144
#Number of participants
original_data |> group_by(AgeGroup,Location) %>% summarize(subjects=n(), .groups = 'drop')# A tibble: 4 × 3
AgeGroup Location subjects
<chr> <chr> <int>
1 Adults Bolivia 80
2 Adults US 144
3 Children Bolivia 109
4 Children US 30
#Stats by experiment
summary_data_original <- original_data |>
group_by(Experiment) |>
summarize(subjects=n(),
average=mean(Age,na.rm=TRUE),
minage=min(Age,na.rm=TRUE),
maxage=max(Age,na.rm=TRUE),
stdev=sd(Age,na.rm=TRUE),
ShapeNo=sum(Choice=="Shape"),
MatNo=sum(Choice=="Material"),
ColorNo=sum(Choice=="Color"),
DistNo=sum(Choice=="Distractor"),
ShapePerc=ShapeNo/subjects,
MatPerc=MatNo/subjects,
ColorPerc=ColorNo/subjects,
DistPerc=DistNo/subjects)
summary_data_original$ExpNo <- c(6,7,2,4,3,1,5)
summary_data_original <- arrange(summary_data_original,ExpNo)
#Results
summary_data_original |>
dplyr::select(ExpNo, Experiment, subjects, ShapeNo, MatNo, ColorNo, DistNo, ShapePerc, MatPerc, ColorPerc, DistPerc)# A tibble: 7 × 11
ExpNo Experiment subjects ShapeNo MatNo ColorNo DistNo ShapePerc MatPerc
<dbl> <chr> <int> <int> <int> <int> <int> <dbl> <dbl>
1 1 US_Children 30 18 10 2 0 0.6 0.333
2 2 Tsi_Child_Objec… 37 12 16 9 0 0.324 0.432
3 3 Tsi_Child_Shape… 42 12 27 0 3 0.286 0.643
4 4 Tsi_Child_Shape… 30 10 0 17 3 0.333 0
5 5 US_Turk_Adults 144 99 41 4 0 0.688 0.285
6 6 Tsi_Adults_Obje… 39 18 8 13 0 0.462 0.205
7 7 Tsi_Adults_wDis… 41 17 12 11 1 0.415 0.293
# ℹ 2 more variables: ColorPerc <dbl>, DistPerc <dbl>
#Reproduce the binomial test for Experiment 5: US_Turk_Adults.
CurrExp <- summary_data_original |>
filter(ExpNo==5)
binom.test(CurrExp$ShapeNo, CurrExp$subjects, p=1/3)
Exact binomial test
data: CurrExp$ShapeNo and CurrExp$subjects
number of successes = 99, number of trials = 144, p-value < 2.2e-16
alternative hypothesis: true probability of success is not equal to 0.3333333
95 percent confidence interval:
0.6050180 0.7620797
sample estimates:
probability of success
0.6875
#This model predicts the participant's preference for the shape-match object.
CurrExp <- original_data |>
filter(Experiment %in% c("Tsi_Adults_Objects","Tsi_Adults_wDistractor", "US_Turk_Adults","Tsi_Child_Objects","US_Children")) |>
filter(Choice!="Distractor")
CurrExp$Offset <- logit(1/3)
m <- glmer(Choice=="Shape" ~ offset(Offset) + Location * AgeGroup + (1 + Location + AgeGroup| Example) + (1 | Experiment),CurrExp,family="binomial")
summary(m)Generalized linear mixed model fit by maximum likelihood (Laplace
Approximation) [glmerMod]
Family: binomial ( logit )
Formula: Choice == "Shape" ~ offset(Offset) + Location * AgeGroup + (1 +
Location + AgeGroup | Example) + (1 | Experiment)
Data: CurrExp
AIC BIC logLik deviance df.resid
376.3 416.7 -177.2 354.3 279
Scaled residuals:
Min 1Q Median 3Q Max
-2.3934 -0.8253 0.4178 0.8957 2.2925
Random effects:
Groups Name Variance Std.Dev. Corr
Experiment (Intercept) 1.540e-08 0.0001241
Example (Intercept) 1.230e-01 0.3506452
LocationUS 2.197e-01 0.4687695 0.81
AgeGroupChildren 1.089e+00 1.0436391 -0.67 -0.11
Number of obs: 290, groups: Experiment, 5; Example, 3
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.4550 0.3073 1.480 0.13877
LocationUS 1.1373 0.4080 2.787 0.00532 **
AgeGroupChildren -0.8325 0.7718 -1.079 0.28070
LocationUS:AgeGroupChildren 0.4077 0.6807 0.599 0.54918
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr) LctnUS AgGrpC
LocationUS -0.072
AgGrpChldrn -0.567 0.108
LctnUS:AgGC 0.257 -0.331 -0.455
#Actual model: main effects of `Location` and `AgeGroup` tested without their interaction.
m0 <- glmer(Choice=="Shape" ~ offset(Offset) + Location + AgeGroup + (1 + Location + AgeGroup | Example) + (1 | Experiment),CurrExp,family="binomial")boundary (singular) fit: see help('isSingular')
anova(m,m0)Data: CurrExp
Models:
m0: Choice == "Shape" ~ offset(Offset) + Location + AgeGroup + (1 + Location + AgeGroup | Example) + (1 | Experiment)
m: Choice == "Shape" ~ offset(Offset) + Location * AgeGroup + (1 + Location + AgeGroup | Example) + (1 | Experiment)
npar AIC BIC logLik deviance Chisq Df Pr(>Chisq)
m0 10 374.68 411.38 -177.34 354.68
m 11 376.32 416.69 -177.16 354.32 0.365 1 0.5457
summary(m0)Generalized linear mixed model fit by maximum likelihood (Laplace
Approximation) [glmerMod]
Family: binomial ( logit )
Formula: Choice == "Shape" ~ offset(Offset) + Location + AgeGroup + (1 +
Location + AgeGroup | Example) + (1 | Experiment)
Data: CurrExp
AIC BIC logLik deviance df.resid
374.7 411.4 -177.3 354.7 280
Scaled residuals:
Min 1Q Median 3Q Max
-2.4474 -0.8063 0.4086 0.9219 2.0617
Random effects:
Groups Name Variance Std.Dev. Corr
Experiment (Intercept) 0.0000 0.0000
Example (Intercept) 0.1284 0.3583
LocationUS 0.2202 0.4693 0.82
AgeGroupChildren 0.9782 0.9890 -0.68 -0.13
Number of obs: 290, groups: Experiment, 5; Example, 3
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.4061 0.2999 1.354 0.17560
LocationUS 1.2215 0.3844 3.178 0.00148 **
AgeGroupChildren -0.6236 0.6569 -0.949 0.34239
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr) LctnUS
LocationUS 0.030
AgGrpChldrn -0.540 -0.060
optimizer (Nelder_Mead) convergence code: 0 (OK)
boundary (singular) fit: see help('isSingular')
Based on the calculations above, I successfully reproduced the original results: “This model…showed that U.S. participants were significantly more likely to generalize the label by shape \((\beta = 1.22\); \(p < .01)\) and no difference across age groups \((\beta = -.62\); \(p = .34)\)”.
Confirmatory analysis
I conducted one main confirmatory analysis:
- A binomial test with chance set to 1/3 tested whether U.S. adults were significantly more likely to choose the shape match as opposed to the color or material matches within just this experiment. This test matches the statistical test run by the original authors. The results replicated with the original study \((P_{success} = 0.45, p < 0.001)\).
#Summary statistics
summary_data <- data_main_analyses_tidy |>
summarize(subjects=n(),
average=mean(as.numeric(participantAge), na.rm=TRUE),
minage=min(as.numeric(participantAge), na.rm=TRUE),
maxage=max(as.numeric(participantAge), na.rm=TRUE),
stdev=sd(as.numeric(participantAge), na.rm=TRUE),
shapeNum=sum(participantOverlapProperty=="shape"),
materialNum=sum(participantOverlapProperty=="material"),
colorNum=sum(participantOverlapProperty=="color"),
shapePercent=shapeNum/subjects,
matPercent=materialNum/subjects,
colorPercent=colorNum/subjects,
.groups = 'drop') |>
ungroup()
#Binomial test
binomial_result <- binom.test(summary_data$shapeNum, summary_data$subjects, p=1/3)
binomial_result
Exact binomial test
data: summary_data$shapeNum and summary_data$subjects
number of successes = 191, number of trials = 426, p-value = 9.3e-07
alternative hypothesis: true probability of success is not equal to 0.3333333
95 percent confidence interval:
0.4004589 0.4969768
sample estimates:
probability of success
0.4483568
#Store values of 95% confidence intervals.
conf_int <- binomial_result$conf.int Figure 1: Rescue vs. Replication vs. Original Paper Results for Shape Preference
Figure 1 shows the proportion of choices of an extension by the type of property (shape, color, material).
#Create dataframe for figure.
figure_statistics <- data_main_analyses_tidy |>
group_by(participantOverlapProperty, participantLocation) |>
summarize(n = n(), proportion = n / nrow(data_main_analyses_tidy), .groups = 'drop') |>
ungroup()
#Plot the data for the first figure and save as a file.
ggplot(figure_statistics, aes(y = proportion, x = participantLocation, fill = participantOverlapProperty)) +
geom_bar(position = "stack", stat = 'identity', width = .7) +
geom_errorbar(aes(ymin = conf_int[1], ymax = conf_int[2]), width = .2) +
theme_minimal() +
scale_fill_manual(values = c("#B4DCB9", "#7F92B8", "#6C6969")) +
scale_x_discrete('Experiment 5: USA Adults') +
scale_y_continuous('Percentage of responses', labels = scales::percent) +
facet_wrap(~participantLocation) +
geom_hline(yintercept = (1/3), linetype = 'dotted', color = 'black') +
labs(fill = "Choice") +
theme(legend.title = element_text(size = 12, face = "bold"))img_path <- file.path(getwd(), '..', 'figures', 'choice_by_property_rescue.png')
ggsave(img_path, device = 'png', dpi=300)Saving 7 x 5 in image
We also want to calculate the proportion of choice by property for each of the shapes (arch, lamp, snowman). The results partially replicated with the original study, with participants choosing the shape match for the arch and lamp exemplars \((P_{success_{arch}} = 0.64, p < 0.001), (P_{success_{lamp}} = 0.51, p < 0.001)\) but not for the snowman exemplars \((P_{success_{snowman}} = 0.21, p < 0.01)\).
#Calculate the percentages of each choice by property and output into a dataframe.
count_df <- data_main_analyses_tidy |>
group_by(exemplarShape, participantOverlapProperty, experiment) |>
summarize(count = n(), .groups = 'drop') |>
ungroup()
#Pivot the dataframe.
figure_statistics_by_shape_property <- pivot_wider(count_df, id_cols = exemplarShape, names_from = participantOverlapProperty, values_from = count)
figure_statistics_by_shape_property[is.na(figure_statistics_by_shape_property)] <- 0 #Make NULL values 0
#Calculate the proportion of responses for each property.
figure_statistics_by_shape_property$totalCount <- rowSums(figure_statistics_by_shape_property[, -1]) #Calculate the total responses per property
columns_to_calculate <- c("shape", "material", "color") #All possible properties
existing_columns <- intersect(columns_to_calculate, names(figure_statistics_by_shape_property)) #Find columns that actually exist
figure_statistics_by_shape_property <- figure_statistics_by_shape_property |>
mutate(across(all_of(existing_columns), ~ .x / totalCount, .names = "{.col}Proportion")) #Takes into account if properties don't exist.
#Pivot the data.
figure_statistics_by_shape_property_long <- figure_statistics_by_shape_property |>
pivot_longer(
cols = any_of(c("shapeProportion", "materialProportion", "colorProportion")),
names_to = "exemplarProperty",
values_to = "proportion"
)
#Clean up the property value names.
figure_statistics_by_shape_property_long$exemplarProperty <- sub("Proportion", "", figure_statistics_by_shape_property_long$exemplarProperty)
#Binomial tests.
binomial_result_arch <- binom.test(figure_statistics_by_shape_property$shape[figure_statistics_by_shape_property$exemplarShape == 'arch'], figure_statistics_by_shape_property$totalCount[figure_statistics_by_shape_property$exemplarShape == 'arch'], p=1/3)
binomial_result_arch
Exact binomial test
data: figure_statistics_by_shape_property$shape[figure_statistics_by_shape_property$exemplarShape == "arch"] and figure_statistics_by_shape_property$totalCount[figure_statistics_by_shape_property$exemplarShape == "arch"]
number of successes = 86, number of trials = 135, p-value = 6.72e-13
alternative hypothesis: true probability of success is not equal to 0.3333333
95 percent confidence interval:
0.5499125 0.7180067
sample estimates:
probability of success
0.637037
binomial_result_lamp <- binom.test(figure_statistics_by_shape_property$shape[figure_statistics_by_shape_property$exemplarShape == 'lamp'], figure_statistics_by_shape_property$totalCount[figure_statistics_by_shape_property$exemplarShape == 'lamp'], p=1/3)
binomial_result_lamp
Exact binomial test
data: figure_statistics_by_shape_property$shape[figure_statistics_by_shape_property$exemplarShape == "lamp"] and figure_statistics_by_shape_property$totalCount[figure_statistics_by_shape_property$exemplarShape == "lamp"]
number of successes = 75, number of trials = 147, p-value = 1.015e-05
alternative hypothesis: true probability of success is not equal to 0.3333333
95 percent confidence interval:
0.4265102 0.5934785
sample estimates:
probability of success
0.5102041
binomial_result_snowman <- binom.test(figure_statistics_by_shape_property$shape[figure_statistics_by_shape_property$exemplarShape == 'snowman'], figure_statistics_by_shape_property$totalCount[figure_statistics_by_shape_property$exemplarShape == 'snowman'], p=1/3)
binomial_result_snowman
Exact binomial test
data: figure_statistics_by_shape_property$shape[figure_statistics_by_shape_property$exemplarShape == "snowman"] and figure_statistics_by_shape_property$totalCount[figure_statistics_by_shape_property$exemplarShape == "snowman"]
number of successes = 30, number of trials = 144, p-value = 0.001367
alternative hypothesis: true probability of success is not equal to 0.3333333
95 percent confidence interval:
0.1451945 0.2838570
sample estimates:
probability of success
0.2083333
#Store values of 95% confidence intervals.
conf_int_arch <- binomial_result_arch$conf.int
conf_int_lamp <- binomial_result_lamp$conf.int
conf_int_snowman <- binomial_result_snowman$conf.int Figure 2: Rescue vs. Replication vs. Original Paper Results for Shape Preference by Shape Type
Figure 2 shows the proportion of choices by type of extension shape within the three property (shape, color, material) categories.
#Plot the data for the second figure and save as a file.
ggplot(figure_statistics_by_shape_property_long, aes(x = exemplarShape, y = proportion, fill = exemplarProperty)) +
geom_bar(stat = "identity") + #use the actual proportions
geom_errorbar(aes(ymin = ifelse(exemplarShape == "arch", conf_int_arch[1],
ifelse(exemplarShape == "lamp", conf_int_lamp[1],
conf_int_snowman[1])),
ymax = ifelse(exemplarShape == "arch", conf_int_arch[2],
ifelse(exemplarShape == "lamp", conf_int_lamp[2],
conf_int_snowman[2]))),
width = .2) +
theme_minimal() +
scale_fill_manual(values = c("#B4DCB9", "#7F92B8", "#6C6969")) + #set the bar colors
scale_x_discrete('Extension Shape') + #x-label
scale_y_continuous(labels = scales::percent) + #convert y-axis labels to percent
geom_hline(yintercept = (1/3), linetype = 'dotted', color = 'black') + #add dotted chance line
labs(x = "Shape", y = "Percentage of Responses", fill = "Material") +
labs(fill = "Property Choice") +
theme(legend.title = element_text(size = 12, face = "bold"))img_path <- file.path(getwd(), '..', 'figures', 'choice_by_shape_and_property_rescue.png')
ggsave(img_path, device = 'png', dpi=300)Saving 7 x 5 in image
Exploratory analyses
- Given that the first replication author observed stimulus-level random effects, the first exploratory analysis is a model that takes into account these random effects:
glmer( Choice=="shape" ~ offset(Offset) + 1 + (1 + Example) ). This model tests, given an offset of 1/3, if the intercept is significantly different from 0 while accounting for random effects. The model yielded an insignificant result \((\beta = 0.45, p = 0.33)\).
#Create a dataframe for the analysis.
rescue_data <- data_main_analyses_tidy |>
select(Age = participantAge,
Experiment = experiment,
Example = exemplarShape,
Choice = participantOverlapProperty,
AgeGroup = ageGroup,
Location = participantLocation)
#Add the offset column.
rescue_data$Offset <- logit(1/3)
#Run the random effects model against chance.
summary(glmer(Choice=="shape" ~ offset(Offset) + 1 + (1 | Example), rescue_data, family="binomial"))Generalized linear mixed model fit by maximum likelihood (Laplace
Approximation) [glmerMod]
Family: binomial ( logit )
Formula: Choice == "shape" ~ offset(Offset) + 1 + (1 | Example)
Data: rescue_data
AIC BIC logLik deviance df.resid
543.9 552.0 -269.9 539.9 424
Scaled residuals:
Min 1Q Median 3Q Max
-1.2986 -1.0145 -0.5311 0.7701 1.8827
Random effects:
Groups Name Variance Std.Dev.
Example (Intercept) 0.6085 0.7801
Number of obs: 426, groups: Example, 3
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.4531 0.4626 0.98 0.327
- I also ran a random effects model, with random intercepts for the exemplar object (
Example), testing for a participant’s preference for shape as a function of the exemplar object:glmer( Choice=="shape" ~ offset(Offset) + Location + (1 + Location | Example) ). The main difference between the model in the first replication and this model is the use of theOffsetvariable, set to the log-odds probability of 1/3. The model yielded an insignificant result \((\beta = -0.06, p = 0.86)\).
#Create a dataset to merge with the original dataset.
rescue_data <- data_main_analyses_tidy |>
select(Age = participantAge,
Experiment = experiment,
Example = exemplarShape,
Choice = participantOverlapProperty,
AgeGroup = ageGroup,
Location = participantLocation)
#Filter the original dataset for just the Tsimane adult data (Experiment 6, without distractors).
tsimane_adult_data <- original_data |>
filter(Experiment == 'Tsi_Adults_Objects') |>
select(Age, Experiment, Example, Choice, AgeGroup, Location) |>
mutate(Example = tolower(Example), Choice = tolower(Choice)) #Make all values in these columns lowercase.
#Combine the rescue USA adult data and the original data from Experiment 6 (N=39).
appended_data <- rbind(rescue_data, tsimane_adult_data)
#Add the offset column.
appended_data$Offset <- logit(1/3)
#Run the random effects model.
summary(glmer(Choice=="shape" ~ offset(Offset) + Location + (1 + Location | Example), appended_data, family="binomial"))boundary (singular) fit: see help('isSingular')
Generalized linear mixed model fit by maximum likelihood (Laplace
Approximation) [glmerMod]
Family: binomial ( logit )
Formula: Choice == "shape" ~ offset(Offset) + Location + (1 + Location |
Example)
Data: appended_data
AIC BIC logLik deviance df.resid
600.3 621.0 -295.1 590.3 460
Scaled residuals:
Min 1Q Median 3Q Max
-1.2852 -1.0285 -0.5274 0.8094 1.8961
Random effects:
Groups Name Variance Std.Dev. Corr
Example (Intercept) 0.40041 0.6328
LocationUS 0.02211 0.1487 1.00
Number of obs: 465, groups: Example, 3
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.51345 0.49754 1.032 0.302
LocationUS -0.06279 0.36386 -0.173 0.863
Correlation of Fixed Effects:
(Intr)
LocationUS -0.456
optimizer (Nelder_Mead) convergence code: 0 (OK)
boundary (singular) fit: see help('isSingular')
- I also ran the original paper’s logistic mixed-effects model that predicts the participant’s preference for the shape-match object:
glmer( Choice == "shape" ~ offset(Offset) + Location + AgeGroup + (1 + Location + AgeGroup | Example) + (1 | Experiment) )by swapping out the USA adult data from the original paper with my new USA adult data but keeping all of the other data the same (USA children, Tsimane’ adults, and Tsimane’ children). In the original study, the authors used a baseline probability of 33.3% with the population (U.S. participants) and the age group (adults) dummy coded as independent variables. To control for the role of exemplar, the regression included random intercepts for the experiment number, random intercepts for the exemplar object, random slopes for population as a function of exemplar, and random slopes for age group as a function of exemplar. This model found that, with a new sample of U.S. adults, they were not more likely to generalize the label of a novel object by shape \((\beta = 0.31\), \(p = 0.56)\).
#Filter the original dataset to remove the USA adult data (and not include two if the Tsimane children experiments).
original_data_no_USA_adult <- original_data |>
filter(Experiment %in% c("Tsi_Adults_Objects","Tsi_Adults_wDistractor", "Tsi_Child_Objects","US_Children")) |>
filter(Choice!="Distractor") |>
select(Age, Experiment, Example, Choice, AgeGroup, Location) |>
mutate(Example = tolower(Example), Choice = tolower(Choice)) #Make all values in these columns lowercase.
#Combine the rescue USA adult data and the original data (minus the US_Turk_Adult subjects).
appended_data_full <- rbind(rescue_data, original_data_no_USA_adult)
#Add the offset column.
appended_data_full$Offset <- logit(1/3)
#Run the logistic mixed-effects model.
summary(glmer(Choice == "shape" ~ offset(Offset) + Location + AgeGroup + (1 + Location + AgeGroup | Example) + (1 | Experiment), appended_data_full, family="binomial"))boundary (singular) fit: see help('isSingular')
Generalized linear mixed model fit by maximum likelihood (Laplace
Approximation) [glmerMod]
Family: binomial ( logit )
Formula: Choice == "shape" ~ offset(Offset) + Location + AgeGroup + (1 +
Location + AgeGroup | Example) + (1 | Experiment)
Data: appended_data_full
AIC BIC logLik deviance df.resid
747.6 791.1 -363.8 727.6 562
Scaled residuals:
Min 1Q Median 3Q Max
-1.3558 -0.8966 -0.5423 0.9937 2.0425
Random effects:
Groups Name Variance Std.Dev. Corr
Experiment (Intercept) 0.02198 0.1482
Example (Intercept) 0.06538 0.2557
LocationUS 0.43013 0.6558 0.41
AgeGroupChildren 0.28745 0.5361 -0.16 0.83
Number of obs: 572, groups: Experiment, 5; Example, 3
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.249138 0.279632 0.891 0.373
LocationUS 0.309283 0.532994 0.580 0.562
AgeGroupChildren -0.003805 0.460833 -0.008 0.993
Correlation of Fixed Effects:
(Intr) LctnUS
LocationUS -0.208
AgGrpChldrn -0.317 0.418
optimizer (Nelder_Mead) convergence code: 0 (OK)
boundary (singular) fit: see help('isSingular')
Figure 3: Rescue vs. Replication vs. Original Paper Results for Shape Preference
suppressMessages(library(knitr))
#Get figure file name.
image_path <- 'choice_by_property_comparison.jpeg'
#Print out the figure.
knitr::include_graphics(image_path)Figure 4: Rescue vs. Replication vs. Original Paper Results for Shape Preference by Shape Type
suppressMessages(library(knitr))
#Get figure file name.
image_path_by_shape <- 'choice_by_shape_and_property_comparison.jpeg'
#Print out the figures.
knitr::include_graphics(image_path_by_shape)Discussion
Mini meta analysis
I am not able to fully combine data across the original paper (Jara-Ettinger et al., 2022), the first replication (Abdelrahim, 2022), and 2nd replication (Chen, 2023) because each of the replications did not have the same models. For example, Jara-Ettinger et al., 2022 and Chen, 2023 ran a binomial test and a logistic mixed-effects model (Chen, 2023 used new data for the U.S. adult sample for this model but the rest of the data came from Jara-Ettinger et al., 2022), while Abdelrahim, 2022 and Chen, 2023 used a random effects model that Jara-Ettinger et al., 2022 did not use. Below are two different meta-analysis models obtaining the aggregate estimates from the random effects models (Abdelrahim, 2022 and Chen, 2023) and the logistic mixed-effects models (Jara-Ettinger et al., 2022 and Chen, 2023).
suppressMessages(library(metafor))
#Import meta-analysis data.
meta_analysis_file <- 'meta_analysis_data.csv'
meta_analysis_data <- read.csv(file.path(getwd(), '..', 'data', meta_analysis_file))
#Print out the data.
knitr::kable(meta_analysis_data,
table.attr = "class=\"striped\"",
format = "html")| Study | N | binomial_test_probability_of_success | binomial_test_confidence_interval_low | binomial_test_confidence_interval_high | random_effects_model_estimate | random_effects_model_p_value | random_effects_model_SE | logistic_mixed_effects_model_estimate | logistic_mixed_effects_model_p_value | logistic_mixed_effects_model_SE |
|---|---|---|---|---|---|---|---|---|---|---|
| Jara-Ettinger et al. (2022) | 144 | 0.6875000 | 0.6050180 | 0.7620797 | NA | NA | NA | 1.220000 | 0.00148 | 0.384400 |
| Abdelrahim (2022) | 142 | NA | NA | NA | 1.13790 | 0.318 | 1.14030 | NA | NA | NA |
| Chen (2023) | 426 | 0.4483568 | 0.4004589 | 0.4969768 | -0.06279 | 0.863 | 0.36386 | 0.309283 | 0.56200 | 0.532994 |
The aggregate effect size between the Abdelrahim (2022) replication project and the Chen (2023) rescue replication project for the random effects model is extremely small: 0.0512.
#Run meta-analysis model for Abdelrahim (2022) and Chen (2023)
meta_analysis_model_random_effects <- rma(yi=random_effects_model_estimate, sei = random_effects_model_SE, slab=Study, data=meta_analysis_data) Warning: 1 study with NAs omitted from model fitting.
summary(meta_analysis_model_random_effects)
Random-Effects Model (k = 2; tau^2 estimator: REML)
logLik deviance AIC BIC AICc
-1.2553 2.5105 6.5105 2.5105 18.5105
tau^2 (estimated amount of total heterogeneity): 0.0045 (SE = 1.0194)
tau (square root of estimated tau^2 value): 0.0670
I^2 (total heterogeneity / total variability): 0.62%
H^2 (total variability / sampling variability): 1.01
Test for Heterogeneity:
Q(df = 1) = 1.0063, p-val = 0.3158
Model Results:
estimate se zval pval ci.lb ci.ub
0.0512 0.3520 0.1455 0.8843 -0.6386 0.7411
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The aggregate effect size between the Jara-Ettinger et al. (2022) replication project and the Chen (2023) rescue replication project for the logistic mixed-effects model is also extremely small: 0.8395.
#Run meta-analysis model for Jara-Ettinger et al. (2022) and Chen (2023)
meta_analysis_model_logistic_mixed_effects <- rma(yi=logistic_mixed_effects_model_estimate, sei = logistic_mixed_effects_model_SE, slab=Study, data=meta_analysis_data) Warning: 1 study with NAs omitted from model fitting.
summary(meta_analysis_model_logistic_mixed_effects)
Random-Effects Model (k = 2; tau^2 estimator: REML)
logLik deviance AIC BIC AICc
-0.9788 1.9577 5.9577 1.9577 17.9577
tau^2 (estimated amount of total heterogeneity): 0.1988 (SE = 0.5865)
tau (square root of estimated tau^2 value): 0.4458
I^2 (total heterogeneity / total variability): 47.93%
H^2 (total variability / sampling variability): 1.92
Test for Heterogeneity:
Q(df = 1) = 1.9206, p-val = 0.1658
Model Results:
estimate se zval pval ci.lb ci.ub
0.8395 0.4492 1.8690 0.0616 -0.0409 1.7198 .
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Figure 6: Estimated Effect Sizes for the Random Effects Model (Abdelrahim, 2022 and Chen, 2023)
Visualize the effect sizes of the Abdelrahim (2022) replication project and the Chen (2023) rescue replication project for the random effects model.
#Filter out study without this model.
meta_analysis_data_random_effects <- meta_analysis_data |>
filter(Study != "Jara-Ettinger et al. (2022)")
# Plot mini meta-analysis results.
ggplot(meta_analysis_data_random_effects, aes(x = Study, y = random_effects_model_estimate, size = N)) +
geom_point(data = meta_analysis_data_random_effects) +
coord_flip() +
ylim(-2, 2) +
scale_size_area() +
geom_hline(yintercept = 0, color = "black") +
theme_linedraw() +
theme(legend.position = "none") +
labs(y = "Main effect size on original scale", x = "")Figure 7: Estimated Effect Sizes for the Logistic Mixed-Effects Model (Jara-Ettinger et al., 2022 and Chen, 2023)
Visualize the effect sizes of the Jara-Ettinger et al. (2022) replication project and the Chen (2023) rescue replication project for the logistic mixed-effects model.
#Filter out study without this model.
meta_analysis_data_logistic_mixed_effects <- meta_analysis_data |>
filter(Study != "Abdelrahim (2022)")
# Plot mini meta-analysis results.
ggplot(meta_analysis_data_logistic_mixed_effects, aes(x = Study, y = logistic_mixed_effects_model_estimate, size = N)) +
geom_point(data = meta_analysis_data_logistic_mixed_effects) +
coord_flip() +
ylim(-2, 2) +
scale_size_area() +
geom_hline(yintercept = 0, color = "black") +
theme_linedraw() +
theme(legend.position = "none") +
labs(y = "Main effect size on original scale", x = "")Figure 8: Forest Plot for the Random Effects Model (Abdelrahim, 2022 and Chen, 2023)
forest(meta_analysis_model_random_effects)Figure 9: Forest Plot for the Logistic Mixed-Effects Model (Jara-Ettinger et al., 2022 and Chen, 2023)
forest(meta_analysis_model_logistic_mixed_effects)Figure 10: Funnel Plot for the Random Effects Model (Abdelrahim, 2022 and Chen, 2023)
funnel(meta_analysis_model_random_effects)Figure 11: Funnel Plot for the Logistic Mixed-Effects Model (Jara-Ettinger et al., 2022 and Chen, 2023)
funnel(meta_analysis_model_logistic_mixed_effects)Summary of Rescue Replication Attempt
This rescue replication tested if U.S. adults were more likely to match a novel object with an exemplar based on shape, material, or color. Using a binomial test as my key analysis of interest, I found that U.S. adults were more likely to match based on shape overall \((P_{success} = 0.45, p < 0.001)\), but that this result depended on the exemplar shape, since I replicated the results in the original paper for the arch and lamp exemplars that the probability of success was greater than 1/3 \((P_{success_{arch}} = 0.64, p < 0.001), (P_{success_{lamp}} = 0.51, p < 0.001)\) but not for the snowman exemplar \((P_{success_{snowman}} = 0.21, p < 0.01)\).
Commentary
I also ran some follow-up exploratory analyses to (1) account for stimulus-level random effects, (2) test my U.S. adult data against the first replication author’s data using their key analysis of a random effects model, and (3) test how my U.S. adult data compared with the rest of the data in the original paper from Tsimane’ adults, Tsimane’ children, and U.S. children. For Aim 1, I found that the model that takes into account the stimulus-level random effects yielded an insignificant result \((\beta = 0.45, p = 0.33)\), showing that U.S. adults were not more likely to choose the shape match when accounting for the exemplar type. For Aim 2, I found that the random effects model with random intercepts for the exemplar object, testing for a participant’s preference for shape as a function of the exemplar object, yielded an insignificant result \((\beta = -0.06, p = 0.86)\), which aligned with the first replication author’s finding. Lastly, for Aim 3, with a new sample of U.S. adults, the original paper’s logistic mixed-effects model that predicts the participant’s preference for the shape-match object found that U.S. adults were not more likely to generalize the label of a novel object by shape \((\beta = 0.31\), \(p = 0.56)\).
The implementation of attention checks found that participants understood the differences between the exemplar and the three extension objects, so the inability to replicate the shape match probability for all exemplars does not seem to be from participants guessing randomly or not understanding how the extension objects relate to the exemplar. Material was the property that was guessed the most often for the snowman exemplar, which suggests that perhaps the snowman shape was a little less distinctive relative to the arch and lamp shapes, such that the material became a more salient property on which to match the exemplar and extension objects.
Overall, increasing the sample size, using the correct stimuli, and asking participants to share what properties each extension object had with the exemplar object were important changes to ensure a robust replication. Future work could ensure that the shape of the objects are very distinctive, such that other properties like the material do not hinder the participant’s ability to appreciate the object’s shape. Asking participants to answer more than one trial with a greater variety of shapes, materials, and colors may also address individual differences in trial performance or trial-level errors that occur when using a one-trial paradigm. Lastly, it is possible that subject-specific demographics (e.g., the urbanicity of their early environment) may be moderating what property match they choose.