Replication of The Origins of the Shape Bias: Evidence From the Tsimane’ by Jara-Ettinger et al. (2022, Journal of Experimental Psychology: General)

Author

Emily Chen (emchen15@stanford.edu)

Published

December 15, 2023

Introduction

I chose to try and rescue the replication of this experiment because I am interested in the intersection of language and spatial/object reasoning. As a developmental cognitive scientist, I also found the comparison of adults and children to be an important element relevant to my research interests. Finally, the cross-cultural comparisons made in this paper are something that I’d like to explore further in my own research, outside of the context of this project.

The stimuli that will be required to collect additional data for this experiment are available in the repository of the original replication project. The original replication collected data online from 144 U.S. adults, following the paradigm of Experiment 5 in the paper. It won’t be possible to collect additional data from Tsimane’ adults, so a replication of Experiments 6 and 7 cannot be performed. The new replication attempt will feature attention checks and a demographic questionnaire about the subject’s early geographic environment (e.g., urbanicity of current and childhood environments), given the theoretical claims in the paper about the importance of early environmental factors on the strength of the shape bias in children.

Click here for this rescue project’s Github repository. The PDF of the original paper can be found here. The experiment, hosted on cognition.run, can be accessed here.

Summary of prior replication attempt

The prior replication attempt tried to replicate only Experiment 5 from the original paper (the original paper had seven total experiments), which tested U.S. adults online using Amazon’s Mechanical Turk. While the replication attempt does not specify how they collected the data, I assume that they collected the data using Prolific instead of MTurk.

The biggest difference between the original study and the first replication is that the original authors used different stimuli from the first replication author. The stimuli in the original paper used the images shown in Table 1, but the first replication used the images shown in Figure 1. The replication author contacted the first author of the study for the original image files, but they were lost, so the replication author used screenshots of the shapes taken from the figures in the paper.

The demographics of the sample were the same for both the original study (specified as U.S. adults on MTurk) and the first replication (specified as English-speaking adults from the U.S.). The original study had a sample size of N=144 U.S. adults, and the first replication had a sample size of N=142 U.S. adults, with a planned sample of N=144, but two participants did not complete the study and the replication author did not collect two additional data points.

There are two main analyses:

Calculating the percentage of participants who chose the object based on shape congruency with the exemplar, the percentage of participants who chose the object based on material congruency with the exemplar, and the percentage of participants who chose the object based on color congruency with the exemplar.

In the original study, the authors used a binomial test with chance set to 1/3 to test whether U.S. adults were significantly more likely to choose the shape match as opposed to the color or material matches within just the U.S. adult experiment. The first replication did not run this test.

Running a logistic mixed-effects model that predicted the participant’s preference for the shape-match object.

In the original study, the authors used a baseline probability of 33.3% with the population and the age group dummy coded as independent variables. To control for the role of exemplar, the regression included random intercepts for the experiment (1 | Experiment), random intercepts for the exemplar object (Example), random slopes for population as a function of exemplar (1 + Location + AgeGroup | Example), and random slopes for age group (1 + Location + AgeGroup | Example) as a function of exemplar: glmer( Choice == "shape" ~ offset(Offset) + Location + AgeGroup + (1 + Location + AgeGroup | Example) + (1 | Experiment) ). The population by age interaction effect (Location * AgeGroup) model was not included based on a nested model comparison; instead, only the main effects of Location and AgeGroup were tested without their interaction (Location + AgeGroup). The Offset variable was set to the log-odds probability of 1/3 or 33.3% (logit(1/3)).

In the first replication, the author used a random effects model as their key analysis of interest, with random intercepts for the exemplar object (Example), testing for a participant’s preference for shape as a function of the exemplar object: glmer( Choice == "shape" ~ Location + (1 + Location | Example) ). Notably, the first replication author did not include the offset(Offset) of the log-odds probability of 1/3. They also removed the random slopes for age group as a function of exemplar ((1 + Location | Example) instead of (1 + Location + AgeGroup | Example)) since they were not comparing the adult data to the child data. Using this random effects model, the first replication author obtained an unsignificant result with a p-value of 0.318 and a small estimate of 1.138.

Methods

Power Analysis

Because of uncertainty about the true effect size, I am going to use a small telescopes approach and run 3 times the original sample size (N=144) for a total of N=432 participants, which means 8 subjects per condition (54 conditions in total). While the main hypothesis for the failed first replication is the use of the wrong stimuli, running 3 times the sample size will also increase the power.

Planned Sample

The sample size for Experiment 5 will be N=432 U.S. adults, whose primary language is English, recruited on Prolific. I will stop the study when I reach 432 participants who complete the task. I will also collect the following demographic information from the adult participants:

Age: How old the participant is (in years).
Nationality:
- If the participant was born in the USA.
- If the participant spent more than a year in the USA before age 10.
- If the participant currently lives in the USA.
Geographic Location:
- The first three characters or digits of the participant’s current zipcode.
- The first three characters or digits of the zipcode where the participant spent the most time in growing up.
Urbanicity:
- How the participant would describe their current home location on a scale from 1-100, with 1 being very rural, 100 being very urban, and 50 being suburban.
- How the participant would describe their childhood home location (if multiple, they are asked to choose the one they spent the most time in before 18 years old) on a scale from 1-100, with 1 being very rural, 100 being very urban, and 50 being suburban.
Languages:
- The first language the participant learned.
- All languages that the participant currently speaks.

Materials

“Stimuli consisted of solid objects that varied in shape, color, and material…Each experiment consisted of three example objects and three extension objects. Each participant saw only one example object (counterbalanced across participants) and all extension objects…Experiment 5 which used photographs of the objects because it was conducted online” (Jara-Ettinger et al, 2022).

Procedure

Experiment 5 was a “one-shot learning trial and each participant completed one trial only. Although each trial only required one label, we used three different possible labels, randomized across participants. In the experiments with U.S. participants, the example object was called a koba, dax, or fep…Participants saw a single screen where the top said ‘This is a(n) x’ along with a picture of the object. Below, the text read ‘One of these is also a(n) x’ along with three pictures of the three possible extension choices. The text below read ‘Which one is the other x?’ Participants were allowed to select one of the three objects” (Jara-Ettinger et al, 2022).

Controls

“To ensure that participants were attending to the task, we also asked participants what each object shared in common with the original object. These questions were only included to motivate participants to look at the images carefully” (Jara-Ettinger et al, 2022). In the original study, answers to these questions were not used as exclusion criteria, but in this study, I will exclude participants who have more than one incorrect answer.

Analysis Plan

Participants will be excluded if they did not answer the main question of interest (choosing an extension object) or if they answer more than one attention check question incorrectly, asking how each extension object relates to the exemplar object. Data will be downloaded from Prolific and scrubbed of all identifying information (e.g., IP addresses) and then cleaned. I will conduct one main analysis and three exploratory analyses:

Main analysis:

A binomial test with chance set to 1/3 tested whether U.S. adults were significantly more likely to choose the shape match as opposed to the color or material matches within just the U.S. adult experiment. This test matches the statistical test run by the original authors.

Exploratory analyses:

A random effects model that takes into account the stimulus-level random effects by testing against chance within just the U.S. adult sample: glmer( Choice=="shape" ~ offset(Offset) + 1 + (1 + Example) ). This model tests, given an offset of 1/3, if the intercept is significantly different from chance while accounting for random effects.
A random effects model, with random intercepts for the exemplar object (Example), testing for a participant’s preference for shape as a function of the exemplar object and the population group (Location): glmer( Choice=="shape" ~ offset(Offset) + Location + (1 + Location | Example) ). The main difference between the model in the first replication and this model is the use of the Offset variable, set to the log-odds probability of 1/3.
The original paper’s logistic mixed-effects model that predicts the participant’s preference for the shape-match object: glmer( Choice == "shape" ~ offset(Offset) + Location + AgeGroup + (1 + Location + AgeGroup | Example) + (1 | Experiment) ). I will swap out the USA adult data from the original paper with my new USA adult data but keep all of the other data the same (USA children, Tsimane’ adults, and Tsimane’ children).

Clarify key analysis of interest: I am primarily interested in replicating the result in the original paper, using the binomial test statistic described under “Main analysis” above, that U.S. adults are more likely to choose the extension object that is matched on the shape property compared to the color or material matches.

Differences from Original Study and First Replication

The only known major difference between this plan and the first replication is that the stimuli are images of the physical objects as opposed to artistic renderings of the objects. The random effects model will also include the Offset variable, set to the log-odds probability of 1/3.

The main difference between this plan and the original study is that I collected more demographic information from the participants for exploratory analyses not conducted in the original study if the main analysis of interest did not replicate. I also excluded participants who answered more than one attention check question incorrectly when indicating what each object has in common with the exemplar object.

Methods Addendum (Post Data Collection)

Actual Sample

Sample size, demographics, data exclusions based on rules spelled out in analysis plan

Differences from pre-data collection methods plan

Any differences from what was described as the original plan, or “none”.

Results

Data preparation

This section contains the code necessary to prepare the data for analysis. First, load in the raw data. The data were collected in 8 batches of 54 subjects, where each subject saw a different condition in each batch. The data were collected in batches because cognition.run limits the number of subjects that can be run at a time. We’re going to import and stitch together the eight data files here.

#Load relevant libraries and functions. 
suppressMessages(library(tidyverse))
suppressMessages(library(dplyr))
suppressMessages(library(ggplot2))

#Get all data file names from the data_raw folder. 
folder_path <- file.path(getwd(), '..', 'data_raw')

# Get all file names in the folder, ignoring sub-directories. 
file_names <- list.files(path = folder_path, full.names = FALSE, recursive = FALSE)

# Filter to include only files (exclude directories). 
file_names <- file_names[!file.info(file.path(folder_path, file_names))$isdir]

#Check if there are any files in the list
if (length(file_names) == 0) {
  print("No files found in the specified folder.")
} else {
  
  #Create an empty dataframe to store all of the data
  full_raw_data <- data.frame()
  
  #Loop through each of the files
  for (file in file_names) {
    
    #Construct the full path to the file
    file_path <- file.path(folder_path, file)
    
    #Check if the file exists
    if (file.exists(file_path)) {
      
      #Read the data from the file
      temp_data <- read.csv(file_path)
      
      # Append the data to the full_raw_data dataframe
      full_raw_data <- rbind(full_raw_data, temp_data)
    } else {
      print(paste("File not found:", file_path))
    }
  }
}

Check a few things about the raw data. For example, there should be an equal spread of subjects per condition. Also make sure that all of the images for the subjects loaded successfully (they should, or they would not have been able to complete the study).

#Check that there are enough subjects per condition.
data_checks <- full_raw_data |> 
  filter(trial_type == 'fullscreen') |> #Get one row per subject. 
  group_by(condition) |> #Group by subject.
  summarize(numSubjectsPerCondition = n_distinct(condition)) #Count the number of times the o

print(paste("There are an equal number of subjects per condition:", all(data_checks$numSubjectsPerCondition == data_checks$numSubjectsPerCondition[1]))) #If true, then there are an equal number of subjects per condition.

[1] "There are an equal number of subjects per condition: TRUE"

Check that all of the images for each subject loaded successfully.

suppressMessages(library(stringr))
print(paste("Images loaded successfully for all subjects:", sum(str_detect(full_raw_data$success, 'false')) == 0)) #If true, then all images loaded successfully for all subjects.

[1] "Images loaded successfully for all subjects: TRUE"

Now clean the data. First remove columns that are unnecessary for the main confirmatory analyses and filter out subjects not relevant to the full sample (i.e., pilot subjects). Remove subjects who used phones to complete the study. Then parse the demographic responses into columns.

#Select only relevant columns and rename them. 
data_main_analyses <- full_raw_data |> 
  select(c('rt', 'trial_type', 'run_id', 'condition', 'recorded_at', 'device', 'success', 'stimulus', 'response', starts_with('exemplar'), starts_with('extension'))) |> 
  rename(reactionTime = rt, trialType = trial_type, subjectID = run_id, recordedAt = recorded_at, imageLoadSuccess = success) |> #Rename columns to match naming conventions. 
  filter(subjectID > 29) |> #Remove subjects before particular dates (i.e., pilot subjects). 
  filter(device != 'iPhone') |>  #Remove subjects who used phones.
  select(c(-'device', -'recordedAt')) #Remove columns that aren't necessary anymore. 

#Put demographic responses (currently in the 'response' column) into their own columns. 
for (row_index in 1:nrow(data_main_analyses)) {
  
  participant_response <- data_main_analyses[row_index, 'response'] #Create a variable for the response in the current row
  stimulus <- data_main_analyses[row_index, 'stimulus'] #Create a variable for the stimulus in the current row
  
  #Check that the 'response' is not an empty string.
  if(nzchar(participant_response)) { 
    
    #Check if it's a survey response (has {} in the string) and update.
    if (grepl("{", participant_response, fixed = TRUE)) {
      
      #Attention check: left image. 
      if (grepl('extensionLeftImg_exemplar_commonalities', participant_response)) {
        temp_response <- strsplit(participant_response, ':')[[1]][2] #Get only the answer
        temp_response <- gsub('["}]', '', temp_response) #Trim the extra characters 
        data_main_analyses[row_index, 'extensionLeftImgAttnCheckResponse'] <- temp_response
      }
      
      #Attention check: center image. 
      else if (grepl('extensionCenterImg_exemplar_commonalities', participant_response)) {
        temp_response <- strsplit(participant_response, ':')[[1]][2] #Get only the answer
        temp_response <- gsub('["}]', '', temp_response) #Trim the extra characters 
        data_main_analyses[row_index, 'extensionCenterImgAttnCheckResponse'] <- temp_response
      }
      
      #Attention check: right image.
      else if (grepl('extensionRightImg_exemplar_commonalities', participant_response)) {
        temp_response <- strsplit(participant_response, ':')[[1]][2] #Get only the answer
        temp_response <- gsub('["}]', '', temp_response) #Trim the extra characters 
        data_main_analyses[row_index, 'extensionRightImgAttnCheckResponse'] <- temp_response
      }
      
      #Participant age.
      else if (grepl('Age', participant_response)) {
        temp_response <- strsplit(participant_response, ':')[[1]][2] #Get only the answer
        temp_response <- gsub('["}]', '', temp_response) #Trim the extra characters 
        data_main_analyses[row_index, 'participantAge'] <- temp_response
      }
      
      #Geographical location.
      else if (grepl('CurrentUSA', participant_response)) {
        temp_response <- strsplit(participant_response, ',') #List of responses for BornUSA, ChildhoodUSA, and CurrentUSA.
        
        #Participant birth location. 
        if (grepl('BornUSA', temp_response[[1]][1])) {
          geog_temp_response <- strsplit(temp_response[[1]][1], ':')[[1]][2] #Get only the answer for BornUSA. 
          geog_temp_response <- gsub('["}]', '', geog_temp_response) #Trim the extra characters 
          data_main_analyses[row_index, 'participantBornUSA'] <- geog_temp_response
        }
        
        #Participant childhood location. 
        if (grepl('ChildhoodUSA', temp_response[[1]][2])) {
          geog_temp_response <- strsplit(temp_response[[1]][2], ':')[[1]][2] #Get only the answer for ChildhoodUSA.  
          geog_temp_response <- gsub('["}]', '', geog_temp_response) #Trim the extra characters 
          data_main_analyses[row_index, 'participantChildhoodUSA'] <- geog_temp_response
        }
        
        #Participant current location. 
        if (grepl('CurrentUSA', temp_response[[1]][3])) {
          geog_temp_response <- strsplit(temp_response[[1]][3], ':')[[1]][2] #Get only the answer for CurrentUSA. 
          geog_temp_response <- gsub('["}]', '', geog_temp_response) #Trim the extra characters 
          data_main_analyses[row_index, 'participantCurrentUSA'] <- geog_temp_response
        }
      }
      
      #Zipcodes. 
      else if (grepl('CurrentZipcode', participant_response)) {
        temp_response <- strsplit(participant_response, ',') #List of responses for CurrentZipcode and ChildhoodZipcode.
        
        #Participant current zipcode. 
        if (grepl('CurrentZipcode', temp_response[[1]][1])) {
          zipcode_temp_response <- strsplit(temp_response[[1]][1], ':')[[1]][2] #Get only the answer for CurrentZipcode.  
          zipcode_temp_response <- gsub('["}]', '', zipcode_temp_response) #Trim the extra characters 
          data_main_analyses[row_index, 'participantCurrentZipcode'] <- zipcode_temp_response
        }
        
        #Participant childhood zipcode.  
        if (grepl('ChildhoodZipcode', temp_response[[1]][2])) {
          zipcode_temp_response <- strsplit(temp_response[[1]][2], ':')[[1]][2] #Get only the answer for ChildhoodZipcode.  
          zipcode_temp_response <- gsub('["}]', '', zipcode_temp_response) #Trim the extra characters 
          data_main_analyses[row_index, 'participantChildhoodZipcode'] <- zipcode_temp_response
        }
      }
      
      #Languages. 
      else if (grepl('FirstLanguage', participant_response)) {
        temp_response <- strsplit(participant_response, '",') #List of responses for FirstLanguage and AllLanguages.

        #Participant first language. 
        if (grepl('FirstLanguage', temp_response[[1]][1])) {
          language_temp_response <- strsplit(temp_response[[1]][1], ':')[[1]][2] #Get only the answer for FirstLanguage. 
          language_temp_response <- gsub('["}]', '', language_temp_response) #Trim the extra characters 
          data_main_analyses[row_index, 'participantFirstLanguage'] <- language_temp_response
        }
        
        #Participant all languages spoken.  
        if (grepl('AllLanguages', temp_response[[1]][2])) {
          language_temp_response <- strsplit(temp_response[[1]][2], ':')[[1]][2] #Get only the answer for AllLanguages.
          language_temp_response <- gsub('["}]', '', language_temp_response) #Trim the extra characters 
          data_main_analyses[row_index, 'participantCurrentLanguages'] <- language_temp_response
        }
      }
    }
    
    #Check if it's a numerical response for participant's current location urbanicity rating (1-100). 
    else if(grepl("^[0-9]+$", participant_response) & (grepl('currently live', stimulus, fixed = TRUE))) {
      data_main_analyses[row_index, 'participantCurrentUrbanicity'] <- participant_response
    }
    
    #Check if it's a numerical response for participant's childhood location urbanicity rating (1-100). 
    else if(grepl("^[0-9]+$", participant_response) & (grepl('grew up', stimulus, fixed = TRUE))) {
      #Participant's childhood location urbanicity. 
      data_main_analyses[row_index, 'participantChildhoodUrbanicity'] <- participant_response
    }
    
    #Check if it's an image response (choosing the shape image). 
    else if(grepl("^[0-9]+$", participant_response) & (grepl('img', stimulus, fixed = TRUE))) {
      data_main_analyses[row_index, 'participantExtensionChoice'] <- participant_response
    }
  }
  
  #Interpret the participantExtensionChoice column to make the numerical answer choice values meaningful.
  participant_image_choice <- data_main_analyses[row_index, 'participantExtensionChoice']
  if (!is.na(participant_image_choice) && length(participant_image_choice) > 0) {
    
    #Participant chose the left image. 
    if(participant_image_choice == "0") {
      data_main_analyses[row_index, 'participantExtensionChoiceImage'] <- data_main_analyses[row_index, 'extensionLeftImg']
    }
    
    #Participant chose the center image.
    else if(participant_image_choice == "1") {
      data_main_analyses[row_index, 'participantExtensionChoiceImage'] <- data_main_analyses[row_index, 'extensionCenterImg']
    }
    
    #Participant chose the right image. 
    else if(participant_image_choice == "2") {
      data_main_analyses[row_index, 'participantExtensionChoiceImage'] <- data_main_analyses[row_index, 'extensionRightImg']
    }
  }
}

#Combine subject rows into a single row and remove the extra columns. 
data_main_analyses_tidy <- data_main_analyses |> 
  select(-reactionTime, -trialType, -imageLoadSuccess, -stimulus, -response) |> 
  mutate(across(everything(), ~ ifelse(. == "", NA, .))) |>  #Replace empty strings with NA
  group_by(subjectID) |> 
  summarize(across(everything(), ~ if (all(is.na(.))) {NA} else {na.omit(.)[1]}), .groups = "drop")

Now that the data are cleaned and in a tidy format, figure out whether participants were choosing based on color, material, or shape. To do so, analyze exemplarImg and participantExtensionChoiceImage to figure out what the common property is between the exemplar and the extension choice. Put the overlapping information into a new column called participantOverlapProperty.

#Go through each subject. 
for (row_index in 1:nrow(data_main_analyses_tidy)) {
  #Get the exemplar the participant saw.
  participant_exemplar <- unlist(strsplit(as.character(data_main_analyses_tidy[row_index, 'exemplarImg']), '_'))
  
  #Get the extension the participant chose. 
  participant_extension_choice <- unlist(strsplit(as.character(data_main_analyses_tidy[row_index, 'participantExtensionChoiceImage']), '_'))
  
  #Get the shape of the exemplar object and put into a new column called exemplarShape. 
  exemplar_shape <- participant_exemplar[3]
  data_main_analyses_tidy[row_index, 'exemplarShape'] <- exemplar_shape
  
  #Get the overlapping property. 
  overlap <- intersect(participant_exemplar, participant_extension_choice)
  
  #Put the overlapping property into a new column. 
  data_main_analyses_tidy[row_index, 'participantOverlapAnswer'] <- overlap
  
  #Interpret the overlapping answer
  if(overlap == 'red' | overlap == 'blue' | overlap == 'yellow') {
    data_main_analyses_tidy[row_index, 'participantOverlapProperty'] <- 'color' 
  }
  
  else if(overlap == 'crepe' | overlap == 'foam' | overlap == 'yarn') {
    data_main_analyses_tidy[row_index, 'participantOverlapProperty'] <- 'material'
  }

  else if(overlap == 'arch' | overlap == 'lamp' | overlap == 'snowman') {
    data_main_analyses_tidy[row_index, 'participantOverlapProperty'] <- 'shape'
  }
}

Finally, reorder the columns into a logical order and add information about the participant group and experiment type.

data_main_analyses_tidy <- data_main_analyses_tidy |> 
  mutate(participantLocation = 'US', experiment= 'USA_Adults', ageGroup = 'Adults', exclude = 0) |> 
  select(subjectID, condition, experiment, ageGroup, participantLocation, exemplarName, exemplarImg, exemplarShape, participantOverlapAnswer, participantOverlapProperty, participantExtensionChoice, participantExtensionChoiceImage, extensionLeftImg, extensionCenterImg, extensionRightImg, extensionLeftImgAttnCheckResponse, extensionCenterImgAttnCheckResponse, extensionRightImgAttnCheckResponse, participantAge, participantBornUSA, participantChildhoodUSA, participantCurrentUSA, participantCurrentZipcode, participantChildhoodZipcode, participantFirstLanguage, participantCurrentLanguages, participantCurrentUrbanicity, participantChildhoodUrbanicity, exclude)

Results of control measures

I examined the answers to the attention check question “what does each object share in common with the original object?” to verify that participants understood the task and were paying attention. I manually excluded participants who answered these questions incorrectly more than once. The default exclude value is 0.

Reproduction of original analyses

I want to make sure that I can get the same results as the original authors from their analysis code. Here, I am going to load in the original data and adapt the code from the original authors, accessible here.

#Load libraries
suppressMessages(library(tidyverse))
suppressMessages(library(boot))
suppressMessages(library(lme4))

# Load data
original_data_file <- "Original_ShapeBias_Data.csv"
original_data_path <- file.path(getwd(), '..', 'data', original_data_file)
original_data <- read.csv(original_data_path)

table(original_data$Experiment) #Contains data from all 7 experiments.


       Tsi_Adults_Objects    Tsi_Adults_wDistractor         Tsi_Child_Objects 
                       39                        41                        37 
   Tsi_Child_ShapeVsColor Tsi_Child_ShapeVsMaterial               US_Children 
                       30                        42                        30 
           US_Turk_Adults 
                      144

#Number of participants
original_data |> group_by(AgeGroup,Location) %>% summarize(subjects=n(), .groups = 'drop')

# A tibble: 4 × 3
  AgeGroup Location subjects
  <chr>    <chr>       <int>
1 Adults   Bolivia        80
2 Adults   US            144
3 Children Bolivia       109
4 Children US             30

#Stats by experiment
summary_data_original <- original_data |> 
  group_by(Experiment) |> 
  summarize(subjects=n(), 
            average=mean(Age,na.rm=TRUE),
            minage=min(Age,na.rm=TRUE),
            maxage=max(Age,na.rm=TRUE),
            stdev=sd(Age,na.rm=TRUE),
            ShapeNo=sum(Choice=="Shape"),
            MatNo=sum(Choice=="Material"),
            ColorNo=sum(Choice=="Color"),
            DistNo=sum(Choice=="Distractor"),
            ShapePerc=ShapeNo/subjects,
            MatPerc=MatNo/subjects,
            ColorPerc=ColorNo/subjects,
            DistPerc=DistNo/subjects)
summary_data_original$ExpNo <- c(6,7,2,4,3,1,5)
summary_data_original <- arrange(summary_data_original,ExpNo)

# Results
summary_data_original |> 
  dplyr::select(ExpNo, Experiment, subjects, ShapeNo, MatNo, ColorNo, DistNo, ShapePerc, MatPerc, ColorPerc, DistPerc)

# A tibble: 7 × 11
  ExpNo Experiment       subjects ShapeNo MatNo ColorNo DistNo ShapePerc MatPerc
  <dbl> <chr>               <int>   <int> <int>   <int>  <int>     <dbl>   <dbl>
1     1 US_Children            30      18    10       2      0     0.6     0.333
2     2 Tsi_Child_Objec…       37      12    16       9      0     0.324   0.432
3     3 Tsi_Child_Shape…       42      12    27       0      3     0.286   0.643
4     4 Tsi_Child_Shape…       30      10     0      17      3     0.333   0    
5     5 US_Turk_Adults        144      99    41       4      0     0.688   0.285
6     6 Tsi_Adults_Obje…       39      18     8      13      0     0.462   0.205
7     7 Tsi_Adults_wDis…       41      17    12      11      1     0.415   0.293
# ℹ 2 more variables: ColorPerc <dbl>, DistPerc <dbl>

#Reproduce the binomial test for Experiment 5: US_Turk_Adults.
CurrExp <- summary_data_original |> 
  filter(ExpNo==5)
binom.test(CurrExp$ShapeNo, CurrExp$subjects, p=1/3)


    Exact binomial test

data:  CurrExp$ShapeNo and CurrExp$subjects
number of successes = 99, number of trials = 144, p-value < 2.2e-16
alternative hypothesis: true probability of success is not equal to 0.3333333
95 percent confidence interval:
 0.6050180 0.7620797
sample estimates:
probability of success 
                0.6875

#This model predicts the participant's preference for the shape-match object. 
CurrExp <- original_data |> 
  filter(Experiment %in% c("Tsi_Adults_Objects","Tsi_Adults_wDistractor", "US_Turk_Adults","Tsi_Child_Objects","US_Children")) |> 
  filter(Choice!="Distractor")
CurrExp$Offset <- logit(1/3)

m <- glmer(Choice=="Shape" ~ offset(Offset) + Location * AgeGroup + (1 + Location + AgeGroup| Example) + (1 | Experiment),CurrExp,family="binomial")
summary(m)

Generalized linear mixed model fit by maximum likelihood (Laplace
  Approximation) [glmerMod]
 Family: binomial  ( logit )
Formula: Choice == "Shape" ~ offset(Offset) + Location * AgeGroup + (1 +  
    Location + AgeGroup | Example) + (1 | Experiment)
   Data: CurrExp

     AIC      BIC   logLik deviance df.resid 
   376.3    416.7   -177.2    354.3      279 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-2.3934 -0.8253  0.4178  0.8957  2.2925 

Random effects:
 Groups     Name             Variance  Std.Dev.  Corr       
 Experiment (Intercept)      1.540e-08 0.0001241            
 Example    (Intercept)      1.230e-01 0.3506452            
            LocationUS       2.197e-01 0.4687695  0.81      
            AgeGroupChildren 1.089e+00 1.0436391 -0.67 -0.11
Number of obs: 290, groups:  Experiment, 5; Example, 3

Fixed effects:
                            Estimate Std. Error z value Pr(>|z|)   
(Intercept)                   0.4550     0.3073   1.480  0.13877   
LocationUS                    1.1373     0.4080   2.787  0.00532 **
AgeGroupChildren             -0.8325     0.7718  -1.079  0.28070   
LocationUS:AgeGroupChildren   0.4077     0.6807   0.599  0.54918   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
            (Intr) LctnUS AgGrpC
LocationUS  -0.072              
AgGrpChldrn -0.567  0.108       
LctnUS:AgGC  0.257 -0.331 -0.455

#Actual model: main effects of `Location` and `AgeGroup` tested without their interaction. 
m0 <- glmer(Choice=="Shape" ~ offset(Offset) + Location + AgeGroup + (1 + Location + AgeGroup | Example) + (1 | Experiment),CurrExp,family="binomial")

boundary (singular) fit: see help('isSingular')

anova(m,m0)

Data: CurrExp
Models:
m0: Choice == "Shape" ~ offset(Offset) + Location + AgeGroup + (1 + Location + AgeGroup | Example) + (1 | Experiment)
m: Choice == "Shape" ~ offset(Offset) + Location * AgeGroup + (1 + Location + AgeGroup | Example) + (1 | Experiment)
   npar    AIC    BIC  logLik deviance Chisq Df Pr(>Chisq)
m0   10 374.68 411.38 -177.34   354.68                    
m    11 376.32 416.69 -177.16   354.32 0.365  1     0.5457

summary(m0)

Generalized linear mixed model fit by maximum likelihood (Laplace
  Approximation) [glmerMod]
 Family: binomial  ( logit )
Formula: Choice == "Shape" ~ offset(Offset) + Location + AgeGroup + (1 +  
    Location + AgeGroup | Example) + (1 | Experiment)
   Data: CurrExp

     AIC      BIC   logLik deviance df.resid 
   374.7    411.4   -177.3    354.7      280 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-2.4474 -0.8063  0.4086  0.9219  2.0617 

Random effects:
 Groups     Name             Variance Std.Dev. Corr       
 Experiment (Intercept)      0.0000   0.0000              
 Example    (Intercept)      0.1284   0.3583              
            LocationUS       0.2202   0.4693    0.82      
            AgeGroupChildren 0.9782   0.9890   -0.68 -0.13
Number of obs: 290, groups:  Experiment, 5; Example, 3

Fixed effects:
                 Estimate Std. Error z value Pr(>|z|)   
(Intercept)        0.4061     0.2999   1.354  0.17560   
LocationUS         1.2215     0.3844   3.178  0.00148 **
AgeGroupChildren  -0.6236     0.6569  -0.949  0.34239   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
            (Intr) LctnUS
LocationUS   0.030       
AgGrpChldrn -0.540 -0.060
optimizer (Nelder_Mead) convergence code: 0 (OK)
boundary (singular) fit: see help('isSingular')

Based on the calculations above, I successfully reproduced the original results: “This model…showed that U.S. participants were significantly more likely to generalize the label by shape \((\beta = 1.22\); \(p < .01)\) and no difference across age groups \((\beta = -.62\); \(p = .34)\)”.

Confirmatory analysis

I conducted one main confirmatory analysis:

A binomial test with chance set to 1/3 tested whether U.S. adults were significantly more likely to choose the shape match as opposed to the color or material matches within just this experiment. This test matches the statistical test run by the original authors. The results did/did not replicate with the original study.

#Summary statistics  
summary_data <- data_main_analyses_tidy |>  
  summarize(subjects=n(), 
            average=mean(as.numeric(participantAge), na.rm=TRUE),
            minage=min(as.numeric(participantAge), na.rm=TRUE),
            maxage=max(as.numeric(participantAge), na.rm=TRUE),
            stdev=sd(as.numeric(participantAge), na.rm=TRUE),
            shapeNum=sum(participantOverlapProperty=="shape"),
            materialNum=sum(participantOverlapProperty=="material"),
            colorNum=sum(participantOverlapProperty=="color"),
            shapePercent=shapeNum/subjects,
            matPercent=materialNum/subjects,
            colorPercent=colorNum/subjects,
            .groups = 'drop') |> 
    ungroup()

#Binomial test
binomial_result <- binom.test(summary_data$shapeNum, summary_data$subjects, p=1/3)

#Store values of 95% confidence intervals. 
conf_int <- binomial_result$conf.int # Confidence interval

Figure 1 shows the proportion of choices of an extension by the type of property (shape, color, material).

#Create dataframe for figure. 
figure_statistics <- data_main_analyses_tidy |>  
  group_by(participantOverlapProperty, participantLocation) |> 
  summarize(n = n(), proportion = n / nrow(data_main_analyses_tidy), .groups = 'drop') |> 
  ungroup()

#Plot the data for the first figure and save as a file. 
ggplot(figure_statistics, aes(y = proportion, x = participantLocation, fill = participantOverlapProperty)) +
  geom_bar(position = "stack", stat = 'identity', width = .7) +
  geom_errorbar(aes(ymin = binomial_result$conf.int[1], ymax = binomial_result$conf.int[2]), width = .2) +
  theme_minimal() +
  scale_fill_manual(values = c("#7F92B8", "#6C6969", "#B4DCB9")) +
  scale_x_discrete('Experiment 5: USA Adults') +
  scale_y_continuous('Percentage of responses', labels = scales::percent) +
  facet_wrap(~participantLocation) +
  geom_hline(yintercept = (1/3), linetype = 'dotted', color = 'black') +
  labs(fill = "Choice") +
  theme(legend.title = element_text(size = 12, face = "bold"))

img_path <- file.path(getwd(), '..', 'figures', 'choice_by_property_rescue.png')
ggsave(img_path, device = 'png', dpi=300)

Saving 7 x 5 in image

We also want to calculate the proportion of choice by property for each of the shapes (arch, lamp, snowman).

#Calculate the percentages of each choice by property and output into a dataframe. 
count_df <- data_main_analyses_tidy |> 
  group_by(exemplarShape, participantOverlapProperty, experiment) |> 
  summarize(count = n(), .groups = 'drop') |> 
  ungroup()

#Pivot the dataframe.
figure_statistics_by_shape_property <- pivot_wider(count_df, id_cols = exemplarShape, names_from = participantOverlapProperty, values_from = count)
figure_statistics_by_shape_property[is.na(figure_statistics_by_shape_property)] <- 0 #Make NULL values 0

#Calculate the proportion of responses for each property. 
figure_statistics_by_shape_property$totalCount <- rowSums(figure_statistics_by_shape_property[, -1]) #Calculate the total responses per property 
columns_to_calculate <- c("shape", "material", "color")  #All possible properties 
existing_columns <- intersect(columns_to_calculate, names(figure_statistics_by_shape_property)) #Find columns that actually exist
figure_statistics_by_shape_property <- figure_statistics_by_shape_property |> 
  mutate(across(all_of(existing_columns), ~ .x / totalCount, .names = "{.col}Proportion")) #Takes into account if properties don't exist. 

#Pivot the data. 
figure_statistics_by_shape_property_long <- figure_statistics_by_shape_property |> 
  pivot_longer(
    cols = any_of(c("shapeProportion", "materialProportion", "colorProportion")),
    names_to = "exemplarProperty",
    values_to = "proportion"
  )

#Clean up the property value names.
figure_statistics_by_shape_property_long$exemplarProperty <- sub("Proportion", "", figure_statistics_by_shape_property_long$exemplarProperty)

Figure 2 shows the proportion of choices by type of extension shape within the three property (shape, color, material) categories.

#Plot the data for the second figure and save as a file.
ggplot(figure_statistics_by_shape_property_long, aes(x = exemplarShape, y = proportion, fill = exemplarProperty)) +
  geom_bar(stat = "identity") + #use the actual proportions
  theme_minimal() + 
  scale_fill_manual(values = c("#7F92B8", "#6C6969", "#B4DCB9")) + #set the bar colors 
  scale_x_discrete('Extension Shape') + #x-label
  scale_y_continuous(labels = scales::percent) + # Convert y-axis labels to percent
  #facet_wrap(~participantLocation) + #group responses by population
  geom_hline(yintercept = (1/3), linetype = 'dotted', color = 'black') + #add dotted chance line 
  labs(x = "Shape", y = "Percentage of Responses", fill = "Material") +
  labs(fill = "Property Choice") + 
  theme(legend.title = element_text(size = 12, face = "bold"))

img_path <- file.path(getwd(), '..', 'figures', 'choice_by_shape_and_property_rescue.png')
ggsave(img_path, device = 'png', dpi=300)

Saving 7 x 5 in image

Exploratory analyses

Given that the first replication author observed stimulus-level random effects, the first exploratory analysis is a model that takes into account these random effects: glmer( Choice=="shape" ~ offset(Offset) + 1 + (1 + Example) ). This model tests, given an offset of 1/3, if the intercept is significantly different from 0 while accounting for random effects.

#Create a dataframe for the analysis. 
rescue_data <- data_main_analyses_tidy |> 
  select(Age = participantAge,
         Experiment = experiment, 
         Example = exemplarShape, 
         Choice = participantOverlapProperty, 
         AgeGroup = ageGroup, 
         Location = participantLocation)

#Add the offset column. 
rescue_data$Offset <- logit(1/3)

#Run the random effects model against chance. 
summary(glmer(Choice=="shape" ~ offset(Offset) + 1 + (1 | Example), rescue_data, family="binomial"))

Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge with max|grad| = 0.00358313 (tol = 0.002, component 1)

Generalized linear mixed model fit by maximum likelihood (Laplace
  Approximation) [glmerMod]
 Family: binomial  ( logit )
Formula: Choice == "shape" ~ offset(Offset) + 1 + (1 | Example)
   Data: rescue_data

     AIC      BIC   logLik deviance df.resid 
    11.6     11.2     -3.8      7.6        4 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-0.7992 -0.5932 -0.5680  0.7965  1.2513 

Random effects:
 Groups  Name        Variance Std.Dev.
 Example (Intercept) 0.518    0.7197  
Number of obs: 6, groups:  Example, 3

Fixed effects:
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -0.18555    0.01962  -9.459   <2e-16 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
optimizer (Nelder_Mead) convergence code: 0 (OK)
Model failed to converge with max|grad| = 0.00358313 (tol = 0.002, component 1)

I also ran a random effects model, with random intercepts for the exemplar object (Example), testing for a participant’s preference for shape as a function of the exemplar object: glmer( Choice=="shape" ~ offset(Offset) + Location + (1 + Location | Example) ). The main difference between the model in the first replication and this model is the use of the Offset variable, set to the log-odds probability of 1/3.

#Create a dataset to merge with the original dataset. 
rescue_data <- data_main_analyses_tidy |> 
  select(Age = participantAge,
         Experiment = experiment, 
         Example = exemplarShape, 
         Choice = participantOverlapProperty, 
         AgeGroup = ageGroup, 
         Location = participantLocation)

#Filter the original dataset for just the Tsimane adult data (Experiment 6, without distractors).   
tsimane_adult_data <- original_data |> 
  filter(Experiment == 'Tsi_Adults_Objects') |> 
  select(Age, Experiment, Example, Choice, AgeGroup, Location) |> 
  mutate(Example = tolower(Example), Choice = tolower(Choice)) #Make all values in these columns lowercase. 

#Combine the rescue USA adult data and the original data from Experiment 6 (N=39). 
appended_data <- rbind(rescue_data, tsimane_adult_data)

#Add the offset column. 
appended_data$Offset <- logit(1/3)

#Run the random effects model. 
summary(glmer(Choice=="shape" ~ offset(Offset) + Location + (1 + Location | Example), appended_data, family="binomial"))

boundary (singular) fit: see help('isSingular')

Generalized linear mixed model fit by maximum likelihood (Laplace
  Approximation) [glmerMod]
 Family: binomial  ( logit )
Formula: Choice == "shape" ~ offset(Offset) + Location + (1 + Location |  
    Example)
   Data: appended_data

     AIC      BIC   logLik deviance df.resid 
    70.5     79.5    -30.2     60.5       40 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-1.0900 -0.9244 -0.7220  1.0092  1.3616 

Random effects:
 Groups  Name        Variance Std.Dev. Corr
 Example (Intercept) 0.2185   0.4675       
         LocationUS  0.5546   0.7447   1.00
Number of obs: 45, groups:  Example, 3

Fixed effects:
            Estimate Std. Error z value Pr(>|z|)
(Intercept)   0.5282     0.4263   1.239    0.215
LocationUS   -0.8666     1.4358  -0.604    0.546

Correlation of Fixed Effects:
           (Intr)
LocationUS 0.012 
optimizer (Nelder_Mead) convergence code: 0 (OK)
boundary (singular) fit: see help('isSingular')

I also ran the original paper’s logistic mixed-effects model that predicts the participant’s preference for the shape-match object: glmer( Choice == "shape" ~ offset(Offset) + Location + AgeGroup + (1 + Location + AgeGroup | Example) + (1 | Experiment) ) by swapping out the USA adult data from the original paper with my new USA adult data but keeping all of the other data the same (USA children, Tsimane’ adults, and Tsimane’ children). In the original study, the authors used a baseline probability of 33.3% with the population (U.S. participants) and the age group (adults) dummy coded as independent variables. To control for the role of exemplar, the regression included random intercepts for the experiment number, random intercepts for the exemplar object, random slopes for population as a function of exemplar, and random slopes for age group as a function of exemplar. This model found that, with a new sample of U.S. adults, they were [] to generalize the label of a novel object by shape \((\beta = X.XX\), \(p < X.XX)\).

#Filter the original dataset to remove the USA adult data (and not include two if the Tsimane children experiments).   
original_data_no_USA_adult <- original_data |> 
  filter(Experiment %in% c("Tsi_Adults_Objects","Tsi_Adults_wDistractor", "Tsi_Child_Objects","US_Children")) |> 
  filter(Choice!="Distractor") |> 
  select(Age, Experiment, Example, Choice, AgeGroup, Location) |> 
  mutate(Example = tolower(Example), Choice = tolower(Choice)) #Make all values in these columns lowercase. 

#Combine the rescue USA adult data and the original data (minus the US_Turk_Adult subjects). 
appended_data_full <- rbind(rescue_data, original_data_no_USA_adult)

#Add the offset column. 
appended_data_full$Offset <- logit(1/3)

#Run the logistic mixed-effects model. 
summary(glmer(Choice == "shape" ~ offset(Offset) + Location + AgeGroup + (1 + Location + AgeGroup | Example) + (1 | Experiment), appended_data_full, family="binomial"))

Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
unable to evaluate scaled gradient

Warning in checkConv(attr(opt, "derivs"), opt$par, ctrl = control$checkConv, :
Model failed to converge: degenerate Hessian with 1 negative eigenvalues

Generalized linear mixed model fit by maximum likelihood (Laplace
  Approximation) [glmerMod]
 Family: binomial  ( logit )
Formula: Choice == "shape" ~ offset(Offset) + Location + AgeGroup + (1 +  
    Location + AgeGroup | Example) + (1 | Experiment)
   Data: appended_data_full

     AIC      BIC   logLik deviance df.resid 
   216.1    246.3    -98.0    196.1      142 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-2.5121 -0.8458 -0.5993  1.1791  1.6813 

Random effects:
 Groups     Name             Variance  Std.Dev.  Corr     
 Experiment (Intercept)      7.330e-07 0.0008562          
 Example    (Intercept)      4.785e-05 0.0069170          
            LocationUS       1.133e+00 1.0646353 0.51     
            AgeGroupChildren 2.188e-01 0.4677830 0.49 1.00
Number of obs: 152, groups:  Experiment, 5; Example, 3

Fixed effects:
                 Estimate Std. Error z value Pr(>|z|)
(Intercept)        0.3594     0.2233   1.609    0.108
LocationUS         0.9014     0.8003   1.126    0.260
AgeGroupChildren  -0.3196     0.4833  -0.661    0.508

Correlation of Fixed Effects:
            (Intr) LctnUS
LocationUS  -0.070       
AgGrpChldrn -0.407  0.210
optimizer (Nelder_Mead) convergence code: 0 (OK)
unable to evaluate scaled gradient
Model failed to converge: degenerate  Hessian with 1 negative eigenvalues

Three-panel graph with original, 1st replication, and your replication is ideal here

Discussion

Mini meta analysis

Combining across the original paper, 1st replication, and 2nd replication, what is the aggregate effect size?

Summary of Replication Attempt

Open the discussion section with a paragraph summarizing the primary result from the confirmatory analysis and the assessment of whether it replicated, partially replicated, or failed to replicate the original result.

Commentary

Add open-ended commentary (if any) reflecting (a) insights from follow-up exploratory analysis, (b) assessment of the meaning of the replication (or not) - e.g., for a failure to replicate, are the differences between original and present study ones that definitely, plausibly, or are unlikely to have been moderators of the result, and (c) discussion of any objections or challenges raised by the current and original authors about the replication attempt. None of these need to be long.