Replication of The Origins of the Shape Bias: Evidence From the Tsimane’ by Jara-Ettinger et al. (2022, Journal of Experimental Psychology: General)

Author

Emily Chen (emchen15@stanford.edu)

Published

May 2, 2024

Introduction

I chose to try and rescue the replication of this experiment because I am interested in the intersection of language and spatial/object reasoning. As a developmental cognitive scientist, I also found the comparison of adults and children to be an important element relevant to my research interests. Finally, the cross-cultural comparisons made in this paper are something that I’d like to explore further in my own research, outside of the context of this project.

The stimuli that will be required to collect additional data for this experiment are available in the repository of the original replication project. The original replication collected data online from 144 U.S. adults, following the paradigm of Experiment 5 in the paper. It won’t be possible to collect additional data from Tsimane’ adults, so a replication of Experiments 6 and 7 cannot be performed. The new replication attempt will feature attention checks and a demographic questionnaire about the subject’s early geographic environment (e.g., urbanicity of current and childhood environments), given the theoretical claims in the paper about the importance of early environmental factors on the strength of the shape bias in children.

The PDF of the original paper can be found here. The project’s Github repository with all data, scripts, and the source code for this writeup can be found here. The preregistration can be found here. The experiment, hosted on cognition.run, can be found here.

Summary of prior replication attempt

The prior replication attempt tried to replicate only Experiment 5 from the original paper (the original paper had seven total experiments), which tested U.S. adults online using Amazon’s Mechanical Turk. While the replication attempt does not specify how they collected the data, I assume that they collected the data using Prolific instead of MTurk.

The biggest difference between the original study and the first replication is that the original authors used different stimuli from the first replication author. The stimuli in the original paper used the images shown in Table 1, but the first replication used the images shown in Figure 1. The replication author contacted the first author of the study for the original image files, but they were lost, so the replication author used screenshots of the shapes taken from the figures in the paper.

The demographics of the sample were the same for both the original study (specified as U.S. adults on MTurk) and the first replication (specified as English-speaking adults from the U.S.). The original study had a sample size of N=144 U.S. adults, and the first replication had a sample size of N=142 U.S. adults, with a planned sample of N=144, but two participants did not complete the study and the replication author did not collect two additional data points.

There are two main analyses:

Calculating the percentage of participants who chose the object based on shape congruency with the exemplar, the percentage of participants who chose the object based on material congruency with the exemplar, and the percentage of participants who chose the object based on color congruency with the exemplar.

In the original study, the authors used a binomial test with chance set to 1/3 to test whether U.S. adults were significantly more likely to choose the shape match as opposed to the color or material matches within just the U.S. adult experiment. The first replication did not run this test.

Running a logistic mixed-effects model that predicted the participant’s preference for the shape-match object.

In the original study, the authors used a baseline probability of 33.3% with the population and the age group dummy coded as independent variables. To control for the role of exemplar, the regression included random intercepts for the experiment (1 | Experiment), random intercepts for the exemplar object (Example), random slopes for population as a function of exemplar (1 + Location + AgeGroup | Example), and random slopes for age group (1 + Location + AgeGroup | Example) as a function of exemplar: glmer( Choice == "shape" ~ offset(Offset) + Location + AgeGroup + (1 + Location + AgeGroup | Example) + (1 | Experiment) ). The population by age interaction effect (Location * AgeGroup) model was not included based on a nested model comparison; instead, only the main effects of Location and AgeGroup were tested without their interaction (Location + AgeGroup). The Offset variable was set to the log-odds probability of 1/3 or 33.3% (logit(1/3)).

In the first replication, the author used a random effects model as their key analysis of interest, with random intercepts for the exemplar object (Example), testing for a participant’s preference for shape as a function of the exemplar object: glmer( Choice == "shape" ~ Location + (1 + Location | Example) ). Notably, the first replication author did not include the offset(Offset) of the log-odds probability of 1/3. They also removed the random slopes for age group as a function of exemplar ((1 + Location | Example) instead of (1 + Location + AgeGroup | Example)) since they were not comparing the adult data to the child data. Using this random effects model, the first replication author obtained an unsignificant result with a p-value of 0.318 and a small estimate of 1.138.

Methods

Power Analysis

Because of uncertainty about the true effect size, I am going to use a small telescopes approach and run 3 times the original sample size (N=144) for a total of N=432 participants, which means 8 subjects per condition (54 conditions in total). While the main hypothesis for the failed first replication is the use of the wrong stimuli, running 3 times the sample size will also increase the power.

Planned Sample

The sample size for Experiment 5 will be N=432 U.S. adults, whose primary language is English, recruited on Prolific. I will stop the study when I reach 432 participants who complete the task. I will also collect the following demographic information from the adult participants:

Age: How old the participant is (in years).
Nationality:
- If the participant was born in the USA.
- If the participant spent more than a year in the USA before age 10.
- If the participant currently lives in the USA.
Geographic Location:
- The first three characters or digits of the participant’s current zipcode.
- The first three characters or digits of the zipcode where the participant spent the most time in growing up.
Urbanicity:
- How the participant would describe their current home location on a scale from 1-100, with 1 being very rural, 100 being very urban, and 50 being suburban.
- How the participant would describe their childhood home location (if multiple, they are asked to choose the one they spent the most time in before 18 years old) on a scale from 1-100, with 1 being very rural, 100 being very urban, and 50 being suburban.
Languages:
- The first language the participant learned.
- All languages that the participant currently speaks.

Materials

“Stimuli consisted of solid objects that varied in shape, color, and material…Each experiment consisted of three example objects and three extension objects. Each participant saw only one example object (counterbalanced across participants) and all extension objects…Experiment 5…used photographs of the objects because it was conducted online” (Jara-Ettinger et al, 2022).

Procedure

Experiment 5 was a “one-shot learning trial and each participant completed one trial only. Although each trial only required one label, we used three different possible labels, randomized across participants. In the experiments with U.S. participants, the example object was called a koba, dax, or fep…Participants saw a single screen where the top said ‘This is a(n) x’ along with a picture of the object. Below, the text read ‘One of these is also a(n) x’ along with three pictures of the three possible extension choices. The text below read ‘Which one is the other x?’ Participants were allowed to select one of the three objects” (Jara-Ettinger et al, 2022).

Controls

“To ensure that participants were attending to the task, we also asked participants what each object shared in common with the original object. These questions were only included to motivate participants to look at the images carefully” (Jara-Ettinger et al, 2022). In the original study, answers to these questions were not used as exclusion criteria, but in this study, I will exclude participants who have more than one incorrect answer.

Analysis Plan

Participants will be excluded if they did not answer the main question of interest (choosing an extension object) or if they answer more than one attention check question incorrectly, asking how each extension object relates to the exemplar object. Data will be downloaded from Prolific and scrubbed of all identifying information (e.g., IP addresses) and then cleaned. I will conduct one main analysis and three exploratory analyses:

Main analysis:

A binomial test with chance set to 1/3 tested whether U.S. adults were significantly more likely to choose the shape match as opposed to the color or material matches within just the U.S. adult experiment. This test matches the statistical test run by the original authors.

Exploratory analyses:

A random effects model that takes into account the stimulus-level random effects by testing against chance within just the U.S. adult sample. This model tests, given an offset of 1/3, if the intercept is significantly different from chance while accounting for random effects.

glmer( Choice=="shape" ~ offset(Offset) + 1 + (1 | Example) )

A random effects model, with random intercepts for the exemplar object (Example), testing for a participant’s preference for shape as a function of the exemplar object and the population group (Location). The main difference between the model in the first replication and this model is the use of the Offset variable, set to the log-odds probability of 1/3.

glmer( Choice=="shape" ~ offset(Offset) + Location + (1 + Location | Example) )

The original paper’s logistic mixed-effects model that predicts the participant’s preference for the shape-match object. I will swap out the USA adult data from the original paper with my new USA adult data but keep all of the other data the same (USA children, Tsimane’ adults, and Tsimane’ children).

glmer( Choice == "shape" ~ offset(Offset) + Location + AgeGroup + (1 + Location + AgeGroup | Example) + (1 | Experiment) )

Key analysis of interest: I am primarily interested in replicating the result in the original paper, using the binomial test statistic described under “Main analysis” above, that U.S. adults are more likely to choose the extension object that is matched on the shape property compared to the color or material matches.

Differences from Original Study and First Replication

The only known major difference between this plan and the first replication is that the stimuli are images of the physical objects as opposed to artistic renderings of the objects. The random effects model will also include the Offset variable, set to the log-odds probability of 1/3. Ensuring that the stimuli versions match the original study may have an effect on this rescue’s replication success.

The main difference between this plan and the original study is that I collected more demographic information from the participants for exploratory analyses not conducted in the original study if the main analysis of interest did not replicate. I also excluded participants who answered more than one attention check question incorrectly when indicating what each object has in common with the exemplar object. These changes will hopefully (1) provide more background information to determine (if necessary) if there are particular demographic factors causing different results from the original paper and (2) ensure that subjects are paying attention to the stimuli and not guessing randomly.

Overall, based on Figure 1 in this paper, this rescue replication is an exact replication, given that all aspects of the experiment under our control are the same relative to the original paper.

Methods Addendum (Post Data Collection)

Actual Sample

The data were collected in eight batches of N=54 subjects each, one at a time so that subjects who participated in the previous batches could be excluded from participating. Each batch had 54 unique conditions, one for each subject who participated. The anticipated total number of subjects was N=432. The raw data contains N=429 subjects collected, but across the eight batches, N=426 subjects completed the study (N=3 subjects started the study but did not provide data). Subjects opted out of participating in the study by not answering or completing the study, and rows in the dataset that had NA values for the main question of interest were excluded. Despite implementing an exclusion rule for each subsequent experimental batch, N=9 subjects participated twice. However, since duplicate participation was not an exclusion criteria in the original analysis plan, these subjects were not removed from the final dataset.

Differences from pre-data collection methods plan

None.

Results

Data preparation

This section contains the code necessary to prepare the data for analysis. First, load in the raw data. The data were collected in 8 batches of 54 subjects, where each subject saw a different condition in each batch. The data were collected in batches because the free version of cognition.run limits the number of subjects that can be run at a time, so each batch contained one version of each counterbalancing option (3 objects x 3 names x 6 possible presentation orders). Prior to making the repository public, these data were stitched together into one CSV file and anonymized using /scripts/anonymize_cognition_run.R (the file name is full_sample_anonymized.csv).

# Load relevant libraries and functions. 
suppressMessages(library(tidyverse))
suppressMessages(library(dplyr))
suppressMessages(library(ggplot2))

# Load data. 
all_data <- read_csv('../data/full_sample_anonymized.csv',
                     show_col_types = FALSE)

Check the raw data to make sure that all of the images for the subjects loaded successfully (they should, or they would not have been able to complete the study).

suppressMessages(library(stringr))

# If true, then all images loaded successfully for all subjects.
print(paste("Images loaded successfully for all subjects:", 
            sum(str_detect(all_data$success, 'false'), na.rm = TRUE) == 0))

[1] "Images loaded successfully for all subjects: TRUE"

Check how many unique subjects there are based on their prolific_id. Note that subjects were not excluded if they did the study twice, as that was not part of the preregistered analysis plan.

subject_check <- all_data |> 
  group_by(prolific_id) |> 
  summarize(numRows = n())

# If a subject completed the study once, they should have 18 rows. 
repeat_subjects <- subset(subject_check, numRows > 18)

cat(nrow(repeat_subjects), "subjects did the study more than once.")

9 subjects did the study more than once.

Now clean the data. First remove columns that are unnecessary for the main confirmatory analyses and filter out subjects not relevant to the full sample (i.e., pilot subjects). Remove subjects who used phones to complete the study, since the exemplar and the extension stimuli are not displayed in one view on phones and thus participants would not be able to see all options at once. Then parse the demographic responses into columns.

# Since the data were collected in 8 batches, there are repeat subject IDs 
  #(`batchSubjectID`). Create a new column called `subject_id` and make each 
  #one unique to the combined dataset. Each subject has one 'fullscreen' trial 
  #so use that to assign a unique subject ID.  

suppressMessages(library(data.table))

# Convert the dataframe to a data.table.
setDT(all_data)

# Create a new column 'subject_id' based on unique subject codes.
all_data[, subject_id := .GRP, by = prolific_id]

# Convert data.table back to a dataframe.
all_data <- as.data.frame(all_data)

# Remove subjects who used phones.
data_main_analyses <- all_data |> 
  filter(is.na(device) | device != 'iPhone') |> 
  select(c(-'device')) # Remove column since it isn't necessary anymore. 

# Put demographic responses (currently in the 'response' column) into their own columns. 
for(row_index in 1:nrow(data_main_analyses)) {
  
  # Create a variable for the response in the current row
  participant_response <- data_main_analyses[row_index, 'response'] 
  
  # Create a variable for the stimulus in the current row
  stimulus <- data_main_analyses[row_index, 'stimulus'] 
  
  # Check that the 'response' is not an empty string.
  if(nzchar(participant_response)) { 
    
    # Check if it's a survey response (has {} in the string) and update.
    if(grepl("{", participant_response, fixed = TRUE)) {
      
      # Attention check: left image. 
      if(grepl('extensionLeftImg_exemplar_commonalities', participant_response)) {
        
        # Get only the answer. 
        temp_response <- strsplit(participant_response, ':')[[1]][2] 
        
        # Trim the extra characters. 
        temp_response <- gsub('["}]', '', temp_response) 
        data_main_analyses[row_index, 
                           'extension_left_img_attn_check_response'] <- temp_response
      }
      
      # Attention check: center image. 
      else if(grepl('extensionCenterImg_exemplar_commonalities', participant_response)) {
        
        # Get only the answer. 
        temp_response <- strsplit(participant_response, ':')[[1]][2] 
        
        # Trim the extra characters. 
        temp_response <- gsub('["}]', '', temp_response) 
        data_main_analyses[row_index, 
                           'extension_center_img_attn_check_response'] <- temp_response
      }
      
      # Attention check: right image.
      else if(grepl('extensionRightImg_exemplar_commonalities', participant_response)) {
        
        # Get only the answer. 
        temp_response <- strsplit(participant_response, ':')[[1]][2] 
        
        # Trim the extra characters. 
        temp_response <- gsub('["}]', '', temp_response) 
        data_main_analyses[row_index, 
                           'extension_right_img_attn_check_response'] <- temp_response
      }
      
      # Participant age.
      else if(grepl('Age', participant_response)) {
        
        # Get only the answer. 
        temp_response <- strsplit(participant_response, ':')[[1]][2] 
        
        # Trim the extra characters. 
        temp_response <- gsub('["}]', '', temp_response) 
        data_main_analyses[row_index, 
                           'participant_age'] <- temp_response
      }
      
      # Geographical location.
      else if(grepl('CurrentUSA', participant_response)) {
        # List of responses for BornUSA, ChildhoodUSA, and CurrentUSA.
        temp_response <- strsplit(participant_response, ',') 
        
        # Participant birth location. 
        if(grepl('BornUSA', temp_response[[1]][1])) {
          
          # Get only the answer for BornUSA. 
          geog_temp_response <- strsplit(temp_response[[1]][1], ':')[[1]][2] 
          
          # Trim the extra characters. 
          geog_temp_response <- gsub('["}]', '', geog_temp_response) 
          data_main_analyses[row_index, 
                             'participant_born_USA'] <- geog_temp_response
        }
        
        # Participant childhood location. 
        if(grepl('ChildhoodUSA', temp_response[[1]][2])) {
          
          # Get only the answer for ChildhoodUSA.  
          geog_temp_response <- strsplit(temp_response[[1]][2], ':')[[1]][2] 
          
          # Trim the extra characters. 
          geog_temp_response <- gsub('["}]', '', geog_temp_response) 
          data_main_analyses[row_index, 
                             'participant_childhood_USA'] <- geog_temp_response
        }
        
        # Participant current location. 
        if(grepl('CurrentUSA', temp_response[[1]][3])) {
          
          # Get only the answer for CurrentUSA. 
          geog_temp_response <- strsplit(temp_response[[1]][3], ':')[[1]][2] 
          
          # Trim the extra characters. 
          geog_temp_response <- gsub('["}]', '', geog_temp_response) 
          data_main_analyses[row_index, 
                             'participant_current_USA'] <- geog_temp_response
        }
      }
      
      # Zipcodes. 
      else if(grepl('CurrentZipcode', participant_response)) {
        
        #List of responses for CurrentZipcode and ChildhoodZipcode.
        temp_response <- strsplit(participant_response, ',') 
        
        # Participant current zipcode. 
        if(grepl('CurrentZipcode', temp_response[[1]][1])) {
          
          # Get only the answer for CurrentZipcode.  
          zipcode_temp_response <- strsplit(temp_response[[1]][1], ':')[[1]][2] 
          
          # Trim the extra characters. 
          zipcode_temp_response <- gsub('["}]', '', zipcode_temp_response)  
          data_main_analyses[row_index, 
                             'participant_current_zipcode'] <- zipcode_temp_response
        }
        
        # Participant childhood zipcode.  
        if(grepl('ChildhoodZipcode', temp_response[[1]][2])) {
          
          # Get only the answer for ChildhoodZipcode.  
          zipcode_temp_response <- strsplit(temp_response[[1]][2], ':')[[1]][2] 
          
          # Trim the extra characters. 
          zipcode_temp_response <- gsub('["}]', '', zipcode_temp_response) 
          data_main_analyses[row_index, 
                             'participant_childhood_zipcode'] <- zipcode_temp_response
        }
      }
      
      # Languages. 
      else if(grepl('FirstLanguage', participant_response)) {
        
        #List of responses for FirstLanguage and AllLanguages.
        temp_response <- strsplit(participant_response, '",') 

        # Participant first language. 
        if(grepl('FirstLanguage', temp_response[[1]][1])) {
          
          # Get only the answer for FirstLanguage. 
          language_temp_response <- strsplit(temp_response[[1]][1], ':')[[1]][2] 
          
          # Trim the extra characters. 
          language_temp_response <- gsub('["}]', '', language_temp_response) 
          data_main_analyses[row_index, 
                             'participant_first_language'] <- language_temp_response
        }
        
        # Participant all languages spoken.  
        if(grepl('AllLanguages', temp_response[[1]][2])) {
          
          # Get only the answer for AllLanguages.
          language_temp_response <- strsplit(temp_response[[1]][2], ':')[[1]][2] 
          
          # Trim the extra characters. 
          language_temp_response <- gsub('["}]', '', language_temp_response) 
          data_main_analyses[row_index, 
                             'participant_current_languages'] <- language_temp_response
        }
      }
    }
    
    # Check if it's a numerical response for participant's current location urbanicity rating (1-100). 
    else if(grepl("^[0-9]+$", participant_response) & 
            (grepl('currently live', stimulus, fixed = TRUE))) {
      data_main_analyses[row_index, 
                         'participant_current_urbanicity'] <- participant_response
    }
    
    # Check if it's a numerical response for participant's childhood location urbanicity rating (1-100). 
    else if(grepl("^[0-9]+$", participant_response) & 
            (grepl('grew up', stimulus, fixed = TRUE))) {
      # Participant's childhood location urbanicity. 
      data_main_analyses[row_index, 
                         'participant_childhood_urbanicity'] <- participant_response
    }
    
    # Check if it's an image response (choosing the shape image). 
    else if(grepl("^[0-9]+$", participant_response) & (grepl('img', stimulus, fixed = TRUE))) {
      data_main_analyses[row_index, 
                         'participant_extension_choice'] <- participant_response
    }
  }

# Interpret the participant_extension_choice column to make the numerical answer choice values meaningful.
  participant_image_choice <- data_main_analyses[row_index, 
                                                 'participant_extension_choice']
  if(!is.na(participant_image_choice) && length(participant_image_choice) > 0) {
    
    # Participant chose the left image. 
    if(participant_image_choice == "0") {
      data_main_analyses[row_index, 
                         'participant_extension_choice_image'] <- data_main_analyses[row_index, 'extension_left_img']
    }
    
    # Participant chose the center image.
    else if(participant_image_choice == "1") {
      data_main_analyses[row_index, 
                         'participant_extension_choice_image'] <- data_main_analyses[row_index, 'extension_center_img']
    }
    
    # Participant chose the right image. 
    else if(participant_image_choice == "2") {
      data_main_analyses[row_index, 
                         'participant_extension_choice_image'] <- data_main_analyses[row_index, 'extension_right_img']
    }
  }
}

# Combine subject rows into a single row and remove the extra columns. 
data_main_analyses_tidy <- data_main_analyses |> 
  select(-rt, -trial_type, -success, -stimulus, -response) |> 
  # Replace empty strings with NA. 
  mutate(across(everything(), ~ ifelse(. == "", NA, .))) |> 
  group_by(subject_id) |> 
  summarize(across(everything(), ~ if(all(is.na(.))) {NA} else {na.omit(.)[1]}), 
            .groups = "drop")

Check the pre-processed data to see if there is an equal spread of subjects per condition.

# Check that there are enough subjects per condition.
condition_check <- data_main_analyses_tidy |> 
  group_by(condition) |> 
  # Count the number of times the condition comes up. 
  summarize(numSubjectsPerCondition = n()) 
  

print(paste("There are an equal number of subjects per condition:", all(condition_check$numSubjectsPerCondition == 
  condition_check$numSubjectsPerCondition[1])))

[1] "There are an equal number of subjects per condition: FALSE"

  # If true, then there are an equal number of subjects per condition.

Remove the rows that are empty (no data collected).

# Check if there are null (NA) values in participant_extension_choice.  
null_values <- sum(is.na(data_main_analyses_tidy$participant_extension_choice))

# Print the result. 
if (null_values > 0) {
  cat("Out of", nrow(data_main_analyses_tidy), 
      "subjects, there are", null_values, 
      "null values in the column 'participant_extension_choice.'.\n")
} else {
  cat("Out of", nrow(data_main_analyses_tidy), 
      "subjects, there are no null values in the column 'participant_extension_choice.'.\n")
}

Out of 429 subjects, there are 3 null values in the column 'participant_extension_choice.'.

# Remove rows that don't have a response in participant_extension_choice.
data_main_analyses_tidy <- 
  data_main_analyses_tidy[!is.na(data_main_analyses_tidy$participant_extension_choice), ]
  
# Check if the data were removed. 
null_values <- sum(is.na(data_main_analyses_tidy$participant_extension_choice))

# Print the result. 
if (null_values > 0) {
  cat("After data cleaning, there are", 
      null_values, "null values in the column 'participant_extension_choice' and", nrow(data_main_analyses_tidy), "total subjects. \n")
} else {
  cat("After data cleaning, there are no null values in the column 'participant_extension_choice' and", nrow(data_main_analyses_tidy), "total subjects. \n")
}

After data cleaning, there are no null values in the column 'participant_extension_choice' and 426 total subjects.

Now that the data are cleaned and in a tidy format, figure out whether participants were choosing based on color, material, or shape. To do so, analyze exemplarImg and participant_extension_choiceImage to figure out what the common property is between the exemplar and the extension choice. Put the overlapping information into a new column called participant_overlap_property.

# Go through each subject. 
for(row_index in 1:nrow(data_main_analyses_tidy)) {
  # Get the exemplar the participant saw.
  participant_exemplar <- 
    unlist(strsplit(as.character(data_main_analyses_tidy[row_index, 
                                                         'exemplar_img']), '_'))
  
  # Get the extension the participant chose. 
  participant_extension_choice <- 
    unlist(strsplit(as.character(data_main_analyses_tidy[row_index, 
                                                         'participant_extension_choice_image']), '_'))
  
  # Get the shape of the exemplar object and put into a new column called exemplar_shape. 
  exemplar_shape <- participant_exemplar[3]
  data_main_analyses_tidy[row_index, 'exemplar_shape'] <- exemplar_shape
  
  # Get the overlapping property. 
  overlap <- intersect(participant_exemplar, participant_extension_choice)
  
  # Put the overlapping property into a new column. 
  data_main_analyses_tidy[row_index, 'participant_overlap_answer'] <- overlap
  
  # Interpret the overlapping answer
  if(overlap == 'red' | overlap == 'blue' | overlap == 'yellow') {
    data_main_analyses_tidy[row_index, 'participant_overlap_property'] <- 'color' 
  }
  
  else if(overlap == 'crepe' | overlap == 'foam' | overlap == 'yarn') {
    data_main_analyses_tidy[row_index, 'participant_overlap_property'] <- 'material'
  }

  else if(overlap == 'arch' | overlap == 'lamp' | overlap == 'snowman') {
    data_main_analyses_tidy[row_index, 'participant_overlap_property'] <- 'shape'
  }
}

Finally, reorder the columns into a logical order and add information about the participant group and experiment type.

data_main_analyses_tidy <- data_main_analyses_tidy |> 
  mutate(participant_location = 'US', 
         experiment = 'USA_Adults', 
         age_group = 'Adults', 
         exclude = 0) |> 
  select(subject_id, condition, experiment, 
         age_group, participant_location,
         exemplar_name, exemplar_img, exemplar_shape, 
         participant_overlap_answer,
         participant_overlap_property,
         participant_extension_choice,
         participant_extension_choice_image,
         extension_left_img, extension_center_img, extension_right_img,
         extension_left_img_attn_check_response,
         extension_center_img_attn_check_response,
         extension_right_img_attn_check_response,
         participant_age, participant_born_USA,
         participant_childhood_USA, participant_current_USA,
         participant_current_zipcode, participant_childhood_zipcode,
         participant_first_language, participant_current_languages,
         participant_current_urbanicity, participant_childhood_urbanicity,
         exclude)

Results of control measures

I examined the answers to the attention check question “what does each object share in common with the original object?” to verify that participants understood the task and were paying attention. I manually excluded participants who answered these questions incorrectly more than once. The default exclude value is 0.

Reproduction of original analyses

I want to make sure that I can get the same results as the original authors from their analysis code. Here, I am going to load in the original data and adapt the code from the original authors, accessible here.

# Load libraries. 
suppressMessages(library(tidyverse))
suppressMessages(library(boot))
suppressMessages(library(lme4))

# Load data. 
original_data_file <- "Original_ShapeBias_Data.csv"
original_data_path <- file.path(getwd(), '..', 'data', original_data_file)
original_data <- read.csv(original_data_path)

table(original_data$Experiment) #Contains data from all 7 experiments.


       Tsi_Adults_Objects    Tsi_Adults_wDistractor         Tsi_Child_Objects 
                       39                        41                        37 
   Tsi_Child_ShapeVsColor Tsi_Child_ShapeVsMaterial               US_Children 
                       30                        42                        30 
           US_Turk_Adults 
                      144

# Number of participants. 
original_data |> group_by(AgeGroup,Location) |> 
  summarize(subjects=n(), .groups = 'drop')

# A tibble: 4 × 3
  AgeGroup Location subjects
  <chr>    <chr>       <int>
1 Adults   Bolivia        80
2 Adults   US            144
3 Children Bolivia       109
4 Children US             30

# Stats by experiment. 
summary_data_original <- original_data |> 
  group_by(Experiment) |> 
  summarize(subjects=n(), 
            average=mean(Age,na.rm=TRUE),
            minage=min(Age,na.rm=TRUE),
            maxage=max(Age,na.rm=TRUE),
            stdev=sd(Age,na.rm=TRUE),
            ShapeNo=sum(Choice=="Shape"),
            MatNo=sum(Choice=="Material"),
            ColorNo=sum(Choice=="Color"),
            DistNo=sum(Choice=="Distractor"),
            ShapePerc=ShapeNo/subjects,
            MatPerc=MatNo/subjects,
            ColorPerc=ColorNo/subjects,
            DistPerc=DistNo/subjects)
summary_data_original$ExpNo <- c(6,7,2,4,3,1,5)
summary_data_original <- arrange(summary_data_original,ExpNo)

# Print the summary of results. 
summary_data_original |> 
  dplyr::select(ExpNo, Experiment, subjects, ShapeNo, MatNo, ColorNo, DistNo, 
                ShapePerc, MatPerc, ColorPerc, DistPerc)

# A tibble: 7 × 11
  ExpNo Experiment       subjects ShapeNo MatNo ColorNo DistNo ShapePerc MatPerc
  <dbl> <chr>               <int>   <int> <int>   <int>  <int>     <dbl>   <dbl>
1     1 US_Children            30      18    10       2      0     0.6     0.333
2     2 Tsi_Child_Objec…       37      12    16       9      0     0.324   0.432
3     3 Tsi_Child_Shape…       42      12    27       0      3     0.286   0.643
4     4 Tsi_Child_Shape…       30      10     0      17      3     0.333   0    
5     5 US_Turk_Adults        144      99    41       4      0     0.688   0.285
6     6 Tsi_Adults_Obje…       39      18     8      13      0     0.462   0.205
7     7 Tsi_Adults_wDis…       41      17    12      11      1     0.415   0.293
# ℹ 2 more variables: ColorPerc <dbl>, DistPerc <dbl>

# Reproduce the binomial test for Experiment 5: US_Turk_Adults.
CurrExp <- summary_data_original |> 
  filter(ExpNo==5)
binom.test(CurrExp$ShapeNo, CurrExp$subjects, p=1/3)


    Exact binomial test

data:  CurrExp$ShapeNo and CurrExp$subjects
number of successes = 99, number of trials = 144, p-value < 2.2e-16
alternative hypothesis: true probability of success is not equal to 0.3333333
95 percent confidence interval:
 0.6050180 0.7620797
sample estimates:
probability of success 
                0.6875

# This model predicts the participant's preference for the shape-match object. 
CurrExp <- original_data |> 
  filter(Experiment %in% c("Tsi_Adults_Objects", 
                           "Tsi_Adults_wDistractor", 
                           "US_Turk_Adults", 
                           "Tsi_Child_Objects", 
                           "US_Children")) |> 
  filter(Choice!="Distractor")
CurrExp$Offset <- logit(1/3)

m <- glmer(Choice=="Shape" ~ offset(Offset) + Location * AgeGroup + 
             (1 + Location + AgeGroup| Example) + (1 | Experiment), 
           CurrExp, family="binomial")
summary(m)

Generalized linear mixed model fit by maximum likelihood (Laplace
  Approximation) [glmerMod]
 Family: binomial  ( logit )
Formula: Choice == "Shape" ~ offset(Offset) + Location * AgeGroup + (1 +  
    Location + AgeGroup | Example) + (1 | Experiment)
   Data: CurrExp

     AIC      BIC   logLik deviance df.resid 
   376.3    416.7   -177.2    354.3      279 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-2.3934 -0.8253  0.4178  0.8957  2.2925 

Random effects:
 Groups     Name             Variance  Std.Dev.  Corr       
 Experiment (Intercept)      1.540e-08 0.0001241            
 Example    (Intercept)      1.230e-01 0.3506452            
            LocationUS       2.197e-01 0.4687695  0.81      
            AgeGroupChildren 1.089e+00 1.0436391 -0.67 -0.11
Number of obs: 290, groups:  Experiment, 5; Example, 3

Fixed effects:
                            Estimate Std. Error z value Pr(>|z|)   
(Intercept)                   0.4550     0.3073   1.480  0.13877   
LocationUS                    1.1373     0.4080   2.787  0.00532 **
AgeGroupChildren             -0.8325     0.7718  -1.079  0.28070   
LocationUS:AgeGroupChildren   0.4077     0.6807   0.599  0.54918   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
            (Intr) LctnUS AgGrpC
LocationUS  -0.072              
AgGrpChldrn -0.567  0.108       
LctnUS:AgGC  0.257 -0.331 -0.455

# Actual model: main effects of `Location` and `AgeGroup` tested without their interaction. 
m0 <- glmer(Choice=="Shape" ~ offset(Offset) + Location + AgeGroup + 
              (1 + Location + AgeGroup | Example) + (1 | Experiment), 
            CurrExp, family="binomial")

boundary (singular) fit: see help('isSingular')

anova(m,m0)

Data: CurrExp
Models:
m0: Choice == "Shape" ~ offset(Offset) + Location + AgeGroup + (1 + Location + AgeGroup | Example) + (1 | Experiment)
m: Choice == "Shape" ~ offset(Offset) + Location * AgeGroup + (1 + Location + AgeGroup | Example) + (1 | Experiment)
   npar    AIC    BIC  logLik deviance Chisq Df Pr(>Chisq)
m0   10 374.68 411.38 -177.34   354.68                    
m    11 376.32 416.69 -177.16   354.32 0.365  1     0.5457

summary(m0)

Generalized linear mixed model fit by maximum likelihood (Laplace
  Approximation) [glmerMod]
 Family: binomial  ( logit )
Formula: Choice == "Shape" ~ offset(Offset) + Location + AgeGroup + (1 +  
    Location + AgeGroup | Example) + (1 | Experiment)
   Data: CurrExp

     AIC      BIC   logLik deviance df.resid 
   374.7    411.4   -177.3    354.7      280 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-2.4474 -0.8063  0.4086  0.9219  2.0617 

Random effects:
 Groups     Name             Variance Std.Dev. Corr       
 Experiment (Intercept)      0.0000   0.0000              
 Example    (Intercept)      0.1284   0.3583              
            LocationUS       0.2202   0.4693    0.82      
            AgeGroupChildren 0.9782   0.9890   -0.68 -0.13
Number of obs: 290, groups:  Experiment, 5; Example, 3

Fixed effects:
                 Estimate Std. Error z value Pr(>|z|)   
(Intercept)        0.4061     0.2999   1.354  0.17560   
LocationUS         1.2215     0.3844   3.178  0.00148 **
AgeGroupChildren  -0.6236     0.6569  -0.949  0.34239   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
            (Intr) LctnUS
LocationUS   0.030       
AgGrpChldrn -0.540 -0.060
optimizer (Nelder_Mead) convergence code: 0 (OK)
boundary (singular) fit: see help('isSingular')

Based on the calculations above, I successfully reproduced the original results: “This model…showed that U.S. participants were significantly more likely to generalize the label by shape \((\beta = 1.22\); \(p < .01)\) and no difference across age groups \((\beta = -.62\); \(p = .34)\)”.

Confirmatory analysis

I conducted one main confirmatory analysis:

A binomial test with chance set to 1/3 tested whether U.S. adults were significantly more likely to choose the shape match as opposed to the color or material matches within just this experiment. This test matches the statistical test run by the original authors. The results replicated with the original study \((P_{success} = 0.45, p < 0.001)\).

Based on only this confirmatory analysis and for the purposes of the larger rescue project effort, this rescue replicated fully and can be assigned a score of 1 on a 0-1 rating scale.

# Summary statistics  
summary_data <- data_main_analyses_tidy |>  
  summarize(subjects=n(), 
            average=mean(as.numeric(participant_age), na.rm=TRUE),
            minage=min(as.numeric(participant_age), na.rm=TRUE),
            maxage=max(as.numeric(participant_age), na.rm=TRUE),
            stdev=sd(as.numeric(participant_age), na.rm=TRUE),
            shapeNum=sum(participant_overlap_property=="shape"),
            materialNum=sum(participant_overlap_property=="material"),
            colorNum=sum(participant_overlap_property=="color"),
            shapePercent=shapeNum/subjects,
            matPercent=materialNum/subjects,
            colorPercent=colorNum/subjects,
            .groups = 'drop') |> 
    ungroup()

# Binomial test
binomial_result <- binom.test(summary_data$shapeNum, 
                              summary_data$subjects, p=1/3)
binomial_result


    Exact binomial test

data:  summary_data$shapeNum and summary_data$subjects
number of successes = 191, number of trials = 426, p-value = 9.3e-07
alternative hypothesis: true probability of success is not equal to 0.3333333
95 percent confidence interval:
 0.4004589 0.4969768
sample estimates:
probability of success 
             0.4483568

# Store values of 95% confidence intervals. 
conf_int <- binomial_result$conf.int

Figure 1: Rescue vs. Replication vs. Original Paper Results for Shape Preference

Figure 1 shows the proportion of choices of an extension by the type of property (shape, color, material).

#Create dataframe for figure. 
figure_statistics <- data_main_analyses_tidy |>  
  group_by(participant_overlap_property, participant_location) |> 
  summarize(n = n(), proportion = n / nrow(data_main_analyses_tidy), 
            .groups = 'drop') |> 
  ungroup()

#Plot the data for the first figure and save as a file. 
ggplot(figure_statistics, aes(y = proportion, x = participant_location, fill = participant_overlap_property)) +
  geom_bar(position = "stack", stat = 'identity', width = .7) +
  geom_errorbar(aes(ymin = conf_int[1], ymax = conf_int[2]), width = .2) +
  theme_minimal() +
  scale_fill_manual(values = c("#B4DCB9", "#7F92B8", "#6C6969")) +
  scale_x_discrete('Experiment 5: USA Adults') +
  scale_y_continuous('Percentage of responses', labels = scales::percent) +
  facet_wrap(~participant_location) +
  geom_hline(yintercept = (1/3), linetype = 'dotted', color = 'black') +
  labs(fill = "Choice") +
  theme(legend.title = element_text(size = 12, face = "bold"))

img_path <- file.path(getwd(), '..', 'figures', 'choice_by_property_rescue.png')
ggsave(img_path, device = 'png', dpi=300)

Saving 7 x 5 in image

We also want to calculate the proportion of choice by property for each of the shapes (arch, lamp, snowman). The results partially replicated with the original study, with participants choosing the shape match for the arch and lamp exemplars \((P_{success_{arch}} = 0.64, p < 0.001)\) and \((P_{success_{lamp}} = 0.51, p < 0.001)\), respectively, but not for the snowman exemplars \((P_{success_{snowman}} = 0.21, p < 0.01)\).

# Calculate the percentages of each choice by property and output into a dataframe. 
count_df <- data_main_analyses_tidy |> 
  group_by(exemplar_shape, participant_overlap_property, experiment) |> 
  summarize(count = n(), .groups = 'drop') |> 
  ungroup()

# Pivot the dataframe.
figure_statistics_by_shape_property <- pivot_wider(count_df, 
                                                   id_cols = exemplar_shape, 
                                                   names_from = participant_overlap_property, 
                                                   values_from = count)

#Make NULL values 0. 
figure_statistics_by_shape_property[is.na(figure_statistics_by_shape_property)] <- 0 

# Calculate the proportion of responses for each property. 
figure_statistics_by_shape_property$totalCount <- rowSums(figure_statistics_by_shape_property[, -1])

# All possible properties. 
columns_to_calculate <- c("shape", "material", "color")

# Find columns that actually exist. 
existing_columns <- intersect(columns_to_calculate, names(figure_statistics_by_shape_property)) 

# Takes into account if properties don't exist. 
figure_statistics_by_shape_property <- figure_statistics_by_shape_property |> 
  mutate(across(all_of(existing_columns), ~ .x / totalCount, .names = "{.col}Proportion")) 

# Pivot the data. 
figure_statistics_by_shape_property_long <- figure_statistics_by_shape_property |> 
  pivot_longer(
    cols = any_of(c("shapeProportion", "materialProportion", "colorProportion")),
    names_to = "exemplarProperty",
    values_to = "proportion"
  )

# Clean up the property value names.
figure_statistics_by_shape_property_long$exemplarProperty <- sub("Proportion", "", figure_statistics_by_shape_property_long$exemplarProperty)

# Binomial tests. 
binomial_result_arch <- binom.test(figure_statistics_by_shape_property$shape[figure_statistics_by_shape_property$exemplar_shape == 'arch'], figure_statistics_by_shape_property$totalCount[figure_statistics_by_shape_property$exemplar_shape == 'arch'], p=1/3)
binomial_result_arch


    Exact binomial test

data:  figure_statistics_by_shape_property$shape[figure_statistics_by_shape_property$exemplar_shape == "arch"] and figure_statistics_by_shape_property$totalCount[figure_statistics_by_shape_property$exemplar_shape == "arch"]
number of successes = 86, number of trials = 135, p-value = 6.72e-13
alternative hypothesis: true probability of success is not equal to 0.3333333
95 percent confidence interval:
 0.5499125 0.7180067
sample estimates:
probability of success 
              0.637037

binomial_result_lamp <- binom.test(figure_statistics_by_shape_property$shape[figure_statistics_by_shape_property$exemplar_shape == 'lamp'], 
                                   figure_statistics_by_shape_property$totalCount[figure_statistics_by_shape_property$exemplar_shape == 'lamp'], p=1/3)
binomial_result_lamp


    Exact binomial test

data:  figure_statistics_by_shape_property$shape[figure_statistics_by_shape_property$exemplar_shape == "lamp"] and figure_statistics_by_shape_property$totalCount[figure_statistics_by_shape_property$exemplar_shape == "lamp"]
number of successes = 75, number of trials = 147, p-value = 1.015e-05
alternative hypothesis: true probability of success is not equal to 0.3333333
95 percent confidence interval:
 0.4265102 0.5934785
sample estimates:
probability of success 
             0.5102041

binomial_result_snowman <- binom.test(figure_statistics_by_shape_property$shape[figure_statistics_by_shape_property$exemplar_shape == 'snowman'], figure_statistics_by_shape_property$totalCount[figure_statistics_by_shape_property$exemplar_shape == 'snowman'], p=1/3)
binomial_result_snowman


    Exact binomial test

data:  figure_statistics_by_shape_property$shape[figure_statistics_by_shape_property$exemplar_shape == "snowman"] and figure_statistics_by_shape_property$totalCount[figure_statistics_by_shape_property$exemplar_shape == "snowman"]
number of successes = 30, number of trials = 144, p-value = 0.001367
alternative hypothesis: true probability of success is not equal to 0.3333333
95 percent confidence interval:
 0.1451945 0.2838570
sample estimates:
probability of success 
             0.2083333

# Store values of 95% confidence intervals. 
conf_int_arch <- binomial_result_arch$conf.int 
conf_int_lamp <- binomial_result_lamp$conf.int 
conf_int_snowman <- binomial_result_snowman$conf.int

Figure 2: Rescue vs. Replication vs. Original Paper Results for Shape Preference by Shape Type

Figure 2 shows the proportion of choices by type of extension shape within the three property (shape, color, material) categories.

# Plot the data for the second figure and save as a file.
ggplot(figure_statistics_by_shape_property_long, 
       aes(x = exemplar_shape, 
           y = proportion, 
           fill = exemplarProperty)) +
  geom_bar(stat = "identity") + 
  geom_errorbar(aes(ymin = ifelse(exemplar_shape == "arch", 
                                  conf_int_arch[1], 
                                  ifelse(exemplar_shape == "lamp", 
                                         conf_int_lamp[1],
                                         conf_int_snowman[1])),
                    ymax = ifelse(exemplar_shape == "arch", 
                                  conf_int_arch[2],
                                  ifelse(exemplar_shape == "lamp", 
                                         conf_int_lamp[2],
                                         conf_int_snowman[2]))),
                width = .2) +
  theme_minimal() + 
  scale_fill_manual(values = 
                      c("#B4DCB9", "#7F92B8", "#6C6969")) + 
  scale_x_discrete('Extension Shape') + 
  scale_y_continuous(labels = scales::percent) + 
  geom_hline(yintercept = (1/3), 
             linetype = 'dotted', 
             color = 'black') + 
  labs(x = "Shape", 
       y = "Percentage of Responses", 
       fill = "Material") +
  labs(fill = "Property Choice") + 
  theme(legend.title = element_text(size = 12, face = "bold"))

img_path <- file.path(getwd(), '..', 'figures', 'choice_by_shape_and_property_rescue.png')
ggsave(img_path, device = 'png', dpi=300)

Saving 7 x 5 in image

Exploratory analyses

Given that the first replication author observed stimulus-level random effects, the first exploratory analysis is a model that takes into account these random effects: glmer( Choice=="shape" ~ offset(Offset) + 1 + (1 | Example) ). This model tests, given an offset of 1/3, if the intercept is significantly different from 0 while accounting for random effects. The model yielded an insignificant result \((\beta = 0.45, p = 0.33)\).

# Create a dataframe for the analysis. 
rescue_data <- data_main_analyses_tidy |> 
  select(Age = participant_age,
         Experiment = experiment, 
         Example = exemplar_shape, 
         Choice = participant_overlap_property, 
         AgeGroup = age_group, 
         Location = participant_location)

# Add the offset column. 
rescue_data$Offset <- logit(1/3)

# Run the random effects model against chance. 
summary(glmer(Choice=="shape" ~ offset(Offset) + 1 + (1 | Example), 
              rescue_data, 
              family="binomial"))

Generalized linear mixed model fit by maximum likelihood (Laplace
  Approximation) [glmerMod]
 Family: binomial  ( logit )
Formula: Choice == "shape" ~ offset(Offset) + 1 + (1 | Example)
   Data: rescue_data

     AIC      BIC   logLik deviance df.resid 
   543.9    552.0   -269.9    539.9      424 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-1.2986 -1.0145 -0.5311  0.7701  1.8827 

Random effects:
 Groups  Name        Variance Std.Dev.
 Example (Intercept) 0.6085   0.7801  
Number of obs: 426, groups:  Example, 3

Fixed effects:
            Estimate Std. Error z value Pr(>|z|)
(Intercept)   0.4531     0.4626    0.98    0.327

I also ran a random effects model, with random intercepts for the exemplar object (Example), testing for a participant’s preference for shape as a function of the exemplar object: glmer( Choice=="shape" ~ offset(Offset) + Location + (1 + Location | Example) ). The main difference between the model in the first replication and this model is the use of the Offset variable, set to the log-odds probability of 1/3. The model yielded an insignificant result \((\beta = -0.063, p = 0.86)\).

# Create a dataset to merge with the original dataset. 
rescue_data <- data_main_analyses_tidy |> 
  select(Age = participant_age,
         Experiment = experiment, 
         Example = exemplar_shape, 
         Choice = participant_overlap_property, 
         AgeGroup = age_group, 
         Location = participant_location)

# Filter the original dataset for just the Tsimane adult data (Experiment 6, without distractors).   
tsimane_adult_data <- original_data |> 
  filter(Experiment == 'Tsi_Adults_Objects') |> 
  select(Age, Experiment, Example, Choice, AgeGroup, Location) |> 
  mutate(Example = tolower(Example), Choice = tolower(Choice)) 

# Combine the rescue USA adult data and the original data from Experiment 6 (N=39). 
appended_data <- rbind(rescue_data, tsimane_adult_data)

# Add the offset column. 
appended_data$Offset <- logit(1/3)

# Run the random effects model. 
summary(glmer(Choice=="shape" ~ offset(Offset) + Location + (1 + Location | Example), 
              appended_data, 
              family="binomial"))

boundary (singular) fit: see help('isSingular')

Generalized linear mixed model fit by maximum likelihood (Laplace
  Approximation) [glmerMod]
 Family: binomial  ( logit )
Formula: Choice == "shape" ~ offset(Offset) + Location + (1 + Location |  
    Example)
   Data: appended_data

     AIC      BIC   logLik deviance df.resid 
   600.3    621.0   -295.1    590.3      460 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-1.2852 -1.0285 -0.5274  0.8094  1.8961 

Random effects:
 Groups  Name        Variance Std.Dev. Corr
 Example (Intercept) 0.40041  0.6328       
         LocationUS  0.02211  0.1487   1.00
Number of obs: 465, groups:  Example, 3

Fixed effects:
            Estimate Std. Error z value Pr(>|z|)
(Intercept)  0.51345    0.49754   1.032    0.302
LocationUS  -0.06279    0.36386  -0.173    0.863

Correlation of Fixed Effects:
           (Intr)
LocationUS -0.456
optimizer (Nelder_Mead) convergence code: 0 (OK)
boundary (singular) fit: see help('isSingular')

I also ran the original paper’s logistic mixed-effects model that predicts the participant’s preference for the shape-match object (see equation below) by swapping out the USA adult data from the original paper with my new USA adult data but keeping all of the other data the same (USA children, Tsimane’ adults, and Tsimane’ children). In the original study, the authors used a baseline probability of 33.3% with the population (U.S. participants) and the age group (adults) dummy coded as independent variables. To control for the role of exemplar, the regression included random intercepts for the experiment number, random intercepts for the exemplar object, random slopes for population as a function of exemplar, and random slopes for age group as a function of exemplar. This model found that, with a new sample of U.S. adults, they were not more likely to generalize the label of a novel object by shape \((\beta = 0.31\), \(p = 0.56)\).

glmer( Choice == "shape" ~ offset(Offset) + Location + AgeGroup + (1 + Location + AgeGroup | Example) + (1 | Experiment) )

# Filter the original dataset to remove the USA adult data 
  #(and not include two if the Tsimane children experiments).   
original_data_no_USA_adult <- original_data |> 
  filter(Experiment %in% c("Tsi_Adults_Objects", 
                           "Tsi_Adults_wDistractor", 
                           "Tsi_Child_Objects", 
                           "US_Children")) |> 
  filter(Choice!="Distractor") |> 
  select(Age, Experiment, Example, Choice, AgeGroup, Location) |> 
  mutate(Example = tolower(Example), Choice = tolower(Choice))

# Combine the rescue USA adult data and the original data (minus the US_Turk_Adult subjects). 
appended_data_full <- rbind(rescue_data, original_data_no_USA_adult)

# Add the offset column. 
appended_data_full$Offset <- logit(1/3)

# Run the logistic mixed-effects model. 
summary(glmer(Choice == "shape" ~ offset(Offset) + Location + AgeGroup + 
                (1 + Location + AgeGroup | Example) + (1 | Experiment), 
              appended_data_full, 
              family="binomial"))

boundary (singular) fit: see help('isSingular')

Generalized linear mixed model fit by maximum likelihood (Laplace
  Approximation) [glmerMod]
 Family: binomial  ( logit )
Formula: Choice == "shape" ~ offset(Offset) + Location + AgeGroup + (1 +  
    Location + AgeGroup | Example) + (1 | Experiment)
   Data: appended_data_full

     AIC      BIC   logLik deviance df.resid 
   747.6    791.1   -363.8    727.6      562 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-1.3558 -0.8966 -0.5423  0.9937  2.0425 

Random effects:
 Groups     Name             Variance Std.Dev. Corr       
 Experiment (Intercept)      0.02198  0.1482              
 Example    (Intercept)      0.06538  0.2557              
            LocationUS       0.43013  0.6558    0.41      
            AgeGroupChildren 0.28745  0.5361   -0.16  0.83
Number of obs: 572, groups:  Experiment, 5; Example, 3

Fixed effects:
                  Estimate Std. Error z value Pr(>|z|)
(Intercept)       0.249138   0.279632   0.891    0.373
LocationUS        0.309283   0.532994   0.580    0.562
AgeGroupChildren -0.003805   0.460833  -0.008    0.993

Correlation of Fixed Effects:
            (Intr) LctnUS
LocationUS  -0.208       
AgGrpChldrn -0.317  0.418
optimizer (Nelder_Mead) convergence code: 0 (OK)
boundary (singular) fit: see help('isSingular')

Figure 3: Rescue vs. Replication vs. Original Paper Results for Shape Preference

suppressMessages(library(knitr))

# Get figure file name.   
image_path <- 'choice_by_property_comparison.jpeg'

# Print out the figure. 
knitr::include_graphics(image_path)

Figure 4: Rescue vs. Replication vs. Original Paper Results for Shape Preference by Shape Type

suppressMessages(library(knitr))

# Get figure file name.   
image_path_by_shape <- 'choice_by_shape_and_property_comparison.jpeg'

# Print out the figures. 
knitr::include_graphics(image_path_by_shape)

Discussion

Mini meta analysis

The two known replications of the original Jara-Ettinger et al., 2022 paper are the first replication (Abdelrahim, 2022) and this rescue replication (Chen, 2023). However, I am not able to fully combine data across the original paper (Jara-Ettinger et al., 2022), the first replication (Abdelrahim, 2022), and second replication (Chen, 2023) for this meta analysis because each of the replications did not have the same models. For example, Jara-Ettinger et al., 2022 and Chen, 2023 ran a binomial test and a logistic mixed-effects model (Chen, 2023 used new data for the U.S. adult sample for this model but the rest of the data came from Jara-Ettinger et al., 2022), while Abdelrahim, 2022 and Chen, 2023 used a random effects model that Jara-Ettinger et al., 2022 did not use. Below are two different meta-analysis models obtaining the aggregate estimates from the random effects models (Abdelrahim, 2022 and Chen, 2023) and the logistic mixed-effects models (Jara-Ettinger et al., 2022 and Chen, 2023).

suppressMessages(library(metafor))

# Import meta-analysis data. 
meta_analysis_file <- 'meta_analysis_data.csv'
meta_analysis_data <- read.csv(file.path(getwd(), '..', 'data', 
                                         meta_analysis_file))

# Print out the data. 
knitr::kable(meta_analysis_data,
             col.names = c("Study", 
                           "N", 
                           "Binomial Test - P(success)",
                           "Binomial Test - CI (lower)",
                           "Binomial Test - CI (upper)", 
                           "Random Effects Model - Estimate", 
                           "Random Effects Model - p-value",
                           "Random Effects Model - SE", 
                           "Logistic Mixed Effects Model - Estimate",
                           "Logistic Mixed Effects Model - p-value",
                           "Logistic Mixed Effects Model - SE"),
             table.attr = "class=\"striped\"",
             format = "html")

Study	N	Binomial Test - P(success)	Binomial Test - CI (lower)	Binomial Test - CI (upper)	Random Effects Model - Estimate	Random Effects Model - p-value	Random Effects Model - SE	Logistic Mixed Effects Model - Estimate	Logistic Mixed Effects Model - p-value	Logistic Mixed Effects Model - SE
Jara-Ettinger et al. (2022)	144	0.6875000	0.6050180	0.7620797	NA	NA	NA	1.220000	0.00148	0.384400
Abdelrahim (2022)	142	NA	NA	NA	1.13790	0.318	1.14030	NA	NA	NA
Chen (2023)	426	0.4483568	0.4004589	0.4969768	-0.06279	0.863	0.36386	0.309283	0.56200	0.532994

The aggregate effect size between the Abdelrahim (2022) replication project and the Chen (2023) rescue replication project for the random effects model is extremely small: 0.0512.

# Run meta-analysis model for Abdelrahim (2022) and Chen (2023)
meta_analysis_model_random_effects <- rma(yi = random_effects_model_estimate, 
                                          sei = random_effects_model_SE, 
                                          slab = Study, 
                                          data = meta_analysis_data)

Warning: 1 study with NAs omitted from model fitting.

summary(meta_analysis_model_random_effects)


Random-Effects Model (k = 2; tau^2 estimator: REML)

  logLik  deviance       AIC       BIC      AICc   
 -1.2553    2.5105    6.5105    2.5105   18.5105   

tau^2 (estimated amount of total heterogeneity): 0.0045 (SE = 1.0194)
tau (square root of estimated tau^2 value):      0.0670
I^2 (total heterogeneity / total variability):   0.62%
H^2 (total variability / sampling variability):  1.01

Test for Heterogeneity:
Q(df = 1) = 1.0063, p-val = 0.3158

Model Results:

estimate      se    zval    pval    ci.lb   ci.ub    
  0.0512  0.3520  0.1455  0.8843  -0.6386  0.7411    

---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The aggregate effect size between the Jara-Ettinger et al. (2022) replication project and the Chen (2023) rescue replication project for the logistic mixed-effects model is large: 0.8395.

# Run meta-analysis model for Jara-Ettinger et al. (2022) and Chen (2023)
meta_analysis_model_logistic_mixed_effects <- rma(yi = logistic_mixed_effects_model_estimate, 
                                                  sei = logistic_mixed_effects_model_SE, 
                                                  slab = Study, 
                                                  data = meta_analysis_data)

Warning: 1 study with NAs omitted from model fitting.

summary(meta_analysis_model_logistic_mixed_effects)


Random-Effects Model (k = 2; tau^2 estimator: REML)

  logLik  deviance       AIC       BIC      AICc   
 -0.9788    1.9577    5.9577    1.9577   17.9577   

tau^2 (estimated amount of total heterogeneity): 0.1988 (SE = 0.5865)
tau (square root of estimated tau^2 value):      0.4458
I^2 (total heterogeneity / total variability):   47.93%
H^2 (total variability / sampling variability):  1.92

Test for Heterogeneity:
Q(df = 1) = 1.9206, p-val = 0.1658

Model Results:

estimate      se    zval    pval    ci.lb   ci.ub    
  0.8395  0.4492  1.8690  0.0616  -0.0409  1.7198  . 

---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Figure 6: Estimated Effect Sizes for the Random Effects Model (Abdelrahim, 2022 and Chen, 2023)

Visualize the effect sizes of the Abdelrahim (2022) replication project and the Chen (2023) rescue replication project for the random effects model.

# Filter out study without this model. 
meta_analysis_data_random_effects <- meta_analysis_data |> 
  filter(Study != "Jara-Ettinger et al. (2022)")

# Plot mini meta-analysis results. 
ggplot(meta_analysis_data_random_effects, 
       aes(x = Study, 
           y = random_effects_model_estimate, 
           size = N)) +
  geom_point(data = meta_analysis_data_random_effects) +
  coord_flip() +
  ylim(-2, 2) +
  scale_size_area() +
  geom_hline(yintercept = 0, 
             color = "black") +
  theme_linedraw() +
  theme(legend.position = "none") +
  labs(y = "Main effect size on original scale", 
       x = "")

Figure 7: Estimated Effect Sizes for the Logistic Mixed-Effects Model (Jara-Ettinger et al., 2022 and Chen, 2023)

Visualize the effect sizes of the Jara-Ettinger et al. (2022) replication project and the Chen (2023) rescue replication project for the logistic mixed-effects model.

# Filter out study without this model. 
meta_analysis_data_logistic_mixed_effects <- meta_analysis_data |> 
  filter(Study != "Abdelrahim (2022)")

# Plot mini meta-analysis results. 
ggplot(meta_analysis_data_logistic_mixed_effects, 
       aes(x = Study, 
           y = logistic_mixed_effects_model_estimate, 
           size = N)) +
  geom_point(data = meta_analysis_data_logistic_mixed_effects) +
  coord_flip() +
  ylim(-2, 2) +
  scale_size_area() +
  geom_hline(yintercept = 0, 
             color = "black") +
  theme_linedraw() +
  theme(legend.position = "none") +
  labs(y = "Main effect size on original scale", 
       x = "")

Figure 8: Forest Plot for the Random Effects Model (Abdelrahim, 2022 and Chen, 2023)

forest(meta_analysis_model_random_effects)

Figure 9: Forest Plot for the Logistic Mixed-Effects Model (Jara-Ettinger et al., 2022 and Chen, 2023)

forest(meta_analysis_model_logistic_mixed_effects)

Figure 10: Funnel Plot for the Random Effects Model (Abdelrahim, 2022 and Chen, 2023)

funnel(meta_analysis_model_random_effects)

Figure 11: Funnel Plot for the Logistic Mixed-Effects Model (Jara-Ettinger et al., 2022 and Chen, 2023)

funnel(meta_analysis_model_logistic_mixed_effects)

Summary of rescue replication attempt

This rescue replication tested if U.S. adults were more likely to match a novel object with an exemplar based on shape, material, or color. Using a binomial test as my key analysis of interest, I found that U.S. adults were more likely to match based on shape overall \((P_{success} = 0.45, p < 0.001)\), but that this result depended on the exemplar shape, since I replicated the results in the original paper for the arch and lamp exemplars that the probability of success was greater than 1/3 \((P_{success_{arch}} = 0.64, p < 0.001)\) and \((P_{success_{lamp}} = 0.51, p < 0.001)\), respectively, but not for the snowman exemplar \((P_{success_{snowman}} = 0.21, p < 0.01)\).

Commentary

I also ran some follow-up exploratory analyses to (1) account for stimulus-level random effects, (2) test my U.S. adult data against the first replication author’s data using their key analysis of a random effects model, and (3) test how my U.S. adult data compared with the rest of the data in the original paper from Tsimane’ adults, Tsimane’ children, and U.S. children. For Aim 1, I found that the model that takes into account the stimulus-level random effects yielded an insignificant result \((\beta = 0.45, p = 0.33)\), showing that U.S. adults were not more likely to choose the shape match when accounting for the exemplar type. For Aim 2, I found that the random effects model with random intercepts for the exemplar object, testing for a participant’s preference for shape as a function of the exemplar object, yielded an insignificant result \((\beta = -0.063, p = 0.86)\), which aligned with the first replication author’s finding. Lastly, for Aim 3, with a new sample of U.S. adults, the original paper’s logistic mixed-effects model that predicts the participant’s preference for the shape-match object found that U.S. adults were not more likely to generalize the label of a novel object by shape \((\beta = 0.31\), \(p = 0.56)\).

The implementation of attention checks found that participants understood the differences between the exemplar and the three extension objects, so the inability to replicate the shape match probability for all exemplars does not seem to be from participants guessing randomly or not understanding how the extension objects relate to the exemplar. Material was the property that was guessed the most often for the snowman exemplar, which suggests that perhaps the snowman shape was a little less distinctive relative to the arch and lamp shapes, such that the material became a more salient property on which to match the exemplar and extension objects.

Overall, increasing the sample size, using the correct stimuli, and asking participants to share what properties each extension object had with the exemplar object were important changes to ensure a robust replication. Future work could ensure that the shape of the objects are very distinctive, such that other properties like the material do not hinder the participant’s ability to appreciate the object’s shape. Asking participants to answer more than one trial with a greater variety of shapes, materials, and colors may also address individual differences in trial performance or trial-level errors that occur when using a one-trial paradigm. Lastly, it is possible that subject-specific demographics (e.g., the urbanicity of their early environment) may be moderating what property match they choose.

Links