Replication of The Origins of the Shape Bias: Evidence From the Tsimane’ by Jara-Ettinger et al. (2022, Journal of Experimental Psychology: General)

Author

Emily Chen (emchen15@stanford.edu)

Published

October 22, 2023

Introduction

I chose to try and rescue the replication of this experiment because I am interested in the intersection of language and spatial/object reasoning. As a developmental cognitive scientist, I also found the comparison of adults and children to be an important element relevant to my research interests. Finally, the cross-cultural comparisons made in this paper are something that I’d like to explore further in my own research, outside of the context of this project.

The stimuli that will be required to collect additional data for this experiment are available in the repository of the original replication project. The original replication collected data online from 144 U.S. adults, following the paradigm of Experiment 5 in the paper. It won’t be possible to collect additional data from Tsimane’ adults, so a replication of Experiments 6 and 7 cannot be performed. The new replication attempt will feature attention checks and a more detailed questionnaire about the subject’s early geographic environment (e.g., urban versus rural, highly industrialized versus less industrialized), given the theoretical claims in the paper about the importance of early environmental factors on the strength of the shape bias in children.

In the original paper, the only experiment (out of 7) where the data were collected online was Experiment 5. To test if the replication of the original paper partially depends on the setting of data collection, I also plan to collect data to replicate Experiment 1 in the original paper, where data were originally collected in person with U.S. children. One hypothesis for why the setting might matter is that subjects’ interpretation of the physical properties of the exemplar may differ based on whether the exemplar is presented as a physical object or online. Thus, I plan to collect data asynchronously online with 30 children ages 3-9 years old using Lookit, following the same procedures for Experiment 1 in the original paper. Like with the adult studies, it won’t be possible to collect additional data from Tsimane’ children, so a replication of Experiments 2, 3, and 4 cannot be performed.

Click here for this rescue project’s Github repository. The PDF of the original paper can be found here.

Summary of prior replication attempt

The prior replication attempt tried to replicate only Experiment 5 from the original paper (the original paper had seven total experiments), which tested U.S. adults online using Amazon’s Mechanical Turk. While the replication attempt does not specify how they collected the data, I assume that they collected the data using Prolific instead of MTurk.

The biggest difference between the original study and the first replication is that the original authors used different stimuli from the first replication author. The stimuli in the original paper used the images shown in Table 1, but the first replication used the images shown in Figure 1. The replication author contacted the first author of the study for the original image files, but they were lost, so the replication author used screenshots of the shapes taken from the figures in the paper.

The demographics of the sample were the same for both the original study (specified as U.S. adults on MTurk) and the first replication (specified as English-speaking adults from the U.S.). The original study had a sample size of N=144 U.S. adults, and the first replication had a sample size of N=142 U.S. adults, with a planned sample of N=144, but two participants did not complete the study and the replication author did not collect two additional data points.

There are two main analyses:

Calculating the percentage of participants who chose the object based on shape congruency with the exemplar, the percentage of participants who chose the object based on material congruency with the exemplar, and the percentage of participants who chose the object based on color congruency with the exemplar. Both the original study and the first replication did this analysis the same way.
Running a logistic mixed-effects model that predicted the participant’s preference for the shape-match object. In the original study, the authors used a baseline probability of 33.3% with the population (U.S. participants) and the age group (adults) dummy coded as independent variables. To control for the role of exemplar, the regression included random intercepts for the experiment number, random intercepts for the exemplar object, random slopes for population as a function of exemplar, and random slopes for age group as a function of exemplar. In the first replication, the author used a random effects model, which I believe is the same type of model as the original paper, with exemplar type as random intercepts, which tested for a participant’s preference for shape as a function of the exemplar object.

Methods

Power Analysis

TO BE DETERMINED.

Original effect size, power analysis for samples to achieve 80%, 90%, 95% power to detect that effect size. Considerations of feasibility for selecting planned sample size.

How much power does your planned sample have for original effect? For an attenuated effect that is half the size of the original?

(If power analysis is not possible or precise, discuss more fully how you determined a sample size that would be sufficient for rescue.)

Planned Sample

The sample size for Experiment 5 will be N=144 U.S. adults, recruited on Prolific. I will stop the study when I reach 144 participants who complete the task. I will also collect the following demographic information from the adult participants:

All relevant zip codes of where they lived between ages 3 and 9 (to follow the age range for the children tested for the studies in the original paper). Alternatively, if this information is not allowed to be collected due to privacy reasons, I will instead ask them to indicate the level of industrialization of their home location (on a 7-point Likert scale) during that age.
The extent to which their home location was urban or rural between ages 3 and 9, as assessed using a 7-point Likert scale.
The first language learned (and subsequent languages learned).

Materials

“Stimuli consisted of solid objects that varied in shape, color, and material…Each experiment consisted of three example objects and three extension objects. Each participant saw only one example object (counterbalanced across participants) and all extension objects…Experiment 5 which used photographs of the objects because it was conducted online.”

Procedure

Experiment 5 was a “one-shot learning trial and each participant completed one trial only. Although each trial only required one label, we used three different possible labels, randomized across participants. In the experiments with U.S. participants, the example object was called a koba, dax, or fep…Participants saw a single screen where the top said “This is a(n) x” along with a picture of the object. Below, the text read “One of these is also a(n) x” along with three pictures of the three possible extension choices. The text below read “Which one is the other x?” Participants were allowed to select one of the three objects.”

Controls

“To ensure that participants were attending to the task, we also asked participants what each object shared in common with the original object. These questions were only included to motivate participants to look at the images carefully.” In the original study, answers to these questions were not used as exclusion criteria, but in this study, I will exclude participants who have more than one incorrect answer.

Analysis Plan

Participants will be excluded if they did not answer the main question of interest (the shape bias question) and if they answer incorrectly more than once to how the extension object choices relate to the exemplar object. Data will be downloaded from Prolific and scrubbed of all identifying information (e.g., IP addresses). I will conduct two analyses:

Calculate the percentage of participants who chose the object based on shape congruency with the exemplar, the percentage of participants who chose the object based on material congruency with the exemplar, and the percentage of participants who chose the object based on color congruency with the exemplar.
Run a logistic mixed-effects model that predicts the participant’s preference for the shape-match object. In the original study, the authors used a baseline probability of 33.3% with the population (U.S. participants) and the age group (adults) dummy coded as independent variables. To control for the role of exemplar, the regression included random intercepts for the experiment number, random intercepts for the exemplar object, random slopes for population as a function of exemplar, and random slopes for age group as a function of exemplar.

Clarify key analysis of interest:
I am primarily interested in replicating the two analyses listed above, which are identical to the original paper. However, I also plan to do a secondary analysis, which looks at the effect of early environment and first language on the strength of the shape bias. This secondary analysis may require me to collect more data than the originally planned N=144 U.S. adult sample in order to have a balanced number of participants in the following groups: industrialized/non-industrialized, urban/rural, and English as a first language/non-English as a first language.

Differences from Original Study and 1st replication

The only known major difference between this plan and the first replication is that the stimuli will be images of the physical objects as opposed to artistic renderings of the objects. The main difference between this plan and the original study is that I plan to collect more demographic information from the participants for additional analyses not conducted in the original study. I also plan to exclude participants who answer incorrectly more than once to the question asking them to indicate what each object has in common with the exemplar object.

Methods Addendum (Post Data Collection)

You can comment this section out prior to final report with data collection.

Actual Sample

Sample size, demographics, data exclusions based on rules spelled out in analysis plan

Differences from pre-data collection methods plan

Any differences from what was described as the original plan, or “none”.

Results

Data preparation

Data preparation following the analysis plan.

### Data Preparation

#### Load Relevant Libraries and Functions
library (tidyr)
library (ggplot2)

#### Import data
#Read in the CSV file, which has been scrubbed of the columns containing identifiable participant information.

#### Data exclusion / filtering
#Remove columns that are unnecessary for the main findings. Remove rows that have NA in the shape choice column (i.e., the participant did not respond to the main question of interest). Exclude rows with more than one incorrect answer to the attention check.  

#### Prepare data for analysis - create columns etc.
#Factor the columns to make the numerical answer values meaningful.

Results of control measures

I examined the answers to the attention check question “what does each object share in common with the original object?” to verify that participants understood the task and were paying attention. I excluded participants who answered these questions incorrectly more than once.

Confirmatory analysis

I conducted two main analyses:

I calculated the percentage of participants who chose the object based on shape congruency with the exemplar, the percentage of participants who chose the object based on material congruency with the exemplar, and the percentage of participants who chose the object based on color congruency with the exemplar. XX% of participants chose the object based on shape congruency with the exemplar, XX% of participants chose the object based on material congruency with the exemplar, and XX% of participants chose the object based on color congruency with the exemplar. The results did/did not replicate with the original study.
I ran a logistic mixed-effects model that predicts the participant’s preference for the shape-match object. In the original study, the authors used a baseline probability of 33.3% with the population (U.S. participants) and the age group (adults) dummy coded as independent variables. To control for the role of exemplar, the regression included random intercepts for the experiment number, random intercepts for the exemplar object, random slopes for population as a function of exemplar, and random slopes for age group as a function of exemplar. This model found that U.S. adults were [] to generalize the label of a novel object by shape (\(\beta = X.XX\), \(p < X.XX\)).

Three-panel graph with original, 1st replication, and your replication is ideal here

Exploratory analyses

I conducted an exploratory analysis to see if the nature of adult participants’ early childhood environment affected the strength of their shape bias. I also conducted an exploratory analysis to check if their first language affected the strength of their shape bias.

Discussion

Mini meta analysis

Combining across the original paper, 1st replication, and 2nd replication, what is the aggregate effect size?

Summary of Replication Attempt

Open the discussion section with a paragraph summarizing the primary result from the confirmatory analysis and the assessment of whether it replicated, partially replicated, or failed to replicate the original result.

Commentary

Add open-ended commentary (if any) reflecting (a) insights from follow-up exploratory analysis, (b) assessment of the meaning of the replication (or not) - e.g., for a failure to replicate, are the differences between original and present study ones that definitely, plausibly, or are unlikely to have been moderators of the result, and (c) discussion of any objections or challenges raised by the current and original authors about the replication attempt. None of these need to be long.