Reproduction of The influence of linguistic form and causal explanations on the development of social essentialism by Benitez, Leshin, and Rhodes (2022, Cognition)

Author

Rose Reagan (reagan@ucsd.edu)

Published

December 11, 2024

Introduction

The paper I have chosen is “The influence of linguistic form and causal explanations on the development of social essentialism” by Josie Benitez, Rachel Leshin, and Marjorie Rhodes. As my first-year project is investigating how the way we talk to kids about social differences may influence the prevalence of essentialist thinking, this developmental psychology paper is incredibly relevant to my interests and one I quite enjoy.

This study includes a child sample and an adult sample in which participants were presented with a storybook where they learned about a novel group called the Zarpies. There were four between-subjects conditions, each following one of four combinations of language form and causal origin information. They then completed four tasks assessing different aspects of essentialism. 3/4 were binary-choice tasks in which a score of 1=essentialist response; these were then combined into an “Essentialism Composite” score out of 4. The 4th task used a 5-point Likert scale in which higher numbers indicated greater endorsement of essentialism.

In the original paper, the adult and child samples were analyzed independently given the vast age-related differences in results. A mean was generated for the composite and scores for each binary test to indicate the probability of endorsing essentialist response in a given task with a 95% confidence interval. The data for the composite and binary measures were analyzed using generalized linear mixed models (GLMMM) - the authors included that they used the glmer function in R’s lme4 package, specified for a binomial distribution. They then conducted Wald chi-squared tests from these results to assess significance. Effect sizes were calculated as coefficients and translated odd ratios (i.e. adults are x times more likely to endorse essentialism than children). For the Likert scale measure, means were also calculated, and the authors used mixed ordinal logistic regression models for further analysis. Significance was calculated using likelihood ratio tests. Mixed effects models tested for language form and causal origin (remember, the four conditions each used a unique combination of language form and causal origin). Participant mean-centered age was used as a predictor in analysis, and simple slopes follow-up tests looked at these interactions.

A main challenge here will be my complete inexperience with R and analysis more generally. I currently know next to nothing about what the above paragraph really means and writing it felt like adapting French, a language I do not speak. Thus, there will be quite the learning curve as I undertake this project. In terms of key reproducability criteria, I will attempt to reproduce all of the above measures while doing additional visualization; I would like to chart out results from both the child and adult samples in ways not included in the original paper. I will also collaborate with the course instructors to brainstorm and complete additional exploratory analysis, looking at any potential patterns based on factors such as gender.

I would now like to link you to the repository for this project and the original paper.

Methods

Power Analysis

A sample size of 200 child participant and 200 adult participants was determined based on a power analysis of effects obtained by the authors in a preceding paper, Leshin et al. 2021. For children, the power analysis sample of 200 was then inflated by researchers to a general sample of 220 in order to account for an anticipated 10% participant drop rate, the standard for studies using the lab’s recruitment venue.

Project Note: After conversation with course staff, it was determined that conducting a post-hoc power analysis is quite difficult with complex generalized effects models and out of scope for this project.

Planned Sample

A target sample size of 220 child participants were recruited from a remote developmental research platform. Participants were not excluded based on responses for comprehension questions or manipulation checks. A random subset of 20% of the study videos were coded for parental interference during the study; if the rate of interference for this subset was not within the bounds of interference identified in prior work using the platform (~1%), trial-by-trial interference coding would be conducted.

200 adult participants were recruited, half in-person with undergraduate students at the host institution and half online via Prolific due to the emergence of the COVID-19 pandemic. Adult participants were excluded if they failed audio verification comprehension checks, did not complete the full study, did not meet eligibility requirements (i.e. identified as non-English speaker or as being under 18 years of age), or failed >3/5 of the Winograd Schema questions included in the paradigm. The sample was 59.4% female, 50.86% white, 26.5% Asian, 9.4% Multiracial, 6.84% Latinx, and 7.26% Black/African American, with 5.56% declining to provide racial ethnic demographic information.

Materials

This study was conducted digitally with stimuli consisting of a video file walking participants through a narrated storybook followed by four digital question-based tasks. Subjects participated using their or their family’s own computer. No other materials were used.

Procedure

This study used a fully crossed, between-participant 2 x 2 design in which each participant was randomly assigned to one of four experimental conditions.

“Prior to beginning the test trials, participants underwent a warm-up phase to introduce them to the biological and cultural origin explanations. After hearing each causal origin explanation, participants were asked two comprehension questions: one that probed the causal origin of being able to smell things (e.g., “What about being able to smell things with your nose? Is that something that you were born with, or something you learned from other people?”), and one that probed the causal origin of knowing the ABCs (e.g., “What about knowing the ABCs? Is that something you were born with, or something you learned from other people?”).”

“Participants were guided through a narrated storybook about a novel category of people referred to as “Zarpies.” The storybook contained 16 pages, each depicting an individual Zarpie displaying a novel property (e.g., having stripes in their hair) or engaging in a novel behavior (e.g., drawing stars on their knees). Each page of the storybook included a one-line description of the depicted property, which followed one of four combinations of language form and causal origin information: (a) generic form/biological origin, (b) generic form/cultural origin, (c) specific form/biological origin, or (d) specific form/cultural origin.” In terms of the linguistic manipulation, descriptors of behavior used either generic form language (Zarpies sleep in tall trees) or specific form language (This Zarpie sleeps in tall trees”).

After participants viewed the storybook, they completed four measures of essentialist beliefs about Zarpies:

Essentialism Measure #1: Category-based explanations of properties Participants will hear three Zarpie properties from the storybook and will be asked to determine whether the property reflects features of the category (e.g., “A lot of Zarpies like to do X”) or of the individual (e.g., “This Zarpie likes to do X”).

Essentialism Measure #2: Flexibility of category-linked properties Participants will be told about traits or behaviors exhibited by Zarpies and will be asked whether or not they believe the Zarpie exclusively demonstrates these traits (and not others).

Essentialism Measure #3: Heritability of category-linked properties Participants will hear a story about a fictitious child who was born to a Zarpie mom but was raised by a non-Zarpie mom, and will be asked two manipulation check questions to ensure that they understood the story. To assess beliefs about the heritability of category-linked properties, participant’s will then be asked to make predictions about what the child will be like in the future—specifically, whether the child will possess properties of the Zarpie parent or the non-Zarpie parent.

Essentialism Measure #4: Within-category homogeneity Participants will be presented with information about two different properties exhibited by a Zarpie and will be asked to predict how many other members of the category “Zarpie” exhibit that same trait: (a) only one, (b) a few, (c) some, (d) most, or (e) all. Participants will receive training on how to use the visual scale to indicate their response before being asked the two target questions.

The first three tasks involved forced-choice binary questions, and participants were given a score of 1 for essentialist responses and 0 for non-essentialist responses. An “essentialism composite” was generated from the first three measures. As the last item was a Likert scale, it was analyzed separately and not included in the composite.

Authors also included two exploratory measures about participants’ attitudes and intended behavior towards Zarpies (resource allocation task and feelings thermometer task). As these tasks were not central to the research questions, they were not used heavily in this paper.

Analysis Plan

“We intend to run generalized linear models from the package lme4 to examine the effects of language form and causal explanations on children’s essentialist beliefs about novel social categories, including their perceptions of (a) category-based explanations, (b) flexibility of category membership, (c) the heritability of category-linked properties. We also intend to use mixed ordinal logistic regression models from the ordinal package to assess the effects of language form and causal explanations on children’s perceptions of within-category homogeneity.”

“Follow-Up Comparisons: If we find significant three-way interactions between language form, origin, and age-group on any of our dependent measures (see models 1f – 5f), we will conduct pairwise follow-up tests on the adult and child samples to determine the nature of the language*origin interaction across age-groups. If we find significant three-way interactions between language form, origin, and mean-centered age within our child sample (see models 1c – 1f), we will dichotomize age into “old” and “young” via a median-split, and use pairwise follow-up tests to analyze the language*origin interaction for “old” and “young” children. Based on the two sets of follow-up tests described above, we will further investigate either (a) the slow emergence of an adult-like pattern across age, or (b) the qualitatively distinct patterns across and within different age groups. All follow-up comparisons will be conducted using functions from the emmeans package (e.g., emmeans, emtrends).

For the binary DVs (explanation items, flexibility items, and switched-at-birth items) we will report beta coefficients from the GLMER results, along with means that will be reported as the probability of providing an essentialist response with 95% confidence intervals, and report odds ratios as indicators of effect sizes. For our mixed-effects ordinal logistic regression analyses (homogeneity items), we will measure goodness of fit and report generalized R2, Pearson’s X2 likelihood ratios, and odds ratios with 95% confidence intervals, with higher numbers indicating broader generalization.”

My modest goals involve generating the essentialism composite and identifying means and standard deviations, exploring correlations between the essentialism measures as an investigation into convergent validity, and visualizing the data. I would like to explore the data not included in the paper from the resource allocation and feelings thermometer tasks. My ambitious goals are to run the generalized linear models described above.

Differences from Original Study

As this is a reproduction project, all facets up to data analysis will be identical. I will aim to create data visualizations external to that already conducted in the original paper; this likely will not drive any differences from the original study.

Design Overview

This study implemented a between-subjects 2 x 2 design in which the manipulated features were linguistic form (generic or specific) and causal origin of the characters’ features (biological vs. cultural). There were four main measures, none of which were repeated within participants: three binary essentialism measures that were then combined into a composite, and a Likert scale within-category homogeneity measure.

The authors would not have been able to conduct the study in this same manner using a within-participant design; use of one type of language would likely “contaminate” the other language trials. Though making the first half of the task use specific language and the latter half generic may save the study from contamination, repeating the measures may lead to a learning effect. Combining biological and cultural causal origins may be possible (i.e. half and half), but any potential differences may be difficult to interpret when the causes are not clearly separated. I have further thoughts on how the authors may have been able to adapt the task to within-subjects, but I do not think it would be worth it with the amount of changes that would have to be made, and I really like the current design!

The authors do not mention demand characteristics, and while I do not think this is really an issue with the child sample, I do wonder if it was clear to some of the adults that the experimenters expected them to essentialize specifically in the generalized language conditions.

Thinking about confounds, I did notice that the four tasks were always conducted in the same sequential order. I wonder if randomizing the task order may make a difference, and if there is a reason the authors decided to pursue a set order. I can’t find if the order of the answer choices (essentialist choice vs. specific choice) was randomized, and I definitely think they should be. I also do not think that the presentation order of the storybook pages was randomized; I don’t know that this matters, but it might!

Actual Sample

A final sample size of 199 children was analyzed (54.27% female, Mage = 6.07 years, range = 4.50–7.95). The racial breakdown was as follows: 70.35% White, 14.07% Multiracial, 9.05% Asian, and 12.56% of Hispanic, Latinx, or Spanish descent. 2 children were excluded after failing to respond to most test questions,

“Twenty additional children participated but were excluded for unsuccessful video uploads (n = 14), not speaking English throughout the duration of the study (n = 2), or for completing an insufficient number of trials (n = 4), as specified in our pre-registrated exclusion criteria.”

234 (59.4% female) adult participants were included in the sample. The racialethnic composition of our sample was 50.86% White, 26.5% Asian, 9.4% Multiracial, 7.26% Black or African American, 0.4% Native American, and 6.84% of the adult sample identified as being of Hispanic, Latinx, or Spanish descent. 5.56% declined to provide this information.

“An additional 109 adults attempted the session but were excluded (as per our pre-registered exclusion criteria) for not completing the entire study (n = 55), failing audio verification checks (n = 30), self-reporting as a non-English speaker (n = 15), being under 18 years of age (n = 3), or failing >3 of the 5 Winograd Schema questions included in our paradigm (n = 6).”

Core Analyses

Data preparation

# Load in all libraries.
library(readr)
library(tidyverse)

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ purrr     1.0.2
✔ forcats   1.0.0     ✔ stringr   1.5.1
✔ ggplot2   3.5.1     ✔ tibble    3.2.1
✔ lubridate 1.9.3     ✔ tidyr     1.3.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(ggplot2)
library(ggthemes)
library(visreg)
library(modelr)
library(geepack)
library(doBy)


Attaching package: 'doBy'

The following object is masked from 'package:dplyr':

    order_by

library(effects)

Loading required package: carData
lattice theme set by effectsTheme()
See ?effectsTheme for details.

library(emmeans)

Welcome to emmeans.
Caution: You lose important information if you filter this package's results.
See '? untidy'

library(Rmisc)

Loading required package: lattice
Loading required package: plyr
------------------------------------------------------------------------------
You have loaded plyr after dplyr - this is likely to cause problems.
If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
library(plyr); library(dplyr)
------------------------------------------------------------------------------

Attaching package: 'plyr'

The following objects are masked from 'package:dplyr':

    arrange, count, desc, failwith, id, mutate, rename, summarise,
    summarize

The following object is masked from 'package:purrr':

    compact

library(effsize)
library(interactions)
library(lme4)

Loading required package: Matrix

Attaching package: 'Matrix'

The following objects are masked from 'package:tidyr':

    expand, pack, unpack

library(ordinal)


Attaching package: 'ordinal'

The following object is masked from 'package:doBy':

    income

The following object is masked from 'package:dplyr':

    slice

library(lmerTest)


Attaching package: 'lmerTest'

The following object is masked from 'package:lme4':

    lmer

The following object is masked from 'package:stats':

    step

library(formattable)
library(here)

here() starts at /Users/rose/Downloads/UCSD

Attaching package: 'here'

The following object is masked from 'package:plyr':

    here

library(dplyr)
library(aod)
library(car)


Attaching package: 'car'

The following object is masked from 'package:dplyr':

    recode

The following object is masked from 'package:purrr':

    some

library(RVAideMemoire)

*** Package RVAideMemoire v 0.9-83-7 ***

Attaching package: 'RVAideMemoire'

The following object is masked from 'package:lme4':

    dummy

The following object is masked from 'package:modelr':

    bootstrap

library(cowplot)


Attaching package: 'cowplot'

The following object is masked from 'package:ggthemes':

    theme_map

The following object is masked from 'package:lubridate':

    stamp

library(gridExtra)


Attaching package: 'gridExtra'

The following object is masked from 'package:dplyr':

    combine

library(egg)

#### Import data
zarpex_kids <- read.csv("/Users/rose/Downloads/UCSD/R/Data/zarpex_clean_kid.csv")
zarpex_adult <- read.csv("/Users/rose/Downloads/UCSD/R/Data/zarpex_clean_adult.csv")

#### Data exclusion / filtering
#most of the data is clean, but they didn't remove one kid that didn't answer most of the Qs

zarpex_kids <- zarpex_kids[-42, ]

#whew, ok! now, removing irrelevant columns
zarpex_kids <- subset(zarpex_kids, select = -c(Progress))
zarpex_kids <- subset(zarpex_kids, select = -c( privacy))
zarpex_kids <- subset(zarpex_kids, select = -c(trainingnose_1, trainingabc_1,soundcheck_1))
zarpex_kids <- subset(zarpex_kids, select = -c(Date.of.Study, Q_TotalDuration, language))

#loading in adult data and giving it a more uniform name
zarpex_adults <- zarpex_adult
rm(zarpex_adult)

#now, creating a column that shows age group
zarpex_kids$age_group = "child"
zarpex_adults$age_group <- "adult"

#error because of character names.. hmm
zarpex_adults$ID <- as.character(zarpex_adults$ID)
zarpex_kids$ID <- as.character(zarpex_kids$ID)


full_data = full_join(zarpex_adults, zarpex_kids, by= NULL)

Joining with `by = join_by(X, ID, lang_form, origin, explain_1, explain_2,
explain_3, flex_1, flex_2, flex_3, sabcheck_1, sabcheck_2, sab_1, sab_2,
homo_1, homo_2, ra_1, ra_2, ft_1, ft_2, gender, race, hispanic, wnw,
age_group)`

#yay! now we want to make the language form and causal origin factor values not character...
full_data$lang_form <- as.factor(full_data$lang_form)
full_data$origin <- as.factor(full_data$origin)

#now we'll report some demographics
summary(zarpex_kids$age_exact)

   Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
  4.495   5.221   5.871   6.065   6.830   7.954

zarpex_kids$gender <- as.factor(zarpex_kids$gender)
summary(zarpex_kids$gender)

female   male 
   109     91

zarpex_kids$race <- as.factor(zarpex_kids$race)
summary(zarpex_kids$race)

(Missing)     Asian     Mixed     White 
       13        18        28       141

#and for adults...
zarpex_adults$gender <- as.factor(zarpex_adults$gender)
zarpex_adults$race <- as.factor(zarpex_adults$race)
zarpex_adults$wnw <- as.factor(zarpex_adults$wnw)

summary(zarpex_adults$gender)

                 female                    male Non-binary/third gender 
                    139                      93                       2

summary(zarpex_adults$race)

          Asian           Black           Mixed Native American           White 
             62              17              22               1             119 
           NA's 
             13

summary(zarpex_adults$wnw)

  nw    w NA's 
 114  119    1

#No exclusions based on comp checks, but let's look anyways just for funsies
zarpex_kids$sabcheck_1 <- as.factor(zarpex_kids$sabcheck_1)
summary(zarpex_kids$sabcheck_1)

   0    1 NA's 
  16  177    7

zarpex_kids$sabcheck_2 <- as.factor(zarpex_kids$sabcheck_2)
summary(zarpex_kids$sabcheck_2)

   0    1 NA's 
  12  186    2

#the kids r alright

#adult comp checks
zarpex_adults$trainingnose_1 <- as.factor(zarpex_adults$trainingnose_1)
summary(zarpex_adults$trainingnose_1)

   0    1 NA's 
  12  218    4

zarpex_adults$trainingabc_1 <- as.factor(zarpex_adults$trainingabc_1)
summary(zarpex_adults$trainingabc_1)

   0    1 NA's 
   1  231    2

#### Prepare data for analysis - create columns etc.
#Now we make the composite dataframe...
full_compo <- select(full_data, -c("soundcheck_1", "soundcheck_2", "trainingabc_1", "trainingnose_1", "sabcheck_1", "sabcheck_2", "homo_1", "homo_2", "ra_1", "ra_2", "ft_1", "ft_2", "wave", "wino_1", "wino_2", "wino_3", "wino_4", , "wino_5"))

#and we gotta make that mf long
compo_long <- pivot_longer(
  full_compo,
  cols = c(5:12),
  names_to = c("task", "item"),
  names_sep = ("_"),
  values_to = "response")

#and let's group by ID and break items into sequence
compo_long <- compo_long %>%
  dplyr::group_by(ID) %>%
      dplyr::mutate(item = seq.int(1:8))

Confirmatory Analysis

#now I want to see the sums and averages for fun
sum(zarpex_kids$explain_1, na.rm = TRUE)

[1] 22

sum_explanation <- colSums(zarpex_kids[, c("explain_1", "explain_2", "explain_3")], na.rm = TRUE)
print(sum_explanation)

explain_1 explain_2 explain_3 
       22       141       102

avg_explanation <- colMeans(zarpex_kids[, c("explain_1", "explain_2", "explain_3")], na.rm = TRUE)
print(sum_explanation)

explain_1 explain_2 explain_3 
       22       141       102

#averages are funky when you have a binary measure huh

summary_explanation <- full_data %>%
  group_by(age_group) %>%
  summarise(
    Sum_exp1 = sum(explain_1, na.rm = TRUE),
    Mean_exp1 = mean(explain_1, na.rm = TRUE),
    Sum_exp2 = sum(explain_2, na.rm = TRUE),
    Mean_exp2 = mean(explain_2, na.rm = TRUE),
    Sum_exp3 = sum(explain_3, na.rm = TRUE),
    Mean_exp3 = mean(explain_3, na.rm = TRUE)
  )
print(summary_explanation)

  Sum_exp1 Mean_exp1 Sum_exp2 Mean_exp2 Sum_exp3 Mean_exp3
1       59 0.1421687      320  0.760095      225 0.5357143

#now just peeking around at what people did on this measure, just for fun
zarpex_kids$explain_1 <- as.factor(zarpex_kids$explain_1)
summary(zarpex_kids$explain_1)

   0    1 NA's 
 170   22    8

zarpex_adults$explain_1 <- as.factor(zarpex_adults$explain_1)
summary(zarpex_kids$explain_1)

   0    1 NA's 
 170   22    8

summary(zarpex_adults$explain_1)

   0    1 NA's 
 186   37   11

#now let's look at how kids responded

comp_kid_means <- summarySE(subset(compo_long, age_group == "child"), measurevar = "response", groupvars = c("age_exact", "lang_form", "origin"), na.rm = TRUE) %>%
  mutate(ci_lower = response - ci,
         ci_upper = response + ci)

Warning in qt(conf.interval/2 + 0.5, datac$N - 1): NaNs produced

compo_kid_means_lang <- summarySE(subset(compo_long, age_group == "child"), measurevar = "response", groupvars = c("lang_form"), na.rm = TRUE) %>%
  mutate(ci_lower = response - ci,
         ci_upper = response + ci)

compo_kid_means_orig <- summarySE(subset(compo_long, age_group == "child"), measurevar = "response", groupvars = c("origin"), na.rm = TRUE) %>%
  mutate(ci_lower = response - ci,
         ci_upper = response + ci)

#and adults
comp_adult_means <- summarySE(subset(compo_long, age_group == "adult"), measurevar = "response", groupvars = c("lang_form", "origin"), na.rm = TRUE) %>%
  mutate(ci_lower = response - ci,
         ci_upper = response + ci)
         
compo_adult_means_lang <- summarySE(subset(compo_long, age_group == "adult"), measurevar = "response", groupvars = c("lang_form"), na.rm = TRUE) %>%
  mutate(ci_lower = response - ci,
         ci_upper = response + ci)

compo_adult_means_orig <- summarySE(subset(compo_long, age_group == "adult"), measurevar = "response", groupvars = c("origin"), na.rm = TRUE) %>%
  mutate(ci_lower = response - ci,
         ci_upper = response + ci)

#model time.. i am frightened but khuyen helped!
#let me try to work through the code here ... glmer is the model, response is the variable we're looking at. we're taking lang form, origin, and age group, looking at everything - throwing the kitchen sink at it. the item and ID are telling it to account for variances between items and individuals. then we tell it what data to use, and what family - binomial tells it that the values are 0 and 1. then is some extra stuff that the authors recommended.
comp_model <- glmer(response ~ lang_form*origin*age_group + (1|item) +(1|ID), data = compo_long, family = binomial, glmerControl(optimizer = "bobyqa", optCtrl=list(maxfun=2e5)))

Anova(comp_model)

Analysis of Deviance Table (Type II Wald chisquare tests)

Response: response
                             Chisq Df Pr(>Chisq)    
lang_form                  43.2497  1  4.818e-11 ***
origin                      5.5411  1    0.01857 *  
age_group                  23.4437  1  1.286e-06 ***
lang_form:origin            1.4152  1    0.23419    
lang_form:age_group         1.2978  1    0.25461    
origin:age_group            0.6118  1    0.43413    
lang_form:origin:age_group  0.0061  1    0.93779    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#ok awesome! That was for the full sample, now let's run it for kiddos
comp_model_kid <- glmer(response ~ lang_form*origin*center_age + (1|item) +(1|ID), data = subset(compo_long, age_group == "child"), family = binomial, glmerControl(optimizer = "bobyqa", optCtrl=list(maxfun=2e5)))
summary(comp_model_kid)

Generalized linear mixed model fit by maximum likelihood (Laplace
  Approximation) [glmerMod]
 Family: binomial  ( logit )
Formula: response ~ lang_form * origin * center_age + (1 | item) + (1 |  
    ID)
   Data: subset(compo_long, age_group == "child")
Control: glmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 2e+05))

     AIC      BIC   logLik deviance df.resid 
  1917.0   1970.5   -948.5   1897.0     1539 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-2.4909 -0.9196  0.4468  0.7873  3.4711 

Random effects:
 Groups Name        Variance Std.Dev.
 ID     (Intercept) 0.09646  0.3106  
 item   (Intercept) 0.84603  0.9198  
Number of obs: 1549, groups:  ID, 199; item, 8

Fixed effects:
                                       Estimate Std. Error z value Pr(>|z|)    
(Intercept)                              0.6684     0.3517   1.901 0.057354 .  
lang_formspecific                       -0.8905     0.1742  -5.113 3.17e-07 ***
originsoc                               -0.3191     0.1853  -1.722 0.085106 .  
center_age                               0.5685     0.1538   3.695 0.000220 ***
lang_formspecific:originsoc              0.3025     0.2453   1.233 0.217564    
lang_formspecific:center_age            -0.6459     0.1919  -3.365 0.000765 ***
originsoc:center_age                    -0.4059     0.2010  -2.020 0.043422 *  
lang_formspecific:originsoc:center_age   0.4128     0.2549   1.620 0.105280    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
            (Intr) lng_fr orgnsc cntr_g lng_f: lng_:_ orgn:_
lng_frmspcf -0.292                                          
originsoc   -0.273  0.555                                   
center_age   0.075 -0.159 -0.143                            
lng_frmspc:  0.206 -0.703 -0.755  0.108                     
lng_frmsp:_ -0.060  0.148  0.115 -0.802 -0.100              
orgnsc:cnt_ -0.057  0.120  0.140 -0.764 -0.105  0.613       
lng_frms::_  0.045 -0.109 -0.110  0.602  0.076 -0.751 -0.788

Anova(comp_model_kid)

Analysis of Deviance Table (Type II Wald chisquare tests)

Response: response
                              Chisq Df Pr(>Chisq)    
lang_form                   31.7772  1  1.729e-08 ***
origin                       1.0697  1   0.301005    
center_age                   1.7667  1   0.183789    
lang_form:origin             1.2407  1   0.265337    
lang_form:center_age        10.5885  1   0.001138 ** 
origin:center_age            1.4572  1   0.227377    
lang_form:origin:center_age  2.6237  1   0.105280    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#aaaaand for adults, but let's remove the center age stuff since they're all adults
comp_model_adult <- glmer(response ~ lang_form*origin + (1|item) +(1|ID), data = subset(compo_long, age_group == "adult"), family = binomial, glmerControl(optimizer = "bobyqa", optCtrl=list(maxfun=2e5)))

summary(comp_model_adult)

Generalized linear mixed model fit by maximum likelihood (Laplace
  Approximation) [glmerMod]
 Family: binomial  ( logit )
Formula: response ~ lang_form * origin + (1 | item) + (1 | ID)
   Data: subset(compo_long, age_group == "adult")
Control: glmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 2e+05))

     AIC      BIC   logLik deviance df.resid 
  2264.5   2297.6  -1126.2   2252.5     1839 

Scaled residuals: 
    Min      1Q  Median      3Q     Max 
-2.8613 -0.7492 -0.3939  0.8231  2.7938 

Random effects:
 Groups Name        Variance Std.Dev.
 ID     (Intercept) 0.4109   0.641   
 item   (Intercept) 0.8190   0.905   
Number of obs: 1845, groups:  ID, 234; item, 8

Fixed effects:
                            Estimate Std. Error z value Pr(>|z|)    
(Intercept)                   0.1948     0.3467   0.562 0.574201    
lang_formspecific            -0.6366     0.1869  -3.406 0.000659 ***
originsoc                    -0.4081     0.1929  -2.115 0.034445 *  
lang_formspecific:originsoc   0.2185     0.2692   0.812 0.416917    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Correlation of Fixed Effects:
            (Intr) lng_fr orgnsc
lng_frmspcf -0.275              
originsoc   -0.266  0.495       
lng_frmspc:  0.190 -0.692 -0.716

Anova(comp_model_adult)

Analysis of Deviance Table (Type II Wald chisquare tests)

Response: response
                   Chisq Df Pr(>Chisq)    
lang_form        15.5174  1  8.175e-05 ***
origin            4.8247  1    0.02806 *  
lang_form:origin  0.6590  1    0.41692    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#ok now for some task-specific analysis without the composite, so we gotta prep the full dataset
full_longo <- pivot_longer(
  full_data,
  cols = c(6:23),
  names_to = c("task", "item"),
  names_sep = ("_"),
  values_to = "response")

full_longo$item <- as.numeric(full_longo$item)
full_longo$ID <- as.factor(full_longo$ID)
full_longo$age_group <- as.factor(full_longo$age_group)

#let's see how participants did on average and by task

explanation_means <- summarySE(subset(full_longo, task == "explain"), measurevar = "response", groupvars = c("lang_form", "age_group"), na.rm = TRUE)

sab_means <- summarySE(subset(full_longo, task == "sab"), measurevar = "response", groupvars = c("lang_form", "age_group"), na.rm = TRUE)

sab_means$age_group <- factor(sab_means$age_group, levels = c("child", "adult"))

#now trying to make a model looking at effect of language form on heritability task in kids
sab_model_kiddo <- glmer(response ~ lang_form*origin*center_age + (1|ID), data = subset(full_longo, task == "sab" & age_group == "child"), family = binomial(logit), glmerControl(optimizer = "bobyqa", optCtrl=list(maxfun=2e5)))

boundary (singular) fit: see help('isSingular')

Anova(sab_model_kiddo)

Analysis of Deviance Table (Type II Wald chisquare tests)

Response: response
                             Chisq Df Pr(>Chisq)  
lang_form                   3.6772  1    0.05516 .
origin                      0.7358  1    0.39101  
center_age                  0.1680  1    0.68194  
lang_form:origin            0.4477  1    0.50341  
lang_form:center_age        0.1593  1    0.68976  
origin:center_age           0.3596  1    0.54872  
lang_form:origin:center_age 0.2947  1    0.58720  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#now, for the flexibility task
flex_model_kiddo <- glmer(response ~ lang_form*origin*center_age + (1|ID), data = subset(full_longo, task == "flex" & age_group == "child"), family = binomial(logit), glmerControl(optimizer = "bobyqa", optCtrl=list(maxfun=2e5)))

Anova(flex_model_kiddo)

Analysis of Deviance Table (Type II Wald chisquare tests)

Response: response
                             Chisq Df Pr(>Chisq)   
lang_form                   3.6637  1    0.05561 . 
origin                      2.4983  1    0.11397   
center_age                  0.4498  1    0.50242   
lang_form:origin            0.0991  1    0.75286   
lang_form:center_age        8.5009  1    0.00355 **
origin:center_age           0.3828  1    0.53611   
lang_form:origin:center_age 0.6469  1    0.42123   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#and explanation task
explain_model_kiddo <- glmer(response ~ lang_form*origin*center_age + (1|ID), data = subset(full_longo, task == "explain" & age_group == "child"), family = binomial(logit), glmerControl(optimizer = "bobyqa", optCtrl=list(maxfun=2e5)))

boundary (singular) fit: see help('isSingular')

Anova(explain_model_kiddo)

Analysis of Deviance Table (Type II Wald chisquare tests)

Response: response
                              Chisq Df Pr(>Chisq)    
lang_form                   29.8423  1  4.687e-08 ***
origin                       0.7320  1    0.39224    
center_age                  10.4749  1    0.00121 ** 
lang_form:origin             1.1255  1    0.28874    
lang_form:center_age         2.1355  1    0.14392    
origin:center_age            0.7765  1    0.37822    
lang_form:origin:center_age  1.4332  1    0.23124    
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Results

Author’s Note: In the interest of time, I will report the results based on the analyses using the composite essentialism scores. I ran models showing that children’s performance was similar across tasks, but there was variability in adults’ task-specific performance as illustrated in the original paper.

Children were significantly more likely to choose essentialist responses in the generic language condition (M = 0.61, 95% CI [0.57, 0.65]) than in the specific language condition (M = .47, 95% CI [0.43, 0.50]) (main effect of language form, Wald X²(1) = 31.78, p < .001). There was a significant interaction between language form and mean-centered age (Wald X²(1) = 10.59, p < .001), such that the heightening effect of generic language on essentialism increased with age. In contrast, age did not have any meaningful effect on children’s responses as a result of hearing specific language. There were no significant effects of causal origin.

Adults were also more likely to choose essentialist responses after hearing characteristics described using generic language (M = 50, 95% CI [0.47, 0.53] ) than after hearing specific language (M = .39, 95% CI [0.36, 0.43]) (main effect of language form, Wald X2(1) = 15.52, P < .001). Unlike in the child sample, there was also a main effect of causal origin such that adults were significantly more likely to choose essentialist responses when the causal origin was biological (M = .47, 95% CI [0.44, 0.51]) versus cultural (M = .42, 95% CI [0.38, 0.45]) (Wald X2(1) = 4.82, P < .005). There was no significant interaction between language form and causal origin.

Figures

#let's visualize how kids did with the whole composite, not getting into task-specific, just looking at the main effects and age trends
ggplot(data = comp_kid_means,
       mapping = aes(x = age_exact,
                     y = response,
                     color = lang_form)) +
  geom_point() +
    stat_smooth(aes(group = lang_form), method = "glm", level = 0.95) +
   ggtitle("Language Form and Essentialism") +
  labs(x = "Child Age (exact)", 
      y = "% Providing Essentialist Response",
      color = "Language Form") +
  scale_color_manual(
    values = c("purple", "sky blue"),
    labels = c("Generic", "Specific")) +
  theme(plot.title = element_text(hjust = 0.5),
        legend.position = c("bottom"), legend.direction = "horizontal")

`geom_smooth()` using formula = 'y ~ x'

Warning: Removed 1 row containing non-finite outside the scale range
(`stat_smooth()`).

Warning: Removed 1 row containing missing values or values outside the scale range
(`geom_point()`).

  theme(panel.grid.major.x = element_line(color = "black",
                                          size = 0.5,
                                          linetype = 2))

Warning: The `size` argument of `element_line()` is deprecated as of ggplot2 3.4.0.
ℹ Please use the `linewidth` argument instead.

List of 1
 $ panel.grid.major.x:List of 6
  ..$ colour       : chr "black"
  ..$ linewidth    : num 0.5
  ..$ linetype     : num 2
  ..$ lineend      : NULL
  ..$ arrow        : logi FALSE
  ..$ inherit.blank: logi FALSE
  ..- attr(*, "class")= chr [1:2] "element_line" "element"
 - attr(*, "class")= chr [1:2] "theme" "gg"
 - attr(*, "complete")= logi FALSE
 - attr(*, "validate")= logi TRUE

  #omg this is beautiful

ggplot(data = comp_kid_means,
       mapping = aes(x = age_exact,
                     y = response,
                     color = origin)) +
  geom_point() +
    stat_smooth(aes(group = origin), method = "glm", level = 0.95) +
   ggtitle("Causal Origin and Essentialism Across Development") +
  labs(x = "Child Age (exact)", 
      y = "% Providing Essentialist Response",
      color = "Causal Origin") +
  scale_color_manual(
    values = c("orange", "darkgreen"),
    labels = c("Generic", "Specific")) +
  theme(plot.title = element_text(hjust = 0.5),
        legend.position = c("bottom"), legend.direction = "horizontal")

`geom_smooth()` using formula = 'y ~ x'

Warning: Removed 1 row containing non-finite outside the scale range
(`stat_smooth()`).

Warning: Removed 1 row containing missing values or values outside the scale range
(`geom_point()`).

  theme(panel.grid.major.x = element_line(color = "black",
                                          size = 0.5,
                                          linetype = 2))

List of 1
 $ panel.grid.major.x:List of 6
  ..$ colour       : chr "black"
  ..$ linewidth    : num 0.5
  ..$ linetype     : num 2
  ..$ lineend      : NULL
  ..$ arrow        : logi FALSE
  ..$ inherit.blank: logi FALSE
  ..- attr(*, "class")= chr [1:2] "element_line" "element"
 - attr(*, "class")= chr [1:2] "theme" "gg"
 - attr(*, "complete")= logi FALSE
 - attr(*, "validate")= logi TRUE

  #also beautiful but the colors are giving peas and carrots

ggplot(data = compo_long %>% 
         filter(age_group == "adult"),
       mapping = aes(y = response,
                     x = lang_form,
                     fill = origin)) +
  ylim(0, 1) +
  labs(x = "Language Form", 
      y = "% Providing Essentialist Response",
      fill = "Causal Origin") +
  stat_summary(fun = "mean",
               geom = "bar",
               position = position_dodge(width = .9),
               color = "black") +
  stat_summary(fun.data = "mean_cl_boot",
               geom = "pointrange",
               position = position_dodge(width = .9))

Warning: Removed 27 rows containing non-finite outside the scale range
(`stat_summary()`).
Removed 27 rows containing non-finite outside the scale range
(`stat_summary()`).

comp_adult_points <- subset(compo_long, age_group == "adult") %>%
  dplyr::group_by(ID, lang_form, origin,.drop = TRUE) %>%
 dplyr::summarise(response_mean = mean(response, na.rm = TRUE))

`summarise()` has grouped output by 'ID', 'lang_form'. You can override using
the `.groups` argument.

comp_adult_points %>%
  ggplot(aes(x = lang_form, y = response_mean, color = origin, shape = origin)) + 
  geom_jitter(width = 0.20, 
              height = 0) + 
   labs(x = "Language Form", 
      y = "% Providing Essentialist Response") +
  ggtitle("Causal Origin and Essentialism in Adults") +
  stat_summary(fun.data = "mean_cl_boot",
               geom = "errorbar",
               width = 0.3)

#khuyen and janna helped so much with this, literal lifesavers

ggplot(data = compo_long,
       mapping = aes(y = response,
                     x = lang_form,
                     fill = age_group)) +
  stat_summary(fun = "mean",
               geom = "bar",
               position = position_dodge(width = .9),
               color = "black") +
    ylim(0, 1) +
  labs(x = "Language Form", 
      y = "% Providing Essentialist Response",
      color = "Age") +
  ggtitle("Children's and Adults' Essentialist Responses by Language Form") +
  stat_summary(fun = "mean",
               geom = "bar",
               position = position_dodge(width = .9),
               color = "black") +
   stat_summary(fun.data = "mean_cl_boot",
               geom = "errorbar",
               position = position_dodge(width = .9),
               width = 0.3)

Warning: Removed 78 rows containing non-finite outside the scale range
(`stat_summary()`).
Removed 78 rows containing non-finite outside the scale range
(`stat_summary()`).
Removed 78 rows containing non-finite outside the scale range
(`stat_summary()`).

ggplot(data = compo_long,
       mapping = aes(y = response,
                     x = origin,
                     fill = age_group)) +
  stat_summary(fun = "mean",
               geom = "bar",
               position = position_dodge(width = .9),
               color = "black") +
    ylim(0, 1) +
  labs(x = "Causal Origin", 
      y = "% Providing Essentialist Response",
      color = "Age") +
  ggtitle("Children's and Adults' Essentialist Responses by Causal Origin") +
  stat_summary(fun = "mean",
               geom = "bar",
               position = position_dodge(width = .9),
               color = "black") +
   stat_summary(fun.data = "mean_cl_boot",
               geom = "errorbar",
               position = position_dodge(width = .9),
               width = 0.3)

Warning: Removed 78 rows containing non-finite outside the scale range
(`stat_summary()`).
Removed 78 rows containing non-finite outside the scale range
(`stat_summary()`).
Removed 78 rows containing non-finite outside the scale range
(`stat_summary()`).

Exploratory Analyses

This paper included two exploratory measures that were not included in the paper or preregistration: resource allocation (RA) and feelings thermometer (FT) to measure intended behavior towards and feelings about group members. As RA was a binary measure like the tasks used in our composite and and FT was not, I only analyzed the RA measure. There were no significant main effects or interactions.

#let's look at their exploratory measures. first, let's see what all measures we have:
table(full_longo$task)


     explain         flex         homo           ra          sab     sabcheck 
        1302         1302          868          868          868          868 
  soundcheck  trainingabc trainingnose 
         868          434          434

#ah yes - I remember, I dropped one exploratory measure because it was not binary, and this project makes it difficult to look at likert-scale data. so let's just look at resource allocation (RA)
ra_means <- summarySE(subset(full_longo, task == "ra"), measurevar = "response", groupvars = c("lang_form", "age_group"), na.rm = TRUE)

ra_model_kiddo <- glmer(response ~ lang_form*origin*center_age + (1|ID), data = subset(full_longo, task == "ra" & age_group == "child"), family = binomial(logit), glmerControl(optimizer = "bobyqa", optCtrl=list(maxfun=2e5)))

boundary (singular) fit: see help('isSingular')

Anova(ra_model_kiddo)

Analysis of Deviance Table (Type II Wald chisquare tests)

Response: response
                             Chisq Df Pr(>Chisq)  
lang_form                   3.0536  1    0.08056 .
origin                      0.7069  1    0.40048  
center_age                  0.0505  1    0.82220  
lang_form:origin            0.0512  1    0.82101  
lang_form:center_age        0.0349  1    0.85182  
origin:center_age           0.2993  1    0.58429  
lang_form:origin:center_age 0.0204  1    0.88636  
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#ok - I don't know what to do about singularity, or what that means. But it looks like there's nothing of significance.



ggplot(data = full_longo %>%
       filter(task == "ra"),
       mapping = aes(y = response,
                     x = lang_form,
                     fill = age_group,
                     shape = origin)) +
  ylim(0, 1) +
  labs(x = "Language Form", 
      y = "% Providing Essentialist Response",
      fill = "Age Group",
      shape = "Causal Origin") +
   ggtitle("Children's and Adults' Essentialist Responses in Resouce Allocation Task") +
  stat_summary(fun = "mean",
               geom = "bar",
               position = position_dodge(width = .9),
               color = "black") +
  stat_summary(fun.data = "mean_cl_boot",
               geom = "pointrange",
               position = position_dodge(width = .9))

Warning: Removed 15 rows containing non-finite outside the scale range
(`stat_summary()`).
Removed 15 rows containing non-finite outside the scale range
(`stat_summary()`).

Discussion

Summary of Reproduction Attempt

Open the discussion section with a paragraph summarizing the primary result from the confirmatory analysis and the assessment of whether it replicated, partially replicated, or failed to replicate the original result.

In summary, this study investigated whether varying linguistic form (generic vs. specific language) and causal origin (biological vs. cultural) to describe the properties of a novel social group will lead children and adults to essentialize said group. Social essentialism was measured using a series of tasks, three of which were combined to generate a composite score and mutated to represent a proportion of essentialist responses given.

My analyses using this composite found that children were significantly more likely to endorse essentialist responses after hearing generic language to describe group-specific features, and that this effect strengthens with age in a sample of children ages 4-8. I observed no significant effect of specific language on children’s essentialist responses, no main effect of the causal origin manipulation, and no interaction between linguistic form and causal origin. The strengthening effect of generic language on essentialism was significant in the adult sample as well; however, while causal origin did not affect children’s responses, the effect of biological causal origin was highly significant in the adult population. I was pleased to find that all results replicated the original results completely, and I was able to successfully recreate the authors’ visualizations of this data.

Given how tidy and well-annotated the open-source data was (and the fact that the authors made their code available as well), I had high hopes for this reproduction attempt. As this study served as the inspiration for my first-year PhD project and represents the type of developmental dataset that I will realistically have to analyse for my own work, I was highly motivated to understand the mechanisms of the reproduction and did not just blankly regurgitate the original code. However, the authors’ provided code did serve as a great foundation for which packages to use, reference points for when I got stuck, and guidance when I simply did not know how to start. I took advantage of ChatGPT, Blackbox AI, and the TAs to help me intepret the authors’ original code, including figuring out what it was doing, what it meant, and how it built to the holistic interpretation of the data. Throughout the project (especially in visualization), I referenced many of our in-class R exercises, which provided a helpful foundation and code snippets that were proven to work and that I could adapt to fit my needs. I had to really hone my troubleshooting skills throughout this process, and many of my most frustrating errors were syntactical in nature. Author’s note: Thank you to Khuyen and Janna, whose clever eyes caught many the typo and missed dash, and gracefully handled many of my “IT’S NOT WORKING!” exclamations in class (and death-glares at my laptop screen)

Commentary

Add open-ended commentary (if any) reflecting (a) insights from follow-up exploratory analysis, (b) assessment of the meaning of the replication (or not) - e.g., for a failure to replicate, are the differences between original and present study ones that definitely, plausibly, or are unlikely to have been moderators of the result, and (c) discussion of any objections or challenges raised by the current and original authors about the replication attempt. None of these need to be long.

Commentary: Progress Report 1

I began by loading up all of the libraries needed according to the OSF files. As it stands, data is fairly clean and well-documented, broken into separate files for adult and child data. I have removed the two child subjects who did not answer the majority of the questions, as they were not included in the primary analyses. No exclusions are necessary from the clean adult data. I then made a new data frame excluding some columns that were not relevant, such as time taken to complete the study and level of media release for study recordings, just to tidy things up.

Then I made a new column in each data frame for age group (adult vs. child), and then combined my two data frames using “full join.” I then changed various columns from character to factor variables for analysis: language and causal origin condition, gender, race, response to comprehension checks. I ran a summary for each demographic factor to report below, and ran summaries of the comprehension checks to get a sense of how they were doing, even though the authors did not exclude participants based on this.

My next step is to generate the composite; I’ve worked to make a new data frame with the relevant columns for the composite, but I need to think about how best to proceed.

Commentary: Progress Report 2

First, I did some peeking at how participants did on the measures just to get a sense of the data and to get more practice generating sums and averages. I then constructed the dataframe for the essentialism composite (and made it long!). Next, with the help of Khuyen during office hours, I ran a Generalized Linear Mixed-Effects Model using the glmr function specified by the authors. I then used an Anova on the model. I used ChatGPT to help me interpret the results and figure out how to make sense of the numbers and…. they reproduced! The numbers match the authors’ completely and the interpretation of the results is identical. Now that I knew I was on the right track, I pulled the child sample from the full composite and ran the same GLMM and Anova. Yet another reproduction! Then, I did the same with the adult subset of the composite. Now, all of the models have been run, and the numbers match the authors’ – woohoo! Next I’ll need to do visualizations, and I hope that tomorrow’s reading and this week’s classes will help with that.

Commentary: Progress Report 3

I ran a bunch more models and got to work on my visualizations, something I was really scared of. I attended Pria’s talk on LMMs, and it was so helpful in understanding what I’m doing and what it means! In response to prior progress checks, I finally included my code in this progress report (sorry Khuyen!) and added explanation on the linguistic form manipulation. Some of my code isn’t working and I don’t know what’s up with that honestly - it’s working in my QMD, so I am about to go postal. Update: it was a missing asterisk

Final Commentary

Going into this project, I picked a paper that I have been obsessed with for a while now and think is beautifully done. I expected to find major problems and bones to pick with the authors as I worked to reproduce the project, but that was not the case - I still highly respect the work and have no major objections. The authors set me up nicely for my reproduction and I think this is a great example of how open science should be done - clean and generally well-documented data, easily accessible reference materials for the tasks, and well-annotated code (with one funny chunk of documentation about how R was randomly not working and refusing to process something, and they had to do this convoluted work-around).

Given that, I found a couple things in the data that are not super well documented, but they seem to be minor and have no obvious implications for the paper’s findings. One of the exclusion criteria relates Winograd Schema questions, but the authors never explain what those are or why they’re in the paradigm, and they aren’t used in any of the analyses. After googling it, I think it’s to collect a baseline of how adults interpret generic language ambiguity? I’m not sure, and I am curious. There is also a column in the data reporting political affiliation, with no mention to how that data was collected, how liberal/conservative was determined, and this was not used in any of the analyses.

Besides these minor data confusions, I do have one point of uncertainty in the paradigm: the essentialism measures. The use of the “composite essentialism score” makes sense to me, but having one task that does not fall into that (the generalizability task, where participants are shown a quality of the social group and asked how many other group-members also have that quality using a Likert scale) feels a bit disjointed, and I didn’t know how best to handle that one measure. Maybe adapting the Likert scale into a binary (something like, less than half of the other group members = 0, more than half = 1) would allow this measure to be included in the composite.

Finally, while I categorize this attempt as a success, I recognize that I did not do much of the additional analysis that I had initially planned in my key reproducibility criteria. I underestimated how much learning I had to do and how much work actually needed to be done to cover the main analyses, and how carefully I had to work to make the figures. I did do some exploratory analysis and visualization in the resource allocation measure, but that brought me to my limit.