Reproduction of The influence of linguistic form and causal explanations on the development of social essentialism by Benitez, Leshin, and Rhodes (2022, Cognition)
Author
Rose Reagan (reagan@ucsd.edu)
Published
December 11, 2024
Introduction
The paper I have chosen is “The influence of linguistic form and causal explanations on the development of social essentialism” by Josie Benitez, Rachel Leshin, and Marjorie Rhodes. As my first-year project is investigating how the way we talk to kids about social differences may influence the prevalence of essentialist thinking, this developmental psychology paper is incredibly relevant to my interests and one I quite enjoy.
This study includes a child sample and an adult sample in which participants were presented with a storybook where they learned about a novel group called the Zarpies. There were four between-subjects conditions, each following one of four combinations of language form and causal origin information. They then completed four tasks assessing different aspects of essentialism. 3/4 were binary-choice tasks in which a score of 1=essentialist response; these were then combined into an “Essentialism Composite” score out of 4. The 4th task used a 5-point Likert scale in which higher numbers indicated greater endorsement of essentialism.
In the original paper, the adult and child samples were analyzed independently given the vast age-related differences in results. A mean was generated for the composite and scores for each binary test to indicate the probability of endorsing essentialist response in a given task with a 95% confidence interval. The data for the composite and binary measures were analyzed using generalized linear mixed models (GLMMM) - the authors included that they used the glmer function in R’s lme4 package, specified for a binomial distribution. They then conducted Wald chi-squared tests from these results to assess significance. Effect sizes were calculated as coefficients and translated odd ratios (i.e. adults are x times more likely to endorse essentialism than children). For the Likert scale measure, means were also calculated, and the authors used mixed ordinal logistic regression models for further analysis. Significance was calculated using likelihood ratio tests. Mixed effects models tested for language form and causal origin (remember, the four conditions each used a unique combination of language form and causal origin). Participant mean-centered age was used as a predictor in analysis, and simple slopes follow-up tests looked at these interactions.
A main challenge here will be my complete inexperience with R and analysis more generally. I currently know next to nothing about what the above paragraph really means and writing it felt like adapting French, a language I do not speak. Thus, there will be quite the learning curve as I undertake this project. In terms of key reproducability criteria, I will attempt to reproduce all of the above measures while doing additional visualization; I would like to chart out results from both the child and adult samples in ways not included in the original paper. I will also collaborate with the course instructors to brainstorm and complete additional exploratory analysis, looking at any potential patterns based on factors such as gender.
A sample size of 200 child participant and 200 adult participants was determined based on a power analysis of effects obtained by the authors in a preceding paper, Leshin et al. 2021. For children, the power analysis sample of 200 was then inflated by researchers to a general sample of 220 in order to account for an anticipated 10% participant drop rate, the standard for studies using the lab’s recruitment venue.
Project Note: After conversation with course staff, it was determined that conducting a post-hoc power analysis is quite difficult with complex generalized effects models and out of scope for this project.
Planned Sample
A target sample size of 220 child participants were recruited from a remote developmental research platform. Participants were not excluded based on responses for comprehension questions or manipulation checks. A random subset of 20% of the study videos were coded for parental interference during the study; if the rate of interference for this subset was not within the bounds of interference identified in prior work using the platform (~1%), trial-by-trial interference coding would be conducted.
200 adult participants were recruited, half in-person with undergraduate students at the host institution and half online via Prolific due to the emergence of the COVID-19 pandemic. Adult participants were excluded if they failed audio verification comprehension checks, did not complete the full study, did not meet eligibility requirements (i.e. identified as non-English speaker or as being under 18 years of age), or failed >3/5 of the Winograd Schema questions included in the paradigm. The sample was 59.4% female, 50.86% white, 26.5% Asian, 9.4% Multiracial, 6.84% Latinx, and 7.26% Black/African American, with 5.56% declining to provide racial ethnic demographic information.
Materials
This study was conducted digitally with stimuli consisting of a video file walking participants through a narrated storybook followed by four digital question-based tasks. Subjects participated using their or their family’s own computer. No other materials were used.
Procedure
This study used a fully crossed, between-participant 2 x 2 design in which each participant was randomly assigned to one of four experimental conditions.
“Prior to beginning the test trials, participants underwent a warm-up phase to introduce them to the biological and cultural origin explanations. After hearing each causal origin explanation, participants were asked two comprehension questions: one that probed the causal origin of being able to smell things (e.g., “What about being able to smell things with your nose? Is that something that you were born with, or something you learned from other people?”), and one that probed the causal origin of knowing the ABCs (e.g., “What about knowing the ABCs? Is that something you were born with, or something you learned from other people?”).”
“Participants were guided through a narrated storybook about a novel category of people referred to as “Zarpies.” The storybook contained 16 pages, each depicting an individual Zarpie displaying a novel property (e.g., having stripes in their hair) or engaging in a novel behavior (e.g., drawing stars on their knees). Each page of the storybook included a one-line description of the depicted property, which followed one of four combinations of language form and causal origin information: (a) generic form/biological origin, (b) generic form/cultural origin, (c) specific form/biological origin, or (d) specific form/cultural origin.” In terms of the linguistic manipulation, descriptors of behavior used either generic form language (Zarpies sleep in tall trees) or specific form language (This Zarpie sleeps in tall trees”).
After participants viewed the storybook, they completed four measures of essentialist beliefs about Zarpies:
Essentialism Measure #1: Category-based explanations of properties Participants will hear three Zarpie properties from the storybook and will be asked to determine whether the property reflects features of the category (e.g., “A lot of Zarpies like to do X”) or of the individual (e.g., “This Zarpie likes to do X”).
Essentialism Measure #2: Flexibility of category-linked properties Participants will be told about traits or behaviors exhibited by Zarpies and will be asked whether or not they believe the Zarpie exclusively demonstrates these traits (and not others).
Essentialism Measure #3: Heritability of category-linked properties Participants will hear a story about a fictitious child who was born to a Zarpie mom but was raised by a non-Zarpie mom, and will be asked two manipulation check questions to ensure that they understood the story. To assess beliefs about the heritability of category-linked properties, participant’s will then be asked to make predictions about what the child will be like in the future—specifically, whether the child will possess properties of the Zarpie parent or the non-Zarpie parent.
Essentialism Measure #4: Within-category homogeneity Participants will be presented with information about two different properties exhibited by a Zarpie and will be asked to predict how many other members of the category “Zarpie” exhibit that same trait: (a) only one, (b) a few, (c) some, (d) most, or (e) all. Participants will receive training on how to use the visual scale to indicate their response before being asked the two target questions.
The first three tasks involved forced-choice binary questions, and participants were given a score of 1 for essentialist responses and 0 for non-essentialist responses. An “essentialism composite” was generated from the first three measures. As the last item was a Likert scale, it was analyzed separately and not included in the composite.
Authors also included two exploratory measures about participants’ attitudes and intended behavior towards Zarpies (resource allocation task and feelings thermometer task). As these tasks were not central to the research questions, they were not used heavily in this paper.
Analysis Plan
“We intend to run generalized linear models from the package lme4 to examine the effects of language form and causal explanations on children’s essentialist beliefs about novel social categories, including their perceptions of (a) category-based explanations, (b) flexibility of category membership, (c) the heritability of category-linked properties. We also intend to use mixed ordinal logistic regression models from the ordinal package to assess the effects of language form and causal explanations on children’s perceptions of within-category homogeneity.”
“Follow-Up Comparisons: If we find significant three-way interactions between language form, origin, and age-group on any of our dependent measures (see models 1f – 5f), we will conduct pairwise follow-up tests on the adult and child samples to determine the nature of the language*origin interaction across age-groups. If we find significant three-way interactions between language form, origin, and mean-centered age within our child sample (see models 1c – 1f), we will dichotomize age into “old” and “young” via a median-split, and use pairwise follow-up tests to analyze the language*origin interaction for “old” and “young” children. Based on the two sets of follow-up tests described above, we will further investigate either (a) the slow emergence of an adult-like pattern across age, or (b) the qualitatively distinct patterns across and within different age groups. All follow-up comparisons will be conducted using functions from the emmeans package (e.g., emmeans, emtrends).
For the binary DVs (explanation items, flexibility items, and switched-at-birth items) we will report beta coefficients from the GLMER results, along with means that will be reported as the probability of providing an essentialist response with 95% confidence intervals, and report odds ratios as indicators of effect sizes. For our mixed-effects ordinal logistic regression analyses (homogeneity items), we will measure goodness of fit and report generalized R2, Pearson’s X2 likelihood ratios, and odds ratios with 95% confidence intervals, with higher numbers indicating broader generalization.”
My modest goals involve generating the essentialism composite and identifying means and standard deviations, exploring correlations between the essentialism measures as an investigation into convergent validity, and visualizing the data. I would like to explore the data not included in the paper from the resource allocation and feelings thermometer tasks. My ambitious goals are to run the generalized linear models described above.
Differences from Original Study
As this is a reproduction project, all facets up to data analysis will be identical. I will aim to create data visualizations external to that already conducted in the original paper; this likely will not drive any differences from the original study.
Design Overview
This study implemented a between-subjects 2 x 2 design in which the manipulated features were linguistic form (generic or specific) and causal origin of the characters’ features (biological vs. cultural). There were four main measures, none of which were repeated within participants: three binary essentialism measures that were then combined into a composite, and a Likert scale within-category homogeneity measure.
The authors would not have been able to conduct the study in this same manner using a within-participant design; use of one type of language would likely “contaminate” the other language trials. Though making the first half of the task use specific language and the latter half generic may save the study from contamination, repeating the measures may lead to a learning effect. Combining biological and cultural causal origins may be possible (i.e. half and half), but any potential differences may be difficult to interpret when the causes are not clearly separated. I have further thoughts on how the authors may have been able to adapt the task to within-subjects, but I do not think it would be worth it with the amount of changes that would have to be made, and I really like the current design!
The authors do not mention demand characteristics, and while I do not think this is really an issue with the child sample, I do wonder if it was clear to some of the adults that the experimenters expected them to essentialize specifically in the generalized language conditions.
Thinking about confounds, I did notice that the four tasks were always conducted in the same sequential order. I wonder if randomizing the task order may make a difference, and if there is a reason the authors decided to pursue a set order. I can’t find if the order of the answer choices (essentialist choice vs. specific choice) was randomized, and I definitely think they should be. I also do not think that the presentation order of the storybook pages was randomized; I don’t know that this matters, but it might!
Actual Sample
A final sample size of 199 children was analyzed (54.27% female, Mage = 6.07 years, range = 4.50–7.95). The racial breakdown was as follows: 70.35% White, 14.07% Multiracial, 9.05% Asian, and 12.56% of Hispanic, Latinx, or Spanish descent. 2 children were excluded after failing to respond to most test questions,
“Twenty additional children participated but were excluded for unsuccessful video uploads (n = 14), not speaking English throughout the duration of the study (n = 2), or for completing an insufficient number of trials (n = 4), as specified in our pre-registrated exclusion criteria.”
234 (59.4% female) adult participants were included in the sample. The racialethnic composition of our sample was 50.86% White, 26.5% Asian, 9.4% Multiracial, 7.26% Black or African American, 0.4% Native American, and 6.84% of the adult sample identified as being of Hispanic, Latinx, or Spanish descent. 5.56% declined to provide this information.
“An additional 109 adults attempted the session but were excluded (as per our pre-registered exclusion criteria) for not completing the entire study (n = 55), failing audio verification checks (n = 30), self-reporting as a non-English speaker (n = 15), being under 18 years of age (n = 3), or failing >3 of the 5 Winograd Schema questions included in our paradigm (n = 6).”
Core Analyses
Data preparation
# Load in all libraries.library(readr)library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr 1.1.4 ✔ purrr 1.0.2
✔ forcats 1.0.0 ✔ stringr 1.5.1
✔ ggplot2 3.5.1 ✔ tibble 3.2.1
✔ lubridate 1.9.3 ✔ tidyr 1.3.1
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag() masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
Attaching package: 'doBy'
The following object is masked from 'package:dplyr':
order_by
library(effects)
Loading required package: carData
lattice theme set by effectsTheme()
See ?effectsTheme for details.
library(emmeans)
Welcome to emmeans.
Caution: You lose important information if you filter this package's results.
See '? untidy'
library(Rmisc)
Loading required package: lattice
Loading required package: plyr
------------------------------------------------------------------------------
You have loaded plyr after dplyr - this is likely to cause problems.
If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
library(plyr); library(dplyr)
------------------------------------------------------------------------------
Attaching package: 'plyr'
The following objects are masked from 'package:dplyr':
arrange, count, desc, failwith, id, mutate, rename, summarise,
summarize
The following object is masked from 'package:purrr':
compact
Loading required package: Matrix
Attaching package: 'Matrix'
The following objects are masked from 'package:tidyr':
expand, pack, unpack
library(ordinal)
Attaching package: 'ordinal'
The following object is masked from 'package:doBy':
income
The following object is masked from 'package:dplyr':
slice
library(lmerTest)
Attaching package: 'lmerTest'
The following object is masked from 'package:lme4':
lmer
The following object is masked from 'package:stats':
step
library(formattable)library(here)
here() starts at /Users/rose/Downloads/UCSD
Attaching package: 'here'
The following object is masked from 'package:plyr':
here
library(dplyr)library(aod)library(car)
Attaching package: 'car'
The following object is masked from 'package:dplyr':
recode
The following object is masked from 'package:purrr':
some
library(RVAideMemoire)
*** Package RVAideMemoire v 0.9-83-7 ***
Attaching package: 'RVAideMemoire'
The following object is masked from 'package:lme4':
dummy
The following object is masked from 'package:modelr':
bootstrap
library(cowplot)
Attaching package: 'cowplot'
The following object is masked from 'package:ggthemes':
theme_map
The following object is masked from 'package:lubridate':
stamp
library(gridExtra)
Attaching package: 'gridExtra'
The following object is masked from 'package:dplyr':
combine
library(egg)#### Import datazarpex_kids <-read.csv("/Users/rose/Downloads/UCSD/R/Data/zarpex_clean_kid.csv")zarpex_adult <-read.csv("/Users/rose/Downloads/UCSD/R/Data/zarpex_clean_adult.csv")#### Data exclusion / filtering#most of the data is clean, but they didn't remove one kid that didn't answer most of the Qszarpex_kids <- zarpex_kids[-42, ]#whew, ok! now, removing irrelevant columnszarpex_kids <-subset(zarpex_kids, select =-c(Progress))zarpex_kids <-subset(zarpex_kids, select =-c( privacy))zarpex_kids <-subset(zarpex_kids, select =-c(trainingnose_1, trainingabc_1,soundcheck_1))zarpex_kids <-subset(zarpex_kids, select =-c(Date.of.Study, Q_TotalDuration, language))#loading in adult data and giving it a more uniform namezarpex_adults <- zarpex_adultrm(zarpex_adult)#now, creating a column that shows age groupzarpex_kids$age_group ="child"zarpex_adults$age_group <-"adult"#error because of character names.. hmmzarpex_adults$ID <-as.character(zarpex_adults$ID)zarpex_kids$ID <-as.character(zarpex_kids$ID)full_data =full_join(zarpex_adults, zarpex_kids, by=NULL)
#yay! now we want to make the language form and causal origin factor values not character...full_data$lang_form <-as.factor(full_data$lang_form)full_data$origin <-as.factor(full_data$origin)
#now we'll report some demographicssummary(zarpex_kids$age_exact)
Min. 1st Qu. Median Mean 3rd Qu. Max.
4.495 5.221 5.871 6.065 6.830 7.954
#and for adults...zarpex_adults$gender <-as.factor(zarpex_adults$gender)zarpex_adults$race <-as.factor(zarpex_adults$race)zarpex_adults$wnw <-as.factor(zarpex_adults$wnw)summary(zarpex_adults$gender)
female male Non-binary/third gender
139 93 2
summary(zarpex_adults$race)
Asian Black Mixed Native American White
62 17 22 1 119
NA's
13
summary(zarpex_adults$wnw)
nw w NA's
114 119 1
#No exclusions based on comp checks, but let's look anyways just for funsieszarpex_kids$sabcheck_1 <-as.factor(zarpex_kids$sabcheck_1)summary(zarpex_kids$sabcheck_1)
#### Prepare data for analysis - create columns etc.#Now we make the composite dataframe...full_compo <-select(full_data, -c("soundcheck_1", "soundcheck_2", "trainingabc_1", "trainingnose_1", "sabcheck_1", "sabcheck_2", "homo_1", "homo_2", "ra_1", "ra_2", "ft_1", "ft_2", "wave", "wino_1", "wino_2", "wino_3", "wino_4", , "wino_5"))#and we gotta make that mf longcompo_long <-pivot_longer( full_compo,cols =c(5:12),names_to =c("task", "item"),names_sep = ("_"),values_to ="response")#and let's group by ID and break items into sequencecompo_long <- compo_long %>% dplyr::group_by(ID) %>% dplyr::mutate(item =seq.int(1:8))
Confirmatory Analysis
#now I want to see the sums and averages for funsum(zarpex_kids$explain_1, na.rm =TRUE)
#now just peeking around at what people did on this measure, just for funzarpex_kids$explain_1 <-as.factor(zarpex_kids$explain_1)summary(zarpex_kids$explain_1)
#model time.. i am frightened but khuyen helped!#let me try to work through the code here ... glmer is the model, response is the variable we're looking at. we're taking lang form, origin, and age group, looking at everything - throwing the kitchen sink at it. the item and ID are telling it to account for variances between items and individuals. then we tell it what data to use, and what family - binomial tells it that the values are 0 and 1. then is some extra stuff that the authors recommended.comp_model <-glmer(response ~ lang_form*origin*age_group + (1|item) +(1|ID), data = compo_long, family = binomial, glmerControl(optimizer ="bobyqa", optCtrl=list(maxfun=2e5)))Anova(comp_model)
#ok awesome! That was for the full sample, now let's run it for kiddoscomp_model_kid <-glmer(response ~ lang_form*origin*center_age + (1|item) +(1|ID), data =subset(compo_long, age_group =="child"), family = binomial, glmerControl(optimizer ="bobyqa", optCtrl=list(maxfun=2e5)))summary(comp_model_kid)
#aaaaand for adults, but let's remove the center age stuff since they're all adultscomp_model_adult <-glmer(response ~ lang_form*origin + (1|item) +(1|ID), data =subset(compo_long, age_group =="adult"), family = binomial, glmerControl(optimizer ="bobyqa", optCtrl=list(maxfun=2e5)))summary(comp_model_adult)
Generalized linear mixed model fit by maximum likelihood (Laplace
Approximation) [glmerMod]
Family: binomial ( logit )
Formula: response ~ lang_form * origin + (1 | item) + (1 | ID)
Data: subset(compo_long, age_group == "adult")
Control: glmerControl(optimizer = "bobyqa", optCtrl = list(maxfun = 2e+05))
AIC BIC logLik deviance df.resid
2264.5 2297.6 -1126.2 2252.5 1839
Scaled residuals:
Min 1Q Median 3Q Max
-2.8613 -0.7492 -0.3939 0.8231 2.7938
Random effects:
Groups Name Variance Std.Dev.
ID (Intercept) 0.4109 0.641
item (Intercept) 0.8190 0.905
Number of obs: 1845, groups: ID, 234; item, 8
Fixed effects:
Estimate Std. Error z value Pr(>|z|)
(Intercept) 0.1948 0.3467 0.562 0.574201
lang_formspecific -0.6366 0.1869 -3.406 0.000659 ***
originsoc -0.4081 0.1929 -2.115 0.034445 *
lang_formspecific:originsoc 0.2185 0.2692 0.812 0.416917
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Correlation of Fixed Effects:
(Intr) lng_fr orgnsc
lng_frmspcf -0.275
originsoc -0.266 0.495
lng_frmspc: 0.190 -0.692 -0.716
#ok now for some task-specific analysis without the composite, so we gotta prep the full datasetfull_longo <-pivot_longer( full_data,cols =c(6:23),names_to =c("task", "item"),names_sep = ("_"),values_to ="response")full_longo$item <-as.numeric(full_longo$item)full_longo$ID <-as.factor(full_longo$ID)full_longo$age_group <-as.factor(full_longo$age_group)#let's see how participants did on average and by taskexplanation_means <-summarySE(subset(full_longo, task =="explain"), measurevar ="response", groupvars =c("lang_form", "age_group"), na.rm =TRUE)sab_means <-summarySE(subset(full_longo, task =="sab"), measurevar ="response", groupvars =c("lang_form", "age_group"), na.rm =TRUE)sab_means$age_group <-factor(sab_means$age_group, levels =c("child", "adult"))#now trying to make a model looking at effect of language form on heritability task in kidssab_model_kiddo <-glmer(response ~ lang_form*origin*center_age + (1|ID), data =subset(full_longo, task =="sab"& age_group =="child"), family =binomial(logit), glmerControl(optimizer ="bobyqa", optCtrl=list(maxfun=2e5)))
Author’s Note: In the interest of time, I will report the results based on the analyses using the composite essentialism scores. I ran models showing that children’s performance was similar across tasks, but there was variability in adults’ task-specific performance as illustrated in the original paper.
Children were significantly more likely to choose essentialist responses in the generic language condition (M = 0.61, 95% CI [0.57, 0.65]) than in the specific language condition (M = .47, 95% CI [0.43, 0.50]) (main effect of language form, Wald X2(1) = 31.78, p < .001). There was a significant interaction between language form and mean-centered age (Wald X2(1) = 10.59, p < .001), such that the heightening effect of generic language on essentialism increased with age. In contrast, age did not have any meaningful effect on children’s responses as a result of hearing specific language. There were no significant effects of causal origin.
Adults were also more likely to choose essentialist responses after hearing characteristics described using generic language (M = 50, 95% CI [0.47, 0.53] ) than after hearing specific language (M = .39, 95% CI [0.36, 0.43]) (main effect of language form, Wald X2(1) = 15.52, P < .001). Unlike in the child sample, there was also a main effect of causal origin such that adults were significantly more likely to choose essentialist responses when the causal origin was biological (M = .47, 95% CI [0.44, 0.51]) versus cultural (M = .42, 95% CI [0.38, 0.45]) (Wald X2(1) = 4.82, P < .005). There was no significant interaction between language form and causal origin.
Figures
#let's visualize how kids did with the whole composite, not getting into task-specific, just looking at the main effects and age trendsggplot(data = comp_kid_means,mapping =aes(x = age_exact,y = response,color = lang_form)) +geom_point() +stat_smooth(aes(group = lang_form), method ="glm", level =0.95) +ggtitle("Language Form and Essentialism") +labs(x ="Child Age (exact)", y ="% Providing Essentialist Response",color ="Language Form") +scale_color_manual(values =c("purple", "sky blue"),labels =c("Generic", "Specific")) +theme(plot.title =element_text(hjust =0.5),legend.position =c("bottom"), legend.direction ="horizontal")
`geom_smooth()` using formula = 'y ~ x'
Warning: Removed 1 row containing non-finite outside the scale range
(`stat_smooth()`).
Warning: Removed 1 row containing missing values or values outside the scale range
(`geom_point()`).
Warning: Removed 78 rows containing non-finite outside the scale range
(`stat_summary()`).
Removed 78 rows containing non-finite outside the scale range
(`stat_summary()`).
Removed 78 rows containing non-finite outside the scale range
(`stat_summary()`).
Exploratory Analyses
This paper included two exploratory measures that were not included in the paper or preregistration: resource allocation (RA) and feelings thermometer (FT) to measure intended behavior towards and feelings about group members. As RA was a binary measure like the tasks used in our composite and and FT was not, I only analyzed the RA measure. There were no significant main effects or interactions.
#let's look at their exploratory measures. first, let's see what all measures we have:table(full_longo$task)
explain flex homo ra sab sabcheck
1302 1302 868 868 868 868
soundcheck trainingabc trainingnose
868 434 434
#ah yes - I remember, I dropped one exploratory measure because it was not binary, and this project makes it difficult to look at likert-scale data. so let's just look at resource allocation (RA)ra_means <-summarySE(subset(full_longo, task =="ra"), measurevar ="response", groupvars =c("lang_form", "age_group"), na.rm =TRUE)ra_model_kiddo <-glmer(response ~ lang_form*origin*center_age + (1|ID), data =subset(full_longo, task =="ra"& age_group =="child"), family =binomial(logit), glmerControl(optimizer ="bobyqa", optCtrl=list(maxfun=2e5)))
#ok - I don't know what to do about singularity, or what that means. But it looks like there's nothing of significance.ggplot(data = full_longo %>%filter(task =="ra"),mapping =aes(y = response,x = lang_form,fill = age_group,shape = origin)) +ylim(0, 1) +labs(x ="Language Form", y ="% Providing Essentialist Response",fill ="Age Group",shape ="Causal Origin") +ggtitle("Children's and Adults' Essentialist Responses in Resouce Allocation Task") +stat_summary(fun ="mean",geom ="bar",position =position_dodge(width = .9),color ="black") +stat_summary(fun.data ="mean_cl_boot",geom ="pointrange",position =position_dodge(width = .9))
Warning: Removed 15 rows containing non-finite outside the scale range
(`stat_summary()`).
Removed 15 rows containing non-finite outside the scale range
(`stat_summary()`).
Discussion
Summary of Reproduction Attempt
Open the discussion section with a paragraph summarizing the primary result from the confirmatory analysis and the assessment of whether it replicated, partially replicated, or failed to replicate the original result.
In summary, this study investigated whether varying linguistic form (generic vs. specific language) and causal origin (biological vs. cultural) to describe the properties of a novel social group will lead children and adults to essentialize said group. Social essentialism was measured using a series of tasks, three of which were combined to generate a composite score and mutated to represent a proportion of essentialist responses given.
My analyses using this composite found that children were significantly more likely to endorse essentialist responses after hearing generic language to describe group-specific features, and that this effect strengthens with age in a sample of children ages 4-8. I observed no significant effect of specific language on children’s essentialist responses, no main effect of the causal origin manipulation, and no interaction between linguistic form and causal origin. The strengthening effect of generic language on essentialism was significant in the adult sample as well; however, while causal origin did not affect children’s responses, the effect of biological causal origin was highly significant in the adult population. I was pleased to find that all results replicated the original results completely, and I was able to successfully recreate the authors’ visualizations of this data.
Given how tidy and well-annotated the open-source data was (and the fact that the authors made their code available as well), I had high hopes for this reproduction attempt. As this study served as the inspiration for my first-year PhD project and represents the type of developmental dataset that I will realistically have to analyse for my own work, I was highly motivated to understand the mechanisms of the reproduction and did not just blankly regurgitate the original code. However, the authors’ provided code did serve as a great foundation for which packages to use, reference points for when I got stuck, and guidance when I simply did not know how to start. I took advantage of ChatGPT, Blackbox AI, and the TAs to help me intepret the authors’ original code, including figuring out what it was doing, what it meant, and how it built to the holistic interpretation of the data. Throughout the project (especially in visualization), I referenced many of our in-class R exercises, which provided a helpful foundation and code snippets that were proven to work and that I could adapt to fit my needs. I had to really hone my troubleshooting skills throughout this process, and many of my most frustrating errors were syntactical in nature. Author’s note: Thank you to Khuyen and Janna, whose clever eyes caught many the typo and missed dash, and gracefully handled many of my “IT’S NOT WORKING!” exclamations in class (and death-glares at my laptop screen)
Commentary
Add open-ended commentary (if any) reflecting (a) insights from follow-up exploratory analysis, (b) assessment of the meaning of the replication (or not) - e.g., for a failure to replicate, are the differences between original and present study ones that definitely, plausibly, or are unlikely to have been moderators of the result, and (c) discussion of any objections or challenges raised by the current and original authors about the replication attempt. None of these need to be long.
Commentary: Progress Report 1
I began by loading up all of the libraries needed according to the OSF files. As it stands, data is fairly clean and well-documented, broken into separate files for adult and child data. I have removed the two child subjects who did not answer the majority of the questions, as they were not included in the primary analyses. No exclusions are necessary from the clean adult data. I then made a new data frame excluding some columns that were not relevant, such as time taken to complete the study and level of media release for study recordings, just to tidy things up.
Then I made a new column in each data frame for age group (adult vs. child), and then combined my two data frames using “full join.” I then changed various columns from character to factor variables for analysis: language and causal origin condition, gender, race, response to comprehension checks. I ran a summary for each demographic factor to report below, and ran summaries of the comprehension checks to get a sense of how they were doing, even though the authors did not exclude participants based on this.
My next step is to generate the composite; I’ve worked to make a new data frame with the relevant columns for the composite, but I need to think about how best to proceed.
Commentary: Progress Report 2
First, I did some peeking at how participants did on the measures just to get a sense of the data and to get more practice generating sums and averages. I then constructed the dataframe for the essentialism composite (and made it long!). Next, with the help of Khuyen during office hours, I ran a Generalized Linear Mixed-Effects Model using the glmr function specified by the authors. I then used an Anova on the model. I used ChatGPT to help me interpret the results and figure out how to make sense of the numbers and…. they reproduced! The numbers match the authors’ completely and the interpretation of the results is identical. Now that I knew I was on the right track, I pulled the child sample from the full composite and ran the same GLMM and Anova. Yet another reproduction! Then, I did the same with the adult subset of the composite. Now, all of the models have been run, and the numbers match the authors’ – woohoo! Next I’ll need to do visualizations, and I hope that tomorrow’s reading and this week’s classes will help with that.
Commentary: Progress Report 3
I ran a bunch more models and got to work on my visualizations, something I was really scared of. I attended Pria’s talk on LMMs, and it was so helpful in understanding what I’m doing and what it means! In response to prior progress checks, I finally included my code in this progress report (sorry Khuyen!) and added explanation on the linguistic form manipulation. Some of my code isn’t working and I don’t know what’s up with that honestly - it’s working in my QMD, so I am about to go postal. Update: it was a missing asterisk
Final Commentary
Going into this project, I picked a paper that I have been obsessed with for a while now and think is beautifully done. I expected to find major problems and bones to pick with the authors as I worked to reproduce the project, but that was not the case - I still highly respect the work and have no major objections. The authors set me up nicely for my reproduction and I think this is a great example of how open science should be done - clean and generally well-documented data, easily accessible reference materials for the tasks, and well-annotated code (with one funny chunk of documentation about how R was randomly not working and refusing to process something, and they had to do this convoluted work-around).
Given that, I found a couple things in the data that are not super well documented, but they seem to be minor and have no obvious implications for the paper’s findings. One of the exclusion criteria relates Winograd Schema questions, but the authors never explain what those are or why they’re in the paradigm, and they aren’t used in any of the analyses. After googling it, I think it’s to collect a baseline of how adults interpret generic language ambiguity? I’m not sure, and I am curious. There is also a column in the data reporting political affiliation, with no mention to how that data was collected, how liberal/conservative was determined, and this was not used in any of the analyses.
Besides these minor data confusions, I do have one point of uncertainty in the paradigm: the essentialism measures. The use of the “composite essentialism score” makes sense to me, but having one task that does not fall into that (the generalizability task, where participants are shown a quality of the social group and asked how many other group-members also have that quality using a Likert scale) feels a bit disjointed, and I didn’t know how best to handle that one measure. Maybe adapting the Likert scale into a binary (something like, less than half of the other group members = 0, more than half = 1) would allow this measure to be included in the composite.
Finally, while I categorize this attempt as a success, I recognize that I did not do much of the additional analysis that I had initially planned in my key reproducibility criteria. I underestimated how much learning I had to do and how much work actually needed to be done to cover the main analyses, and how carefully I had to work to make the figures. I did do some exploratory analysis and visualization in the resource allocation measure, but that brought me to my limit.