start_total <- Sys.time()

knitr::opts_chunk$set(
  echo = TRUE,        # code goes in collapsible tab
  eval = FALSE,
  results = "hide",   # hide printed output
  message = FALSE,
  warning = FALSE
  )

Introduction

I ran a lot of analyses for the gender stereotypes pilot study. This documents brings them all together. My goal is to get feedback on what should be and not be included in our manuscript. Some analyses may be irrelevant or uninteresting, and their opposite may be missing.

The analysis plan is described in the next subsection. The remaining sections implement the full data preparation and analysis pipeline.

The analyses section is the bulk of this document. It is divided into three sets of analyses, including Measures Analyses, Depicted Gender and Strategies Analyses, and Interest and Efficacy Analyses. Each of these sets of analyses includes subsets of analyses and, within those, individual analyses.

The results differ slightly though nonsubstantively from those presented previously. This is because (i) the current analyses incorporate our full sample of 418 participants, rather than the first 400 as was done previously; and (ii) one participant with “other” gender, who was previously included in the analyses, has been excluded.

I have attached a sample “analysis plan” statement for a manuscript.

Analysis Plan

The data was analyzed with Bayesian generalized linear mixed models via brms (Bürkner, 2017) in R. Bayesian inference quantifies the credibility of hypotheses by allocating probability mass based on prior beliefs and data (Gelman et al., 2013). We used Bayesian inference for two reasons. First, Bayesian regression provides an extremely flexible framework for inference via queries of joint posterior quantities, like expected values (as we used, here, via epred_draws from tidybayes; Kay, 2022). This simplifies analysis because each model can answer a significantly broader array of questions than an analogous frequentist model (e.g., tests against chance, any contrast of interest including with recoded parameters, any conditional and marginal prediction, probable effect direction, practical equivalence, etc.). This obviates the procedural patchwork characteristic of frequentist workflow (e.g., fitting and refitting many models, sometimes with multiple corrections, bootstrapping, etc.). Second, our use of weakly to moderately informative prior distributions aided our models’ convergence while discouraging large estimates unless warranted by the data.

Our analysis reports the shape of the posterior distributions of key quantities in two ways. First, we investigated our directional hypotheses by computing the posterior probability that an effect takes one direction, rather than the other, relative to a comparison value. We averaged these posterior probabilities of direction across the posterior distributions of the remaining model variables. Thus, because they are averaged across other variables, our posterior probabilities account for more uncertainty than would the commonly reported p-values that are conditional on a single value (i.e., the maximum likelihood estimate). Our posterior probabilities represent the probability that an effect takes the hypothesized direction, given the model, data, and averaging performed. Second, we report the triplet of posterior quantities [lower bound of the 95% highest density interval (HDI), posterior mean, upper bound of the 95% HDI]. When describing posterior quantities, the triplet values are on the probability scale, range from 0 to 1, and chance is .5. When describing differences between posterior quantities (i.e., contrasts), triplet values may be negative and chance is 0.

For details about the analysis plan, inference criteria, sampler characteristics, convergence diagnostics and posterior predictive checks (all adequate per Gelman et al., 2013), and model marginal posterior parameters, see the Supplementary Material. Code and data to reproduce all analyses and all data and results Figures are available at the following OSF link (anonymized: XXX).

Data Preparation

Globals

# globals ####
library(bayesplot) # plots
library(bayestestR) # hdi, posterior analysis
library(brms) # models
library(flextable) # tables
library(ggdist) # plots
library(glue) # plots
library(ggtext) # plots
library(janitor) # clean_names
library(officer) # tables
library(parallel) # detectCores
library(patchwork) # patchworks
library(posterior) # posterior analysis
library(scales) # plots
library(tidybayes) # posterior analysis
library(tidyverse) # data wrangling

n_draws_posterior <- 2000 # draws from epred_draws and posterior_predict
length_out <- 25 # bins for continuous variable prediction
set.seed(10003)
options(contrasts = c("contr.treatment", "contr.poly"))
prior_mod <- c(set_prior("normal(0, 1)", class = "Intercept"),
               set_prior("normal(0, 0.80)", class = "b"),
               set_prior("normal(0, 0.30)", class = "sd"),
               set_prior("normal(0, 0.15)", class = "sd", group = "stim_race"),
               set_prior("lkj(2)", class = "cor"))

func_mod <- function(formula,
                     data, 
                     family,
                     priors = NULL,
                     inits = NULL,
                     adapt_delta = 0.9,
                     iter = 3000) {
  brm(formula = formula,
      data = data,
      family = family,
      prior = priors,
      chains = 7, # n hmc chains (7 = n - 1 cores on JV's machine),
      control = list(adapt_delta = adapt_delta),
      cores = detectCores() - 1,
      backend = "cmdstanr",
      iter = iter,
      init = inits,
      seed = 10003 # reproducible sampler
  )
}

Data Import

domain      <- read_csv("data/domain_data.csv") # import domain data
gender      <- read_csv("data/gender_data.csv") # import gender data
processing1 <- read_csv("data/report_processing_session1.csv") # import processing report data (session 1; n = 472 participants in first session)
processing2 <- read_csv("data/report_processing_session2.csv") # import processing report data (session 2; n = 335 participants in second session)
study       <- read_csv("data/report_study.csv") %>%  # import study report data
  clean_names()

Data Cleaning

func_processing <- function(df){
  df %>% 
    set_names(.[1, ]) %>% # first row to column name
    slice(-1) %>% # remove first row
    clean_names() %>% # clean names
    mutate(date_of_study = mdy(date_of_study)) # convert date to mdy
}

processing1_interim1_valid_and_invalid <- func_processing(processing1)
processing1_interim1_valid_and_invalid %>% group_by(valid) %>% summarize(n = n()) # n = 423

processing1_interim1_valid_and_invalid %>% # exclusions
  filter(valid == "Invalid") %>%
  select(complete_qualtrics, notes) %>% 
  mutate(complete_qualtrics = case_when(
    str_detect(complete_qualtrics, "%") ~ str_remove(complete_qualtrics, "%"),
    T ~ NA),
    complete_qualtrics = as.numeric(complete_qualtrics),
    exclusion_reason = case_when(
      complete_qualtrics < 50 | is.na(complete_qualtrics) ~ "incomplete data", 
      str_detect(notes, "video") | str_detect(notes, "off") ~ "no_video",
      T ~ "other")) %>% 
  group_by(exclusion_reason) %>% summarize(n = n(), .groups = "drop") %>% 
  mutate(total_exclusions = sum(n))

processing1_interim1 <- processing1_interim1_valid_and_invalid %>% filter(valid == "Valid") # check validity from processing report

processing2_interim1_valid_and_invalid <- func_processing(processing2) %>% rename(id = sid)
processing2_interim1_valid_and_invalid %>% group_by(valid) %>% summarize(n = n())
processing2_interim1 <- processing2_interim1_valid_and_invalid %>% filter(valid == "Valid") # check validity from processing report

processing_interim2 <- processing1_interim1 %>%
  bind_rows(processing2_interim1)

length(unique(processing_interim2$id)) # unique ID's with valid session 1 and session 2 data
processing_interim2 %>% 
  group_by(id) %>%
  arrange(date_of_study) %>%
  mutate(session = row_number()) %>%
  ungroup() 
processing_interim2 %>%
  group_by(id) %>%
  arrange(date_of_study) %>%
  mutate(session = row_number()) %>%
  ungroup() %>%
  group_by(session) %>%
  summarize(n = n())

func_data <- function(df, label){
  original_n <- nrow(df)
  
  df <- df %>% 
    set_names(.[1, ]) %>% # first row to column names
    slice(-c(1,2)) %>% # remove first two rows
    clean_names() # formatted names
  
  parsed_df <- df %>%
    mutate(start_time = str_sub(start_date, nchar(start_date) - 7, nchar(start_date)) %>% hms(),
           start_date = as.Date(str_sub(start_date, end = -10)),
           progress = as.numeric(progress))
  
  min_date <- min(processing_interim2$date_of_study, na.rm = TRUE)
  prelaunch <- parsed_df %>% filter(start_date < min_date)
  ip_filtered <- parsed_df %>% filter(response_type != "IP Address")
  incomplete <- parsed_df %>% filter(progress <= 50)
  
  excluded <- bind_rows(
    prelaunch %>% mutate(reason = "Pre-launch"),
    ip_filtered %>% mutate(reason = "Non-IP address"),
    incomplete %>% mutate(reason = "Progress ≤ 50%")) %>%
    distinct(id, .keep_all = TRUE)
  
  final <- parsed_df %>%
    filter(start_date >= min_date,
           response_type == "IP Address",
           progress > 50) %>%
    arrange(desc(progress)) %>%
    distinct(id, .keep_all = TRUE)
  
  return(final)
}

domain_interim1 <- func_data(domain, "domain_interim1")
gender_interim1 <- func_data(gender, "gender_interim1")

study_interim <- study %>% filter(is_valid_case == "Yes") # exclude invalids from study report
study %>% filter(is_valid_case == "No") # check invalids (n = 42)

func_pivot <- function(df, type){
  if(type == "outcome"){
    df %>% 
      pivot_longer(cols = starts_with("q_"), names_to = "question", values_to = "response") %>% # long data
      filter(!question %in% c("q_consent10_yes", "q_consent10_no")) # remove consent rows
  }
  else if(type == "timing"){
    df %>% 
      select(id, ends_with(c("_onset", "offset"))) %>%  # remove unneeded columns
      pivot_longer(cols = ends_with(c("_onset", "offset")), names_to = "measure", values_to = "time") %>% # long data
      filter(!measure %in% c("consent_onset", "consent_offset")) # remove consent rows
  }
}

domain_interim2 <- domain_interim1 %>% # apply func_filter
  filter(id %in% study_interim$id) %>% # remove id's not in study_interim
  select(id, start_date, start_time, progress, domain_condition, starts_with("q_")) %>% # remove unneeded columns
  func_pivot("outcome") %>% # long response data
  rename(condition = domain_condition) # rename for parity with gender df (enables bind_rows)

gender_interim2 <- gender_interim1 %>% # apply func_filter
  filter(id %in% study_interim$id) %>% # remove id's not in study_interim
  select(id, start_date, start_time, progress, starts_with("q_")) %>% # remove unneeded columns
  func_pivot("outcome") %>% # long response data
  mutate(condition = "gender") # create column for parity with domain (enables bind_rows)

domain_interim2_timing <- domain_interim1 %>% # apply func_filter
  filter(id %in% study_interim$id) %>% # remove id's not in study_interim
  func_pivot("timing") # long timing data

gender_interim2_timing <- gender_interim1 %>% # apply func_filter
  filter(id %in% study_interim$id) %>% # remove id's not in study_interim
  func_pivot("timing") # long timing data

# create cleaned df (split into several sub-df's for interpretability)
domain_gender1 <- bind_rows(domain_interim2, gender_interim2) %>% 
  mutate(question = str_remove(question, "q_")) %>% # remove leading q_ from question var
  separate(question, c("q_type", "question"), sep = "_", extra = "merge") %>% # create q_type from question prefix
  separate(question, c("q_subtype", "question"), sep = "_", extra = "merge") %>%  # create q_subtype from question prefix
  filter(id %in% processing1_interim1$id) # include only participants included in the final sample of session 1 (or else we get a dataset with n = 436 due to the bind_rows, above, and not the desired 423)

domain_gender2 <- domain_gender1 %>% 
  mutate(across(c(id, condition, response, q_type, q_subtype), ~as.factor(.)), # factor coercion
         q_subtype = fct_recode(q_subtype, "asian_boy" = "ab", "asian_girl" = "ag", "black_boy" = "bb", "black_girl" = "bg", "hispanic_boy" = "hb", "hispanic_girl" = "hg", "white_boy" = "wb", "white_girl" = "wg"), # expand factor levels (readability)
         condition = case_when(
           !q_type %in% c("gen", "dom") ~ NA, # relabel factor condition var (first line is equivalent to q_type %in% c("efficacy", "interest"), NA, ...)
           str_detect(question, "other") ~ "unknown", # label unknown vs known condition
           condition %in% c("one choice and something else", "two choices") ~ "known", # label known condition
           q_type == "gen" ~ "gender", 
           T ~ NA), # label gender condition
         response = case_when(
           response == "Off" ~ NA, # relabel non-selections with NA (subsequently dropped)
           response == "On" & q_type %in% c("dom", "gen") ~ paste0(str_extract(question, "(?<=_)[^_]+$")), # relabel responses by pasting suffix from question var
           response == "On" & q_type %in% c("efficacy", "interest") ~ paste0(question), 
           T ~ NA), # relabel responses for question %in% c("efficacy" and "interest)
         question = case_when(
           q_type %in% c("efficacy", "interest") ~ NA, # relabel question interest/efficacy with NA
           T ~ paste0(question)), 
         comparison = case_when(
           str_detect(question, "math_read") ~ "read_vs_math", # create comparison var, label read vs math comparison
           str_detect(question, "read_math") ~ "read_vs_math", # create comparison var, label read vs math comparison
           str_detect(question, "art_sci") ~ "science_vs_art", # label science vs art comparison
           str_detect(question, "sci_art") ~ "science_vs_art", # label science vs art comparison
           str_detect(question, "other") ~ paste0(str_extract(question, "^[^_]+"), "_vs_other"), # label domain_X vs other comparisons (dynamically via regex)
           T ~ NA), 
         stim_gender = case_when(
           str_detect(q_subtype, "girl") ~ paste0(str_extract(q_subtype, "(?<=_)[^_]+$")), # create and label stim gender for known and unknown trials (paste q_subtype suffix)
           str_detect(q_subtype, "boy") ~ paste0(str_extract(q_subtype, "(?<=_)[^_]+$")),
           T ~ NA), 
         stim_race = case_when(
           str_detect(q_subtype, "girl") ~ paste0(str_extract(q_subtype, "^[^_]+")),
           str_detect(q_subtype, "boy") ~ paste0(str_extract(q_subtype, "^[^_]+")),
           T ~ NA), # create and label stim race for known and unknown trials (past q_subtype prefix)
         q_subtype = case_when(
           str_detect(q_subtype, "girl") ~ NA, # replace q_subtype == race_gender values with NA
           str_detect(q_subtype, "boy") ~ NA, 
           T ~ paste0(q_subtype)),
         question = case_when(
           q_type == "gen" ~ str_remove(question, "_(?<=_)[^_]+$"), # remove suffix from question var for q_type == "gen" (this was pasted into the response var above, via syntax for handling "On")
           T ~ question), 
         stim_race = case_when(
           q_type == "gen" ~ paste0(str_extract(question, "(?<=_)[^_]+$")),
           T ~ stim_race))  # create and label stim race for q_type == "gen" (paste suffix)

domain_gender3 <- domain_gender2 %>% 
  group_by(id) %>% # group operations by participant
  mutate(interval = max(start_date) - min(start_date), # compute interval between test sessions (in days)
         session = as.factor(ifelse(max(start_date) > min(start_date) & start_date > min(start_date), 2, 1))) %>%  # label session number
  ungroup() %>% # remove grouping factor
  drop_na(response) %>%  # drop NA's from response (drastically reduces row count)
  group_by(id) %>% 
  mutate(missing_data = ifelse(!(any(interval > 0 & !any(session == 2))), "no", "yes"),
         interval = ifelse(missing_data == "no", interval, 0)) %>%
  ungroup()

domain_gender4 <- domain_gender3 %>% 
  left_join(study_interim %>% select(id, gender, birthday), by = "id") %>% # add participant gender and birthday data
  mutate(age = interval(birthday, start_date) / years(1), # create participant age var
         gender = as.factor(tolower(gender)), # lowercase gender
         response_interest = ifelse(q_type == "interest", paste0(response), NA), # create new response var for interest questions (for ordinal regression)
         response_interest = factor(response_interest,
                                    levels = c("not_at_all", "a_little", "some", "a_lot"),
                                    ordered = T), # create ordered response var for interest questions
         response_efficacy = ifelse(q_type == "efficacy", paste0(response), NA), # create new response var for efficacy questions (for ordinal regression)
         response_efficacy = factor(response_efficacy,
                                    levels = c("not_good_at_all", "a_little_good", "good", "very_good"),
                                    ordered = T), # create ordered response var for efficacy questions
         response =          ifelse(q_type %in% c("interest", "efficacy"), NA, paste0(response)), # replace response labels with NA for interest and efficacy questions
         response_stereotype = case_when(stim_gender == "girl" & response %in% c("math", "science") ~ "inconsistent", # gender stereotype response labels
                                         stim_gender == "boy" & response %in% c("read", "art") ~ "inconsistent", 
                                         q_type %in% c("interest", "efficacy") ~ NA,
                                         comparison %in% c("read_vs_other", "art_vs_other") & stim_gender == "girl" & response == "other" ~ "inconsistent", # pay attention -- accidentally called this "reading_vs_other" (instead of "read_vs_other") and it *totally* messed everything up
                                         comparison %in% c("sci_vs_other", "math_vs_other") & stim_gender == "boy" & response == "other" ~ "inconsistent", # pay attention -- accidentally called this "science_vs_other" and it *totally* messed up
                                         q_subtype %in% c("math", "sci") & response == "girl" ~ "inconsistent",
                                         q_subtype %in% c("art", "read") & response == "boy" ~ "inconsistent",
                                         T ~ "consistent"),
         response_stereotype = fct_relevel(response_stereotype, "inconsistent"), # base outcome = inconsistent, so estimate probability of consistent
         question = str_remove(question, "_(?<=_)[^_]+$"), # remove suffix from question var
         question = as.factor(ifelse(question %in% c("bb_bg", "wb_wg", "ab_ag", "hb_hg", "bh_hg"), "boy_girl", # relabel question var with "boy_girl" for any races X,Y in Xb_Yg (including typo "bh_hg)
                                     ifelse(question %in%    c("bg_bb", "wg_wb", "ag_ab", "hg_hb"), "girl_boy", question))), # relabel question var with "girl_boy" for for any races X,Y in Xg_Yb
         comparison = ifelse(question %in% c("boy_girl", "girl_boy"), paste0(question), comparison),
         across(where(is.character), as.factor), # coerce all character vars to factors
         across(where(is.factor), ~ fct_drop(.))) %>%  # drop unused levels (all factors)
  select(-c(progress, birthday)) %>%  # remove progress and birthday vars
  rename(age_raw = age)

# domain_gender4 %>% distinct(gender)
# domain_gender4 %>%
#   filter(gender != "o", !is.na(comparison)) %>% count(gender, stim_gender, q_subtype, comparison, response, response_stereotype) %>% view

sequencing <- domain_gender4 %>% # label participants in domain and gender conditions of session 1 by participation date/time
  filter(!q_type %in% c("efficacy", "interest")) %>% # only care about q_type %in% c("dom", "gen")
  group_by(session) %>% # group operations by session number
  distinct(id, .keep_all = T) %>% # distinct id's, per session
  relocate(session, .before = everything()) %>% relocate(start_date, .after = session) %>% relocate(start_time, .after = start_date) %>% relocate(q_type, .after = session) %>%  # readability
  group_by(session, q_type) %>% # group ops by session and q_type
  arrange(session, q_type, start_date, start_time) %>% # sequence rows by session --> date --> time (q_type a factor, so only aids readability)
  mutate(id2 = row_number()) %>% # label id2's by temporal sequence
  relocate(id2, .before = id) %>% # readability
  ungroup() # drop grouping

sequencing %>% group_by(session, q_type) %>% summarize(n = n())

domain_gender5 <- domain_gender4 %>% 
  left_join(sequencing %>% select(id, session, id2), by = c("id", "session")) %>% # incorporate sequenced id var
  arrange(start_date, start_time, session, q_type) %>% # sequence rows by session --> date --> time (q_type a factor, so only aids readability)
  mutate(gender = factor(ifelse(gender == "o", NA_character_, paste0(gender)))) # NOT filtering bc we can include this participant in any analyses that do not involve gender (filter out otherwise)

domain_gender5 %>% group_by(session, q_type) %>% distinct(id, .keep_all = T) %>% summarize(n = n()) # distinct participants per session per q_type

# first200 <- domain_gender5 %>%
#   filter(session == 1, gender != "o") %>% # include only session 1
#   group_by(q_type) %>% # perform operations by group
#   filter(id2 <= 201) %>% # get 400 participants in total (there were only 199 participants in "gen", so we get 201 in dom)
#   ungroup() %>% # ungroup
#   mutate(id = fct_drop(id)) # drop ghost levels

# first200$id # check id's (= 400 total)
# first200 %>% group_by(session, q_type) %>% distinct(id, .keep_all = T) %>% summarize(n = n())

domain_gender0 <- domain_gender5 %>% 
  filter(gender != "o") %>%  
         # id %in% first200$id) %>% # remove id's not in first200
  mutate(id = fct_drop(id)) %>% # remove ghost id's
  arrange(id) %>% # arrange id's numerically
  mutate(stim_gender = as.factor(ifelse(is.na(stim_gender), NA, paste0(stim_gender, "_depicted"))), # add "depicted" to stim_gender var
         gender = as.factor(paste0(gender, "_participant")),
         response_gender = case_when(condition == "gender" & gender == "m_participant" & response == "boy" ~ "same", # add "participant" to gender var
                                     condition == "gender" & gender == "m_participant" & response == "girl" ~ "other",
                                     condition == "gender" & gender == "f_participant" & response == "girl" ~ "same",
                                     condition == "gender" & gender == "f_participant" & response == "boy" ~ "other", 
                                     T ~ NA),
         interval) %>% 
  select(-c(start_date, start_time)) %>%  # remove start_date and start_time vars
  left_join(study_interim %>% select(id, race_s), by = "id") %>% rename(race = race_s) # get participant race(s)

domain_gender_id <- domain_gender0 %>% 
  distinct(id) %>% 
  mutate(id_anon = row_number())

domain_gender <- domain_gender0 %>% 
  left_join(domain_gender_id) %>% 
  mutate(stim_gender = case_when(stim_gender == "boy_depicted" ~ "Dep. Boys",
                                 stim_gender == "girl_depicted" ~ "Dep. Girls",
                                 is.na(stim_gender) & condition == "gender" ~ "Dep. Boys & Girls",
                                 T ~ NA),
         gender = case_when(gender == "f_participant" ~ "girls",
                            gender == "m_participant" ~ "boys",
                            T ~ NA),
         domain = case_when(
           condition == "gender" ~ paste0(q_subtype), # create grouping variable based on *type of comparison* (i.e., domain qua controlled information)
           condition %in% c("unknown", "known") ~ paste0(comparison),
           T ~ NA
         ),
         across(c(condition, gender, domain), ~str_to_title(.)),
         across(where(is.character), ~factor(.)))

domain_gender %>% count(condition, q_subtype, comparison, domain)
domain_gender %>% count(condition, domain)

Demographics

Sample Size

domain_gender %>% 
  filter(q_type %in% c("dom", "gen")) %>% 
  distinct(session, q_type, id) %>% 
  count(session, q_type)

## # A tibble: 4 × 3
##   session q_type     n
##   <fct>   <fct>  <int>
## 1 1       dom      219
## 2 1       gen      199
## 3 2       dom      154
## 4 2       gen      180

domain_gender %>% 
  filter(q_type %in% c("dom", "gen")) %>% 
  distinct(session, gender, id) %>% 
  count(session, gender)

## # A tibble: 4 × 3
##   session gender     n
##   <fct>   <fct>  <int>
## 1 1       Boys     203
## 2 1       Girls    215
## 3 2       Boys     157
## 4 2       Girls    177

Age (Years)

# domain_gender %>% # participants 
#   distinct(id, q_type, session, .keep_all = T) %>% 
#   group_by(session, q_type, gender) %>% 
#   summarize(n = n(), .groups = "drop") %>% 
#   mutate(n_for_summation = ifelse(q_type %in% c("gen", "dom"), n, 0)) %>% 
#   group_by(session) %>% 
#   mutate(n_per_session = sum(n_for_summation)) %>% 
#   group_by(session, gender) %>% 
#   mutate(n_per_gender_per_session = sum(n_for_summation)) %>% 
#   ungroup() 

domain_gender %>% # participant age
  distinct(id, session, .keep_all = T) %>%
  group_by(session, gender) %>%
  summarize(lower = min(age_raw),
            mean = mean(age_raw), 
            upper = max(age_raw), .groups = "drop")

## # A tibble: 4 × 5
##   session gender lower  mean upper
##   <fct>   <fct>  <dbl> <dbl> <dbl>
## 1 1       Boys    4.00  5.57  6.99
## 2 1       Girls   4.01  5.58  6.96
## 3 2       Boys    4.02  5.56  7.00
## 4 2       Girls   4.04  5.67  6.99

Race

domain_gender %>% # participant race
  distinct(id, .keep_all = T) %>% 
  group_by(race) %>%
  summarize(n = n(), .groups = "drop") %>%
  mutate(ratio = round(n / sum(n), 2),
         multiracial = str_detect(race, ",") | str_detect(race, "/") | str_detect(race, "&"),
         race_recode = ifelse(multiracial == T, "Multiracial", paste0(race))) %>% 
  group_by(race_recode) %>%
  summarize(n = sum(n), .groups = "drop") %>%
  mutate(ratio2 = round(n / sum(n), 2)) %>%
  arrange(-n) %>% 
  rename(race = race_recode, proportion = ratio2)

## # A tibble: 8 × 3
##   race                 n proportion
##   <chr>            <int>      <dbl>
## 1 Caucasian          247       0.59
## 2 Multiracial         78       0.18
## 3 Asian               59       0.14
## 4 African American    14       0.03
## 5 Hispanic            14       0.03
## 6 <NA>                 8       0.02
## 7 Middle Eastern       1       0   
## 8 South American       1       0

Intersession Intervals

In the figure, the solid vertical line depicts the mean, the hatched vertical line the median, and the dots vertical lines the 25th and 75th percentiles of the distribution.

# domain_gender %>% # intersession interval distribution
#   distinct(id, .keep_all = T) %>%
#   group_by(interval) %>%
#   summarize(n = n()) %>%
#   arrange(n = n()) 

domain_gender %>% # intersession interval stats
  distinct(id, .keep_all = T) %>%
  filter(interval != 0) %>%
  summarize(mean_days = mean(interval), 
            median = median(interval),
            sd = sd(interval),
            min = min(interval),
            max = max(interval))

## # A tibble: 1 × 5
##   mean_days median    sd   min   max
##       <dbl>  <dbl> <dbl> <dbl> <dbl>
## 1      15.9     13  9.39     7    60

# Figure
domain_gender_interval <- domain_gender %>% # intersession interval df
  distinct(id, .keep_all = T) %>%
  filter(interval != 0) 

ggplot(domain_gender_interval, aes(x = interval)) +
  geom_histogram(fill = "#69b3a2", color = "white", bins = 30, alpha = 0.8) +
  geom_vline(xintercept = mean(domain_gender_interval$interval), linetype = 1) +
  geom_vline(xintercept = median(domain_gender_interval$interval), linetype = 2) +
  geom_vline(xintercept = quantile(domain_gender_interval$interval, 0.25), linetype = "dotted", linewidth = 0.8, colour = "darkgrey") +
  geom_vline(xintercept = quantile(domain_gender_interval$interval, 0.75), linetype = "dotted", linewidth = 0.8, colour = "darkgrey") +
  labs(x = "Days Between Sessions", y = "Number of Participants", title = "Distribution of Intersession Intervals",     subtitle = paste("N =", nrow(domain_gender_interval))) +
  theme_classic(base_family = "Times New Roman", base_size = 15) +
  theme(plot.title = element_text(hjust = 0.5, face = "bold", size = 16),
        plot.subtitle = element_text(hjust = 0.5, size = 13, face = "italic"))

ggsave("figs/sup_fig1.png", height = 4, width = 6)

Analyses

Measures (aka Conditions)

These analyses investigate children’s stereotyping within and across measures.

Data Preparation

# pr(stereotyping) ####
df_mod_condition0 <- domain_gender %>%
  filter(condition %in% c("Gender", "Unknown", "Known")) %>% 
  select(response_stereotype, q_subtype, condition, comparison, age_raw, gender, session, stim_gender, stim_race, id_anon, domain)

df_mod_condition0_age <- df_mod_condition0 %>% 
  group_by(id_anon) %>% 
  summarize(age_mean = mean(age_raw), .groups = "drop") %>% 
  mutate(age_mean_standardized = as.numeric(scale(age_mean))) # mean center & standardize mean age

df_mod_condition <- df_mod_condition0 %>% 
  left_join(df_mod_condition0_age)

contrasts(df_mod_condition$response_stereotype) # pr(stereotype)

# prep
df <- df_mod_condition %>%
  mutate(cell = interaction(condition, domain, drop = TRUE))

# how many distinct cells (levels)
df %>% summarize(n_cells = n_distinct(cell))

# obs per cell (shows sparsity across cells)
df %>% count(cell) %>% arrange(n)

# per-subject observations per cell
per_subject_cell <- df %>%
  count(id_anon, cell) %>%
  group_by(cell) %>%
  summarize(min_n = min(n), median_n = median(n), mean_n = mean(n), n_subjs = n(), .groups = "drop") %>%
  arrange(min_n)
per_subject_cell

# per-subject total obs and how many subjects saw all 3 conditions
df %>% group_by(id_anon) %>%
  summarize(n_obs = n(), n_cells = n_distinct(cell), n_conditions = n_distinct(condition), .groups = "drop") %>%
  summarize(min_n_obs = min(n_obs), median_n_obs = median(n_obs), mean_n_obs = mean(n_obs))

Logistic Regression

Model

model_path <- "rmd_cache/models/mod_condition.rds"

if (!file.exists(model_path)) {

  mod_condition <- func_mod(
    formula =
      response_stereotype ~
        session +
        domain * age_mean_standardized * gender +
        (domain | id_anon) +
        (1 | stim_race),
    data   = df_mod_condition,
    family = bernoulli(),
    prior  = prior_mod
  )

  saveRDS(mod_condition, model_path)

} else {

  mod_condition <- readRDS(model_path)

}

# saveRDS(mod_condition, "rmd_cache/models/mod_condition.rds")

Checks

# plot(mod_condition)

# pp_check(mod_condition, ndraws = 1000)
# pp_check(mod_condition, "bars_grouped", group = "condition", ndraws = 1000)
# pp_check(mod_condition, "bars_grouped", group = "domain", ndraws = 1000)
# pp_check(mod_condition, "bars_grouped", group = "gender", ndraws = 1000)
# pp_check(mod_condition, "ecdf_overlay_grouped", group = "domain", ndraws = 100)

Results

# prediction grid
newdata_mod_condition <- crossing(session               = levels(mod_condition$data$session), 
                                  domain                = levels(mod_condition$data$domain),
                                  gender                = levels(mod_condition$data$gender),
                                  age_mean_standardized = seq(from = min(mod_condition$data$age_mean_standardized),
                                                              to = max(mod_condition$data$age_mean_standardized),
                                                              length.out = length_out)) 

# newdata_mod_condition %>% count(condition, domain) # ensure only observed condition*domain combinations will be predicted
# newdata_mod_condition %>% count(`condition:domain`) # ensure only observed condition:domain levels will be predicted

# posterior predicted expectation draws
epreds_mod_condition <- local({
  set.seed(10003) # reproducible subsampling of posterior draws
  epred_draws(mod_condition,
              newdata_mod_condition,
              re_formula = ~ (1 | condition:domain),
              ndraws = n_draws_posterior)
}) %>% ungroup() %>% 
  mutate(condition = case_when(domain %in% c("Art", "Math", "Read", "Sci") ~ "Gender",
                               str_detect(domain, "_other") ~ "Unknown",
                               str_detect(domain, "_vs_") ~ "Known",
                               T ~ NA),
         condition = fct_relevel(condition, "Known", "Gender", "Unknown"),
         age = age_mean_standardized * sd(domain_gender$age_raw) + mean(domain_gender$age_raw))

Pr(stereotyping | trial type, measure, age, gender)

What is the posterior probability that child of gender G and age A will respond stereotypically in trial type T of measure M? For example, what is the probability that a 4.76-year-old boy will respond stereotypically in the art domain of the gender measure (i.e., choose a girl)? (Answer: ~.45.)

### new fig ####
epreds_newfig_grouped <- epreds_mod_condition %>%
  group_by(.draw, domain, gender, age) %>% # marginalize across 
  summarize(mean_epred = mean(.epred), .groups = "drop") 

epreds_newfig_overall <- epreds_mod_condition %>% # marginalize across session, age, gender
  group_by(.draw, domain, age) %>%
  summarize(mean_epred = mean(.epred), .groups = "drop") %>%
  mutate(gender = "Overall")

epreds_newfig <- bind_rows(epreds_newfig_overall, epreds_newfig_grouped) %>% 
  mutate(gender = fct_rev(gender),
         condition = case_when(domain %in% c("Art", "Math", "Read", "Sci") ~ "Gender",
                               str_detect(domain, "_other") ~ "Unknown",
                               str_detect(domain, "_vs_") ~ "Known",
                               T ~ NA),
         domain = str_to_title(case_when(str_detect(domain, "_") ~ str_replace_all(domain, "_vs_", " vs. "),
                                         str_detect(domain, "Art") ~ "Art",
                                         str_detect(domain, "Sci") ~ "Science",
                                         str_detect(domain, "Read") ~ "Reading",
                                         str_detect(domain, "Math") ~ "Math",
                                         T ~ NA)),
         domain = str_replace(domain, "Vs.", "vs."))

epreds_newfig_averaged <- epreds_newfig %>% 
  group_by(condition, gender, domain, age) %>% 
  summarize(ci_lo = quantile(mean_epred, .025),
            mean_mean_epred = mean(mean_epred),
            ci_hi = quantile(mean_epred, .975), .groups = "drop")

func_plot <- function(condition_level){
  ggplot(epreds_newfig_averaged %>% filter(condition == condition_level),
       aes(x = age, y = mean_mean_epred,
           color = domain, fill = domain)) +
  geom_ribbon(aes(ymin = ci_lo, ymax = ci_hi),
              alpha = 0.2, color = NA) +
  geom_line(linewidth = 1) +
  geom_hline(yintercept = .5, linewidth = 0.25, linetype = 2, color = "black") +
  facet_grid(condition ~ gender, scales = "free") +
  labs(x = "Age (Years)",
       y = "Predicted Rate of Stereotyping",
       title = "Predicted Stereotyping",
       fill = "Trial Type",
       color = "Trial Type") +
  theme_classic(base_family = "Times New Roman", base_size = 16) +
  theme(plot.title = element_text(hjust = .5, face = "bold"),
        legend.position = "right") +
    scale_y_continuous(labels = function(x) sub("^0", "", sprintf("%.1f", x)), breaks = pretty_breaks(4))
}

p1_line <- func_plot("Gender") + labs(x = NULL, y = NULL) + theme(axis.line.x = element_blank(),
                                                             axis.ticks.x = element_blank(),
                                                             axis.text.x = element_blank(),
                                                             legend.justification = "left",
                                                             legend.box.margin = margin(0, 0, 20, 0))
p2_line <- func_plot("Known") + labs(x = NULL) + theme(axis.line.x = element_blank(),
                                                  axis.ticks.x = element_blank(),
                                                  axis.text.x = element_blank(),
                                                  plot.title = element_blank(),
                                                  strip.text.x.top = element_blank(),
                                                  plot.margin = unit(c(0, 0, 0, 0), "cm"),
                                                  legend.title = element_blank())
p3_line <- func_plot("Unknown") + labs(y = NULL) + theme(plot.title = element_blank(),
                                                    strip.text.x.top = element_blank(),
                                                    legend.title = element_blank())

p1_line / p2_line / p3_line

ggsave("figs/newfig.png", height = 5, width = 9)

epreds_newfig_slope <- epreds_newfig %>% 
  group_by(.draw, condition, domain, gender) %>% 
  summarize(age_slope = cov(age, mean_epred)/var(age),
            .groups = "drop")

epreds_newfig_slope_summary <- epreds_newfig_slope %>% 
  group_by(condition, domain, gender) %>% 
  summarize(prop_gt = mean(age_slope > 0),
            hdi_lo = hdi(age_slope)[1],
            mean = mean(age_slope),
            hdi_hi = hdi(age_slope)[2]) %>% 
  ungroup()

epreds_newfig_slope_contrasts <- epreds_newfig_slope %>% 
  group_by(condition, gender) %>% 
  compare_levels(age_slope, by = domain) %>% 
  ungroup() %>% 
  rename(diff_age_slope = age_slope)

# epreds_newfig_slope_contrasts %>% count(gender, domain)

epreds_newfig_slope_contrasts_summary <- epreds_newfig_slope_contrasts %>% 
  group_by(gender, condition, domain) %>% 
  summarize(prop_gt = mean(diff_age_slope > 0),
            hdi_lo = hdi(diff_age_slope)[1],
            mean = mean(diff_age_slope),
            hdi_hi = hdi(diff_age_slope)[2]) %>% 
  ungroup()

Is age influence children’s stereotyping at different rates in different domains? That is, do the slopes of the lines within each facet of the figure, above, differ? To answer this, this table displays contrasts of the average predicted effect of age on children’s rates of stereotyping between domains (e.g., art vs. reading) or domain comparisons (e.g., art-vs-other vs. reading-vs-other). There are 39 contrasts in the table, which compare the average predicted effect of age for every combination of trial types within a measure. The gender, measure, and domain column are listed first.

The last four columns provide the two, key inferential criteria outlined in the analysis plan. Specifically, the column “prop_gt” represents the posterior probability that, e.g., girls’ predicted mean rate of stereotyping in science exceeds that of reading. This specific comparison is depicted in row 2, in which the posterior probability is .844 that average effect of age on girls’ rate of stereotyping is greater in the science than in the reading domain of the gender measure. This suggests that girls’ science stereotypes may increase more with age compared to their reading stereotypes.

The last three columns depicted the lower bound of the 95% highest density interval of the distribution over the contrast, the mean of the contrast (i.e., the mean predicted difference in age effects), and the upper bound of the 95% highest density interval.

This or a similar structure repeats itself in all the tables, below.

print(epreds_newfig_slope_contrasts_summary %>% arrange(condition, -prop_gt) %>% rename(measure = "condition"), n = Inf)

## # A tibble: 39 × 7
##    gender  measure domain                       prop_gt   hdi_lo     mean hdi_hi
##    <fct>   <chr>   <chr>                          <dbl>    <dbl>    <dbl>  <dbl>
##  1 Overall Gender  Science - Reading              0.908 -1.42e-2  3.79e-2 0.0955
##  2 Girls   Gender  Science - Reading              0.844 -3.59e-2  3.36e-2 0.0978
##  3 Overall Gender  Science - Math                 0.841 -1.99e-2  2.17e-2 0.0640
##  4 Boys    Gender  Science - Reading              0.826 -4.26e-2  4.22e-2 0.123 
##  5 Boys    Gender  Science - Math                 0.790 -2.99e-2  2.76e-2 0.0947
##  6 Boys    Gender  Science - Art                  0.744 -4.56e-2  2.19e-2 0.0817
##  7 Girls   Gender  Science - Math                 0.722 -3.86e-2  1.58e-2 0.0703
##  8 Overall Gender  Science - Art                  0.688 -3.80e-2  1.04e-2 0.0498
##  9 Girls   Gender  Science - Art                  0.480 -5.42e-2 -1.10e-3 0.0542
## 10 Boys    Gender  Math - Art                     0.433 -7.08e-2 -5.73e-3 0.0596
## 11 Boys    Gender  Reading - Math                 0.371 -1.06e-1 -1.46e-2 0.0704
## 12 Overall Gender  Math - Art                     0.318 -5.71e-2 -1.13e-2 0.0347
## 13 Girls   Gender  Reading - Math                 0.317 -8.41e-2 -1.78e-2 0.0584
## 14 Girls   Gender  Math - Art                     0.301 -7.98e-2 -1.69e-2 0.0431
## 15 Overall Gender  Reading - Math                 0.295 -7.22e-2 -1.62e-2 0.0419
## 16 Boys    Gender  Reading - Art                  0.27  -8.34e-2 -2.03e-2 0.0491
## 17 Overall Gender  Reading - Art                  0.114 -7.22e-2 -2.75e-2 0.0150
## 18 Girls   Gender  Reading - Art                  0.108 -8.73e-2 -3.47e-2 0.0165
## 19 Overall Known   Science vs. Art - Read vs. …   0.976 -2.84e-4  4.05e-2 0.0811
## 20 Boys    Known   Science vs. Art - Read vs. …   0.941 -8.86e-3  4.53e-2 0.108 
## 21 Girls   Known   Science vs. Art - Read vs. …   0.886 -2.13e-2  3.57e-2 0.0949
## 22 Boys    Unknown Read vs. Other - Math vs. O…   0.585 -6.72e-2  9.45e-3 0.0967
## 23 Overall Unknown Read vs. Other - Math vs. O…   0.546 -5.68e-2  2.52e-3 0.0572
## 24 Boys    Unknown Sci vs. Other - Math vs. Ot…   0.481 -8.03e-2  1.72e-4 0.0778
## 25 Girls   Unknown Read vs. Other - Math vs. O…   0.46  -9.07e-2 -4.41e-3 0.0678
## 26 Boys    Unknown Sci vs. Other - Read vs. Ot…   0.408 -9.32e-2 -9.28e-3 0.0690
## 27 Girls   Unknown Sci vs. Other - Read vs. Ot…   0.406 -9.96e-2 -9.67e-3 0.0652
## 28 Overall Unknown Sci vs. Other - Math vs. Ot…   0.404 -6.47e-2 -6.96e-3 0.0509
## 29 Overall Unknown Sci vs. Other - Read vs. Ot…   0.384 -6.71e-2 -9.48e-3 0.0513
## 30 Girls   Unknown Sci vs. Other - Math vs. Ot…   0.364 -8.81e-2 -1.41e-2 0.0731
## 31 Girls   Unknown Math vs. Other - Art vs. Ot…   0.312 -9.82e-2 -2.08e-2 0.0609
## 32 Girls   Unknown Read vs. Other - Art vs. Ot…   0.276 -1.07e-1 -2.52e-2 0.0563
## 33 Boys    Unknown Read vs. Other - Art vs. Ot…   0.232 -1.21e-1 -3.16e-2 0.0501
## 34 Girls   Unknown Sci vs. Other - Art vs. Oth…   0.193 -1.19e-1 -3.49e-2 0.0473
## 35 Overall Unknown Read vs. Other - Art vs. Ot…   0.176 -9.11e-2 -2.84e-2 0.0258
## 36 Boys    Unknown Math vs. Other - Art vs. Ot…   0.159 -1.26e-1 -4.10e-2 0.0391
## 37 Boys    Unknown Sci vs. Other - Art vs. Oth…   0.155 -1.19e-1 -4.08e-2 0.0452
## 38 Overall Unknown Math vs. Other - Art vs. Ot…   0.138 -8.67e-2 -3.09e-2 0.0307
## 39 Overall Unknown Sci vs. Other - Art vs. Oth…   0.106 -9.73e-2 -3.79e-2 0.0231

Pr(stereotyping | measure, gender)

Did children respond stereotypically at rates above or below chance, when averaged across age and trial type?

### fig3 ####
# stereotype probability estimates
epreds_condition_bygender <- epreds_mod_condition %>%
  group_by(.draw, condition, gender) %>% # marginalize across session, age
  summarize(mean_epred = mean(.epred), .groups = "drop") 

epreds_condition_overall <- epreds_mod_condition %>% # marginalize across session, age, gender
  group_by(.draw, condition) %>%
  summarize(mean_epred = mean(.epred), .groups = "drop") %>%
  mutate(gender = "Overall")

draws_probs_mod_condition <- bind_rows(epreds_condition_overall, epreds_condition_bygender) %>% 
  mutate(gender = fct_rev(gender))

probs_mod_condition <- draws_probs_mod_condition %>% 
  group_by(gender, condition) %>% 
  summarize(prop_gt = mean(mean_epred > 0.5),
            min    = min(mean_epred),
            hdi_lo = hdi(mean_epred)[1],
            mean   = mean(mean_epred),
            hdi_hi = hdi(mean_epred)[2],
            max    = max(mean_epred),
            diff_lo = hdi_lo - min,
            diff_hi = hdi_hi - max,
            prop_gt_char = sprintf("%.0f%%", 100 * prop_gt),
            .groups = "drop") %>% 
  mutate(hjust_label = ifelse(hdi_hi > 0.490 & hdi_hi < 0.510, 0.1, NA),
         across(where(is.numeric), ~round(., 3)))

ggplot(draws_probs_mod_condition,
       aes(y = condition, x = mean_epred)) +
  stat_halfeye(aes(fill = after_stat(x < 0.5)), .width = 0.95, slab_alpha = 0.6, slab_linewidth = 0.5, slab_color = "black") +
  scale_fill_manual(values = c("TRUE" = "grey70", "FALSE" = "grey30"), guide = "none") + 
  facet_grid(~gender, scales = "free") +
  geom_text(data = probs_mod_condition,
            aes(x = hdi_hi + .014, label = prop_gt_char, hjust = hjust_label),
            size = 4,
            vjust = -3,
            family = "Times New Roman",
            fontface = "bold",
            color = "grey20") +
  theme_classic(base_family = "Times New Roman", base_size = 16) +
  geom_vline(xintercept = 0.5, alpha = 0.2, linetype = 2) +
  labs(x = "Predicted Probability of Stereotypical Response",
       title = "Stereotypical Responding by Condition",
       subtitle = "Did participants tend to choose the stereotypical option?",
       y = "Condition") +
  scale_x_continuous(labels = function(x) gsub("^-0\\.", "-.", gsub("^0\\.", ".", format(x, trim = TRUE))),
                     breaks = pretty_breaks(2)) +
  scale_y_discrete(expand = c(0.05, 0)) +
  theme(axis.text.y = element_markdown(family = "Times New Roman", size = 12, vjust = -1.5),
        axis.title.y = element_blank(),
        axis.ticks.y = element_blank(),
        plot.title = element_text(face = "bold", hjust = 0.5),
        plot.subtitle = element_text(hjust = .5))

ggsave("figs/fig3.png", width = 7, height = 4)

Pr(girls’ stereotyping > boys’ stereotyping | measure)

Did girls respond stereotypically more often than boys, on average? I have not yet prepared a figure for this.

# contrasts by gender
contrasts_gender_condition <- epreds_condition_bygender %>%
  group_by(condition) %>%
  compare_levels(mean_epred, by = gender) %>%
  ungroup() %>%
  rename(comparison = gender)
contrasts_gender_overall <- epreds_condition_bygender %>%
  group_by(gender, .draw) %>%
  summarize(mean_epred = mean(mean_epred),
            .groups = "drop") %>%
  compare_levels(mean_epred, by = gender) %>%
  rename(comparison = gender) %>%
  mutate(condition = "Overall") %>% 
  ungroup()
contrasts_gender <- bind_rows(contrasts_gender_condition, contrasts_gender_overall)

contrasts_gender %>% 
  group_by(condition, comparison) %>%
  summarize(prop_gt = mean(mean_epred > 0),
            hdi_lo  = hdi(mean_epred, .width = .95)[1],
            mean    = mean(mean_epred),
            hdi_hi  = hdi(mean_epred, .width = .95)[2],
            .groups = "drop") %>% 
  mutate(across(where(is.numeric), ~round(., 3)))

## # A tibble: 4 × 6
##   condition comparison   prop_gt hdi_lo   mean hdi_hi
##   <chr>     <chr>          <dbl>  <dbl>  <dbl>  <dbl>
## 1 Gender    Girls - Boys   0.07  -0.054 -0.023  0.005
## 2 Known     Girls - Boys   0     -0.097 -0.06  -0.02 
## 3 Overall   Girls - Boys   0.004 -0.049 -0.029 -0.007
## 4 Unknown   Girls - Boys   0.42  -0.041 -0.004  0.032

Pr(stereotyping in measure X > measure Y | gender)

Did children respond stereotypically more often in one measure than in another?

### sup_fig2 ####
# contrasts by condition
contrasts_condition_gender <- epreds_condition_bygender %>% 
  group_by(gender) %>%
  compare_levels(mean_epred, by = condition) %>%
  ungroup() %>% 
  rename(comparison = condition)

contrasts_condition_overall <- epreds_condition_bygender %>%
  group_by(condition, .draw) %>%
  summarize(mean_epred = mean(mean_epred),
            .groups = "drop") %>%
  compare_levels(mean_epred, by = condition) %>% 
  rename(comparison = condition) %>%
  mutate(gender = "Overall")

contrasts_condition <- bind_rows(contrasts_condition_gender, contrasts_condition_overall) %>% 
  mutate(gender = fct_rev(gender), # for plotting
         comparison2 = comparison) %>% 
  separate(comparison2, c("domain1", "domain2"), sep = "-") %>% 
  mutate(across(c(domain1, domain2), ~ trimws(.)),
         y_axis_label = paste0(domain2, " v. ", domain1),
         y_axis_label_colored = glue(
           "<span style='color:grey70;font-weight:bold'>{domain2}</span>",
           "<span style='color:black;font-weight:bold'> v. </span>",
           "<span style='color:grey30;font-weight:bold'>{domain1}</span>"))

summary_contrasts <- contrasts_condition %>% 
  group_by(gender, y_axis_label, y_axis_label_colored) %>%
  summarize(prop_gt = mean(mean_epred > 0),
            hdi_lo  = hdi(mean_epred, .width = .95)[1],
            mean    = mean(mean_epred),
            hdi_hi  = hdi(mean_epred, .width = .95)[2],
            prop_gt_char = sprintf("%.0f%%", 100 * prop_gt),
            .groups = "drop") %>% 
  mutate(hjust_label = ifelse(hdi_hi > -0.010 & hdi_hi < 0.010, 0.2, NA),
         across(where(is.numeric), ~round(., 3)))

ggplot(contrasts_condition,
       aes(x = mean_epred, y = y_axis_label)) +
  stat_halfeye(aes(fill = after_stat(x < 0)), .width = 0.95, slab_alpha = 0.6, slab_linewidth = 0.5, slab_color = "black") +
  scale_fill_manual(values = c("TRUE" = "grey70", "FALSE" = "grey30"), guide = "none") + 
  facet_grid(~gender, scales = "free") +
  geom_text(data = summary_contrasts,
            aes(x = hdi_hi + 0.009, label = prop_gt_char, hjust = hjust_label),
            size = 4,
            vjust = -3,
            family = "Times New Roman",
            fontface = "bold",
            color = "grey20") +
  theme_classic(base_size = 15, base_family = "Times New Roman") +
  geom_vline(xintercept = 0, alpha = 0.2, linetype = 2) +
  labs(x = "Difference in Predicted Probability of Stereotypical Response",
       title = "Contrasts: Stereotypical Responding by Condition",
       y = "Comparison",
       subtitle = "Which conditions showed stronger stereotyping?") +
  scale_y_discrete(labels = setNames(summary_contrasts$y_axis_label_colored,
                                     summary_contrasts$y_axis_label),
                   expand = expansion(mult = c(0.05, 0))) +
  scale_x_continuous(labels = function(x) gsub("^-0\\.", "-.", gsub("^0\\.", ".", format(x, trim = TRUE))),
                     breaks = pretty_breaks(3),
                     expand = c(0, .006)) +
  theme(axis.text.y = element_markdown(family = "Times New Roman", size = 12, vjust = -1.5),
        axis.title.y = element_blank(),
        axis.ticks.y = element_blank(),
        plot.title = element_text(face = "bold", hjust = 0.5),
        plot.subtitle = element_text(hjust = .5))

ggsave("figs/sup_fig2.png", width = 7, height = 4)

Pr(effect size of measure in boys > in girls | measure comparison)

Is the estimated effect size of measure M larger in boys or girls? I have no yet prepared a figure for this.

# gender contrasts of condition contrasts (i.e., were predicted condition effect sizes larger in boys or girls?)
contrasts_condition_gender <- contrasts_condition %>% 
  filter(gender != "Overall") %>% 
  group_by(comparison) %>% 
  compare_levels(mean_epred, "gender") %>% 
  ungroup()

contrasts_condition_gender %>% 
  group_by(gender, comparison) %>%
  summarize(prop_gt = mean(mean_epred > 0),
            hdi_lo  = hdi(mean_epred, .width = .95)[1],
            mean    = mean(mean_epred),
            hdi_hi  = hdi(mean_epred, .width = .95)[2],
            .groups = "drop") %>% 
  mutate(across(where(is.numeric), ~round(., 3)))

## # A tibble: 3 × 6
##   gender       comparison       prop_gt hdi_lo   mean hdi_hi
##   <chr>        <chr>              <dbl>  <dbl>  <dbl>  <dbl>
## 1 Boys - Girls Gender - Known     0.077 -0.086 -0.037  0.012
## 2 Boys - Girls Unknown - Gender   0.216 -0.063 -0.019  0.029
## 3 Boys - Girls Unknown - Known    0.019 -0.107 -0.055  0

Pr(stereotyping | age, measure, gender)

### age ####
#### fig4 ####
# age trends
epreds_age_nomarg_draws <- epreds_mod_condition %>% 
  group_by(.draw, condition, gender, age_mean_standardized) %>% # marginalize across session
  summarize(mean_epred = mean(.epred),
            .groups = "drop")
epreds_age_nomarg <- epreds_age_nomarg_draws %>% 
  group_by(condition, gender, age_mean_standardized) %>% 
  summarize(ci_diff_lo = quantile(mean_epred, .025),
            mean_mean_epred = mean(mean_epred),
            ci_diff_hi = quantile(mean_epred, .975),
            .groups = "drop")

epreds_age_genmarg_draws <- epreds_mod_condition %>% 
  group_by(.draw, condition, age_mean_standardized) %>% # marginalize across gender
  summarize(mean_epred = mean(.epred),
            .groups = "drop")
epreds_age_genmarg <- epreds_age_genmarg_draws %>% 
  group_by(condition, age_mean_standardized) %>% 
  summarize(ci_diff_lo = quantile(mean_epred, .025),
            mean_mean_epred = mean(mean_epred),
            ci_diff_hi = quantile(mean_epred, .975),
            .groups = "drop")
epreds_age_genmarg_overall <- epreds_age_genmarg_draws %>% # creating 'overall' for 'overall' facet
  group_by(age_mean_standardized) %>% 
  summarize(ci_diff_lo = quantile(mean_epred, .025),
            mean_mean_epred = mean(mean_epred),
            ci_diff_hi = quantile(mean_epred, .975),
            .groups = "drop")

epreds_age_condmarg_draws <- epreds_mod_condition %>% 
  group_by(.draw, gender, age_mean_standardized) %>% # marginalize across condition
  summarize(mean_epred = mean(.epred),
            .groups = "drop")
epreds_age_condmarg <- epreds_age_condmarg_draws %>%   
  group_by(gender, age_mean_standardized) %>% 
  summarize(ci_diff_lo = quantile(mean_epred, .025),
            mean_mean_epred = mean(mean_epred),
            ci_diff_hi = quantile(mean_epred, .975),
            .groups = "drop")

epreds_age <- bind_rows(epreds_age_nomarg, epreds_age_condmarg, epreds_age_genmarg,
                        epreds_age_genmarg_overall)  %>% 
mutate(age_mean_standardized = age_mean_standardized * sd(domain_gender$age_raw) + mean(domain_gender$age_raw),
         gender = fct_na_value_to_level(gender, level = "Overall"),
         condition = fct_na_value_to_level(condition, level = "Overall"),
         gender = fct_relevel(gender, "Overall", "Girls", "Boys"))

ggplot(epreds_age, aes(x = age_mean_standardized, y = mean_mean_epred, color = condition)) +
  facet_wrap( ~ gender) +
  geom_ribbon(data = epreds_age %>% filter(condition != "Overall"),
              aes(ymin = ci_diff_lo, ymax = ci_diff_hi, fill = condition),
              alpha = 0.2,
              color = NA) +
  geom_line(data = epreds_age %>% filter(condition != "Overall"), linewidth = 1) +
  geom_line(data = epreds_age %>% filter(condition == "Overall"), linewidth = 1, linetype = 2, color= "darkgrey") +
  theme_classic(base_size = 18, base_family = "Times New Roman") +
  labs(x = "Age (Years)",
       y = "Predicted Probability of Stereotyping",
       title = "Age Trends",
       fill = "Condition",
       color = "Condition") +
  geom_hline(yintercept = 0.50, linetype = 3, linewidth = 0.6, alpha = .5) +
  scale_y_continuous(labels = function(x) sub("^0", "", sprintf("%.1f", x)), breaks = pretty_breaks(3)) +
  scale_color_viridis_d(option = "D", end = .9) +
  scale_fill_viridis_d(option = "D", end = .9) +
  theme(plot.title = element_text(face = "bold", hjust = 0.5),
        axis.title.y = element_text(size = 16))

ggsave("figs/fig4.png", width = 6.5, height = 4)

Pr(age effect size in measure X > measure Y | gender, measure comparison)

Was the effect of age on children’s tendency to respond stereotypically in measure X greater than in measure Y? In short, did age matter more in measure X than in measure Y?

#### sup_fig3 & fig5 ####
# epred draws
epreds_age_effect_draws <- epreds_mod_condition %>%
  group_by(.draw, condition, gender, session) %>%
  summarize(age_effect = cov(age_mean_standardized, .epred) / var(age_mean_standardized),
            .groups = "drop")

avg_epreds_age_effects_draws <- epreds_age_effect_draws %>% 
  group_by(.draw, condition) %>% 
  summarize(mean_age_effect = mean(age_effect),
            .groups = "drop") 

avg_age_overall <- avg_epreds_age_effects_draws %>% # average age effect per condition
  group_by(condition) %>% 
  summarize(prop_gt = mean(mean_age_effect > 0),
            hdi_lo = hdi(mean_age_effect, .width = .95)[1],
            mean = mean(mean_age_effect),
            hdi_hi = hdi(mean_age_effect, .width = .95)[2],
            prop_gt_char = sprintf("%.0f%%", 100 * prop_gt),
            .groups = "drop") %>%
  mutate(gender = "Overall")

contrasts_age_overall <- avg_epreds_age_effects_draws %>% # contrast of average age effects per condition per gender
  compare_levels(mean_age_effect, by = condition) %>% 
  ungroup() %>% 
  rename(diff_mean_age_effect = mean_age_effect) %>% 
  mutate(gender = "Overall")

avg_epreds_age_effect_draws_gender <- epreds_age_effect_draws %>% 
  group_by(.draw, gender, condition) %>% 
  summarize(mean_age_effect = mean(age_effect),
            .groups = "drop")

avg_age_gender <- avg_epreds_age_effect_draws_gender %>% # average age effect per condition per gender
  group_by(gender, condition) %>%
  summarize(prop_gt = mean(mean_age_effect > 0),
            hdi_lo = hdi(mean_age_effect, .width = .95)[1],
            mean = mean(mean_age_effect),
            hdi_hi = hdi(mean_age_effect, .width = .95)[2],
            prop_gt_char = sprintf("%.0f%%", 100 * prop_gt),
            .groups = "drop")

contrasts_age_gender <- avg_epreds_age_effect_draws_gender %>% # contrast of average age effects per condition per gender
  mutate(condition = condition) %>% 
  group_by(gender) %>% 
  compare_levels(mean_age_effect, by = condition) %>% 
  ungroup() %>% 
  rename(diff_mean_age_effect = mean_age_effect)

##### sup_fig3 ####
# age effect contrasts
epreds_age_contrasts <- bind_rows(contrasts_age_overall, contrasts_age_gender) %>% 
  ungroup() %>% 
  mutate(gender = fct_rev(gender))

summary_contrasts_age <- bind_rows(contrasts_age_overall, contrasts_age_gender) %>% # age effects contrasts (overall and by gender)
  group_by(gender, condition) %>%
  summarize(prop_gt = mean(diff_mean_age_effect > 0),
            hdi_lo = hdi(diff_mean_age_effect, .width = .95)[1],
            expected_diff_age_effect = mean(diff_mean_age_effect),
            hdi_hi = hdi(diff_mean_age_effect, .width = .95)[2],
            prop_gt_char = sprintf("%.0f%%", 100 * prop_gt),
            .groups = "drop") %>% 
  mutate(hjust_label = ifelse(hdi_hi > -0.010 & hdi_hi < 0.010, 0.2, NA),
         gender = fct_rev(gender),
         across(where(is.numeric), ~round(., 3)))

ggplot(epreds_age_contrasts, aes(x = diff_mean_age_effect, y = condition)) +
  stat_halfeye(aes(fill = after_stat(x < 0)), .width = 0.95, slab_alpha = 0.6, slab_linewidth = 0.5, slab_color = "black") +
  scale_fill_manual(values = c("TRUE" = "grey70", "FALSE" = "grey30"), guide = "none") +
  facet_grid(~ gender, scales = "free") +
  geom_text(data = summary_contrasts_age,
            aes(x = hdi_hi + 0.01, label = prop_gt_char, hjust = hjust_label),
            size = 4,
            vjust = -3,
            family = "Times New Roman",
            fontface = "bold",
            color = "grey20") +
  theme_classic(base_size = 15, base_family = "Times New Roman") +
  geom_vline(xintercept = 0, alpha = 0.2, linetype = 2) +
  labs(x = "Difference in Predicted Age Effect (Probability Scale)",
       y = "Comparison",
       title = "Contrasts: Age Effects", 
       subtitle = "Which conditions showed a stronger positive age effect on stereotyping?") +
  scale_y_discrete(labels = c("<span style='color:grey70'>Known</span> v. <span style='color:grey30'>Gender</span>", # bottom row
                              "<span style='color:grey70'>Unknown</span> v. <span style='color:grey30'>Gender</span>", # middle row
                              "<span style='color:grey70'>Known</span> v. <span style='color:grey30'>Unknown</span>"), # top row
                   expand = expansion(mult = c(0.05, 0))) +
  scale_x_continuous(labels = function(x) sub("^(-?)0", "\\1", sprintf("%.2f", x)), breaks = pretty_breaks(3)) +
  theme(axis.text.y = element_markdown(family = "Times New Roman", size = 12, vjust = -1.5),
        axis.title.y = element_blank(),
        axis.ticks.y = element_blank(),
        plot.title = element_text(face = "bold", hjust = 0.4),
        plot.subtitle = element_text(hjust = .9))

ggsave("figs/sup_fig3.png", width = 7, height = 4)

Pr(age effect > chance | gender, measure comparison)

Was the effect of age on children’s tendency to respond stereotypically in measure X greater than chance?

##### fig5 ####
# marginal age effects
age_draws <- bind_rows(avg_epreds_age_effects_draws %>% mutate(gender = "Overall"),
                       avg_epreds_age_effect_draws_gender) %>% 
  mutate(condition = fct_relevel(condition, "Known", "Unknown", "Gender"),
         gender = fct_relevel(gender, "Overall", "Girls", "Boys"))

summary_age_marginal <- bind_rows(avg_age_overall, avg_age_gender) %>% # average effect of age per condition (overall and per gender)
  mutate(hjust_label = ifelse(hdi_hi > -0.010 & hdi_hi < 0.010, 0.2, NA),
         condition = fct_relevel(condition, "Known", "Unknown", "Gender"),
         gender = fct_rev(gender),
         across(where(is.numeric), ~round(., 3)))

ggplot(age_draws, aes(x = mean_age_effect, y = condition)) +
  stat_halfeye(aes(fill = after_stat(x < 0)), .width = 0.95, slab_alpha = 0.6, slab_linewidth = 0.5, slab_color = "black") +
  scale_fill_manual(values = c("TRUE" = "grey70", "FALSE" = "grey30"), guide = "none") +
  facet_grid(~ gender, scales = "free") +
  geom_text(data = summary_age_marginal,
            aes(x = hdi_hi + 0.005, label = prop_gt_char, hjust = hjust_label),
            size = 4,
            vjust = -3,
            family = "Times New Roman",
            fontface = "bold",
            color = "grey20") +
  theme_classic(base_size = 15, base_family = "Times New Roman") +
  geom_vline(xintercept = 0, alpha = 0.2, linetype = 2) +
  labs(x = "Predicted Effect Size of Age (Probability Scale)",
       title = "Age Effects",
       y = "Condition",
       subtitle = "Was the rate of stereotypical responding associated with age?") +
  scale_x_continuous(labels = function(x) sub("^(-?)0", "\\1", sprintf("%.2f", x)), breaks = pretty_breaks(2)) +
  theme(axis.text.y = element_text(vjust = -2),
        plot.title = element_text(face = "bold", hjust = 0.5),
        axis.ticks.y = element_blank(),
        plot.subtitle = element_text(hjust = 0.5)) +
  scale_y_discrete(expand = c(0.05, 0))

ggsave("figs/fig5.png", width = 7, height = 4)

Multivariate Regression (Correlated Residuals)

Data Preparation

Data structure is ouputted.

# residual correlation ####
## data ####
func_corr_ratios <- function(condition_level){
  df_mod_condition %>% filter(condition == condition_level) %>% 
    group_by(id_anon, response_stereotype) %>%
    summarize(n = n(), .groups = "drop") %>%
    group_by(id_anon) %>%
    mutate(ratio = n / sum(n)) %>%
    ungroup() %>% 
    mutate(response_stereotype = ifelse(response_stereotype == "inconsistent" & ratio == 1, "consistent2", paste0(response_stereotype)), # recode 100% inconsistent responses as 0% consistent
           ratio = ifelse(response_stereotype == "consistent2", 0, ratio),
           response_stereotype = ifelse(response_stereotype == "consistent2", "consistent", paste0(response_stereotype))) %>% 
    filter(response_stereotype != "inconsistent")
}

corr_ratios_gender  <- func_corr_ratios("Gender")  %>% select(id_anon, ratio) %>% rename(gender_ratio = ratio)
corr_ratios_unknown <- func_corr_ratios("Unknown") %>% select(id_anon, ratio) %>% rename(unknown_ratio = ratio)
corr_ratios_known   <- func_corr_ratios("Known")   %>% select(id_anon, ratio) %>% rename(known_ratio = ratio)

corr_data <- reduce(list(corr_ratios_gender, corr_ratios_unknown, corr_ratios_known), left_join, by = "id_anon") %>%
  drop_na() # drop NAs i.e., participants who did not contribute at least one observation to all three trials (these would be removed anyway when running the model, below)

Q-Q Plots

Are the correlations normally distributed? We need to answer this so that we know which distributional form to assume for our response. Functionally akin to a Shapiro-Wilk test.

## sup_fig4 ####
# qq plots
label_formatter <- function(x) {
  sub("^0", "", sprintf("%.2f", x))
}

make_qqplot <- function(data_vector, title) {
  df <- data.frame(value = data_vector)
  ggplot(df, aes(sample = value)) +
    stat_qq(size = 1.2, color = "#0072B2") +
    stat_qq_line(color = "gray40", linetype = "dashed", linewidth = 0.6) +
    labs(title = title, x = "Theoretical Quantiles", y = "Sample Quantiles") +
    scale_y_continuous(labels = label_formatter) +
    theme_minimal(base_size = 18, base_family = "Times New Roman") +
    theme(
      plot.title = element_text(hjust = 0.5, size = 20),
      axis.title = element_text(size = 18),
      axis.text = element_text(size = 16),
      panel.grid.minor = element_blank()
    )
}

p1qq <- make_qqplot(corr_data$gender_ratio, "Gender Measure")
p2qq <- make_qqplot(corr_data$unknown_ratio, "Unknown Measure") + labs(y = NULL)
p3qq <- make_qqplot(corr_data$known_ratio, "Known Measure") + labs(y= NULL)

(p1qq | p2qq | p3qq) + 
  plot_annotation(
    title = "Normal Q-Q Plots of Stereotype Proportions",
    theme = theme(
      plot.title = element_text(
        family = "Times New Roman",
        face = "bold",
        size = 24,
        hjust = 0.5)))

ggsave("figs/sup_fig4.png", width = 12, height = 4)

Model

We assume student t-distributed data to account for non-normality.

model_path <- "rmd_cache/models/mod_corr.rds"

if (!file.exists(model_path)) {

mod_corr <- func_mod(formula = bf(mvbind(unknown_ratio, known_ratio, gender_ratio) ~ 1) + set_rescor(TRUE),
                     data = corr_data,
                     family = student(),
                     prior = set_prior("lkj(2)", class = "rescor"))

  saveRDS(mod_corr, model_path)

} else {

  mod_corr <- readRDS(model_path)

}

# saveRDS(mod_corr, "rmd_cache/models/mod_corr.rds")

Results

Pr(residual correlation of stereotyping in measure X > chance | measures correlated)

Did children who responded stereotypically in measure X tend to respond stereotypically in measure Y? Denoted X_Y, below.

## analysis ####
df_mod_cor_draws <- tibble(as_draws_df(mod_corr)) %>% 
  select(starts_with("rescor")) %>% 
  pivot_longer(everything(), names_to = "cor") %>% 
  mutate(cor = str_remove(cor, "rescor__"),
         cor = str_remove_all(cor, "ratio"),
         cor = str_remove(cor, "_"))

cor_stats <- df_mod_cor_draws %>%
  group_by(cor) %>% 
  summarize(prop_gt = mean(value > 0),
            hdi_lo = hdi(value, .width = .95)[1],
            mean = mean(value),
            hdi_hi = hdi(value, .width = .95)[2],
            prop_gt_char = sprintf("%.0f%%", 100 * prop_gt),
            .groups = "drop") %>% 
  mutate(hjust_label = ifelse(hdi_hi > -0.010 & hdi_hi < 0.010, 0.2, NA),
         across(where(is.numeric), ~round(., 3)))

cor_stats

## # A tibble: 3 × 7
##   cor            prop_gt hdi_lo   mean hdi_hi prop_gt_char hjust_label
##   <chr>            <dbl>  <dbl>  <dbl>  <dbl> <chr>        <lgl>      
## 1 known_gender     0.516 -0.114  0.002  0.107 52%          NA         
## 2 unknown_gender   0.986  0.018  0.123  0.23  99%          NA         
## 3 unknown_known    0.359 -0.128 -0.02   0.092 36%          NA

Figure

This Figure summarizes the data and estimated residual correlations of stereotype proportions. Axes represent the stereotype proportion in that labelled condition. Each dot is a participant’s stereotype proportion value.

### fig6 ####
func_mean_rvals <- function(df){
  mean(df) %>%
    round(3) %>%
    format(nsmall = 3) %>%          # ensures fixed decimal places
    str_replace("^-0\\.", "-.") %>% # remove 0 after negative sign
    str_replace("^0\\.", ".")       # remove 0 from positive numbers
}

make_corr_plot <- function(data, xvar, yvar, xlab, ylab, r_label) {
  r_label_clean <- sub("^(-?)0\\.", "\\1.", r_label)
  ggplot(data, aes(x = .data[[xvar]], y = .data[[yvar]])) +
    geom_point(colour = "purple", alpha = .2) +
    geom_smooth(method = "lm", se = TRUE, color = "red") +
    annotate("text", x = 0.03, y = 1.00, 
             label = paste0("italic(rho) == '", r_label_clean, "'"), 
             parse = TRUE,
             hjust = 0, vjust = 1, size = 5, family = "Times New Roman", color = "red") +
    labs(x = xlab, y = ylab) +
    scale_x_continuous(limits = range(corr_data %>% pivot_longer(ends_with("_ratio")) %>% pull(value)), breaks = pretty_breaks(3), labels = function(x) gsub("^-0\\.", "-.", gsub("^0\\.", ".", format(x, trim = TRUE)))) +
    scale_y_continuous(breaks = pretty_breaks(3), labels = function(x) sub("^(-?)0", "\\1", sprintf("%.1f", x))) +
    theme_classic(base_size = 18, base_family = "Times New Roman")
}

p1 <- make_corr_plot(corr_data, "unknown_ratio", "gender_ratio", "Unknown", "Gender",
                     func_mean_rvals(df_mod_cor_draws %>% filter(cor == "unknown_gender") %>% pull(value)))
p2 <- make_corr_plot(corr_data, "known_ratio", "gender_ratio", "Known", "Gender",
                     func_mean_rvals(df_mod_cor_draws %>% filter(cor == "known_gender") %>% pull(value))) 
p3 <- make_corr_plot(corr_data, "known_ratio", "unknown_ratio", "Known", "Unknown",
                     func_mean_rvals(df_mod_cor_draws %>% filter(cor == "unknown_known") %>% pull(value)))

make_corr_density <- function(var) {
  draws <- df_mod_cor_draws %>% filter(cor == {{var}}) %>% pull(value)
  hdi_vals <- ggdist::hdi(draws, .width = 0.95)
  hdi_lo <- hdi_vals[1]
  hdi_hi <- hdi_vals[2]
  median_val <- median(draws)
  dens <- density(draws)
  dens_df <- tibble(x = dens$x, y = dens$y) %>%
    filter(x >= min(draws), x <= max(draws))
  baseline <- 0.02
  
  ggplot(dens_df, aes(x = x, y = y)) +
    geom_ribbon(aes(ymin = 0, ymax = y, fill = x < 0), alpha = 0.6) +
    geom_line(color = "red", linewidth = 0.5) +
    annotate("segment", x = hdi_lo, xend = hdi_hi, y = baseline, yend = baseline, color = "black", size = 1.2) +
    annotate("point", x = median_val, y = baseline, color = "red", size = 3) +
    geom_vline(xintercept = 0, alpha = 0.2, linetype = 2) +
    geom_text(data = cor_stats %>% filter(cor == {{var}}),
              aes(x = hdi_hi + 0.030, y = 0, label = prop_gt_char, hjust = hjust_label),
              size = 6, vjust = -4, family = "Times New Roman", fontface = "bold", color = "grey40") +
    scale_fill_manual(values = c("TRUE" = "grey70", "FALSE" = "grey30"), guide = "none") +
    labs(x = expression(rho), y = NULL) +
    theme_classic(base_size = 18, base_family = "Times New Roman") +
    theme(plot.title = element_text(hjust = 0.5, size = 18),
          axis.text.y = element_blank(),
          axis.ticks.y = element_blank(),
          axis.line.y = element_blank(),
          axis.title.x = element_text(hjust = .45)) +
    scale_x_continuous(limits = range(df_mod_cor_draws %>% pull(value)), breaks = pretty_breaks(3),
                       labels = function(x) gsub("^-0\\.", "-.", gsub("^0\\.", ".", format(x, trim = TRUE))))
}

d1 <- make_corr_density("unknown_gender") 
d2 <- make_corr_density("known_gender") 
d3 <- make_corr_density("unknown_known") 

row1 <- p1 + d1
row2 <- p2 + d2
row3 <- p3 + d3

wrap_elements(row1 / row2 / row3) + plot_annotation(title = "Residual Correlations of Stereotypical Response Proportions",
                                                    theme = theme(plot.title = element_text(hjust = 0.5, size = 20, face = "bold", family = "Times New Roman")))

ggsave("figs/fig6.png", height = 6, width = 7.2)

Depicted Gender and Strategies

These analyses investigated effects of stimulus (depicted) gender on children’s behavior. Also, these analyses investigated the predictive utility of alternative behavioral coding schemes, besides stereotyping (i.e., alternative “response strategies”).

Depicted Gender

Data Preparation

The data is outputted.

# stim_gender ####
# known & unknown conditions
## data ####
df_mod_stimgender0 <- df_mod_condition %>%
  filter(condition != "Gender") %>% # exclude gender condition (include domain choice trials, only)
  droplevels()

df_mod_stimgender0_age <- df_mod_stimgender0 %>% 
  group_by(id_anon) %>% 
  summarize(age_mean = mean(age_raw), .groups = "drop") %>% 
  mutate(age_mean_standardized = as.numeric(scale(age_mean))) # mean center & standardize mean age

df_mod_stimgender <- df_mod_stimgender0 %>% 
  left_join(df_mod_stimgender0_age) %>% 
  mutate(comparison = factor(case_when(comparison == "math_vs_other" ~ "Math v. Other",
                                comparison == "sci_vs_other" ~ "Sci. v. Other",
                                comparison == "read_vs_other" ~ "Read v. Other",
                                comparison == "art_vs_other" ~ "Art v. Other",
                                comparison == "read_vs_math" ~ "Read v. Math",
                                comparison == "science_vs_art" ~ "Sci. v. Art",
                                T ~ NA)))

## sup_fig5 ####
df_mod_stimgender %>% 
  group_by(id_anon, comparison) %>% 
  count(response_stereotype) %>% 
  mutate(sum_n = sum(n),
         prop_stereotypical = n / sum_n,
         response_stereotype = ifelse(response_stereotype == "inconsistent" &
                                        prop_stereotypical == 1, "fully_inconsistent",
                                      paste0(response_stereotype)),
         n = ifelse(response_stereotype == "fully_inconsistent", 0, n),
         prop_stereotypical = sub("^0\\.", ".", sprintf("%.2f", ifelse(response_stereotype == "fully_inconsistent", 0, prop_stereotypical))),
         response_stereotype = ifelse(response_stereotype == "fully_inconsistent", "consistent", paste0(response_stereotype))) %>%
  ungroup() %>% 
  filter(response_stereotype != "inconsistent") %>% 
  group_by(comparison) %>% 
  count(prop_stereotypical) %>% 
  mutate(sum_n = sum(n),
         prop_prop_stereotypical = n / sum_n) %>% 
  ungroup() %>% 
  mutate(comparison = fct_relevel(comparison, c("Read v. Other", "Art v. Other", "Math v. Other", "Sci. v. Other", "Sci. v. Art", "Read v. Math"))) %>%
  ggplot(aes(x = comparison, 
             y = prop_prop_stereotypical, 
             fill = as.factor(prop_stereotypical))) +
  geom_col(position = position_dodge(width = 0.9), color = "black", width = 0.8) +
  scale_y_continuous(labels = function(x) gsub("^-0\\.", "-.", gsub("^0\\.", ".", format(x, trim = TRUE)))) +
  scale_fill_viridis_d() +
  labs(
    x = "Comparison",
    y = "Proportion of Participants",
    title = "Stereotypical Responses",
    fill = "Proportion\nStereotypical") +
  theme_classic(base_size = 20, base_family = "Times New Roman") +
  theme(axis.text.x = element_text(angle = 30, vjust = 1, hjust = 1),
        plot.title = element_text(face = "bold", hjust = .5))

ggsave("figs/sup_fig5.png", height = 5, width = 10)

# df_mod_stimgender %>% 
#   count(gender, stim_gender, comparison, response_stereotype) %>% 
#   ungroup() %>% 
#   group_by(gender, stim_gender, comparison) %>% 
#   mutate(prop = n / sum(n))
# 
# contrasts(df_mod_stimgender$response_stereotype)

Model

model_path <- "rmd_cache/models/mod_stimgender.rds"

if (!file.exists(model_path)) {

mod_stimgender <- func_mod(formula =
                             response_stereotype ~
                             age_mean_standardized + stim_gender * comparison * gender + 
                             (stim_gender * comparison | id_anon) +
                             (1 | stim_race),
                           data = df_mod_stimgender,
                           family = bernoulli(),
                           prior = prior_mod[-5,])

  saveRDS(mod_stimgender, model_path)

} else {

  mod_stimgender <- readRDS(model_path)

}

# saveRDS(mod_stimgender, "rmd_cache/models/mod_stimgender.rds")

Checks

# plot(mod_stimgender)
# pp_check(mod_stimgender, "dens_overlay_grouped", group = "comparison", ndraws = 100)
# pp_check(mod_stimgender, "ecdf_overlay_grouped", group = "comparison", ndraws = 100)

Results

## analysis ####
newdata_mod_stimgender <- crossing(gender           = levels(mod_stimgender$data$gender),
                                   comparison       = levels(mod_stimgender$data$comparison), # later in the pipeline, average the epreds across the levels of "comparison" corresponding to either condition level ("known", "unknown") to get that condition level's marginal conditional expectation
                                   age_mean_standardized = seq(from = min(mod_stimgender$data$age_mean_standardized),
                                                               to = max(mod_stimgender$data$age_mean_standardized),
                                                               length.out = length_out),
                                   stim_gender      = levels(mod_stimgender$data$stim_gender))

epreds_mod_stimgender <- local({
  set.seed(10003)
  epred_draws(mod_stimgender, newdata_mod_stimgender, re_formula = NA, ndraws = n_draws_posterior)
  }
  ) %>%
  ungroup() %>% 
  mutate(condition = case_when(str_ends(comparison, "_other") ~ "Unknown",
                               T ~ "Known"))

Pr(stereotyping | gender, depicted gender, domain comparison)

### fig7 ####
# probabilities
epreds_mod_stimgender_draws <- epreds_mod_stimgender %>% 
  group_by(.draw, comparison, gender, stim_gender) %>%
  summarize(mean_epred = mean(.epred),
            .groups = "drop") %>% 
  mutate(comparison = fct_relevel(comparison, c("Read v. Other", "Art v. Other", "Math v. Other", "Sci. v. Other", "Sci. v. Art", "Read v. Math")))

epreds_mod_stimgender_draws_summary <- epreds_mod_stimgender_draws %>% 
  group_by(comparison, gender, stim_gender) %>%
  summarize(prop_gt   = mean(mean_epred > 0.5),
            min    = min(mean_epred),
            hdi_lo = hdi(mean_epred)[1],
            mean   = mean(mean_epred),
            hdi_hi = hdi(mean_epred)[2],
            max    = max(mean_epred),
            diff_lo = hdi_lo - min,
            diff_hi = hdi_hi - max,
            prop_gt_char = sprintf("%.0f%%", 100 * prop_gt),
            .groups = "drop") %>% 
  mutate(across(where(is.numeric), ~round(., 3)))

# in the x_vs_other comparisons (i.e., in the unknown condition), it seems like kids are just choosing the item that they can see (but not in the known-known condition)
# important point: the above strategy in the unknown condition (i.e., choose known domain) is explained by kids being more familiar with "known" than "unknown" domains -- this also accounts for why there is an increase in stereotypicality across age in the unknown condition (i.e., kids become increasingly familiar with "known" than "unknown" domains with age due to experience in school etc.)
# to get probability of choosing own gender, use 1-gt_0 conditionally (ie depending on whether the stereotype consistent choice == own_gender)
# likewise^ for choosing known vs unknown domain, but the conditioning statement might differ from that above

ggplot(epreds_mod_stimgender_draws, 
         aes(x = mean_epred, y = stim_gender)) +
  stat_halfeye(aes(fill = after_stat(x < 0.5)), .width = 0.95, slab_alpha = 0.6, slab_linewidth = 0.5, slab_color = "black") +
  scale_fill_manual(values = c("TRUE" = "grey70", "FALSE" = "grey30"), guide = "none") +
  facet_grid(gender ~ comparison, scales = "free") +
  geom_text(data = epreds_mod_stimgender_draws_summary,# %>% filter(gender == gender_level),
            aes(x = hdi_hi + .16, label = prop_gt_char),
            size = 4,
            vjust = -3,
            family = "Times New Roman",
            fontface = "bold",
            color = "grey20") +
  theme_classic(base_size = 14, base_family = "Times New Roman") +
  geom_vline(xintercept = 0.5, alpha = 0.2, linetype = 2) +
  labs(x = "Predicted Probability of Stereotypical Response", 
       y = "Simulus Gender",
       title = "Stereotyping by Stimulus Gender",
       subtitle = "Did participants tend to choose the stereotypical domain?") +
  scale_x_continuous(labels = function(x) sub("^(-?)0", "\\1",
                                              sprintf("%.2f", x)),
                     breaks = pretty_breaks(2),
                     expand = expansion(mult = c(0, 0.25))) +
  scale_y_discrete(expand = expansion(c(0.1, NA))) +
  theme(axis.ticks.y = element_blank(),
        axis.text.y = element_text(vjust = -2.5),
        plot.title = element_text(face = "bold", hjust = 0.5, size = 16),
        plot.subtitle = element_text(hjust = .5))

ggsave("figs/fig7.png", width = 8, height = 5)

Pr(stereotyping for depicted boys > depicted girls | gender, domain comparison)

Were boys or girls more likely to stereotype for depicted boys than for depicted girls, or vice versa, in each of the various domain comparisons?

### sup_fig6 ####
# contrasting levels of stim_gender
contrasts_epreds_mod_stimgender_draws <- epreds_mod_stimgender_draws %>% 
  group_by(gender, comparison) %>%
  compare_levels(mean_epred, by = stim_gender) %>%
  ungroup() 
contrasts_epreds_mod_stimgender_draws_summary <- contrasts_epreds_mod_stimgender_draws %>%
  group_by(gender, comparison, stim_gender) %>%
  summarize(prop_gt = mean(mean_epred > 0),
            hdi_lo  = hdi(mean_epred)[1],
            mean    = mean(mean_epred),
            hdi_hi  = hdi(mean_epred)[2],
            prop_gt_char = sprintf("%.0f%%", 100 * mean(mean_epred > 0)),
            .groups = "drop")

ggplot() + # first plot: left y-axis label
  geom_richtext(aes(x = 0, y = 0, label = "<span style='color:grey70'>Depicted Boys</span><br>
                 <span style='color:black'>vs.</span><br>
                 <span style='color:grey50'>Depicted Girls</span><br>"),
                fill = NA, label.color = NA,
                size = 4.25,
                hjust = 0.5, vjust = 0.5, family = "Times New Roman", fontface = "bold") + 
  theme_void() +
  ggplot(contrasts_epreds_mod_stimgender_draws %>% 
           mutate(stim_gender = case_when(str_detect(stim_gender, "f_participant") ~ "Girls",
                                          str_detect(stim_gender, "m_participant") ~ "Boys")), 
         aes(x = mean_epred, y = stim_gender)) +
  stat_halfeye(aes(fill = after_stat(x < 0)), .width = 0.95, slab_alpha = 0.6, slab_linewidth = 0.5, slab_color = "black") +
  scale_fill_manual(values = c("TRUE" = "grey70", "FALSE" = "grey30"), guide = "none") +
  facet_grid(gender ~ comparison, scales = "free") +
  geom_text(data = contrasts_epreds_mod_stimgender_draws_summary %>% 
              mutate(stim_gender = case_when(str_detect(stim_gender, "f_participant") ~ "Girls",
                                             str_detect(stim_gender, "m_participant") ~ "Boys")),# %>% filter(gender == gender_level),
            aes(x = hdi_hi + .10, label = prop_gt_char),
            size = 3.5,
            vjust = -4,
            family = "Times New Roman",
            fontface = "bold",
            color = "grey20") +
  theme_classic(base_size = 14, base_family = "Times New Roman") +
  geom_vline(xintercept = 0, alpha = 0.2, linetype = 2) +
  labs(x = "Difference in Predicted Probability of Stereotypical Response",
       y = NULL,
       title = "Contrasts: Stereotyping by Stimulus Gender",
       subtitle = "Did participants choose the stereotypical domain more often for depicted boys or girls?") +
  scale_x_continuous(labels = function(x) sub("^(-?)0", "\\1", sprintf("%.1f", x)),
                     breaks = pretty_breaks(4),
                     expand = c(0, 0.15)) +
  scale_y_discrete(expand = expansion(c(0.1, NA))) +
  theme(axis.ticks.y = element_blank(),
        axis.text.y = element_blank(),
        plot.title = element_text(face = "bold", hjust = 0.5),
        plot.subtitle = element_text(hjust = .5)) +
  plot_layout(widths = c(1, 6))

ggsave("figs/sup_fig6.png", width = 8, height = 5)

Strategies

We are interested in investigating children’s probability of responding stereotypically. But we of course do not know how children are actually thinking. This opens up the possibility that children may be using response strategies that resemble stereotyping, but – under a different coding scheme – are also characterizable as an alternative, non-stereotypical response strategy.

One such alternative response strategy is the own-gender bias. This bias has been documented in the literature, and we replicate it, here, in the gender measure (see below). Notably, however, the own-gender bias can only emerge in the gender measure because only in that measure can children choose their own or the other gender. In contrast, the structure and logic of the other two measures - the unknown and known measures - enables two, corresponding alternative response strategies other than an own-gender bias. I call these hypothetical strategies “choose-known” (unknown measure) and “domain bias” (known measure).

It is crucial to keep in mind that - because these response strategies involve a different coding scheme than our coding scheme for stereotypical responding - they are, in effect, operationalizations of complementary, though distinct, questions from our target question about children’s stereotypical responding. Thus, because these two coding schemes are altogether different ways of categorizing children’s behavior that speak to different research questions, it is logically possible for children to both respond stereotypically and to, e.g., exhibit an own-gender bias.

Unknown Measure (Choose Known Strategy)

Do children favor the unknown over the known domain, or vice versa?

Data Preparation

# strategies ####
## unknown condition ####
# choose-known strategy (analogue to the own-gender bias in the gender condition)
### data ####
df_mod_strat_unknown0 <- df_mod_stimgender %>% # do kids use choose-known strategy?
  filter(condition == "Unknown") %>%
  mutate(choose_known = factor(case_when(comparison == "Art v. Other" & stim_gender == "Dep. Girls" & response_stereotype == "consistent" ~ "choose_known", # choose art
                                         comparison == "Art v. Other" & stim_gender == "Dep. Girls" & response_stereotype == "inconsistent" ~ "choose_unknown", # choose other
                                         comparison == "Art v. Other" & stim_gender == "Dep. Boys" & response_stereotype == "consistent" ~ "choose_unknown", # choose other
                                         comparison == "Art v. Other" & stim_gender == "Dep. Boys" & response_stereotype == "inconsistent" ~ "choose_known", # choose art

                                         comparison == "Read v. Other" & stim_gender == "Dep. Girls" & response_stereotype == "consistent" ~ "choose_known", # choose read
                                         comparison == "Read v. Other" & stim_gender == "Dep. Girls" & response_stereotype == "inconsistent" ~ "choose_unknown", # choose other
                                         comparison == "Read v. Other" & stim_gender == "Dep. Boys" & response_stereotype == "consistent" ~ "choose_unknown", # choose other
                                         comparison == "Read v. Other" & stim_gender == "Dep. Boys" & response_stereotype == "inconsistent" ~ "choose_known", # choose read

                                         comparison == "Math v. Other" & stim_gender == "Dep. Girls" & response_stereotype == "consistent" ~ "choose_unknown", # choose other
                                         comparison == "Math v. Other" & stim_gender == "Dep. Girls" & response_stereotype == "inconsistent" ~ "choose_known", # choose math
                                         comparison == "Math v. Other" & stim_gender == "Dep. Boys" & response_stereotype == "consistent" ~ "choose_known", # choose math
                                         comparison == "Math v. Other" & stim_gender == "Dep. Boys" & response_stereotype == "inconsistent" ~ "choose_unknown", # choose other

                                         comparison == "Sci. v. Other" & stim_gender == "Dep. Girls" & response_stereotype == "consistent" ~ "choose_unknown", # choose other
                                         comparison == "Sci. v. Other" & stim_gender == "Dep. Girls" & response_stereotype == "inconsistent" ~ "choose_known", # choose sci
                                         comparison == "Sci. v. Other" & stim_gender == "Dep. Boys" & response_stereotype == "consistent" ~ "choose_known", # choose sci
                                         comparison == "Sci. v. Other" & stim_gender == "Dep. Boys" & response_stereotype == "inconsistent" ~ "choose_unknown", # choose other

                                         T ~ NA)),
         choose_known_num = factor(ifelse(choose_known == "choose_known", 1, 0)))

sum(is.na(df_mod_strat_unknown0$choose_known)) # ensure recoding accuracy
df_mod_strat_unknown0 %>% count(comparison, stim_gender, response_stereotype, choose_known) # ensure recoding accuracy
df_mod_strat_unknown0 %>% count(comparison, stim_gender, response_stereotype) %>% # ensure recoding accuracy
  ungroup() %>% 
  group_by(comparison, stim_gender) %>% 
  mutate(prop = n/ sum(n))
df_mod_strat_unknown0 %>% count(comparison, stim_gender, choose_known) %>%  # ensure recoding accuracy
  ungroup() %>% 
  group_by(comparison, stim_gender) %>% 
  mutate(prop = n/ sum(n))
df_mod_strat_unknown0 %>% count(choose_known_num, choose_known) # ensure recoding accuracy

df_mod_strat_unknown0_age <- df_mod_strat_unknown0 %>% 
  group_by(id_anon) %>% 
  summarize(mean_age = mean(age_raw), .groups = "drop") %>% 
  mutate(age_mean_standardized = as.numeric(scale(mean_age)))

df_mod_strat_unknown <- df_mod_strat_unknown0 %>% 
  left_join(df_mod_strat_unknown0_age)

n_distinct(df_mod_strat_unknown$id_anon) # n participants
nrow(df_mod_strat_unknown) # n obs

Model

# contrasts(df_mod_strat_unknown$choose_known_num)

model_path <- "rmd_cache/models/mod_strat_unknown.rds"

if (!file.exists(model_path)) {

mod_strat_unknown <- func_mod(formula =
                                choose_known_num ~
                                age_mean_standardized + stim_gender * comparison * gender +
                                (stim_gender * comparison | id_anon) +
                                (1 | stim_race),
                              data = df_mod_strat_unknown,
                              family = bernoulli(),
                              prior = prior_mod)

  saveRDS(mod_strat_unknown, model_path)

} else {

  mod_strat_unknown <- readRDS(model_path)

}

# saveRDS(mod_strat_unknown, "rmd_cache/models/mod_strat_unknown.rds")

Checks

conditional_effects(mod_strat_unknown,
                    effects = "stim_gender:comparison",
                    conditions = data.frame(gender = unique(mod_strat_unknown$data$gender)))[[1]]

# plot(mod_strat_unknown)
# pp_check(mod_strat_unknown, ndraws = 100)

df_mod_stimgender %>%
  filter(str_ends(comparison, "Other")) %>%
  count(gender, comparison, stim_gender, response_stereotype)
df_mod_strat_unknown %>% count(gender, comparison, stim_gender, choose_known)

Results

newdata_mod_strat_unknown <- crossing(gender      = levels(mod_strat_unknown$data$gender), # already dropped gender == "O" participant
                                      comparison  = levels(mod_strat_unknown$data$comparison),
                                      stim_gender = levels(mod_strat_unknown$data$stim_gender),
                                      age_mean_standardized = seq(from = min(mod_strat_unknown$data$age_mean_standardized),
                                                             to = max(mod_strat_unknown$data$age_mean_standardized),
                                                             length.out = length_out))

epreds_mod_strat_unknown <- local({
  set.seed(10003)
  epred_draws(mod_strat_unknown, newdata_mod_strat_unknown, re_formula = NA, ndraws = n_draws_posterior)
}
) %>% 
  ungroup()

epreds_mod_strat_unknown_draws <- epreds_mod_strat_unknown %>%
  group_by(.draw, gender, comparison, stim_gender) %>%
  summarize(mean_epred = mean(.epred), .groups = "drop") %>% 
  mutate(comparison = fct_relevel(comparison, c("Read v. Other", "Art v. Other", "Math v. Other", "Sci. v. Other")))

Pr(choose known domain | gender, depicted gender, domain comparison)

#### fig8 ####
# probabilities
epreds_mod_strat_unknown_draws_summary <- epreds_mod_strat_unknown_draws %>% 
  group_by(gender, comparison, stim_gender) %>% 
  summarize(prop_gt_chance = mean(mean_epred > 0.5), # probability that participants chose the known rather than unknown domain on average across depicted gender (NOTE: these may not be the best hypotheses to compare, but nonetheless the prop_gt_chance should be = .50, ideally; a better comparison would be choose_known vs. choose_stereotypical - but how to code this?)
            hdi_lo = hdi(mean_epred)[1],
            mean   = mean(mean_epred),
            hdi_hi = hdi(mean_epred)[2],
            prop_gt_char = sprintf("%.0f%%", 100 * prop_gt_chance),
            .groups = "drop") %>% 
  mutate(across(where(is.numeric), ~round(., 3)))

ggplot(epreds_mod_strat_unknown_draws,
       aes(x = mean_epred, y = stim_gender)) +
  stat_halfeye(aes(fill = after_stat(x < 0.5)), .width = 0.95, slab_alpha = 0.6, slab_linewidth = 0.5, slab_color = "black") +
  scale_fill_manual(values = c("TRUE" = "grey70", "FALSE" = "grey30"), guide = "none") +
  facet_grid(gender ~ comparison, scales = "free") +
  geom_text(data = epreds_mod_strat_unknown_draws_summary,
            aes(x = hdi_hi + .08, label = prop_gt_char),
            size = 3.5,
            vjust = -3,
            family = "Times New Roman",
            fontface = "bold",
            color = "grey20") +
  theme_classic(base_size = 15, base_family = "Times New Roman") +
  geom_vline(xintercept = 0.5, alpha = 0.2, linetype = 2) +
  labs(x = "Predicted Probability of Choose-Known Choice",
       y = "Stimulus Gender",
       title = "Choose-Known Strategy",
       subtitle = "Did participants tend to choose the known domain?") +
  scale_x_continuous(labels = function(x) sub("^(-?)0", "\\1", sprintf("%.2f", x)),
                     breaks = pretty_breaks(3),
                     expand = c(0, .1)) +
  scale_y_discrete(expand = expansion(c(0.1, NA))) +
  theme(plot.title = element_text(face = "bold", hjust = 0.5),
        axis.text.y = element_text(vjust = -2.5),
        axis.ticks.y = element_blank(),
        plot.subtitle = element_text(hjust = .5))

ggsave("figs/fig8.png", width = 7, height = 5)

Pr(choose known domain with depicted boy > depicted girl | gender, domain comparison)

#### sup_fig7 ####
# contrasting levels of stim_gender
epreds_mod_strat_unknown_draws_contrasts_stim <- epreds_mod_strat_unknown_draws %>% 
  group_by(comparison, gender) %>% 
  compare_levels(mean_epred, by = stim_gender) %>% 
  ungroup()

epreds_mod_strat_unknown_draws_contrasts_summary_stim <- epreds_mod_strat_unknown_draws_contrasts_stim %>% 
  group_by(stim_gender, comparison, gender) %>% 
  summarize(prop_gt = mean(mean_epred > 0), # probability that participants used a choose_known, rather than choose_unknown, strategy (NOTE: this may not be the best hypotheses to compare, but nonetheless the prop_gt_chance should be = .50, ideally; a better comparison would be choose_known vs. choose_stereotypical - but how to code this?)
            hdi_lo = hdi(mean_epred)[1],
            mean   = mean(mean_epred),
            hdi_hi = hdi(mean_epred)[2],
            prop_gt_char = sprintf("%.0f%%", 100 * prop_gt),
            .groups = "drop") %>% 
  mutate(across(where(is.numeric), ~round(., 3)))

ggplot() + # first plot: left y-axis label
  geom_richtext(aes(x = 0, y = 0, label = "<span style='color:grey70'>Depicted Boys</span><br>
                 <span style='color:black'>vs.</span><br>
                 <span style='color:grey50'>Depicted Girls</span><br>"),
                fill = NA, label.color = NA,
                size = 4.5,
                hjust = 0.5, vjust = 0.5, family = "Times New Roman", fontface = "bold") +
  theme_void() +
  ggplot(epreds_mod_strat_unknown_draws_contrasts_stim,
         aes(x = mean_epred, y = gender)) +
  stat_halfeye(aes(fill = after_stat(x < 0)), .width = 0.95, slab_alpha = 0.6, slab_linewidth = 0.5, slab_color = "black") +
  scale_fill_manual(values = c("TRUE" = "grey70", "FALSE" = "grey30"), guide = "none") +
  facet_grid(gender ~ comparison, scales = "free") +
  geom_text(data = epreds_mod_strat_unknown_draws_contrasts_summary_stim, # %>% filter(gender == gender_level),
            aes(x = hdi_hi + .12, label = prop_gt_char),
            size = 3.5,
            vjust = -3,
            family = "Times New Roman",
            fontface = "bold",
            color = "grey20") +
  theme_classic(base_size = 15, base_family = "Times New Roman") +
  geom_vline(xintercept = 0, alpha = 0.2, linetype = 2) +
  labs(x = "Difference in Predicted Probability of Choose-Known Strategy", 
       y = NULL, 
       title = "Contrasts: Choose-Known Strategy",
       subtitle = "Did participants choose the known domain more often for depicted boys or girls?") +
  scale_x_continuous(labels = function(x) sub("^(-?)0", "\\1", sprintf("%.1f", x)),
                     breaks = c(-.25, 0, .25),
                     expand = c(0, .15)) +
  scale_y_discrete(expand = expansion(c(0.1, NA))) +
  theme(axis.ticks.y = element_blank(),
        axis.text.y = element_blank(),
        plot.title = element_text(face = "bold", hjust = 0.5),
        plot.subtitle = element_text(hjust = .5)) +
  plot_layout(widths = c(1.1, 6))

ggsave("figs/sup_fig7.png", width = 8, height = 4)

Known Measure (Choose Domain Strategy)

Do children favor any one domain?

Data Preparation

## known condition ####
# domain bias strategy (analogue to the own-gender bias in the gender condition)
### data ####
df_mod_strat_known0 <- df_mod_stimgender %>% # do kids have a domain bias?
  filter(condition == "Known") %>% 
  mutate(domain_choice = factor(case_when(comparison == "Read v. Math" & stim_gender == "Dep. Girls" & response_stereotype == "consistent" ~ "Reading", 
                                          comparison == "Read v. Math" & stim_gender == "Dep. Girls" & response_stereotype == "inconsistent" ~ "Math", 
                                          comparison == "Read v. Math" & stim_gender == "Dep. Boys" & response_stereotype == "consistent" ~ "Math", 
                                          comparison == "Read v. Math" & stim_gender == "Dep. Boys" & response_stereotype == "inconsistent" ~ "Reading", 
                                          
                                          comparison == "Sci. v. Art" & stim_gender == "Dep. Girls" & response_stereotype == "consistent" ~ "Art", 
                                          comparison == "Sci. v. Art" & stim_gender == "Dep. Girls" & response_stereotype == "inconsistent" ~ "Science", 
                                          comparison == "Sci. v. Art" & stim_gender == "Dep. Boys" & response_stereotype == "consistent" ~ "Science", 
                                          comparison == "Sci. v. Art" & stim_gender == "Dep. Boys" & response_stereotype == "inconsistent" ~ "Art", 
                                          
                                          T ~ NA)))

sum(is.na(df_mod_strat_known0))
df_mod_strat_known0 %>%
  count(comparison, stim_gender, response_stereotype, domain_choice) %>% 
  ungroup() %>% 
  group_by(comparison, stim_gender) %>% 
  mutate(prop = n / sum(n))

df_mod_strat_known0_age <- df_mod_stimgender %>% 
  group_by(id_anon) %>% 
  summarize(age_mean = mean(age_raw), .groups = "drop") %>% 
  mutate(age_mean_standardized = as.numeric(scale(age_mean))) # mean center & standardize mean age

df_mod_strat_known <- df_mod_strat_known0 %>% left_join(df_mod_strat_known0_age)

# df_mod_strat_known %>% # is there a systematic preference for choosing a particular domain?
#   mutate(domain_choice = fct_relevel(domain_choice, "Art", "Reading", "Math", "Science")) %>%
#   ggplot(aes(x = stim_gender, fill = domain_choice)) +
#   geom_bar(position = "fill") + 
#   facet_grid(gender ~ comparison) +
#   scale_fill_viridis_d(begin = .4) +
#   theme_classic(base_family = "Times New Roman", base_size = 15) +
#   theme(plot.title = element_text(hjust = .5, face = "bold"),
#         panel.grid.major.x = element_blank(),
#         panel.grid.minor.x = element_blank()) + hline_at(.5)

df_mod_strat_known_science_art <- df_mod_strat_known %>%
  filter(comparison == "Sci. v. Art") %>%
  droplevels() %>%
  mutate(choose_science = factor(ifelse(domain_choice == "Science", 1, 0)))

df_mod_strat_known_read_math <- df_mod_strat_known %>%
  filter(comparison == "Read v. Math") %>%
  droplevels() %>%
  mutate(choose_math = factor(ifelse(domain_choice == "Math", 1, 0)))

Models

# custom modeling function
func_strat_known_mod <- function(df, response_var) {
  func_mod(formula = as.formula(
    paste(response_var, 
          "~ age_mean_standardized * gender * stim_gender +
          (stim_gender | id_anon) +
          (1 | stim_race)")),
    data = df,
    family = bernoulli(),
    prior = prior_mod)
}

# science-art model
model_path <- "rmd_cache/models/mod_strat_known_scienceart.rds"

if (!file.exists(model_path)) {

contrasts(df_mod_strat_known_science_art$choose_science) # model probability of response == science
mod_strat_known_scienceart <- func_strat_known_mod(df_mod_strat_known_science_art, "choose_science")
# conditional_effects(mod_strat_known_scienceart)

  saveRDS(mod_strat_known_scienceart, model_path)

} else {

  mod_strat_known_scienceart <- readRDS(model_path)

}

# saveRDS(mod_strat_known_scienceart, "rmd_cache/models/mod_strat_known_scienceart.rds")

# reading-math model
model_path <- "rmd_cache/models/mod_strat_known_readmath.rds"

if (!file.exists(model_path)) {

contrasts(df_mod_strat_known_read_math$choose_math) # model probability of response == science
mod_strat_known_readmath <- func_strat_known_mod(df_mod_strat_known_read_math, "choose_math")
# conditional_effects(mod_strat_known_readmath)

  saveRDS(mod_strat_known_readmath, model_path)

} else {

  mod_strat_known_readmath <- readRDS(model_path)

}

# saveRDS(mod_strat_known_readmath, "rmd_cache/models/mod_strat_known_readmath.rds")

Results

newdata_mod_strat_known_scienceart <- crossing(gender      = levels(mod_strat_known_scienceart$data$gender), # already dropped gender == "O" participant
                                               stim_gender = levels(mod_strat_known_scienceart$data$stim_gender),
                                               age_mean_standardized = seq(from = min(mod_strat_known_scienceart$data$age_mean_standardized),
                                                                           to = max(mod_strat_known_scienceart$data$age_mean_standardized),
                                                                           length.out = length_out))
newdata_mod_strat_known_readmath <- crossing(gender      = levels(mod_strat_known_readmath$data$gender), # already dropped gender == "O" participant
                                             stim_gender = levels(mod_strat_known_readmath$data$stim_gender),
                                             age_mean_standardized = seq(from = min(mod_strat_known_readmath$data$age_mean_standardized),
                                                                         to = max(mod_strat_known_readmath$data$age_mean_standardized),
                                                                         length.out = length_out))

epreds_mod_strat_known_scienceart <- local({
  set.seed(10003)
  epred_draws(mod_strat_known_scienceart, newdata_mod_strat_known_scienceart, re_formula = NA, ndraws = n_draws_posterior)
  }
  ) %>% ungroup() %>% 
  mutate(comparison = "Science v. Art")
epreds_mod_strat_known_readmath <- local({
  set.seed(10003)
  epred_draws(mod_strat_known_readmath, newdata_mod_strat_known_readmath, re_formula = NA, ndraws = n_draws_posterior)
  }
  ) %>% ungroup() %>% 
  mutate(comparison = "Math v. Reading")

epreds_mod_strat_known <- bind_rows(epreds_mod_strat_known_scienceart, epreds_mod_strat_known_readmath) # combine above two draws dfs

Pr(choose math (read v math) or science (sci v art) | gender, stim gender)

#### fig9 ####
epreds_mod_strat_known_draws <- epreds_mod_strat_known %>% # organize by draw (marginalize over other variables)
  group_by(.draw, comparison, gender, stim_gender) %>% 
  summarize(mean_epred = mean(.epred), .groups = "drop") 

epreds_mod_strat_known_draws_summary <- epreds_mod_strat_known_draws %>% # summarize
  group_by(comparison, gender, stim_gender) %>% 
  summarize(prop_gt = mean(mean_epred > 0.5),
            hdi_lo = hdi(mean_epred)[1],
            mean   = mean(mean_epred),
            hdi_hi = hdi(mean_epred)[2],
            prop_gt_char = sprintf("%.0f%%", 100 * prop_gt),
            .groups = "drop") %>% 
  mutate(across(where(is.numeric), ~round(., 3)))

ggplot(epreds_mod_strat_known_draws,
       aes(x = mean_epred, y = stim_gender)) +
  stat_halfeye(aes(fill = after_stat(x < 0.5)), .width = 0.95, slab_alpha = 0.6, slab_linewidth = 0.5, slab_color = "black") +
  scale_fill_manual(values = c("TRUE" = "grey70", "FALSE" = "grey30"), guide = "none") +
  facet_grid(gender ~ comparison, scales = "free_y") +
  geom_text(data = epreds_mod_strat_known_draws_summary,# %>% filter(gender == gender_level),
            aes(x = hdi_hi + .05, label = prop_gt_char),
            size = 3,
            vjust = -2,
            family = "Times New Roman",
            fontface = "bold",
            color = "grey20") +
  theme_classic(base_size = 12, base_family = "Times New Roman") +
  geom_vline(xintercept = 0.5, alpha = 0.2, linetype = 2) +
  labs(x = "Predicted Probability of Choosing Science or Math",
       y = "Depicted Gender",
       title = "Math or Science Choices",
       subtitle = "Did participants tend to choose math or science?") +
  scale_x_continuous(labels = function(x) sub("^(-?)0", "\\1", sprintf("%.2f", x)), breaks = seq(from = .25, to = .75, by = .25)) +
  scale_y_discrete(expand = expansion(c(0.1, NA))) +
  theme(axis.text.y = element_text(vjust = -1),
        axis.ticks.y = element_blank(),
        plot.title = element_text(face = "bold", hjust = 0.5),
        plot.subtitle = element_text(hjust = .5))

ggsave("figs/fig9.png", width = 5, height = 3)

#### sup_fig8 ####
# contrasting levels of stim_gender
epreds_mod_strat_known_draws_contrasts <- epreds_mod_strat_known_draws %>% # contrast predicted probabilities for depicted girls - vs depicted boys
  group_by(gender, comparison) %>% 
  compare_levels(mean_epred, by = stim_gender) %>% 
  ungroup()

epreds_mod_strat_known_draws_contrasts_summary <- epreds_mod_strat_known_draws_contrasts %>% # summarize the contrasts
  group_by(gender, comparison, stim_gender) %>% 
  summarize(prop_gt = mean(mean_epred > 0),
            hdi_lo = hdi(mean_epred)[1],
            mean   = mean(mean_epred),
            hdi_hi = hdi(mean_epred)[2],
            prop_gt_char = sprintf("%.0f%%", 100 * prop_gt),
            .groups = "drop") %>% 
  mutate(across(where(is.numeric), ~round(., 3)))

ggplot() + # first plot: left y-axis label
  geom_richtext(aes(x = 0, y = 0, label = "<span style='color:grey70'>Depicted Boys</span><br>
                 <span style='color:black'>vs.</span><br>
                 <span style='color:grey50'>Depicted Girls</span><br>"),
                fill = NA, label.color = NA,
                size = 3.5,
                hjust = 0.5, vjust = 0.5, family = "Times New Roman", fontface = "bold") +
  theme_void() +
  ggplot(epreds_mod_strat_known_draws_contrasts,
         aes(x = mean_epred, y = gender)) +
  stat_halfeye(aes(fill = after_stat(x < 0)), .width = 0.95, slab_alpha = 0.6, slab_linewidth = 0.5, slab_color = "black") +
  scale_fill_manual(values = c("TRUE" = "grey70", "FALSE" = "grey30"), guide = "none") +
  facet_grid(gender ~ comparison, scales = "free") +
  geom_text(data = epreds_mod_strat_known_draws_contrasts_summary,
            aes(x = hdi_hi + .12, label = prop_gt_char),
            size = 3.5,
            vjust = -2,
            family = "Times New Roman",
            fontface = "bold",
            color = "grey20") +
  theme_classic(base_size = 12, base_family = "Times New Roman") +
  geom_vline(xintercept = 0, alpha = 0.2, linetype = 2) +
  labs(x = "Difference in Predicted Probability of Choosing Math or Science",
       y = NULL,
       title = "Contrasts: Math or Science Choices",
       subtitle = "Did participants choose math or science more often for depicted boys or girls?") +
  scale_x_continuous(labels = function(x) sub("^(-?)0", "\\1", sprintf("%.1f", x)),
                     breaks = pretty_breaks(3),
                     expand = c(0, .15)) +
  scale_y_discrete(expand = expansion(c(0.1, NA))) +
  theme(axis.ticks.y = element_blank(),
        axis.text.y = element_blank(),
        plot.title = element_text(face = "bold", hjust = 0.5),
        plot.subtitle = element_text(hjust = .5)) +
  plot_layout(widths = c(1.1, 6))

ggsave("figs/sup_fig8.png", width = 6, height = 3)

Gender Measure (Choose Own-Gender Strategy)

Data Preparation

df_mod_gender0 <- df_mod_condition %>% 
  filter(condition == "Gender") %>% 
  mutate(gender_choice = factor(case_when(q_subtype %in% c("sci", "math") & response_stereotype == "consistent" & gender == "Boys"  ~ "own_gender",
                                          q_subtype %in% c("art", "read") & response_stereotype == "consistent" & gender == "Girls"  ~ "own_gender",
                                          q_subtype %in% c("art", "read") & response_stereotype == "inconsistent" & gender == "Boys"  ~ "own_gender",
                                          q_subtype %in% c("sci", "math") & response_stereotype == "inconsistent" & gender == "Girls"  ~ "own_gender",
                                          T ~ "other_gender")),
         q_subtype = case_when(q_subtype == "sci" ~ "Science",
                               q_subtype == "art" ~ "Art",
                               q_subtype == "read" ~ "Reading",
                               q_subtype == "math" ~ "Math",
                               T ~ NA),
         q_subtype = fct_relevel(q_subtype, "Science", "Math", "Reading", "Art")) %>% 
  select(gender_choice, age_raw, comparison, gender, q_subtype, id_anon, stim_race, response_stereotype)
  
df_mod_gender0 %>% count(q_subtype, response_stereotype, gender, gender_choice) # ensure accuracy of recoding
sum(is.na(df_mod_gender0$gender_choice)) # ensure accuracy of recoding

df_mod_gender0_age <- df_mod_gender0 %>% 
  group_by(id_anon) %>% 
  summarize(age_mean = mean(age_raw), .groups = "drop") %>% 
  mutate(age_mean_standardized = as.numeric(scale(age_mean))) # mean center & standardize mean age

df_mod_gender <- df_mod_gender0 %>% left_join(df_mod_gender0_age)

Model

# contrasts(df_mod_gender$gender_choice) # pr(own_gender)

model_path <- "rmd_cache/models/mod_strat_gendercond.rds"

if (!file.exists(model_path)) {

mod_strat_gendercond <- func_mod(formula = 
                                   gender_choice ~ 
                                   comparison + gender * q_subtype * age_mean_standardized +
                                   (q_subtype + comparison | id_anon) +
                                   (1 | stim_race),
                                 data = df_mod_gender,
                                 family = bernoulli(),
                                 prior = prior_mod)

  saveRDS(mod_strat_gendercond, model_path)

} else {

  mod_strat_gendercond <- readRDS(model_path)

}

# saveRDS(mod_strat_gendercond, "rmd_cache/models/mod_strat_gendercond.rds")

Checks

# plot(mod_strat_gendercond)
# pp_check(mod_strat_gendercond)

Results

newdata_mod_strat_gendercond <- crossing(gender = levels(mod_strat_gendercond$data$gender), # already dropped gender == "O" participant
                                        q_subtype = levels(mod_strat_gendercond$data$q_subtype),
                                        comparison = levels(mod_strat_gendercond$data$comparison),
                                        age_mean_standardized = seq(from = min(mod_strat_gendercond$data$age_mean_standardized),
                                                                    to = max(mod_strat_gendercond$data$age_mean_standardized),
                                                                    length.out = length_out))

epreds_mod_strat_gendercond <- local({
  set.seed(10003)
  epred_draws(mod_strat_gendercond, newdata_mod_strat_gendercond, re_formula = NA, ndraws = n_draws_posterior)
}
) %>% ungroup()

Pr(Choose Own Gender | gender, domain)

#### sup_fig9 ####
# probabilities
epreds_mod_strat_gendercond_draws <- epreds_mod_strat_gendercond %>% 
  group_by(.draw, gender, q_subtype) %>% # marginalize across comparison and age_mean_standardized
  summarize(mean_epred = mean(.epred), .groups = "drop") 

epreds_mod_strat_gendercond_draws_summary <- epreds_mod_strat_gendercond_draws %>%
  group_by(gender, q_subtype) %>% 
  summarize(prop_gt = mean(mean_epred > 0.5), # probability of an own_gender bias strategy
            hdi_lo = hdi(mean_epred)[1],
            mean   = mean(mean_epred),
            hdi_hi = hdi(mean_epred)[2],
            prop_gt_char = sprintf("%.0f%%", 100 * prop_gt),
            .groups = "drop") %>% 
  mutate(across(where(is.numeric), ~round(., 3)))

ggplot(epreds_mod_strat_gendercond_draws,
       aes(x = mean_epred, y = q_subtype)) +
  stat_halfeye(aes(fill = after_stat(x < 0.5)), .width = 0.95, slab_alpha = 0.6, slab_linewidth = 0.5, slab_color = "black") +
  scale_fill_manual(values = c("TRUE" = "grey70", "FALSE" = "grey30"), guide = "none") +
  facet_grid(q_subtype ~ gender, scales = "free") +
  geom_text(data = epreds_mod_strat_gendercond_draws_summary,
            aes(x = hdi_hi + .05, label = prop_gt_char),
            size = 3,
            vjust = -2,
            family = "Times New Roman",
            fontface = "bold",
            color = "grey20") +
  theme_classic(base_size = 11, base_family = "Times New Roman") +
  geom_vline(xintercept = 0.5, alpha = 0.2, linetype = 2) +
  labs(x = "Predicted Probability of Own-Gender Choice",
       y = NULL,
       title = "Own-Gender Choices",
       subtitle = "Did participants tend to choose their own gender?") +
  scale_x_continuous(labels = function(x) sub("^(-?)0", "\\1", sprintf("%.2f", x)),
                     breaks = seq(from = .25, to = .75, by = .25),
                     expand = c(0, .05)) +
  scale_y_discrete(expand = expansion(c(0.1, NA))) +
  theme(axis.text.y = element_blank(),
        axis.ticks.y = element_blank(),
        plot.title = element_text(face = "bold", hjust = 0.5),
        plot.subtitle = element_text(hjust = .5),
        axis.text.x = element_text(size = 10))

ggsave("figs/sup_fig9.png", width = 4, height = 3.5)

#### sup_fig10 ####
# contrasts by domain
epreds_mod_strat_gendercond_draws_contrasts_bydomain <- epreds_mod_strat_gendercond_draws %>% 
  group_by(gender) %>% # average across age
  compare_levels(mean_epred, by = q_subtype) %>% 
  ungroup() %>% 
  separate(q_subtype, c("domain1", "domain2"), sep = "-") %>% 
  mutate(domain1 = trimws(domain1),
         domain2 = trimws(domain2),
         y_axis_label = paste0(domain2, " v. ", domain1),
         y_axis_label_colored = glue(
           "<span style='color:grey70;font-weight:bold'>{domain2}</span>",
           "<span style='color:black;font-weight:bold'> v. </span>",
           "<span style='color:grey30;font-weight:bold'>{domain1}</span>"))

epreds_mod_strat_known_draws_contrasts_bydomain_summary <- epreds_mod_strat_gendercond_draws_contrasts_bydomain %>% 
  group_by(gender, y_axis_label, y_axis_label_colored) %>% 
  summarize(prop_gt = mean(mean_epred > 0), # proportion above zero
            hdi_lo = hdi(mean_epred)[1],
            mean   = mean(mean_epred),
            hdi_hi = hdi(mean_epred)[2],
            prop_gt_char = sprintf("%.0f%%", 100 * prop_gt),
            .groups = "drop") %>% 
  mutate(across(where(is.numeric), ~round(., 3)))

ggplot(epreds_mod_strat_gendercond_draws_contrasts_bydomain,
       aes(x = mean_epred, y = y_axis_label)) +
  stat_halfeye(aes(fill = after_stat(x < 0)),
               .width = 0.95, slab_alpha = 0.6,
               slab_linewidth = 0.5, slab_color = "black") +
  scale_fill_manual(values = c("TRUE" = "grey70", "FALSE" = "grey30"), guide = "none") +
  facet_grid(~ gender, scales = "free") +
  geom_text(data = epreds_mod_strat_known_draws_contrasts_bydomain_summary,
            aes(x = hdi_hi + .02, label = prop_gt_char),
            size = 3.5, vjust = -1.5,
            family = "Times New Roman",
            fontface = "bold",
            color = "grey20") +
  theme_classic(base_size = 12, base_family = "Times New Roman") +
  geom_vline(xintercept = 0, alpha = 0.2, linetype = 2) +
  labs(x = "Difference in Predicted Probability of Own-Gender Response",
       y = NULL,
       title = "Contrasts: Own-Gender Choices",
       subtitle = "In which domains were own-gender responses more frequent?") +
  scale_x_continuous(labels = function(x) sub("^(-?)0", "\\1", sprintf("%.1f", x)),
                     breaks = pretty_breaks(3)) +
  scale_y_discrete(labels = setNames(epreds_mod_strat_known_draws_contrasts_bydomain_summary$y_axis_label_colored,
                                     epreds_mod_strat_known_draws_contrasts_bydomain_summary$y_axis_label),
                   expand = expansion(c(0.05, NA))) +
  theme(axis.ticks.y = element_blank(),
        axis.text.y = element_markdown(vjust = -.5, family = "Times New Roman"),
        plot.title = element_text(face = "bold", hjust = 0.5),
        plot.subtitle = element_text(hjust = .5))

ggsave("figs/sup_fig10.png", width = 5.5, height = 4)

Interest and Efficacy

These analyses investigated how children’s stereotyping and age predicted their interest and efficacy ratings across academic domains.

“Stereotyping” was operationalized as the proportion of stereotypical responses that participants gave in the earlier trials that investigated measure. Stereotype proportions were computed in two, partially overlapping ways depending on whether academic domains were considered “controlled” or “uncontrolled” information. Analyses using either definition are presented side-by-side. The results were similar under either operationalization.

Stereotype proportions that were computed from “controlled information” only included participants’ responses from the gender and unknown measures. The academic domains were depicted individually in this pair of measures (e.g., art, science, etc.; or art-vs-other, science-vs-other, etc.) and, thus, the domain could be labeled qua its being experimentally controlled information. In contrast, the known measure had two domains depicted onscreen, and so could not be labeled with a single domain. Consequently, children’s responses in the known measure are excluded from this operationalization of stereotype proportions. Children’s stereotype proportions could thus be based on 1 to 16 trials’ data.

In contrast, stereotype proportions that were computed from domains as “uncontrolled information” included children’s responses from all three measures (i.e., including the known measure, which was absent from controlled information). This is an important distinction because we are now mixing independent and dependent data from our experiment. Specifically, whereas domain is an independent variable in the gender measure, it is a dependent variable in the known measure (and is either independent or dependent, depends on one’s perspective, in the unknown measure). Children’s stereotype proportions could thus be based on 1 to 32 trials’ data.

Neither operationalization seems perfect, either because it leaves out important information (controlled) or is conceptually complicated (uncontrolled). The method used by Michelle in one of her unpublished exploratory analyses was to compute a difference score between the number of times in which participants said that a boy was good at science compared to the number of times that they said that a girl was good at science. I refrained from doing this here because, strictly imitated, this would only work for the gender measure. While this method could be extended to the other measures, it would require a conceptual (and computational) departure from Michelle’s method and would produce an another distinction analogous to the one between “controlled” and “uncontrolled” information, above.

Data Preparation

# general df ####
df_interest_efficacy <- domain_gender %>%
  mutate(domain_as_controlledinfo = case_when(
    q_type == "gen" ~ paste0(q_subtype), # create grouping variable based on *type of comparison* (i.e., domain qua controlled information)
    q_type == "dom"  ~ paste0(str_extract(comparison, "^[^_]+")),
    q_type %in% c("interest", "efficacy") ~ paste0(q_subtype),
    T ~ NA),
    q_type = case_when(
      q_type %in% c("interest", "efficacy") ~ q_type,
      T ~ paste0(condition)),
    domain_as_uncontrolledinfo = case_when(
      q_type %in% c("Unknown", "Known") ~ paste0(response), # create grouping variable for unknown condition based on *response* (i.e., domain qua UNcontrolled information)
      T ~ paste0(domain_as_controlledinfo)),
    domain_as_controlledinfo = case_when(
      condition == "Known" ~ NA_character_, # convert values of known condition to NA so that they cannot be modeled (necessary when taking domain in the unknown condition as the IV, because there is no single domain IV in the known condition)
      T ~ paste0(domain_as_controlledinfo)),
    across(c(domain_as_controlledinfo, domain_as_uncontrolledinfo), 
           ~case_when(
             str_detect(., "sci") ~ "science",
             str_detect(., "read") ~ "reading",
             T ~ .)),
    response_recoded = case_when(
      !is.na(response_stereotype) ~ paste0(response_stereotype),
      !is.na(response_efficacy) ~ paste0(response_efficacy),
      !is.na(response_interest) ~ paste0(response_interest),
      T ~ NA),
    across(c(q_type, starts_with("domain_with")), ~ str_to_title(.))) %>% 
  select(id_anon, age_raw, gender, session, q_type, starts_with("domain_as"), stim_gender, response_recoded) %>% 
  rename(response = response_recoded)

# Domain as Controlled
df_interest_efficacy_controlled1 <- df_interest_efficacy %>% # controlled
  group_by(id_anon, q_type, domain_as_controlledinfo, response) %>% 
  mutate(count_controlled = ifelse(q_type %in% c("Interest", "Efficacy", "Known"), NA, n())) %>% 
  ungroup() %>% 
  filter(q_type != "Known") %>% 
  distinct(id_anon, gender, age_raw, session, q_type, domain_as_controlledinfo, response, count_controlled) %>% 
  group_by(id_anon, q_type, domain_as_controlledinfo) %>% 
  mutate(sum = sum(count_controlled),
         prop = round(count_controlled / sum, 2),
         response = ifelse(response == "inconsistent" & prop == 1, "fully_inconsistent", response),
         prop = ifelse(response == "fully_inconsistent", 0, prop),
         response = ifelse(response == "fully_inconsistent", "consistent", response)) %>% 
  ungroup() %>% 
  filter(response != "inconsistent")

df_interest_efficacy_controlled2 <- df_interest_efficacy_controlled1 %>%
  pivot_wider(id_cols = c(id_anon, gender, age_raw, session, domain_as_controlledinfo),
              names_from = q_type, 
              values_from =  c(response, prop)) %>% 
  mutate(prop = case_when(!is.na(prop_Unknown) ~ prop_Unknown,
                          !is.na(prop_Gender) ~ prop_Gender,
                          TRUE ~ NA_real_),
         condition = case_when(!is.na(prop_Unknown) ~ "Unknown",
                               !is.na(prop_Gender) ~ "Gender",
                               TRUE ~ NA_character_)) %>% 
  mutate(prop_standardized = as.numeric(scale(prop))) %>% 
  drop_na(prop)

df_interest_efficacy_controlled2_age <- df_interest_efficacy_controlled2 %>% # standardize age
  group_by(id_anon) %>% 
  summarize(age_mean = mean(age_raw), .groups = "drop") %>% 
  mutate(age_mean_standardized = as.numeric(scale(age_mean))) 

df_interest_efficacy_controlled <- df_interest_efficacy_controlled2 %>% # order the outcome variables; standardize proportion stereotypical predictor
  left_join(df_interest_efficacy_controlled2_age) %>% 
  mutate(across(where(is.character), ~as.factor(.)),
         response_Efficacy = factor(response_Efficacy, ordered = T, levels = c("not_good_at_all", "a_little_good", "good", "very_good")),
         response_Interest = factor(response_Interest, ordered = T, levels = c("not_at_all", "a_little", "some", "a_lot"))) %>% 
  rename(domain = domain_as_controlledinfo) %>% 
  select(id_anon, age_mean_standardized, gender, session, domain, response_Efficacy, response_Interest, condition, prop_standardized)

df_interest_efficacy_controlled %>% 
  count(condition, response_Interest)

# Domain as Uncontrolled
df_interest_efficacy_uncontrolled1 <- df_interest_efficacy %>% # uncontrolled
  mutate(q_type = case_when(q_type %in% c("Known", "Unknown") ~ "Known_Unknown", TRUE ~ paste0(q_type))) %>%   group_by(id_anon, q_type, domain_as_uncontrolledinfo, response) %>% 
  mutate(count_uncontrolled = ifelse(q_type %in% c("Interest", "Efficacy"), NA, n())) %>% 
  ungroup() %>% 
  distinct(id_anon, gender, age_raw, session, q_type, domain_as_uncontrolledinfo, response, count_uncontrolled) %>% 
  group_by(id_anon, q_type, domain_as_uncontrolledinfo) %>% 
  mutate(sum = sum(count_uncontrolled),
         prop = round(count_uncontrolled / sum, 2),
         response = ifelse(response == "inconsistent" & prop == 1, "fully_inconsistent", response),
         prop = ifelse(response == "fully_inconsistent", 0, prop),
         response = ifelse(response == "fully_inconsistent", "consistent", response)) %>% 
  ungroup() %>% 
  filter(response != "inconsistent", domain_as_uncontrolledinfo != "other")

df_interest_efficacy_uncontrolled2 <- df_interest_efficacy_uncontrolled1 %>%
  pivot_wider(id_cols = c(id_anon, gender, age_raw, session, domain_as_uncontrolledinfo),
              names_from = q_type, 
              values_from =  c(response, prop)) %>% 
  mutate(prop = case_when(!is.na(prop_Known_Unknown) ~ prop_Known_Unknown,
                          !is.na(prop_Gender) ~ prop_Gender,
                          TRUE ~ NA_real_),
         condition = case_when(!is.na(prop_Known_Unknown) ~ "Known_Unknown",
                               !is.na(prop_Gender) ~ "Gender",
                               TRUE ~ NA_character_)) %>% 
  mutate(prop_standardized = as.numeric(scale(prop))) %>% 
  drop_na(prop)

df_interest_efficacy_uncontrolled2_age <- df_interest_efficacy_uncontrolled2 %>% # standardize age
  group_by(id_anon) %>% 
  summarize(age_mean = mean(age_raw), .groups = "drop") %>% 
  mutate(age_mean_standardized = as.numeric(scale(age_mean))) 

df_interest_efficacy_uncontrolled <- df_interest_efficacy_uncontrolled2 %>% # order the outcome variables; standardize proportion stereotypical predictor
  left_join(df_interest_efficacy_uncontrolled2_age) %>% 
  mutate(across(where(is.character), ~as.factor(.)),
         response_Efficacy = factor(response_Efficacy, ordered = T, levels = c("not_good_at_all", "a_little_good", "good", "very_good")),
         response_Interest = factor(response_Interest, ordered = T, levels = c("not_at_all", "a_little", "some", "a_lot"))) %>% 
  rename(domain = domain_as_uncontrolledinfo) %>% 
  select(id_anon, age_mean_standardized, gender, session, domain, response_Efficacy, response_Interest, condition, prop_standardized)

df_interest_efficacy_uncontrolled %>% 
  count(condition, response_Interest)

Models

We model interest and efficacy using a series of hierarchical ordinal models (cumulative, logit link). However, while we retain the true, ordinal scale for modeling (Kruschke & Liddell, 2018), our predictions assume a continuous outcome for interpretability.

# cor(df_interest_efficacy_uncontrolled$prop_standardized, df_interest_efficacy_uncontrolled$age_mean_standardized)

# uncontrolled - interest
model_path <- "rmd_cache/models/mod_interest_uncontrolled.rds"

if (!file.exists(model_path)) {

mod_interest_uncontrolled <- func_mod(response_Interest ~ 
                                        cs(domain) +
                                        domain * gender * prop_standardized * age_mean_standardized +
                                        (domain * prop_standardized | id_anon),
                                      data = df_interest_efficacy_uncontrolled,
                                      family = cumulative(),
                                      inits = "0",
                                      prior = prior_mod[-4,])

  saveRDS(mod_interest_uncontrolled, model_path)

} else {

  mod_interest_uncontrolled <- readRDS(model_path)

}

# saveRDS(mod_interest_uncontrolled, "rmd_cache/models/mod_interest_uncontrolled.rds")

# uncontrolled - efficacy
model_path <- "rmd_cache/models/mod_efficacy_uncontrolled.rds"

if (!file.exists(model_path)) {

mod_efficacy_uncontrolled <- func_mod(response_Efficacy ~ 
                                        cs(domain) +
                                        domain * gender * prop_standardized * age_mean_standardized +
                                        (domain * prop_standardized | id_anon),
                                      data = df_interest_efficacy_uncontrolled,
                                      family = cumulative(),
                                      inits = "0",
                                      prior = prior_mod[-4,])

  saveRDS(mod_efficacy_uncontrolled, model_path)

} else {

  mod_efficacy_uncontrolled <- readRDS(model_path)

}

# saveRDS(mod_efficacy_uncontrolled, "rmd_cache/models/mod_efficacy_uncontrolled.rds")

# controlled - interest
model_path <- "rmd_cache/models/mod_interest_controlled.rds"

if (!file.exists(model_path)) {

mod_interest_controlled <- func_mod(response_Interest ~
                                        cs(domain) +
                                        domain * gender * prop_standardized * age_mean_standardized +
                                        (domain * prop_standardized | id_anon),
                                      data = df_interest_efficacy_controlled,
                                      family = cumulative(),
                                      inits = "0",
                                      prior = prior_mod[-4,])

  saveRDS(mod_interest_controlled, model_path)

} else {

  mod_interest_controlled <- readRDS(model_path)

}

# saveRDS(mod_interest_controlled, "rmd_cache/models/mod_interest_controlled.rds")

# controlled - efficacy
model_path <- "rmd_cache/models/mod_efficacy_controlled.rds"

if (!file.exists(model_path)) {

mod_efficacy_controlled <- func_mod(response_Efficacy ~
                                      cs(domain) +
                                      domain * gender * prop_standardized * age_mean_standardized +
                                      (domain * prop_standardized | id_anon),
                                    data = df_interest_efficacy_controlled,
                                    family = cumulative(),
                                    inits = "0",
                                    prior = prior_mod[-4,])

  saveRDS(mod_efficacy_controlled, model_path)

} else {

  mod_efficacy_controlled <- readRDS(model_path)

}

# saveRDS(mod_efficacy_controlled, "rmd_cache/models/mod_efficacy_controlled.rds")

Checks

# conditional_effects(mod_interest_uncontrolled, effects = "prop_standardized:domain",
#                     conditions = data.frame(gender = unique(mod_interest_uncontrolled$data$gender)))[[1]]
# conditional_effects(mod_interest_uncontrolled, effects = "age_mean_standardized", categorical = T)
# pp_check(mod_interest_uncontrolled, "ecdf_overlay_", ndraws = 1000)
# pp_check(mod_interest_uncontrolled, "bars_grouped", group = "domain", ndraws = 1000)

# conditional_effects(mod_interest_uncontrolled,
#                     effects = "gender",
#                     conditions = data.frame(domain = unique(mod_interest_uncontrolled$data$domain)),
#                     categorical = T)

### checks ####
# plot(mod_interest)
# pp_check(mod_interest, "bars_grouped", group = "condition", ndraws = 1000)
# pp_check(mod_interest, ndraws = 1000, type = "bars_grouped", group = "domain")
# pp_check(mod_interest, ndraws = 1000, type = "ecdf_overlay_grouped", group = "domain")
# conditional_effects(mod_interest, categorical = T)

# plot(mod_efficacy)
# pp_check(mod_efficacy, "bars_grouped", group = "condition", ndraws = 1000)
# pp_check(mod_efficacy, ndraws = 1000, type = "bars_grouped", group = "domain")
# pp_check(mod_efficacy, ndraws = 1000, type = "ecdf_overlay_grouped", group = "domain")
# conditional_effects(mod_efficacy, categorical = T)

Results

# analysis ####
# func_newdata_interest_efficacy <- function(mod){
#   crossing(domain    = levels(mod$data$domain),
#            gender    = levels(mod$data$gender),
#            prop_standardized = seq(
#              from = min(mod$data$prop_standardized),
#              to = max(mod$data$prop_standardized),
#              length.out = 20),
#            age_mean_standardized = seq(
#              from = min(mod$data$age_mean_standardized),
#              to = max(mod$data$age_mean_standardized),
#              length.out = 20))
#   }
# 
# newdata_interest_uncontrolled <- func_newdata_interest_efficacy(mod_interest_uncontrolled)
# newdata_efficacy_uncontrolled <- func_newdata_interest_efficacy(mod_efficacy_uncontrolled)
# newdata_interest_controlled   <- func_newdata_interest_efficacy(mod_interest_controlled)
# newdata_efficacy_controlled   <- func_newdata_interest_efficacy(mod_efficacy_controlled)
# 
# func_epred_interest_efficacy <- function(mod, df){
#   epred_draws(mod, newdata = df, re_formula = NA, ndraws = n_draws_posterior) %>% 
#     ungroup() %>%
#     mutate(across(where(is.character), ~str_to_title(.)),
#            mutate(across(where(is.character), ~ as.factor(.)))) 
# }
# 
# epred_interest_uncontrolled <- local({
#   set.seed(10003) # reproducible subsampling of posterior draws
#   func_epred_interest_efficacy(mod_interest_uncontrolled, newdata_interest_uncontrolled)
# }) %>% ungroup() 
# epred_efficacy_uncontrolled <- local({
#   set.seed(10003) # reproducible subsampling of posterior draws
#   func_epred_interest_efficacy(mod_efficacy_uncontrolled, newdata_efficacy_uncontrolled)
# }) %>% ungroup() 
# epred_interest_controlled <- local({
#   set.seed(10003) # reproducible subsampling of posterior draws
#   func_epred_interest_efficacy(mod_interest_controlled, newdata_interest_controlled)
# }) %>% ungroup()
# epred_efficacy_controlled <- local({
#   set.seed(10003) # reproducible subsampling of posterior draws
#   func_epred_interest_efficacy(mod_efficacy_controlled, newdata_efficacy_controlled)
# }) %>% ungroup()
# 
# func_epred_expected_value <- function(df){ # compute expected value
#   df %>% 
#     mutate(.category_numeric = as.numeric(.category)) %>% 
#     group_by(.draw, domain, gender, prop_standardized, age_mean_standardized) %>%
#     summarize(expected_value = sum(.category_numeric * .epred), .groups = "drop") 
# }
# 
# epred_expected_value_interest <- func_epred_expected_value(epred_interest_uncontrolled)
# epred_expected_value_efficacy <- func_epred_expected_value(epred_efficacy_uncontrolled)
# 
# epred_expected_value <- bind_rows(epred_expected_value_interest %>% mutate(dv = "Interest"),
#                                   epred_expected_value_efficacy %>% mutate(dv = "Efficacy"))
# 
# saveRDS(epred_expected_value, "rmd_cache/epreds/epred_expected_value.rds")
# 
# epred_expected_value_interest_controlled <- func_epred_expected_value(epred_interest_controlled)
# epred_expected_value_efficacy_controlled <- func_epred_expected_value(epred_efficacy_controlled)
# 
# epred_expected_value_controlled <- bind_rows(epred_expected_value_interest_controlled %>% mutate(dv = "Interest"),
#                                              epred_expected_value_efficacy_controlled %>% mutate(dv = "Efficacy"))
# 
# saveRDS(epred_expected_value_controlled, "rmd_cache/epreds/epred_expected_value_controlled.rds")
# 
# rm(epred_expected_value_interest,
#    epred_expected_value_efficacy,
#    epred_expected_value_interest_controlled,
#    epred_expected_value_efficacy_controlled,
#    epred_interest_uncontrolled,
#    epred_efficacy_uncontrolled,
#    epred_interest_controlled,
#    epred_efficacy_controlled)

epreds_path <- "rmd_cache/epreds/epred_expected_value.rds"
epred_expected_value <- readRDS(epreds_path)

epreds_path <- "rmd_cache/epreds/epred_expected_value_controlled.rds"
epred_expected_value_controlled <- readRDS(epreds_path)

Interest and Efficacy Surfaces

Analyses of the posterior distribution of the expected value of interest/efficacy, given specified predictor levels. The expected value could range from 1 to 4, with 4 indicating the strongest interest or efficacy. Darker values denote lower predicted interest or efficacy (see legend) and vice versa. For example, the first graph, below (of predicted interest computed from uncontrolled stereotype proportions), suggests that boys (top row) reported less interest in reading (third column) as they grew older and more stereotypical. This is suggested by the increasingly dark upper-right hand quadrant of the surface.

(Surface of E[interest/efficacy | domain, gender, stereotype proportion, age].)

Uncontrolled Stereotype Proportions

### prob * age ####
epred_expected_value_draws <- epred_expected_value %>%
  group_by(dv, domain, gender, prop_standardized, age_mean_standardized) %>% # organize draws by desired grouping vars
  summarize(expected_value = mean(expected_value), .groups = "drop")

ggplot(epred_expected_value_draws %>% # visualize interest (age * prop)
                          filter(dv == "Interest") %>% 
                          mutate(age_mean_standardized = age_mean_standardized * sd(df_mod_condition$age_raw) + mean(df_mod_condition$age_raw)),
       aes(x = age_mean_standardized,
           y = prop_standardized, 
           z = expected_value)) +
  facet_grid(gender ~ domain) +
  stat_summary_2d(fun = mean, bins = 30) +
  scale_fill_viridis_c(begin = 0, end = 1) +
  theme_minimal(base_family = "Times New Roman", base_size = 17) +
  labs(x = "Age (Years)", y = "Stereotype Proportion (Stand.)", title = "Predicted Interest Ratings", fill = "Predicted\nRating") + 
  theme(plot.title = element_text(face = "bold", hjust = .5))

ggplot(epred_expected_value_draws %>% # visualize efficacy (age * prop)
         filter(dv == "Efficacy") %>% 
                          mutate(age_mean_standardized = age_mean_standardized * sd(df_mod_condition$age_raw) + mean(df_mod_condition$age_raw)),
       aes(x = age_mean_standardized,
           y = prop_standardized, 
           z = expected_value)) +
  facet_grid(gender ~ domain) +
  stat_summary_2d(fun = mean, bins = 30) +
  scale_fill_viridis_c(begin = 0, end = 1) +
  theme_minimal(base_family = "Times New Roman", base_size = 17) +
  labs(x = "Age (Years)", y = "Stereotype Proportion (Stand.)", title = "Predicted Efficacy Ratings", fill = "Predicted\nRating") +
  theme(plot.title = element_text(face = "bold", hjust = .5))

Controlled Stereotype Proportions

### prob * age ####
epred_expected_value_controlled_draws <- epred_expected_value_controlled %>% 
  group_by(dv, domain, gender, prop_standardized, age_mean_standardized) %>% # organize draws by desired grouping vars
  summarize(expected_value = mean(expected_value), .groups = "drop")

ggplot(epred_expected_value_controlled_draws %>% # visualize interest (age * prop)
                          filter(dv == "Interest") %>% 
                          mutate(age_mean_standardized = age_mean_standardized * sd(df_mod_condition$age_raw) + mean(df_mod_condition$age_raw)),
       aes(x = age_mean_standardized,
           y = prop_standardized, 
           z = expected_value)) +
  facet_grid(gender ~ domain) +
  stat_summary_2d(fun = mean, bins = 30) +
  scale_fill_viridis_c(begin = 0, end = 1) +
  theme_minimal(base_family = "Times New Roman", base_size = 17) +
  labs(x = "Age (Years)", y = "Stereotype Proportion (Stand.)", title = "Predicted Interest Ratings", fill = "Predicted\nRating") + 
  theme(plot.title = element_text(face = "bold", hjust = .5))

ggplot(epred_expected_value_controlled_draws %>% # visualize efficacy (age * prop)
         filter(dv == "Efficacy") %>% 
                          mutate(age_mean_standardized = age_mean_standardized * sd(df_mod_condition$age_raw) + mean(df_mod_condition$age_raw)),
       aes(x = age_mean_standardized,
           y = prop_standardized, 
           z = expected_value)) +
  facet_grid(gender ~ domain) +
  stat_summary_2d(fun = mean, bins = 30) +
  scale_fill_viridis_c(begin = 0, end = 1) +
  theme_minimal(base_family = "Times New Roman", base_size = 17) +
  labs(x = "Age (Years)", y = "Stereotype Proportion (Stand.)", title = "Predicted Efficacy Ratings", fill = "Predicted\nRating") +
  theme(plot.title = element_text(face = "bold", hjust = .5))

Effects of Age and Stereotype Proportion

func_analyze_effect <- function(df, effect_var, label) {
  effect_var <- rlang::ensym(effect_var)
  effect_var_name <- rlang::as_name(effect_var)
  
  # 1. Marginalize across the *other* continuous variable(s)
  # (Here we assume the grid includes both age_mean_standardized and prop_standardized)
  marginalized_draws <- df %>%
    group_by(.draw, dv, domain, gender, !!effect_var) %>%
    summarize(expected_value = mean(expected_value), .groups = "drop")
  
  # 2. Summarize marginal posterior predictions
  marginalized_summary <- marginalized_draws %>%
    group_by(dv, domain, gender, !!effect_var) %>% 
    summarize(
      ci_lo = quantile(expected_value, .025),
      mean_expected_value = mean(expected_value),
      ci_hi = quantile(expected_value, .975),
      .groups = "drop"
    )
  
  # 3. Plot (optional)
  plot <- ggplot(marginalized_summary,
                 aes(x = !!effect_var, y = mean_expected_value,
                     color = gender, fill = gender)) +
    geom_ribbon(aes(ymin = ci_lo, ymax = ci_hi),
                alpha = 0.2, color = NA) +
    geom_line(linewidth = 1) +
    geom_hline(yintercept = 2.25, linewidth = 0.25, color = "lightgrey") +
    facet_grid(dv ~ domain) +
    labs(x = paste("Standardized", label),
         y = "Predicted Expected Value",
         title = "Predicted Efficacy and Interest",
         fill = "Gender",
         color = "Gender") +
    theme_classic(base_family = "Times New Roman", base_size = 16) +
    theme(plot.title = element_text(hjust = .5),
          legend.position = "bottom")
  
  # 4. Compute draw-wise slope (effect)
  effect_draws <- df %>%
    group_by(dv, domain, gender, .draw) %>%
    summarize(effect = cov(!!effect_var, expected_value) / var(!!effect_var),
              .groups = "drop")
  
  # 5. Summarize effects
  effect_summary <- effect_draws %>%
    group_by(dv, domain, gender) %>%
    summarize(
      hdi_lo = hdi(effect)[1],
      mean_effect = mean(effect),
      hdi_hi = hdi(effect)[2],
      .groups = "drop"
    )
  
  # 6. Compute gender contrasts
  contrast_draws <- effect_draws %>%
    group_by(dv, domain) %>%
    compare_levels(effect, by = "gender") %>%
    ungroup() %>%
    rename(diff_effect = effect)
  
  # 7. Summarize contrasts
  contrast_summary <- contrast_draws %>%
    group_by(dv, domain, gender) %>%
    summarize(
      prop_gt = mean(diff_effect > 0),
      hdi_lo = hdi(diff_effect)[1],
      mean_diff_effect = mean(diff_effect),
      hdi_hi = hdi(diff_effect)[2],
      .groups = "drop"
    )
  
  list(
    marginalized_draws = marginalized_draws,
    effect_draws = effect_draws,
    contrast_draws = contrast_draws,
    marginalized_summary = marginalized_summary,
    effect_summary = effect_summary,
    contrast_summary = contrast_summary,
    plot = plot
  )
}

Age Effects

Uncontrolled Stereotype Proportions

Plot of predicted effect of age on expected efficacy and interest, conditional on domain and gender.

# domain as uncontrolled
age <- func_analyze_effect(epred_expected_value, age_mean_standardized, "Age")
age$plot # plot of marginal effect of age on expected efficacy and interest, conditional on domain and gender

Predicted effects of age on expected interest and efficacy, conditional on domain and gender.

age$effect_summary # main effects of age on expected interest and efficacy, conditional on domain and gender

## # A tibble: 16 × 6
##    dv       domain  gender   hdi_lo mean_effect  hdi_hi
##    <chr>    <fct>   <fct>     <dbl>       <dbl>   <dbl>
##  1 Efficacy Art     Boys   -0.217      -0.118   -0.0165
##  2 Efficacy Art     Girls  -0.0872      0.00225  0.0935
##  3 Efficacy Math    Boys    0.185       0.340    0.487 
##  4 Efficacy Math    Girls   0.265       0.404    0.541 
##  5 Efficacy Reading Boys    0.0257      0.176    0.328 
##  6 Efficacy Reading Girls  -0.0625      0.102    0.274 
##  7 Efficacy Science Boys   -0.199      -0.0680   0.0781
##  8 Efficacy Science Girls  -0.227      -0.0519   0.111 
##  9 Interest Art     Boys   -0.183      -0.0670   0.0349
## 10 Interest Art     Girls  -0.0282      0.0621   0.160 
## 11 Interest Math    Boys   -0.141       0.0226   0.171 
## 12 Interest Math    Girls  -0.00443     0.147    0.309 
## 13 Interest Reading Boys   -0.281      -0.101    0.0675
## 14 Interest Reading Girls  -0.155       0.00833  0.180 
## 15 Interest Science Boys   -0.126      -0.0166   0.0808
## 16 Interest Science Girls  -0.190      -0.0536   0.0841

Contrasts of predicted effect size of age by gender (i.e., gender * age intxn).

age$contrast_summary # contrast main effects' size by gender (i.e., gender * age intxn)

## # A tibble: 8 × 7
##   dv       domain  gender       prop_gt   hdi_lo mean_diff_effect hdi_hi
##   <chr>    <fct>   <chr>          <dbl>    <dbl>            <dbl>  <dbl>
## 1 Efficacy Art     Girls - Boys   0.960 -0.00835           0.120   0.254
## 2 Efficacy Math    Girls - Boys   0.734 -0.139             0.0641  0.262
## 3 Efficacy Reading Girls - Boys   0.257 -0.310            -0.0738  0.154
## 4 Efficacy Science Girls - Boys   0.565 -0.213             0.0162  0.215
## 5 Interest Art     Girls - Boys   0.971 -0.00707           0.129   0.264
## 6 Interest Math    Girls - Boys   0.862 -0.102             0.124   0.336
## 7 Interest Reading Girls - Boys   0.814 -0.132             0.110   0.344
## 8 Interest Science Girls - Boys   0.324 -0.199            -0.0371  0.139

Controlled Stereotype Proportions

Plot of predicted effect of age on expected efficacy and interest, conditional on domain and gender.

# domain as controlled
age_controlled <- func_analyze_effect(epred_expected_value_controlled, age_mean_standardized, "Age")
age_controlled$plot # plot of marginal effect of age on expected efficacy and interest, conditional on domain and gender

Predicted effects of age on expected interest and efficacy, conditional on domain and gender.

age_controlled$effect_summary # average predicted effects of age on expected interest and efficacy, conditional on domain and gender.

## # A tibble: 16 × 6
##    dv       domain  gender  hdi_lo mean_effect  hdi_hi
##    <chr>    <fct>   <fct>    <dbl>       <dbl>   <dbl>
##  1 Efficacy Art     Boys   -0.216     -0.124   -0.0324
##  2 Efficacy Art     Girls  -0.0930    -0.00518  0.0853
##  3 Efficacy Math    Boys    0.154      0.292    0.442 
##  4 Efficacy Math    Girls   0.252      0.388    0.525 
##  5 Efficacy Reading Boys    0.0225     0.171    0.328 
##  6 Efficacy Reading Girls  -0.0418     0.123    0.302 
##  7 Efficacy Science Boys   -0.216     -0.0818   0.0513
##  8 Efficacy Science Girls  -0.178     -0.0199   0.148 
##  9 Interest Art     Boys   -0.166     -0.0578   0.0533
## 10 Interest Art     Girls  -0.0151     0.0779   0.173 
## 11 Interest Math    Boys   -0.111      0.0356   0.215 
## 12 Interest Math    Girls  -0.0127     0.143    0.306 
## 13 Interest Reading Boys   -0.273     -0.0979   0.0735
## 14 Interest Reading Girls  -0.122      0.0248   0.200 
## 15 Interest Science Boys   -0.132     -0.0297   0.0700
## 16 Interest Science Girls  -0.177     -0.0452   0.0884

Contrasts of predicted effect size of age by gender (i.e., gender * age intxn).

age_controlled$contrast_summary # contrast average predicted effect sizes by gender (i.e., gender * age intxn)

## # A tibble: 8 × 7
##   dv       domain  gender       prop_gt   hdi_lo mean_diff_effect hdi_hi
##   <chr>    <fct>   <chr>          <dbl>    <dbl>            <dbl>  <dbl>
## 1 Efficacy Art     Girls - Boys   0.964 -0.00825           0.118   0.252
## 2 Efficacy Math    Girls - Boys   0.832 -0.0967            0.0958  0.275
## 3 Efficacy Reading Girls - Boys   0.342 -0.263            -0.0486  0.184
## 4 Efficacy Science Girls - Boys   0.727 -0.148             0.0619  0.261
## 5 Interest Art     Girls - Boys   0.970 -0.00877           0.136   0.270
## 6 Interest Math    Girls - Boys   0.829 -0.118             0.107   0.344
## 7 Interest Reading Girls - Boys   0.842 -0.101             0.123   0.367
## 8 Interest Science Girls - Boys   0.424 -0.181            -0.0155  0.155

Stereotype Proportion Effects

Uncontrolled

Predicted effect of stereotype proportion on expected efficacy and interest, conditional on domain and gender.

# domain as uncontrolled
prop <- func_analyze_effect(epred_expected_value, prop_standardized, "Stereotype Proportion")
prop$plot # plot of marginal predicted effect of stereotype proportion on expected efficacy and interest, conditional on domain and gender

Predicted effects of stereotype proportion on expected interest and efficacy, conditional on domain and gender.

prop$effect_summary # average predicted effects of stereotype proportion on expected interest and efficacy, conditional on domain and gender

## # A tibble: 16 × 6
##    dv       domain  gender  hdi_lo mean_effect  hdi_hi
##    <chr>    <fct>   <fct>    <dbl>       <dbl>   <dbl>
##  1 Efficacy Art     Boys   -0.0982    -0.00328  0.0862
##  2 Efficacy Art     Girls  -0.0388     0.0358   0.121 
##  3 Efficacy Math    Boys   -0.0318     0.109    0.260 
##  4 Efficacy Math    Girls  -0.288     -0.163   -0.0356
##  5 Efficacy Reading Boys   -0.133      0.00952  0.160 
##  6 Efficacy Reading Girls  -0.159     -0.0241   0.101 
##  7 Efficacy Science Boys   -0.0315     0.0872   0.205 
##  8 Efficacy Science Girls  -0.103      0.0409   0.166 
##  9 Interest Art     Boys   -0.138     -0.0296   0.0799
## 10 Interest Art     Girls  -0.0723     0.0154   0.100 
## 11 Interest Math    Boys   -0.165     -0.00424  0.156 
## 12 Interest Math    Girls  -0.134      0.0174   0.158 
## 13 Interest Reading Boys   -0.329     -0.179   -0.0193
## 14 Interest Reading Girls  -0.0874     0.0365   0.164 
## 15 Interest Science Boys   -0.143     -0.0434   0.0498
## 16 Interest Science Girls  -0.168     -0.0355   0.0791

Contrasts of predicted effect size of stereotyping by gender (i.e., gender * age intxn).

prop$contrast_summary # contrast predicted average effect sizes by gender (i.e., gender * stereotype proportion intxn)

## # A tibble: 8 × 7
##   dv       domain  gender       prop_gt  hdi_lo mean_diff_effect  hdi_hi
##   <chr>    <fct>   <chr>          <dbl>   <dbl>            <dbl>   <dbl>
## 1 Efficacy Art     Girls - Boys  0.753  -0.0730          0.0390   0.161 
## 2 Efficacy Math    Girls - Boys  0.0035 -0.476          -0.272   -0.0922
## 3 Efficacy Reading Girls - Boys  0.388  -0.239          -0.0336   0.154 
## 4 Efficacy Science Girls - Boys  0.304  -0.216          -0.0463   0.143 
## 5 Interest Art     Girls - Boys  0.75   -0.0862          0.0450   0.180 
## 6 Interest Math    Girls - Boys  0.584  -0.182           0.0217   0.230 
## 7 Interest Reading Girls - Boys  0.988   0.0229          0.215    0.408 
## 8 Interest Science Girls - Boys  0.547  -0.148           0.00789  0.152

Controlled

Plot of predicted effect of stereotype proportion on expected efficacy and interest, conditional on domain and gender.

# domain as controlled
prop_controlled <- func_analyze_effect(epred_expected_value_controlled, prop_standardized, "Stereotype Proportion")
prop_controlled$plot # plot of marginal predicted effect of stereotype proportion on expected efficacy and interest, conditional on domain and gender

Predicted effects of stereotype proportion on expected interest and efficacy, conditional on domain and gender.

prop_controlled$effect_summary # average predicted effects of stereotype proportion on expected interest and efficacy, conditional on domain and gender

## # A tibble: 16 × 6
##    dv       domain  gender  hdi_lo mean_effect  hdi_hi
##    <chr>    <fct>   <fct>    <dbl>       <dbl>   <dbl>
##  1 Efficacy Art     Boys   -0.100     -0.0155  0.0717 
##  2 Efficacy Art     Girls  -0.0168     0.0665  0.150  
##  3 Efficacy Math    Boys   -0.182     -0.0554  0.0830 
##  4 Efficacy Math    Girls  -0.251     -0.126   0.0125 
##  5 Efficacy Reading Boys   -0.104      0.0296  0.180  
##  6 Efficacy Reading Girls  -0.176     -0.0472  0.0962 
##  7 Efficacy Science Boys   -0.0578     0.0591  0.182  
##  8 Efficacy Science Girls  -0.106      0.0198  0.152  
##  9 Interest Art     Boys   -0.142     -0.0405  0.0663 
## 10 Interest Art     Girls  -0.0447     0.0430  0.134  
## 11 Interest Math    Boys   -0.107      0.0400  0.201  
## 12 Interest Math    Girls  -0.156     -0.0183  0.131  
## 13 Interest Reading Boys   -0.307     -0.149   0.00124
## 14 Interest Reading Girls  -0.128     -0.00748 0.125  
## 15 Interest Science Boys   -0.126     -0.0285  0.0765 
## 16 Interest Science Girls  -0.128     -0.00673 0.105

Contrasts of predicted effect size of stereotyping by gender (i.e., gender * age intxn).

prop_controlled$contrast_summary # contrast predicted average effect sizes by gender (i.e., gender * stereotype proportion intxn)

## # A tibble: 8 × 7
##   dv       domain  gender       prop_gt  hdi_lo mean_diff_effect hdi_hi
##   <chr>    <fct>   <chr>          <dbl>   <dbl>            <dbl>  <dbl>
## 1 Efficacy Art     Girls - Boys   0.920 -0.0276           0.0820  0.203
## 2 Efficacy Math    Girls - Boys   0.224 -0.254           -0.0711  0.115
## 3 Efficacy Reading Girls - Boys   0.224 -0.267           -0.0768  0.126
## 4 Efficacy Science Girls - Boys   0.334 -0.222           -0.0393  0.126
## 5 Interest Art     Girls - Boys   0.890 -0.0532           0.0835  0.215
## 6 Interest Math    Girls - Boys   0.294 -0.249           -0.0583  0.166
## 7 Interest Reading Girls - Boys   0.914 -0.0564           0.142   0.340
## 8 Interest Science Girls - Boys   0.614 -0.125            0.0218  0.171

Session Information

sessionInfo()

## R version 4.5.2 (2025-10-31)
## Platform: aarch64-apple-darwin20
## Running under: macOS Tahoe 26.2
## 
## Matrix products: default
## BLAS:   /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib 
## LAPACK: /Library/Frameworks/R.framework/Versions/4.5-arm64/Resources/lib/libRlapack.dylib;  LAPACK version 3.12.1
## 
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
## 
## time zone: America/New_York
## tzcode source: internal
## 
## attached base packages:
## [1] parallel  stats     graphics  grDevices utils     datasets  methods  
## [8] base     
## 
## other attached packages:
##  [1] lubridate_1.9.4   forcats_1.0.1     stringr_1.6.0     dplyr_1.1.4      
##  [5] purrr_1.2.1       readr_2.1.6       tidyr_1.3.2       tibble_3.3.1     
##  [9] ggplot2_4.0.1     tidyverse_2.0.0   tidybayes_3.0.7   scales_1.4.0     
## [13] posterior_1.6.1   patchwork_1.3.2   officer_0.7.3     janitor_2.2.1    
## [17] ggtext_0.1.2      glue_1.8.0        ggdist_3.3.3      flextable_0.9.10 
## [21] brms_2.23.0       Rcpp_1.1.1        bayestestR_0.17.0 bayesplot_1.15.0 
## 
## loaded via a namespace (and not attached):
##  [1] gridExtra_2.3           inline_0.3.21           rlang_1.1.7            
##  [4] magrittr_2.0.4          snakecase_0.11.1        matrixStats_1.5.0      
##  [7] compiler_4.5.2          mgcv_1.9-3              loo_2.9.0              
## [10] systemfonts_1.3.1       vctrs_0.7.0             pkgconfig_2.0.3        
## [13] arrayhelpers_1.1-0      crayon_1.5.3            fastmap_1.2.0          
## [16] backports_1.5.0         labeling_0.4.3          utf8_1.2.6             
## [19] cmdstanr_0.9.0          rmarkdown_2.30          markdown_2.0           
## [22] tzdb_0.5.0              ps_1.9.1                ragg_1.5.0             
## [25] bit_4.6.0               xfun_0.56               cachem_1.1.0           
## [28] litedown_0.9            jsonlite_2.0.0          uuid_1.2-1             
## [31] R6_2.6.1                bslib_0.9.0             stringi_1.8.7          
## [34] RColorBrewer_1.1-3      StanHeaders_2.32.10     jquerylib_0.1.4        
## [37] rstan_2.32.7            knitr_1.51              splines_4.5.2          
## [40] Matrix_1.7-4            timechange_0.3.0        tidyselect_1.2.1       
## [43] rstudioapi_0.18.0       abind_1.4-8             yaml_2.3.12            
## [46] codetools_0.2-20        processx_3.8.6          pkgbuild_1.4.8         
## [49] lattice_0.22-7          withr_3.0.2             bridgesampling_1.2-1   
## [52] S7_0.2.1                askpass_1.2.1           coda_0.19-4.1          
## [55] evaluate_1.0.5          RcppParallel_5.1.11-1   zip_2.3.3              
## [58] xml2_1.5.2              pillar_1.11.1           tensorA_0.36.2.1       
## [61] checkmate_2.3.3         stats4_4.5.2            insight_1.4.4          
## [64] distributional_0.6.0    generics_0.1.4          vroom_1.6.7            
## [67] hms_1.1.4               commonmark_2.0.0        rstantools_2.6.0       
## [70] gdtools_0.4.4           tools_4.5.2             data.table_1.18.0      
## [73] mvtnorm_1.3-3           grid_4.5.2              QuickJSR_1.8.1         
## [76] nlme_3.1-168            cli_3.6.5               textshaping_1.0.4      
## [79] fontBitstreamVera_0.1.1 svUnit_1.0.8            viridisLite_0.4.2      
## [82] Brobdingnag_1.2-9       gtable_0.3.6            sass_0.4.10            
## [85] digest_0.6.39           fontquiver_0.2.1        farver_2.1.2           
## [88] htmltools_0.5.9         lifecycle_1.0.5         fontLiberation_0.1.0   
## [91] gridtext_0.1.5          openssl_2.3.4           bit64_4.6.0-1

end_total <- Sys.time()
end_total - start_total

## Time difference of 1.599539 mins

Academic Gender Stereotypes: Pilot Analyses

Jared Vasil

2026-02-03

Introduction

Analysis Plan

Data Preparation

Globals

Data Import

Data Cleaning

Demographics

Sample Size

Age (Years)

Race

Intersession Intervals

Analyses

Measures (aka Conditions)

Data Preparation

Logistic Regression

Model

Checks

Results

Pr(stereotyping | trial type, measure, age, gender)

Pr(stereotyping | measure, gender)

Pr(girls’ stereotyping > boys’ stereotyping | measure)

Pr(stereotyping in measure X > measure Y | gender)

Pr(effect size of measure in boys > in girls | measure comparison)

Pr(stereotyping | age, measure, gender)

Pr(age effect size in measure X > measure Y | gender, measure comparison)

Pr(age effect > chance | gender, measure comparison)

Multivariate Regression (Correlated Residuals)

Data Preparation

Q-Q Plots

Model

Results

Pr(residual correlation of stereotyping in measure X > chance | measures correlated)

Figure

Depicted Gender and Strategies

Depicted Gender

Data Preparation

Model

Checks

Results

Pr(stereotyping | gender, depicted gender, domain comparison)

Pr(stereotyping for depicted boys > depicted girls | gender, domain comparison)

Strategies

Unknown Measure (Choose Known Strategy)

Data Preparation

Model

Checks

Results

Pr(choose known domain | gender, depicted gender, domain comparison)

Pr(choose known domain with depicted boy > depicted girl | gender, domain comparison)

Known Measure (Choose Domain Strategy)

Data Preparation

Models

Results

Pr(choose math (read v math) or science (sci v art) | gender, stim gender)

Gender Measure (Choose Own-Gender Strategy)

Data Preparation

Model

Checks

Results

Pr(Choose Own Gender | gender, domain)

Interest and Efficacy

Data Preparation

Models

Checks

Results

Interest and Efficacy Surfaces

Uncontrolled Stereotype Proportions

Controlled Stereotype Proportions

Effects of Age and Stereotype Proportion

Age Effects

Uncontrolled Stereotype Proportions

Controlled Stereotype Proportions

Stereotype Proportion Effects

Uncontrolled

Controlled

Session Information