setup

Let’s set up our libraries and import our data. Nothing unusual here.

knitr::opts_chunk$set(fig.width = 9, fig.height = 6)
library(rethinking)
library(tidyverse)

library(tidybayes)
library(tidybayes.rethinking)

data(UCBadmit)

But, let’s look at the data we have. In the past, you may have heard me talking about this as an example of Simpson’s paradox. It’s a famous demonstration of that.

Simpson’s paradox describes a phenomenon where certain statistical descriptions reverse depending on what levels of aggregation we look at. Here, for example, we see a situation where women have a lower admissions rate at the university level but a higher admissions rate at the department level.

We call this a paradox because many non-statisticians have the expectation that these kinds of reversals shouldn’t be possible. But, with our statistical skills, we can analyze the matter to understand the underlying dynamic more clearly.

dept	applicant.gender	admit	reject	applications
A	male	512	313	825
A	female	89	19	108
B	male	353	207	560
B	female	17	8	25
C	male	120	205	325
C	female	202	391	593
D	male	138	279	417
D	female	131	244	375
E	male	53	138	191
E	female	94	299	393
F	male	22	351	373
F	female	24	317	341

Causal diagrams (DAGs)

The issue can be visualized in a simple way like this.

library(dagitty) # version 0.3-4
library(ggdag)   # version 0.2.13.9000 (from source)

dagify(
  admission ~ gender,
  admission ~ department,
  department ~ gender,
  exposure = c("gender", "department"),
  outcome = "admission"
) |> 
  tidy_dagitty(layout = "circle") |>
  node_status() |>
  ggplot(aes_dag(color=status)) + 
    geom_dag(size=2.1) +
    theme_dag_gray(legend.position="none",aspect.ratio=.66) +
    geom_dag_node(size=40) +
    geom_dag_label(aes(label = name), fill = "white", color="black", size=4) +
    coord_cartesian(xlim = c(-0.7, 1.1), ylim = c(-1.1, 1.1)) +
    scale_color_manual(values=c("exposure" = "#F8766D", "outcome" = "#619CFF", "latent"="LightGrey"))

It is possible that gender is predictive of admission, possibly because admissions committees are discriminating.

It is also possible that gender predicts admission only indirect via department applications. It might be the case, for example, that women are applying to different departments than men are, and this difference of preference, rather than different admission standards, is what causes the disparity.

So how can we figure that out?

Simulations

Before we move forward, let’s actually simulate some data to see how this dynamic might manifest under very controlled circumstances. This is an important step in any model. We want to make sure that our model behaves the way we intend it to, which sometimes isn’t the case.

Here, let’s create two simulations: men and women applying to two departments of different levels of difficulty. In the first simulation, admission rates are the same for each gender (no committee bias), and in the second simulation women are admitted at a lower rate (committee bias).

Critically, there is also a different preference between men and women for the department of application. (We’re using a Bernoulli distribution here (rbern), which is a simple two-outcome “coin flip” with a fixed probability. A Bernoulli distribution is functionally the same as a Binomial distribution with size=1.)

N <- 100000
rate_unbiased <- matrix(
  data = c(0.30, 0.10, 
           0.30, 0.10),
  nrow = 2,
  dimnames = list( c("A", "B"), c("male", "female") )
)
rate_biased <- matrix(
  data = c(0.30, 0.10, 
           0.20, 0.05),
  nrow = 2,
  dimnames = list( c("A", "B"), c("male", "female") )
)

sim1 <- tibble(
  applicant.gender = rbern( N ) |>
    as_factor() |> fct_recode("male" = "0", "female" = "1"),
  dept = rbern( N , ifelse(applicant.gender == "male", 0.4, 0.6) ) |>
    as_factor() |> fct_recode("A" = "0", "B" = "1"),
  rate = rate_unbiased[cbind(dept, applicant.gender)],
  admitted = rbern( N , prob=rate )
)
sim2 <- tibble(
  applicant.gender = rbern( N ) |>
    as_factor() |> fct_recode("male" = "0", "female" = "1"),
  dept = rbern( N , ifelse(applicant.gender == "male", 0.4, 0.6) ) |>
    as_factor() |> fct_recode("A" = "0", "B" = "1"),
  rate = rate_biased[cbind(dept, applicant.gender)],
  admitted = rbern( N , prob=rate )
)

And the results…

sim1 |> group_by(applicant.gender) |> summarize(mean(admitted))

## # A tibble: 2 × 2
##   applicant.gender `mean(admitted)`
##   <fct>                       <dbl>
## 1 male                        0.220
## 2 female                      0.181

sim2 |> group_by(applicant.gender) |> summarize(mean(admitted))

## # A tibble: 2 × 2
##   applicant.gender `mean(admitted)`
##   <fct>                       <dbl>
## 1 male                        0.220
## 2 female                      0.108

sim1 |> group_by(applicant.gender, dept) |> summarize(mean(admitted))

## `summarise()` has grouped output by 'applicant.gender'. You can override using
## the `.groups` argument.

## # A tibble: 4 × 3
## # Groups:   applicant.gender [2]
##   applicant.gender dept  `mean(admitted)`
##   <fct>            <fct>            <dbl>
## 1 male             A                0.300
## 2 male             B                0.100
## 3 female           A                0.300
## 4 female           B                0.101

sim2 |> group_by(applicant.gender, dept) |> summarize(mean(admitted))

## `summarise()` has grouped output by 'applicant.gender'. You can override using
## the `.groups` argument.

## # A tibble: 4 × 3
## # Groups:   applicant.gender [2]
##   applicant.gender dept  `mean(admitted)`
##   <fct>            <fct>            <dbl>
## 1 male             A               0.300 
## 2 male             B               0.0978
## 3 female           A               0.199 
## 4 female           B               0.0479

Sure enough, women have a total rate of admission that is lower in both simulations, despite the fact that we have committee-bias only in the second.

compile

So let’s make a model from this. We’ll need a few helper columns to turn our data into something that we can use. Then, we’ll define two models: one that evaluates just for gender (lr_simple) and one that evalutes for both gender and department.

consolidate_data <- function(df) {
  df |>
    group_by(dept, applicant.gender) |>
    summarize(
      admit = sum(admitted),
      reject = sum(!admitted),
      applications = admit + reject,
      .groups="drop"
    )
}
rename_cols <- function(df) {
  df |>
    select(
      A = admit,
      N = applications,
      G = applicant.gender,
      D = dept
    )
}

df_sim1 <- sim1 |> consolidate_data() |> rename_cols()
df_sim2 <- sim2 |> consolidate_data() |> rename_cols()

Then, our models. These are both “logistic regressions”, which means that our output variable isn’t a continuous number but rather a simple binary (admitted or not admitted).

There are a few important changes here, but they look pretty similar to our previous models. Just two things are different. Instead of having a likelihood function based on a normal distribution, this one is based on a binomial distribution (A ~ binomial(N, p)). Second, our linear model needs a “link function” to conver its possibe values into something that matches the requirements of our model structure (logit(p) ~ a[G]). Other than that, it all works exactly the way the rest of our model does.

If you don’t understand yet, that’s okay, but come ready to discuss together.

## simple
lr_simple <- alist(
  A ~ binomial(N, p),
  logit(p) <- a[G],
  a[G] ~ normal(0, 1)
)

simple_model1 <- ulam(
  lr_simple,
  data=df_sim1,
  chains=4,
  cores=4,
  messages=FALSE,
  refresh=0
)

simple_model2 <- ulam(
  lr_simple,
  data=df_sim2,
  chains=4,
  cores=4,
  messages=FALSE,
  refresh=0
)

## multi
lr_multi <- alist(
  A ~ binomial(N, p),
  logit(p) <- a[G,D],
  matrix[G,D]:a ~ normal(0, 1)
)

multi_model1 <- ulam(
  lr_multi,
  data=df_sim1,
  chains=4,
  cores=4,
  messages=FALSE,
  refresh=0
)

multi_model2 <- ulam(
  lr_multi,
  data=df_sim2,
  chains=4,
  cores=4,
  messages=FALSE,
  refresh=0
)

precis(simple_model1, depth=2)

##           mean         sd      5.5%     94.5%     rhat ess_bulk
## a[1] -1.263913 0.01092542 -1.281091 -1.246642 1.002056 1404.406
## a[2] -1.511251 0.01174238 -1.530113 -1.492515 1.003152 1406.586

precis(simple_model2, depth=2)

##           mean         sd      5.5%     94.5%     rhat ess_bulk
## a[1] -1.267974 0.01087992 -1.285266 -1.250272 1.004783 1515.818
## a[2] -2.108358 0.01399520 -2.130149 -2.086042 1.000579 1240.267

precis(multi_model1, depth=3)

##              mean         sd       5.5%      94.5%      rhat ess_bulk
## a[1,1] -0.8456614 0.01271071 -0.8664640 -0.8249427 1.0032956 2077.787
## a[2,1] -0.8457551 0.01580715 -0.8713376 -0.8209453 1.0049709 2340.804
## a[1,2] -2.1955477 0.02352516 -2.2327639 -2.1587013 1.0023756 2151.318
## a[2,2] -2.1844820 0.01933002 -2.2140400 -2.1539559 0.9997393 2148.263

precis(multi_model2, depth=3)

##              mean         sd       5.5%      94.5%     rhat ess_bulk
## a[1,1] -0.8483619 0.01268254 -0.8681213 -0.8275126 1.002790 2275.167
## a[2,1] -1.3956724 0.01784664 -1.4242054 -1.3674374 1.000689 2295.435
## a[1,2] -2.2207892 0.02535860 -2.2614709 -2.1813527 1.002315 1979.838
## a[2,2] -2.9883028 0.02741480 -3.0322119 -2.9440843 1.003158 2265.350

real data

now let’s apply our model to the real data

df <- UCBadmit |> rename_cols()
ucb_model_simple <- ulam(
  lr_simple,
  data=df,
  chains=4,
  cores=4,
  messages=FALSE,
  refresh=0
)


ucb_model_multi <- ulam(
  lr_multi,
  data=df,
  chains=4,
  cores=4,
  messages=FALSE,
  refresh=0
)

precis(ucb_model_simple, depth=2)

##            mean         sd       5.5%      94.5%     rhat ess_bulk
## a[1] -0.8295011 0.05199436 -0.9133484 -0.7484886 1.004321 1244.102
## a[2] -0.2217212 0.04026675 -0.2846887 -0.1579184 1.000804 1187.509

precis(ucb_model_multi, depth=3)

##              mean         sd        5.5%      94.5%      rhat ess_bulk
## a[1,1]  1.4638967 0.24561291  1.08081621  1.8705604 1.0033459 3034.647
## a[2,1]  0.4903475 0.07269761  0.37822650  0.6065609 1.0067840 3096.356
## a[1,2]  0.6654304 0.39366023  0.04160832  1.2993269 0.9996281 3336.659
## a[2,2]  0.5328740 0.08879126  0.38930981  0.6739907 1.0027931 3382.761
## a[1,3] -0.6565170 0.08588772 -0.78983841 -0.5223414 0.9993391 2515.933
## a[2,3] -0.5247377 0.11249573 -0.70867413 -0.3454547 1.0009317 2232.840
## a[1,4] -0.6199629 0.10534940 -0.78450524 -0.4509935 1.0063708 2851.781
## a[2,4] -0.6974509 0.10621764 -0.86493819 -0.5278910 1.0081584 2979.186
## a[1,5] -1.1427886 0.11403406 -1.32830890 -0.9654053 1.0030967 2707.792
## a[2,5] -0.9412246 0.15606820 -1.18643311 -0.6976691 1.0004997 3244.497
## a[1,6] -2.4914726 0.19428600 -2.81331243 -2.1894480 0.9999052 2659.320
## a[2,6] -2.6673009 0.20341311 -3.00018081 -2.3467955 1.0053495 2147.416

analysis

let’s look at the admissions probability, both in total and for specific departments

ucb_model_simple |>
  recover_types(df) |>
  spread_draws(a[G]) |>
  pivot_wider(names_from="G", values_from="a") |>
  mutate(
    diff_prob = inv_logit(female) - inv_logit(male)
  ) |>
  ggplot( aes(x=diff_prob)) +
    geom_density(bw=0.003) +
    labs(x = "Gender-based admissions probability difference", color="Dept")

ucb_model_multi |>
  recover_types(df) |>
  spread_draws(a[G,D]) |>
  pivot_wider(names_from="G", values_from="a") |>
  mutate(
    diff_prob = plogis(female) - plogis(male)
  ) |>
  ggplot(aes(x=diff_prob) ) +
    geom_density(aes(color=D), bw=0.005) +
    geom_vline(xintercept = 0, linetype = "dashed", color = "black", size = 0.4) +
    labs(x = "Gender-based admissions probability difference", color="Dept") +
    scale_x_continuous(limits=c(-.32, .32))

intervention modelling

post-stratification to simulate intervention

# take 2, tidier
ucb_model_multi |>
  recover_types(df) |>
  spread_draws(a[G,D]) |>
  pivot_wider(names_from="G", values_from="a") |>
  mutate(
    diff_prob = plogis(female) - plogis(male)
  ) |>
  ggplot(aes(x=diff_prob)) +
    geom_density(bw=0.002, color="red") +
    geom_vline(xintercept = 0, linetype = "dashed", color = "black", size = 0.4) +
    labs(x = "Gender-based admissions probability difference") +
    scale_x_continuous(limits=c(-.32, .32))

causation?

did we find causation? well…maybe…

set.seed(34)
dagify(
  unknown3 ~ gender,
  admission ~ unknown1,
  admission ~ department,
  admission ~ unknown2,
  department ~ gender,
  department ~ unknown3,
  department ~ unknown4,
  admission ~ unknown4,
  unknown1 ~ gender,
  exposure = c("gender", "department"),
  outcome = "admission",
  latent = c("unknown1", "unknown2", "unknown3", "unknown4")
) |> 
  tidy_dagitty(layout = "gem") |>
  node_status() |>
  ggplot(aes_dag(color=status)) + 
    geom_dag(size=2.1) +
    theme_dag_gray(legend.position="none",aspect.ratio=.66) +
    geom_dag_node(size=40) +
    geom_dag_label(aes(label = name), fill = "white", color="black", size=4) +
    coord_cartesian(xlim = c(-225, 275), ylim = c(-195, 395)) +
    scale_color_manual(values=c("exposure" = "#F8766D", "outcome" = "#619CFF", "latent"="LightGrey"))

berkeley