Welcome there! You have reached my website that explores the different poolings on the Bayesian Regression Models. We know many data have a clustered structure in them. For example, professors’ salaries. At the highest levels, we have different countries and universities. At the lower levels, professors come from different disciplines which could be an important factor in predicting the salaries. Within each university and discipline, the salary can be affected by the length of teaching and researching, number of awards won and so on. In the general mixed effect models, we acquire group-specific effects in addition to those for standard coefficients of our regression model. The former are called random effects, while the latter are named fixed effects or population-average effects.

In some cases, we might have very little data available for some specific groups while a lot information for the other groups. We can exploit a mixed effect model to let the information flow from those with more information to those with less given the assumption there exist some similarities between the groups but more profound similarities within each group. In the following analysis, we will explore the dataset called reedfrogs from rethinking R package. In this dataset, researchers recorded the mortality of Reed frog tadpoles. It has five original variables, which includes density, pred, size, surv, and propsurv. The variable propsurv can be calculated from surv divided by density, which represents the proportion of tadpoles that survived. We will use this data set to analyze different extents of information borrowing and how the estimates change with various pooling effects through statistical results and illustrations.

Obtain reedfrogs Data from rethinking Package

data(reedfrogs)
data <- reedfrogs
rm(reedfrogs)
detach(package:rethinking, unload = T)
head(data)
##   density pred  size surv propsurv
## 1      10   no   big    9      0.9
## 2      10   no   big   10      1.0
## 3      10   no   big    7      0.7
## 4      10   no   big   10      1.0
## 5      10   no small    9      0.9
## 6      10   no small    9      0.9
## transform the categorical variables to numeric values
data$pred <- ifelse(data$pred == 'no', 0, 1)
data$size <- ifelse(data$size == 'big', 1, 0)

Here, we show a glimpse of the final dataset that will later be used to fit the Bayesian models.

## add group variable to data set
data <- data %>%
  mutate(group = 1:nrow(data))
head(data)
##   density pred size surv propsurv group
## 1      10    0    1    9      0.9     1
## 2      10    0    1   10      1.0     2
## 3      10    0    1    7      0.7     3
## 4      10    0    1   10      1.0     4
## 5      10    0    0    9      0.9     5
## 6      10    0    0    9      0.9     6

Quick Exploratory of Data Analysis. It seems that the presence of predator affects the mortality of tadpoles significantly. However, it is not very obvious that the size of tadpoles affects at all.

data %>%
  group_by(pred, size) %>%
  summarise(mean(propsurv))
## `summarise()` has grouped output by 'pred'. You can override using the
## `.groups` argument.
## # A tibble: 4 × 3
## # Groups:   pred [2]
##    pred  size `mean(propsurv)`
##   <dbl> <dbl>            <dbl>
## 1     0     0            0.920
## 2     0     1            0.925
## 3     1     0            0.615
## 4     1     1            0.426

Unpooled Bayesian Regression Model

\[ surv_i \sim Binomial(n_i, p_i) \\ logit(p_i) = \alpha_{group_{i}} + \beta_{pred_i}pred_i + \beta_{size_i}size_i \\ \alpha_{group} \sim Normal(0, 10) \\ \beta_{size} \sim Normal(0, 1) \\ \beta_{pred} \sim Normal(0, 1) \]

\(i\) represents the group that a specific tadpole is in. \(\alpha\) is the group specific intercept and \(\beta_{pred}\) along with \(\beta_{size}\) are the coefficients for predator and size variables. They are both fixed-effect variables.

Fitting Unpooled Bayesian Regression Model

brm_unpooled <- brm(data = data, family = binomial,
                    surv | trials(density) ~ 0 + factor(group) + pred + size,
                    prior = c(prior(normal(0, 10), class = b),
                              prior(normal(0, 1), class = b, coef = 'pred'),
                              prior(normal(0, 1), class = b, coef = 'size')),
                    iter = 5000, warmup = 2000, chains = 2, cores = 2,
                    seed =  13, refresh = 0, silent = 2)
summary(brm_unpooled)
##  Family: binomial 
##   Links: mu = logit 
## Formula: surv | trials(density) ~ 0 + factor(group) + pred + size 
##    Data: data (Number of observations: 48) 
##   Draws: 2 chains, each with iter = 5000; warmup = 2000; thin = 1;
##          total post-warmup draws = 6000
## 
## Population-Level Effects: 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## factorgroup1      2.26      1.59    -0.51     5.71 1.00     1557     2499
## factorgroup2      9.48      5.64     1.66    22.70 1.00     3617     3056
## factorgroup3      0.52      1.14    -1.66     2.82 1.00      989     2181
## factorgroup4      9.64      5.72     1.73    23.10 1.00     3395     2924
## factorgroup5      2.67      1.26     0.71     5.61 1.00     3662     2269
## factorgroup6      2.64      1.25     0.72     5.62 1.00     4961     2839
## factorgroup7      9.70      5.61     2.24    23.28 1.00     3264     2867
## factorgroup8      2.69      1.32     0.72     5.91 1.00     3794     2104
## factorgroup9     -0.79      1.35    -3.39     1.87 1.01      656     1411
## factorgroup10     2.33      1.74    -0.84     5.96 1.00     1021     2201
## factorgroup11     0.59      1.37    -2.06     3.31 1.01      694     1377
## factorgroup12     0.09      1.35    -2.55     2.77 1.01      636     1300
## factorgroup13     1.02      1.14    -1.14     3.34 1.00      950     2120
## factorgroup14     0.08      1.10    -2.03     2.21 1.01      931     1879
## factorgroup15     2.75      1.54     0.06     6.12 1.00     1613     2629
## factorgroup16     2.73      1.54     0.05     6.08 1.00     1267     2384
## factorgroup17     3.24      1.51     0.57     6.50 1.00     1569     2823
## factorgroup18     2.25      1.23    -0.00     4.76 1.00     1155     2349
## factorgroup19     1.71      1.11    -0.41     3.96 1.00      925     1626
## factorgroup20    10.10      5.54     2.52    23.64 1.00     2731     2319
## factorgroup21     2.67      0.82     1.30     4.52 1.00     4104     3083
## factorgroup22     2.66      0.81     1.31     4.49 1.00     5410     3138
## factorgroup23     2.67      0.82     1.29     4.48 1.00     5191     3148
## factorgroup24     1.75      0.56     0.77     3.00 1.00     4506     3483
## factorgroup25    -1.55      1.27    -4.00     0.98 1.01      550     1124
## factorgroup26    -0.27      1.25    -2.70     2.20 1.01      566     1029
## factorgroup27    -2.10      1.30    -4.69     0.42 1.01      586     1111
## factorgroup28    -0.95      1.24    -3.29     1.51 1.01      540     1086
## factorgroup29     0.16      0.96    -1.69     2.04 1.00      703     1301
## factorgroup30     1.54      1.01    -0.40     3.55 1.00      846     1808
## factorgroup31    -0.72      0.97    -2.57     1.21 1.00      752     1243
## factorgroup32    -0.34      0.97    -2.27     1.55 1.00      723     1402
## factorgroup33     3.61      1.55     1.02     7.05 1.00     1341     2677
## factorgroup34     2.62      1.20     0.39     5.01 1.00     1090     2266
## factorgroup35     2.61      1.19     0.44     5.06 1.00     1087     2355
## factorgroup36     1.73      1.05    -0.24     3.83 1.00      866     1766
## factorgroup37     2.15      0.55     1.17     3.34 1.00     4576     3473
## factorgroup38    10.71      5.45     3.56    23.88 1.00     3644     2856
## factorgroup39     3.04      0.79     1.74     4.86 1.00     4359     2962
## factorgroup40     2.52      0.64     1.42     3.96 1.00     4686     2844
## factorgroup41    -2.50      1.31    -5.07     0.07 1.01      596     1226
## factorgroup42    -1.02      1.23    -3.42     1.43 1.01      552     1083
## factorgroup43    -0.90      1.23    -3.33     1.57 1.01      539     1047
## factorgroup44    -0.77      1.23    -3.18     1.67 1.01      522      970
## factorgroup45     0.61      0.93    -1.24     2.45 1.00      692     1271
## factorgroup46    -0.60      0.94    -2.46     1.21 1.00      706     1290
## factorgroup47     2.21      1.02     0.28     4.21 1.00      776     1638
## factorgroup48     0.02      0.93    -1.82     1.86 1.01      668     1196
## pred             -0.07      0.87    -1.83     1.65 1.01      609     1054
## size              0.43      0.89    -1.31     2.12 1.00      681     1286
## 
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).

We roughly can see that the estimates are not very accurate here. For example, the point estimate for pred is -0.07, but the lower end of 95% confidence interval for pred is -1.83 and high end is 1.65. There is lack of certainty in the estimates shown in the summary above.

## priors distribution
sd_prior <- rhcauchy(1e5, 1) %>% as_tibble()
intercept_prior <- rnorm(1e5) %>% as_tibble()

## posterior distributions
post_unpooled_params <- posterior_samples(brm_unpooled, add_chain = T)
post_unpooled_params %>% 
  ggplot(aes(x = b_pred)) + 
  geom_density(data = intercept_prior, aes(x = value),
               color = "transparent", fill = "#EEDA9D", alpha = 3/4) + 
  geom_density(color = "transparent", fill = "#A65141", alpha = 9/10) +
  annotate("text", label = "posterior", 
           x = -0.35, y = 2.2, 
           color = "#A65141", family = "Courier") +
  annotate("text", label = "prior", 
           x = 0, y = 0.9, 
           color = "#EEDA9D", alpha = 2/3, family = "Courier") +
  ggtitle('Unpooled Model - Distributions for `pred` Variable') + 
  scale_y_continuous(NULL, breaks = NULL) +
  theme_pearl_earring

post_unpooled_params %>% 
  ggplot(aes(x = b_size)) + 
  geom_density(data = intercept_prior, aes(x = value),
               color = "transparent", fill = "#EEDA9D", alpha = 3/4) + 
  geom_density(color = "transparent", fill = "#A65141", alpha = 9/10) +
  annotate("text", label = "posterior", 
           x = -0.35, y = 2.2, 
           color = "#A65141", family = "Courier") +
  annotate("text", label = "prior", 
           x = 0, y = 0.9, 
           color = "#EEDA9D", alpha = 2/3, family = "Courier") +
  ggtitle('Unpooled BRM - Distributions for `size` Variable') + 
  scale_y_continuous(NULL, breaks = NULL) +
  theme_pearl_earring

The posterior and prior distribution covered each other to a very large extent, which means the sample data provided limited information in the current formulation of the model. It is clear that current model needs to be modified and it failed to provide good results.

## save the unpooled alpha values for comparison in later parts
unpooled_alpha <- fixef(brm_unpooled)[1:48, 1]
unpooled_alpha <- unname(unpooled_alpha)
unpooled_alpha
##  [1]  2.25889906  9.48147750  0.52381258  9.63601346  2.67090827  2.64086203
##  [7]  9.69889394  2.69378756 -0.79452615  2.33291161  0.59146229  0.09224426
## [13]  1.01827642  0.07543107  2.74528934  2.73296709  3.23761197  2.25199737
## [19]  1.70906143 10.09606000  2.66999260  2.66424068  2.66985537  1.75105587
## [25] -1.54862350 -0.27078418 -2.09828837 -0.95164492  0.15627459  1.53674213
## [31] -0.71690768 -0.34171910  3.61021770  2.62091230  2.61173651  1.72874899
## [37]  2.14948665 10.70986815  3.04157733  2.51606835 -2.50353150 -1.01950511
## [43] -0.90093404 -0.76743478  0.61206712 -0.59652142  2.21477539  0.01606217

Partial-pooling Bayesian Regression Model

\[ surv_i \sim Binomial(n_i, p_i) \\ logit(p_i) = \alpha_{group_{i}} + \beta_{pred_i}pred_i + \beta_{size_i}size_i \\ \alpha_{group} \sim Normal(\alpha, \sigma) \\ \alpha \sim Normal(0, 1)\\ \sigma \sim HalfCauchy(0, 1) \\ \beta_{size} \sim Normal(0, 1) \\ \beta_{pred} \sim Normal(0, 1) \]

In the second model, we explore partial pooling in which we allow information borrowing across groups to certain extent. Groups that have smaller sample size are able to borrow information from groups with larger sample sizes. Those groups provide more information to the population parameters.

Fitting Partial-pooling Bayesian Regression Model

brm_partially_pooled <- brm(data = data, family = binomial,
                            surv | trials(density) ~ 1 + (1 | group) + pred + size,
                            prior = c(prior(normal(0, 1), class = Intercept),
                                      prior(cauchy(0, 1), class = sd),
                                      prior(normal(0, 1), class = b, coef = 'pred'),
                                      prior(normal(0, 1), class = b, coef = 'size')),
                            iter = 5000, warmup = 2000, chains = 2 , cores = 2,
                            seed = 13, refresh = 0, silent = 2)
summary(brm_partially_pooled)
##  Family: binomial 
##   Links: mu = logit 
## Formula: surv | trials(density) ~ 1 + (1 | group) + pred + size 
##    Data: data (Number of observations: 48) 
##   Draws: 2 chains, each with iter = 5000; warmup = 2000; thin = 1;
##          total post-warmup draws = 6000
## 
## Group-Level Effects: 
## ~group (Number of levels: 48) 
##               Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## sd(Intercept)     0.78      0.15     0.53     1.13 1.00     2434     2790
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept     2.76      0.27     2.22     3.29 1.00     4888     4398
## pred         -2.44      0.29    -3.01    -1.84 1.00     4836     3814
## size         -0.47      0.29    -1.02     0.12 1.00     3882     3635
## 
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
## posterior distribution
post_partially_pooled <- posterior_samples(brm_partially_pooled, add_chain = T)
post_partially_pooled %>% 
  ggplot(aes(x = b_size)) + 
  geom_density(data = intercept_prior, aes(x = value),
               color = "transparent", fill = "#EEDA9D", alpha = 3/4) + 
  geom_density(color = "transparent", fill = "#A65141", alpha = 9/10) +
  annotate("text", label = "posterior", 
           x = -0.35, y = 2.2, 
           color = "#A65141", family = "Courier") +
  annotate("text", label = "prior", 
           x = 0, y = 0.9, 
           color = "#EEDA9D", alpha = 2/3, family = "Courier") +
  ggtitle('Partially Pooled BRM - Distributions for `size` Variable') + 
  scale_y_continuous(NULL, breaks = NULL) +
  theme_pearl_earring

post_partially_pooled <- posterior_samples(brm_partially_pooled, add_chain = T)
post_partially_pooled %>% 
  ggplot(aes(x = b_pred)) + 
  geom_density(data = intercept_prior, aes(x = value),
               color = "transparent", fill = "#EEDA9D", alpha = 3/4) + 
  geom_density(color = "transparent", fill = "#A65141", alpha = 9/10) +
  annotate("text", label = "posterior", 
           x = -0.35, y = 2.2, 
           color = "#A65141", family = "Courier") +
  annotate("text", label = "prior", 
           x = 0, y = 0.9, 
           color = "#EEDA9D", alpha = 2/3, family = "Courier") +
  ggtitle('Partially Pooled BRM - Distributions for `pred` Variable') + 
  scale_y_continuous(NULL, breaks = NULL) +
  theme_pearl_earring

post_partially_pooled <- posterior_samples(brm_partially_pooled, add_chain = T)
post_partially_pooled %>% 
  ggplot(aes(x = b_Intercept)) + 
  geom_density(data = intercept_prior, aes(x = value),
               color = "transparent", fill = "#EEDA9D", alpha = 3/4) + 
  geom_density(color = "transparent", fill = "#A65141", alpha = 9/10) +
  annotate("text", label = "posterior", 
           x = -0.35, y = 2.2, 
           color = "#A65141", family = "Courier") +
  annotate("text", label = "prior", 
           x = 0, y = 0.9, 
           color = "#EEDA9D", alpha = 2/3, family = "Courier") +
  ggtitle('Partially Pooled BRM - Distributions for Group Intercept') + 
  scale_y_continuous(NULL, breaks = NULL) +
  theme_pearl_earring

In this model, we see large differences in the posterior and prior distributions. Our sample data is able to provide information and dominates the results. Because we provide weak priors such as half Cauchy prior which has infinite variance, the overlapping between the posterior and prior is small. We see the peaks of all three posterior distributions are relatively narrow. This means we have high confidence with our posterior estimates of the variables.

## save the pooled alpha values for later comparison with the unpooled alphas
pooled_alpha <- coef(brm_partially_pooled)$group[ , 1, 1]
pooled_alpha <- unname(pooled_alpha)
pooled_alpha
##  [1] 2.783688 3.198492 2.068377 3.218434 2.642083 2.635799 3.100836 2.641129
##  [9] 2.600128 3.875302 3.336665 3.101689 3.078186 2.573891 3.632599 3.638374
## [17] 3.204886 2.904258 2.644997 3.554841 2.662321 2.656777 2.664749 2.125167
## [25] 1.993078 2.940856 1.669357 2.416651 2.572972 3.546227 1.925229 2.192061
## [33] 3.386930 3.100358 3.114529 2.655652 2.351212 3.499396 2.854366 2.585849
## [41] 1.357405 2.339933 2.436578 2.536398 2.938215 1.953266 4.022701 2.446128

Complete-pooling Bayesian Regression Model

Next, we explore our final model - the complete pooling Bayesian Regression Model. All the tadpoles are treated as from one single group. The model will estimate the variables as population parameters.

\[ surv \sim Binomial(n, p) \\ logit(p) = \beta_{pred}pred + \beta_{size}size + \beta\\ \beta_{size} \sim Normal(0, 9) \\ \beta_{pred} \sim Normal(0, 9) \]

Fitting Complete-pooling Bayesian Regression Model

brm_complete_pool <- brm(data = data, family = binomial,
                            surv | trials(density) ~ 1 + pred + size,
                            prior = c(prior(normal(0, 9), class = b, coef = 'pred'),
                                      prior(normal(0, 9), class = b, coef = 'size')),
                            iter = 5000, warmup = 2000, chains = 2 , cores = 2,
                            seed = 13, refresh = 0, silent = 2)
summary(brm_complete_pool)
##  Family: binomial 
##   Links: mu = logit 
## Formula: surv | trials(density) ~ 1 + pred + size 
##    Data: data (Number of observations: 48) 
##   Draws: 2 chains, each with iter = 5000; warmup = 2000; thin = 1;
##          total post-warmup draws = 6000
## 
## Population-Level Effects: 
##           Estimate Est.Error l-95% CI u-95% CI Rhat Bulk_ESS Tail_ESS
## Intercept     2.93      0.19     2.57     3.30 1.00     3303     3701
## pred         -2.71      0.18    -3.07    -2.36 1.00     3732     3978
## size         -0.67      0.15    -0.98    -0.38 1.00     3900     4048
## 
## Draws were sampled using sampling(NUTS). For each parameter, Bulk_ESS
## and Tail_ESS are effective sample size measures, and Rhat is the potential
## scale reduction factor on split chains (at convergence, Rhat = 1).
## prior distribution
normal_prior <- rnorm(1e5, mean = 0, sd = 3) %>% as_tibble()
## post distribution
complete_pooled_params <- posterior_samples(brm_complete_pool, add_chain = T)
complete_pooled_params %>% 
  ggplot(aes(x = b_pred)) + 
  geom_density(data = normal_prior, aes(x = value),
               color = "transparent", fill = "#EEDA9D", alpha = 3/4) + 
  geom_density(color = "transparent", fill = "#A65141", alpha = 9/10) +
  annotate("text", label = "posterior", 
           x = -0.35, y = 2.2, 
           color = "#A65141", family = "Courier") +
  annotate("text", label = "prior", 
           x = 0, y = 0.9, 
           color = "#EEDA9D", alpha = 2/3, family = "Courier") +
  ggtitle('Complete Unpooled Model - Distributions for `pred` Variable') + 
  scale_y_continuous(NULL, breaks = NULL) +
  theme_pearl_earring

complete_pooled_params %>% 
  ggplot(aes(x = b_size)) + 
  geom_density(data = normal_prior, aes(x = value),
               color = "transparent", fill = "#EEDA9D", alpha = 3/4) + 
  geom_density(color = "transparent", fill = "#A65141", alpha = 9/10) +
  annotate("text", label = "posterior", 
           x = -0.35, y = 2.2, 
           color = "#A65141", family = "Courier") +
  annotate("text", label = "prior", 
           x = 0, y = 0.9, 
           color = "#EEDA9D", alpha = 2/3, family = "Courier") +
  ggtitle('Complete Unpooled Model - Distributions for `size` Variable') + 
  scale_y_continuous(NULL, breaks = NULL) +
  theme_pearl_earring

We see here both the two posterior distributions have some overlap with the prior distributions. This is mainly due to the fact that the variances of the prior distributions have been increased to 9 instead of 1 like in the previous models. The peaks are even narrower compared to that of the partial pooling model. This is because when we combined all the groups together, the sample size as one group drastically increased resulting in a even smaller variance. Complete pooling does not allow variation between the groups ignoring the grouping structure and this may lead to underfitting.

fixef(brm_complete_pool)
##             Estimate Est.Error       Q2.5      Q97.5
## Intercept  2.9278332 0.1879673  2.5684456  3.3024316
## pred      -2.7099582 0.1847720 -3.0725898 -2.3582195
## size      -0.6733146 0.1538593 -0.9767612 -0.3752417

The population intercept is fixed around 2.93. We see from the results that both pred and size will affect the mortality negatively under the fixed effect Bayesian Model.

Compare Intercept Values from Pooled and Unpooled BRM

x <- 1:48
alphas <- data.frame(x = x, unpooled = unpooled_alpha, pooled = pooled_alpha)
alphas <- melt(alphas, id.var = 1)
ggplot(data = alphas, aes(x, value, colour = variable)) +
  geom_point(alpha = 0.8) + 
  labs(x = 'group number', y = 'alpha') + 
  scale_colour_manual(values = c("#EEDA9D", "#A65141")) +
  ggtitle('Unpooled Intercepts vs Pooled Intercepts') +
  theme_pearl_earring

Here, we show both the pooled and unpooled intercepts for the 48 groups. We see that the pooled alphas are either dragged down or up to be more uniform whereas the unpooled intercepts are scattered around the graph showing large variations.

Further Thoughts

To pool or not to pool is a question of lots of researches. Averaging data like in the case of complete pooling is likely to lose information from data. It disallows variation among the groups even if there is clear grouping structure in the sample data. Analyzing data per individual group also lead to different problems. For small sample-size groups, they lack of enough data to build their own model. This problem may alleviate a little for groups with large sample size. However, this overall results in huge variation in parameters and between-group comparisons leaving a false impression of the whole population. In cases when the sample sizes vary, it maybe conductive to either abandon the small size groups or adopt either complete pooling or partial pooling. In order to decide which pooling method to use, we can examine in detail the similarities among groups and one exemplar metric is the Intraclass Correlation Coefficient. In the end, it really depends on what the sample look like and the goal of the experiment. Trial and error is also a really good strategy in case when direction is not clearly seen.

Reference

  1. Bayesian Linear Mixed Effects Models, https://david-dunson.github.io/decks/lmm_03_deck.html#/
  2. Kurz, A Solomon. “Statistical Rethinking with BRMS, GGPLOT2, and the Tidyverse.” 12 Multilevel Models, 5 May 2019, https://bookdown.org/ajkurz/Statistical_Rethinking_recoded/multilevel-models.html
  3. “Linear Mixed-Effects Models Using ‘Eigen’ and S4 [R Package LME4 Version 1.1-31].” The Comprehensive R Archive Network, Comprehensive R Archive Network (CRAN), 1 Nov. 2022, https://cran.r-project.org/web/packages/lme4/vignettes/lmer.pdf