“Block what you can, and randomize what you cannot”
The ideas and examples from this article primarily come from Chapter 3 and 4 of Gerber and Green’s Field Experiment Design, Interpretation, & Analysis book.
Blocking
Blocking is a special type of randomization technique where participants are randomized from their “subgroups”, called blocks, and then completely randomized within the block.
An easy example to grasp the concept is imagine an experiment is imagining an experiment with say we had 20 participants with 10 men and 10 women, and we needed a treatment and control with 10 people each.
Using complete random assignment, there are 20 C 10 combinations. It is likely that the groups will have an unequal number of men and women. It is even possible with bad enough luck, it is possible to draw a group with all 10 women in the same group. Random is random, right? Block randomization ensures that an equal number of men are women will be assigned to the treatment and control condition.
How is blocking different than just controlling for a variable?
Actually, the two ideas are pretty similar. Blocking is in essence, “manually” controlling for a variable. This is because is because blocking eliminates the correlation between the assigned treatment and the variables used to form the blocks. In other words, this is essentially like in regression adjustment without using the additional control(s). Therefore, blocking is particularly advantageous in small samples. When there are over 100 participants in each condition, the advantages of blocking are reduced.
Blocking is most useful when there are less than 100 participants in each condition
Gerber & Green lay out two main advantages of blocking: * Blocking helps reduce sampling variability * Blocking ensures that certain subgroups are available for analysis
What are the best variables to block on?
Blocking produces the greatest gains in precision when the variables used to form blocks strongly predict outcomes. This is because by blocking (“controlling”) on removing that predict the outcomes w cause reduces variability.
Is it ever bad to block?
Basically no, in practice blocking is almost never going to hurt your experiment unless you analyze your data wrong. Here’s an example from the DeclareDesign blog where they construct a perverse example intentionally to showcase when blocking can reduce precision.
How do I block in DeclareDesign?
It is easy to block using DeclareDesign, in the using the declare_assignment function. Using the “men and women” example from above:
library(tidyverse)
library(DeclareDesign)
library(DT)
library(knitr)
library(ggrepel)
library(ggpubr)
You’ll notice that this argument doesn’t show-up when using ?declare_assignment. This is because declare_assignment uses the conduct_ra() function as it’s handler, so in order to see all the blocking arguments we’ll need to use ?conduct_ra to see the correct documentation.
Other important arguments in regard to blocking for two-arm designs are:
block_m The number of subjects assigned to the treatment within each block
block_prob The probability that participants in each block get assigned to the treatment (for when probability of assingment differs in each block unlike the above example).
If had more than two arms and wanted to differ the probability of assignments in each block, we could use the block_m_each or block_prob_each arguments instead.
How do I block on multiple variables in DeclareDesign?
Neil Fultz offers an easy solution using the base R interaction function. Here I wish to block on gender and eye_color. The interaction function basically “pastes” the two columns together.
block_gender_and_eyes <-
declare_population(N = 100,
gender = draw_categorical(N = N, prob = c(.5, .5), category_labels = c("Male", "Female")),
eye_color = sample(c("blue", "green", "brown"),
N,
replace=TRUE,
prob = c(.2, .2, .6)),
blocks = interaction(gender, eye_color))() %>% # uses interaction to combiune gender and eye_color
declare_assignment(blocks = blocks,
prob = .5)()
block_gender_and_eyes %>%
datatable()
From this example we can see that our groups are split almost perfectly on the basis on gender and eye color.
block_gender_and_eyes %>%
group_by(Z) %>%
count(blocks) %>%
pivot_wider(id_cols = blocks, names_from = Z, values_from = n) %>%
rename("Treatment" = `1`,
"Control" = `0`) %>%
kable()
| Female.blue |
3 |
2 |
| Male.blue |
5 |
5 |
| Female.brown |
16 |
16 |
| Male.brown |
15 |
15 |
| Female.green |
5 |
6 |
| Male.green |
6 |
6 |
How do you evaluate blocked designs? (blocked estimators)
Lastly, the most important question, how do we evaluate blocked designs? This is most important when blocks on have different probability of being assigned to the treatment. The DeclareDesign team has a really cool blog post about this where they try out different methods. Here I recreate a simplified version of their post focusing instead of even and uneven probability of assignment of assignment.
# Model ------------------------------------------------------------------------
U <- declare_population(block = add_level(N = 3,
tau = c(4, 2, 0)),
indiv = add_level(N = 100, e = rnorm(N))) # each of the 3 blocks has 100 people
Y <- declare_potential_outcomes(Y_Z_0 = e,
Y_Z_1 = e + tau)
# Inquiry ----------------------------------------------------------------------
Q <- declare_estimand(ATE = mean(Y_Z_1 - Y_Z_0))
# Data Strategy ----------------------------------------------------------------
Z <- declare_assignment(blocks = block, block_prob = c(.5, .7, .9)) # probability of assignment
R <- declare_reveal(Y, Z)
Warning: `quo_expr()` is deprecated as of rlang 0.2.0.
Please use `quo_squash()` instead.
This warning is displayed once per session.
Warning: Assigning non-quosure objects to quosure lists is deprecated as of rlang 0.3.0.
Please coerce to a bare list beforehand with `as.list()`
This warning is displayed once per session.
# Answer Strategy --------------------------------------------------------------
A0 <- declare_estimator(Y ~ Z, estimand = Q,
model = lm_robust, label = "A0: (No Controls)")
A1 <- declare_estimator(Y ~ Z + block, estimand = Q,
model = lm_robust, label = "A1: (Fixed Effects)")
A2 <- declare_estimator(Y ~ Z, blocks = block, estimand = Q,
model = difference_in_means, label = "A2: (Blocked DIM)")
# Design -----------------------------------------------------------------------
design <- U + Y + Z + Q + R + A0 + A1 + A2
design2 <- replace_step(design = design,
step = Z, declare_assignment(blocks = block, block_prob = c(.5, .5, .5))) # new design, with same chance of assingment
simulations <- simulate_design(design, design2, sims = 500)
Warning: `lang_modify()` is deprecated as of rlang 0.2.0.
Please use `call_modify()` instead.
This warning is displayed once per session.
theme_set(theme_pubr())
simulations %>%
mutate(design_label = case_when(design_label == "design" ~ "Unequal assignment probability blocks",
design_label == "design2" ~ "Equal assignment probability blocks")) %>%
group_by(design_label, estimator_label) %>%
summarize(SE_bias = mean(std.error - sd(estimate)),
ATE_bias = mean(estimate - estimand)) %>%
ggplot(aes(x = ATE_bias, y = SE_bias, color = design_label, label = estimator_label)) +
geom_point() +
geom_hline(yintercept = 0, size = .1, linetype = "dashed") +
geom_vline(xintercept = 0, size = .1, linetype = "dashed") +
geom_text_repel(show.legend = FALSE,
box.padding = .4,
point.padding = .65,
segment.alpha = .5) +
scale_x_continuous(limits = c(-.6, .6)) +
scale_y_continuous(limits = c(-.1, .1)) +
theme(legend.title = element_blank()) +
labs(x = "Average Treatment Effect (ATE) Bias",
y = "Standard Error Bias",
title = "Bias in Even and Uneven Probability of Assignment")

The key takeaway here is the old fashioned weighted-average difference-in-means in the best for both even and uneven block sizes. When the blocks have an equal assignment probability there is a slight overestimation of the standard error, but it is very small. When there is unequal assignment probability, although the fixed effect approach is overestimating the ATE, this kind of method has the advantage that it would allow additional covariates.
It is also worth noting that there appear to be better methods of evaluating the ATEs as shown in the blog post, but I do not delve into them here.
