Imagine you had to do an experiment with a really hard to get population, for example let’s say the CEOs of Fortune 500 companies. These are people with super tight-schedules and it’s unlikely you’ll ever get to work with them again if this experiment goes wrong. Stakes are high. How do you know your experiment will work? You don’t want to do something too mundane, but at the same time you want to do something safe enough that you’re likely to get significant results with. How do you prepare? One way to approach this problem is to think about all the possible outcomes and ways of analyzing this experiment, and consider the probability of best-case and worst-case outcomes, and work off of that.
DeclareDesign is the tool to do just that (Blair, Coppock, Cooper And Humphreys, 2019). DeclareDesign, winner of the prestigious Rosenthal Prize for Open Social Science & Society for Political Methodology Statistical Software Award, is a tool for creating a fake dataset, looping the dataset a few thousand times with some random noise, and then changing the initial assumptions and comparing the results. The purpose of DeclareDesign is to determine how subtle methodological changes affect the results of an experiment. This is hugely important if you are about to invest a huge amount of time and money into an experiment that may not pan-out.
Of course, learning to simulate data is tricky and there is a learning-curve to doing it. I am going to do my best in this guide to walk new users through difficulties and make this as easy as possible. DeclareDesign closely follows the framework laid out by Gerber & Green’s Field Experiment handbook. I think this is the single best resource to look at if you find any of the notation confusing. For a more hands-on approach for DeclareDesign, I really like this gentle introduction by Dorothy Bishop. Another helpful resource is Kevin Munger’s repository from his workshop, and lastly the DeclareDesign book looks fantastic, but as I write this, it’s still a work in progress.
One thing I found difficult as a new DeclareDesign user is that because the tools are so powerful, the documentation often can’t define every single option available. In this guide, I will do my best to address these ambiguities. I also try to be clear about what order the steps come in, but with that said, it is important to emphasize that I am only teaching one way to use DeclareDesign and most steps can be done in different orders (sometimes with differing results!). Most of the DeclareDesign functions below can have their internals replaced with other “handlers”, and allow more flexibility than I am letting on.
In my perspective, DeclareDesign can be thought of in 2 parts. 1) The data fabrication step (this is the bulk of the code), and 2) the evaluation step, where users can change initial assumptions and see how well their design actually worked (this is the computationally intensive part of the code).
The cool thing about the DeclareDesign data fabrication step is that it allows for (as far as I know) any design imaginable, which is why there are so many steps. However, because most researchers often need to work with simple/common designs, there are some ready-made designs which are available in the DesignLibrary, which users can also contribute to. As far as I can tell, most simple psychology lab-experiments will be able to get the entirety of their needs met by DesignLibrary. Although I do think you will develop more intuition for the process by using the individual parts of DeclareDesign. I will expand on DesignLibrary in a future article.
Brief Overview of Power Analysis / DeclareDesign Terms
Before expanding on DeclareDesign, it makes sense to have a few terms defined. I won’t be using all these terms in the design I share, but it is still useful to have them defined in one place:
attrition_rate - the proportion of cases that will be lost to follow-up.
\(\alpha\) (alpha) - The chance of making a type I error; the chance of claiming there is an effect when there actually is none – a false alarm. Traditionally 0.05 (= p-value significance threshold). Also known as the chance of making a fool of yourself.
\(\beta\) (beta) - The chance of making a type II error where you falsely claim there is not an effect when there actually is – missed detection. Known as power or sensitivity. Traditionally 0.8.
Blocking - Participants put into blocks/strata and then randomly assigned. Helps reduce sampling variability, and assures that subgroups are available for separate analysis. In practice, blocking rarely has any downsides.
Cluster random assignment - Random assignment not done on the individual level, but instead assigned on the group level. Consider cities or classrooms randomly getting a treatment, instead of each person individually being assigned. Generally bad for power although necessary.
Diagnosand - A summary of the distribution of a diagnostic statistic. (i.e., the expected value from looping the design)
ES - Effect size. Measures the strength of the phenomenon of interest; solely magnitude based - does not depend on sample size.
N - The sample size. Larger N can lead to smaller ES being deemed significant, although the smaller ES is not always interesting in practical terms (imagine something that makes 1 million people 0.0001% happier – a large N can cause small effects to be deemed significant, regardless of it’s actual interest).
\(\rho\) (rho) - The correlation between time 1 and time 2. When using a within subjects design, intuitively you would expect some stability in peoples’ preferences overtime.
t1 - Time 1. Can be modified for any time period e.g., tx
Z - Condition. In this example I use one this 1 to represent the treatment group, and 0 to represent the control. Cf. arms
Data and Design Fabrication
DeclareDesign operates using a MIDA framework. The functions for each of the MIDA letters are shown below. Simply put MIDA stands for:
- M - model of the world
declare_population
declare_potentional_outcomes
- I - inquiry
- D - data strategy
declare_sampling
declare_assignment
declare_reveal
declare_step
- A - answer strategy
It is important to note that the order of the functions is for the most part necessary and that they all follow the declare_* pattern.
First Example: The effect of coffee on productivity
Let’s begin with a simple design. Let’s say we have a factory with 100 workers who make widgets. I split my workers into two groups, and give one group coffee. I want to see how coffee impacts the productivity of the second group.
We start by loading the following packages:
rm(list = ls())
set.seed(13)
library(DeclareDesign)
library(DesignLibrary)
library(knitr)
library(DT)
library(tidyverse)
n_sims = 100
M - Model of the World
Terms
To begin, let’s define our terms. Intuitively we know that we’ll need N, the number of people, but we’ll also need to define the effect size we think coffee will have. Notice I am making these assumptions up-front, and defining them as objects. In my opinion this is the most transparent way to use DeclareDesign and is also the most intuitive. Please note that more complex designs will have a lot more terms to define.
(Note: I keep adding to a df1 at the bottom of my chunks, but this is only for visualization purposes – it is not part of the DeclareDesign framework, and you won’t need to do this)
N <- 100 # sample size, 100 factory workers
ate <- 4 # average treatment effect I think coffee will have on productivity
sd_1 <- 5 # SD of worker productivity before intervention
df1 <- ""
Population
I start by defining the population. For me this is literally everyone in the factory, all 100 people, but thinking ahead, imagine I was studying Facebook users, I would theoretically use 360 million to represent everyone on Facebook. In reality my sample wouldn’t involve every user on Facebook, but that would still be my population. Next, notice I manually assert the distributions for widgets because not everyone makes the same number of widgets each day.
The purpose of this step is to define the that groups will be compared before being given the treatment, and for me this is one group. This step is really creating similar to making a regular dataframe with the exception of the special N argument which defines the number of rows, and creates a column to label the participants.
# M: Model
population <-
declare_population(N = N,
widgets_t1 = rnorm(n = N, # 100 draws from a normal distribution
mean = 50, # before intervention, my workers average of 50 widgets a day
sd = sd_1) %>% round(., 2)) # sd of worker productivity
df1 <- population()
datatable(df1)
Here we see the each of the 100 workers makes approximately 50 widgets a day.
The key points from this declare_population are:
The number of rows in your population will be equal to N.
Your final sample will be equal to or less than N
One column for each measure that you expect that affects or will be affected by treatment
Outcomes
The purpose of this step is to define the two outcomes using each of the groups (i.e. treatment/control). The treatment group will have the ATE added to their distribution. Again this step is just adding new columns to our data frames.
potential_outcomes <- declare_potential_outcomes(Y_Z_0 = widgets_t1, # the control group, here we expect business as usual
Y_Z_1 = widgets_t1 + ate) # the treatment group, business as usual + the productivity boost from coffee
df1 <- potential_outcomes(df1)
datatable(df1)
Notice above how everyone has a “potential outcome” for that would happen if they were treated or not. Since we expect to split our design into two groups, in real life we wouldn’t actually get to observe both of these potential outcomes.
The key points from this declare_potential_outcomes are:
I - Inquiry
Estimand
The purpose of this step is to define, how you define, your ATE – this is your TRUE TREATMENT EFFECT). Because we are simulating data, we have somewhat of a “perfect world”, where all participants get the treatment perfectly and no one is missing (yet), we will get exactly the same ATE as we defined originally.
The estimand below is the simplest estimand, but as Jasper points out that there are conditional ATEs, or special ATEs for when you have categorical outcomes which are necessary for more complex designs.
estimand <- declare_estimand(ATE = mean(Y_Z_1 - Y_Z_0))
kable(estimand(df1))
The key points from this declare_estimand are:
D - Data Strategy
Assignment
There are two important arguments in the declare_assignment function which are prob (the chance of getting picked) and assignment_variable (the name of the variable being assigned).
I think it is really important to point out that this function defaults the name to assignment_variable to Z which is essential. What this means is that the default use of declare_assignment is to declare conditions, although this function has other purposes it used for also (e.g., attrition).
We are declaring what % of people get assigned to the treatment group. Here we create the Z and Z_cond_prob columns
assignment <- declare_assignment(prob = .5,
assignment_variable = Z) # by the relevant part of columns to assign from, default assignment_variable = Z
df1 <- assignment(df1)
datatable(df1)
Above Z represents the condition the participant is assigned to, Z_cond_prob the chance that the participant gets assigned to Z. This is of course could be a lot more complicated with more conditions, and unequal assignment levels.
The key points from this declare_assignment are:
Actually put people in groups (Reveal)
Next we actually give people the treatment. Here we create the Y column, which is our observed outcome.
reveal <- declare_reveal(outcome_variables = Y) # here we use the prefix before the Z part begins, in our case this is simply Y
df1 <- reveal(df1)
datatable(df1)
Note that the way the chunk above is written, you can conceptually you can think of declare_reveal like:
df1 %>%
mutate(Y = if_else(Z == 1, Y_Z_1, Y_Z_0))
A - Answer Strategy
This is the actual modelling we’re using
The important points here are that:
- Our estimands argument correspond with the names of the estimands we made before,
- The model can be anything that works with the
broom package, and can be modified to work with any model that outputs a dataframe. Here we use estimatr::lm_robust as our model.
The model being shown here is simply using treatment assignment (Z) to predict productivity outcomes (Y)
estimated_effect <- declare_estimator(
Y ~ Z,
model = lm_robust,
estimand = c("ATE"), # also just `estimand` (the object we defined above) would work
#subset = R == 1, # only keep people not lost to attrition
label = "Treatment Effect"
)
The key points from declare_estimator is:
This is where you define your model. You know, the same one you are going to use in the your paper.
You can also use this step to try out multiple models to see which one works best for your design.
Note: If you are planning on testing multiple models, it makes sense to create a different delcare_estimator function for each of the models you are planning on testing. For example for our current design, I use linear regression, but I could have opted for a correlation or a t-test if I had wanted to.
Put it all together
Putting all the pieces we just wrote together we get a design that looks like this. We defined all these pieces individually, but of course we could have done this all together ggplot style, although I personally prefer to do one step at a time. From now on the combination of all of those parts will be called our our_design
our_design <- population +
potential_outcomes +
estimand +
assignment +
reveal +
estimated_effect
Here I paste all the code from above together just incase you would like to test it as a whole:
our_design <-
declare_population(N = N,
widgets_t1 = rnorm(n = N,
mean = 50,
sd = sd_1) %>% round(., 2)) +
declare_potential_outcomes(Y_Z_0 = widgets_t1,
Y_Z_1 = widgets_t1 + ate) +
declare_estimand(ATE = mean(Y_Z_1 - Y_Z_0)) +
declare_assignment(prob = .5,
assignment_variable = Z) +
declare_reveal(outcome_variables = Y) +
declare_estimator(Y ~ Z,
model = lm_robust,
estimand = c("ATE"),
label = "Treatment Effect"
)
Model Evaluation
The following chunks each showcase one model evaluating function. Note that sometimes I use multiple functions together because that is best way to use that function. I am piping some functions into head and kable for visualization purposes.
To get take a peek at your design
Note: This is only a single simulation of your data with draw_data
our_design %>%
draw_data() %>% # draw single simulation
head(5) %>%
kable()
| 001 |
53.47 |
53.47 |
57.47 |
0 |
0.5 |
53.47 |
| 002 |
50.83 |
50.83 |
54.83 |
0 |
0.5 |
50.83 |
| 003 |
45.60 |
45.60 |
49.60 |
1 |
0.5 |
49.60 |
| 004 |
40.44 |
40.44 |
44.44 |
0 |
0.5 |
40.44 |
| 005 |
49.93 |
49.93 |
53.93 |
1 |
0.5 |
53.93 |
Take a peek at your unbiased estimands
draw_estimands
our_design %>%
draw_estimands() %>%
kable()
See estimates of model
draw_estimates
our_design %>%
draw_estimates() %>% #
kable()
| Treatment Effect |
Z |
5.1136 |
0.9793757 |
5.221285 |
1e-06 |
3.170061 |
7.057139 |
98 |
Y |
ATE |
To see the aggregated summary of your design
diagnose_design will loop the creation of your design, making it the most resource intensive step described. This shows the details of your design. This is the single most important function for the evaluation of models
Here I use sims = 100 only for demonstration purposes, but in real life you would likely want to use 500 or higher.
diagnosis1 <- our_design %>%
diagnose_design(sims = n_sims)
OPTIONAL ASIDE: Another way to to simulate data in DeclareDesign is with the simulate_design function. I don’t showcase it here but it is essentially equivalent to our_design %>% diagnose_design %>% get_simulations()
What are all the stats in the diagnosand anyway?
Descriptions from pg 7, Formulas from here.
| design_label |
. |
|
|
| estimand_label |
ATE |
|
|
| estimator_label |
Treatment Effect |
|
|
| term |
Z |
|
|
| bias |
0.0455959999999999 |
Expected difference between estimate and estimand |
mean(estimate - estimand) |
| se(bias) |
0.115653978121041 |
|
|
| rmse |
1.10361088903653 |
Root mean squared error |
sqrt(mean((estimate - estimand) ^ 2)) |
| se(rmse) |
0.0843282077977243 |
|
|
| power |
0.98 |
Probability of rejecting null hypothesis of no effect |
mean(p.value < alpha) |
| se(power) |
0.0140072131490179 |
|
|
| coverage |
0.92 |
Probability that conf. interval contains estimand |
mean(estimand <= conf.high & estimand >= conf.low) |
| se(coverage) |
0.027762339144018 |
|
|
| mean_estimate |
4.045596 |
Mean of the estimates |
mean(estimate) |
| se(mean_estimate) |
0.115653978121041 |
|
|
| sd_estimate |
1.10822362144802 |
|
|
| se(sd_estimate) |
0.0835280827049709 |
|
|
| mean_se |
0.993976024671408 |
|
|
| se(mean_se) |
0.0078903273521745 |
|
|
| type_s_rate |
0 |
Probability that a significant estimate has incorrect sign |
mean((sign(estimate) != sign(estimand))[p.value < alpha]) |
| se(type_s_rate) |
0 |
|
|
| mean_estimand |
4 |
|
|
| se(mean_estimand) |
0 |
|
|
| n_sims |
100 |
|
|
OPTIONAL ASIDE - What if we don’t want all these stats or want to create a new stat? declare_diagnosands
- We can drop stats with the
subtract argument
- Create stats by specifying out own argument
- Or keep stats with the
select argument
# keep only bias, rmse, & type_s_rate
my_diagnosands <- declare_diagnosands(select = c(bias, rmse, type_s_rate))
# drop type_s_rate
my_diagnosands <- declare_diagnosands(subtract = type_s_rate)
# You can add your own diagnosands in addition to or instead of the defaults e.g.,
my_diagnosands <-
declare_diagnosands(median_bias = median(estimate - estimand))
# Keep only the stat we defined
my_diagnosands <-
declare_diagnosands(median_bias = median(estimate - estimand),
keep_defaults = FALSE)
To apply:
our_design %>%
diagnose_design(sims = n_sims, diagnosands = my_diagnosands) %>%
get_diagnosands() %>%
kable()
| . |
ATE |
Treatment Effect |
Z |
0.0849 |
0.1166837 |
100 |
To see the 100 individual simulations that made the aggregated summary
get_simulations
diagnosis1 %>%
get_simulations() %>%
datatable()
get_simulations also has nice properties for plotting e.g.,
get_simulations(diagnosis1) %>%
ggplot(aes(estimate)) +
geom_density() +
geom_vline(xintercept = ate, col = "red") +
labs(title = "Density of Effect Size Estimates",
subtitle = "Real ATE shown in Red")

Change your assumptions
You can test different assumptions from your your DeclareDesign design using redesign the function. redesign takes the global values you are using within the function and gives them new values The first argument is your design, the second are the new assumptions. Also note that it makes sense in most cases to pipe this function into diagnose_design or get_simulations. Here I experiment with changing the sample size:
change_the_parameters <-
our_design %>%
redesign(N = list(30, 50, 75, 100), # note that I define N as a variable in the first chunk in this document
ate = list(1, 2, 3, 4)) %>% # what if the ATE is lower than I originally expected?
diagnose_design(sims = n_sims) # note this does sims for each of combination of N's and ate's
change_the_parameters %>%
get_diagnosands() %>%
kable()
| design_1 |
30 |
1 |
ATE |
Treatment Effect |
Z |
-0.0034533 |
0.1969732 |
1.8343939 |
0.1190232 |
0.08 |
0.0269972 |
0.96 |
0.0186742 |
0.9965467 |
0.1969732 |
1.843632 |
0.1204478 |
1.840460 |
0.0225472 |
0.0000000 |
0.0000000 |
1 |
0 |
100 |
| design_2 |
50 |
1 |
ATE |
Treatment Effect |
Z |
-0.0907840 |
0.1471181 |
1.4435108 |
0.1057918 |
0.08 |
0.0278414 |
0.95 |
0.0226816 |
0.9092160 |
0.1471181 |
1.447911 |
0.1066926 |
1.411619 |
0.0148110 |
0.1250000 |
0.1458455 |
1 |
0 |
100 |
| design_3 |
75 |
1 |
ATE |
Treatment Effect |
Z |
0.0292023 |
0.1240253 |
1.2038179 |
0.0985437 |
0.13 |
0.0366268 |
0.90 |
0.0292110 |
1.0292023 |
0.1240253 |
1.209526 |
0.0997754 |
1.141634 |
0.0120812 |
0.0769231 |
0.0833045 |
1 |
0 |
100 |
| design_4 |
100 |
1 |
ATE |
Treatment Effect |
Z |
0.0946080 |
0.1112108 |
1.0419003 |
0.0758989 |
0.21 |
0.0409996 |
0.93 |
0.0233158 |
1.0946080 |
0.1112108 |
1.042823 |
0.0778319 |
0.981938 |
0.0061373 |
0.0476190 |
0.0470524 |
1 |
0 |
100 |
| design_5 |
30 |
2 |
ATE |
Treatment Effect |
Z |
-0.0960000 |
0.1855930 |
1.8885777 |
0.1454002 |
0.21 |
0.0394061 |
0.92 |
0.0282004 |
1.9040000 |
0.1855930 |
1.895638 |
0.1464280 |
1.797472 |
0.0241419 |
0.0476190 |
0.0426354 |
2 |
0 |
100 |
| design_6 |
50 |
2 |
ATE |
Treatment Effect |
Z |
-0.1966960 |
0.1504968 |
1.4405168 |
0.0788561 |
0.23 |
0.0455133 |
0.96 |
0.0162692 |
1.8033040 |
0.1504968 |
1.434214 |
0.0827779 |
1.405005 |
0.0147104 |
0.0000000 |
0.0000000 |
2 |
0 |
100 |
| design_7 |
75 |
2 |
ATE |
Treatment Effect |
Z |
0.0700125 |
0.1233924 |
1.2256098 |
0.0857311 |
0.47 |
0.0533049 |
0.93 |
0.0246337 |
2.0700125 |
0.1233924 |
1.229773 |
0.0881893 |
1.152958 |
0.0084375 |
0.0000000 |
0.0000000 |
2 |
0 |
100 |
| design_8 |
100 |
2 |
ATE |
Treatment Effect |
Z |
-0.0518520 |
0.1070171 |
1.0119394 |
0.0632917 |
0.49 |
0.0449847 |
0.97 |
0.0176143 |
1.9481480 |
0.1070171 |
1.015701 |
0.0635915 |
1.003862 |
0.0066878 |
0.0000000 |
0.0000000 |
2 |
0 |
100 |
| design_9 |
30 |
3 |
ATE |
Treatment Effect |
Z |
0.0121733 |
0.1507504 |
1.6300766 |
0.1327646 |
0.30 |
0.0454485 |
0.94 |
0.0230228 |
3.0121733 |
0.1507504 |
1.638243 |
0.1334250 |
1.838747 |
0.0237285 |
0.0000000 |
0.0000000 |
3 |
0 |
100 |
| design_10 |
50 |
3 |
ATE |
Treatment Effect |
Z |
-0.0329440 |
0.1192950 |
1.2328144 |
0.0649279 |
0.53 |
0.0496715 |
1.00 |
0.0000000 |
2.9670560 |
0.1192950 |
1.238583 |
0.0652218 |
1.414221 |
0.0125305 |
0.0000000 |
0.0000000 |
3 |
0 |
100 |
| design_11 |
75 |
3 |
ATE |
Treatment Effect |
Z |
-0.0647430 |
0.1175204 |
1.0791808 |
0.0768960 |
0.73 |
0.0457909 |
0.97 |
0.0174553 |
2.9352570 |
0.1175204 |
1.082664 |
0.0794593 |
1.131993 |
0.0106370 |
0.0000000 |
0.0000000 |
3 |
0 |
100 |
| design_12 |
100 |
3 |
ATE |
Treatment Effect |
Z |
-0.2615480 |
0.0984310 |
0.9822463 |
0.0753796 |
0.78 |
0.0372915 |
0.94 |
0.0225798 |
2.7384520 |
0.0984310 |
0.951554 |
0.0629973 |
1.002502 |
0.0061493 |
0.0000000 |
0.0000000 |
3 |
0 |
100 |
| design_13 |
30 |
4 |
ATE |
Treatment Effect |
Z |
-0.1232600 |
0.2104620 |
1.8947399 |
0.1465375 |
0.57 |
0.0536397 |
0.97 |
0.0167124 |
3.8767400 |
0.2104620 |
1.900251 |
0.1374858 |
1.821036 |
0.0246287 |
0.0000000 |
0.0000000 |
4 |
0 |
100 |
| design_14 |
50 |
4 |
ATE |
Treatment Effect |
Z |
-0.1740480 |
0.1392170 |
1.3003914 |
0.0928401 |
0.76 |
0.0469010 |
0.96 |
0.0218895 |
3.8259520 |
0.1392170 |
1.295183 |
0.0890643 |
1.415701 |
0.0163220 |
0.0000000 |
0.0000000 |
4 |
0 |
100 |
| design_15 |
75 |
4 |
ATE |
Treatment Effect |
Z |
0.0961406 |
0.1155925 |
1.1755980 |
0.0837818 |
0.94 |
0.0218005 |
0.95 |
0.0199990 |
4.0961406 |
0.1155925 |
1.177563 |
0.0856972 |
1.154123 |
0.0092089 |
0.0000000 |
0.0000000 |
4 |
0 |
100 |
| design_16 |
100 |
4 |
ATE |
Treatment Effect |
Z |
0.1921140 |
0.1000403 |
0.9878416 |
0.0610962 |
1.00 |
0.0000000 |
0.97 |
0.0163287 |
4.1921140 |
0.1000403 |
0.973862 |
0.0601903 |
1.000048 |
0.0070673 |
0.0000000 |
0.0000000 |
4 |
0 |
100 |
OPTIONAL ASIDE - Why use list() in redesign instead of c()?
Good practice in DeclareDesign is to use list() in redesign and it’s sister function expand_design instead of c(). The distinction is subtle, but the reason why is there are some cases where we would want a to use a vector within our design [think list(c("A", "B", "C")) vs c(c("A", "B". "C"))] and a normal vector won’t cut it. Using c() won’t cause problems in most cases, but this is an edge case to be aware of.
get_diagnosands(change_the_parameters) %>%
mutate(N = factor(N),
ate = factor(ate)) %>%
ggplot(aes(N, power, fill = ate)) +
geom_bar(position = "dodge", stat = "identity") +
geom_errorbar(aes(ymin = power - 1.96*`se(power)`,
ymax = power + 1.96*`se(power)`,
width = .3),
position = position_dodge(width = .9)) +
geom_hline(yintercept = 0.8) +
labs(title = "Power at each N and ate",
subtitle = "95% Confidence Intervals")

Notice how if my original estimate of coffee on number of widgets was a little too high, or there were some workers in my factory who didn’t show up my experiment likely wouldn’t work! This could be huge!
---
title: "Simulating Research Designs for Social Science"
subtitle: "DeclareDesign - An Overview & Beginner Example"
author: "John-Henry Pezzuto"
output: 
  rmarkdown::html_notebook:
    smart: false
    theme: spacelab
    number_sections: no
    toc: true
    toc_float:
      collapsed: false
    toc_depth: 4
---


Imagine you had to do an experiment with a really hard to get population, for example let's say the CEOs of Fortune 500 companies. These are people with super tight-schedules and it's unlikely you'll ever get to work with them again if this experiment goes wrong. Stakes are high. How do you know your experiment will work? You don't want to do something too mundane, but at the same time you want to do something safe enough that you're likely to get significant results with. How do you prepare? One way to approach this problem is to think about all the possible outcomes and ways of analyzing this experiment, and consider the probability of best-case and worst-case outcomes, and work off of that.

[DeclareDesign](https://cran.r-project.org/web/packages/DeclareDesign/DeclareDesign.pdf) is the tool to do just that [(Blair, Coppock, Cooper And Humphreys, 2019)](https://declaredesign.org/declare.pdf). DeclareDesign, winner of the [prestigious Rosenthal Prize for Open Social Science](http://newsroom.ucla.edu/dept/faculty/political-scientist-wins-2016-leamer-rosenthal-prize-for-open-social-science) & [Society for Political Methodology Statistical Software Award](https://www.cambridge.org/core/membership/spm/about-us/awards/statistical-software-award/statistical-software-award-2019), is a tool for creating a fake dataset, looping the dataset a few thousand times with some random noise, and then changing the initial assumptions and comparing the results. The purpose of DeclareDesign is to determine how subtle methodological changes affect the results of an experiment. This is hugely important if you are about to invest a huge amount of time and money into an experiment that may not pan-out.

Of course, learning to simulate data is tricky and there is a learning-curve to doing it. I am going to do my best in this guide to walk new users through difficulties and make this as easy as possible. DeclareDesign closely follows the framework laid out by [Gerber & Green's Field Experiment handbook](https://www.amazon.com/Field-Experiments-Design-Analysis-Interpretation/dp/0393979954). I think this is the single best resource to look at if you find any of the notation confusing. For a more hands-on approach for DeclareDesign, I really like [this gentle introduction by Dorothy Bishop](http://discuss.declaredesign.org/t/gentle-introduction-to-declaredesign-by-dorothy-bishop/51). Another helpful resource is [Kevin Munger's repository from his workshop](https://github.com/kmunger/PRESS_power_analysis), and lastly the [DeclareDesign book](https://book.declaredesign.org/) looks fantastic, but as I write this, it's still a work in progress.

One thing I found difficult as a new DeclareDesign user is that because the tools are so powerful, the documentation often can't define every single option available. In this guide, I will do my best to address these ambiguities. I also try to be clear about what order the steps come in, but with that said, it is important to emphasize that I am only teaching one way to use DeclareDesign and most steps can be done in different orders (sometimes with differing results!). Most of the DeclareDesign functions below can have their internals replaced with other "handlers", and allow more flexibility than I am letting on. 

In my perspective, DeclareDesign can be thought of in 2 parts. __1) The data fabrication step__ (this is the bulk of the code),  and __2) the evaluation step__, where users can change initial assumptions and see how well their design actually worked (this is the computationally intensive part of the code).

The cool thing about the DeclareDesign data fabrication step is that it allows for (as far as I know) _any design imaginable_, which is why there are so many steps. However, because most researchers often need to work with simple/common designs, there are some ready-made designs which are available in the [DesignLibrary](https://cran.r-project.org/web/packages/DesignLibrary/DesignLibrary.pdf), which users can also contribute to. __As far as I can tell, most simple psychology lab-experiments will be able to get the entirety of their needs met by DesignLibrary.__ Although I do think you will develop more intuition for the process by using the individual parts of DeclareDesign. I will expand on DesignLibrary in a future article.


## Brief Overview of Power Analysis / DeclareDesign Terms

Before expanding on DeclareDesign, it makes sense to have a few terms defined. I won't be using all these terms in the design I share, but it is still useful to have them defined in one place:

__attrition_rate__ - the proportion of cases that will be lost to follow-up.

__$\alpha$ (alpha)__ - The chance of making a type I error; the chance of claiming there is an effect when there actually is none -- a false alarm. Traditionally 0.05 (= p-value significance threshold). Also known as the [chance of making a fool of yourself](https://royalsocietypublishing.org/doi/full/10.1098/rsos.140216).

__$\beta$ (beta)__ - The chance of making a type II error where you falsely claim there is not an effect when there actually is -- missed detection. Known as power or sensitivity. Traditionally 0.8. 

__Blocking__ - Participants put into blocks/strata and then randomly assigned. Helps reduce sampling variability, and assures that subgroups are available for separate analysis. In practice, blocking rarely has any downsides.

__Cluster random assignment__ - Random assignment not done on the individual level, but instead assigned on the group level. Consider cities or classrooms randomly getting a treatment, instead of each person individually being assigned. Generally bad for power although necessary.

__Diagnosand__ - A summary of the distribution of a diagnostic statistic. (i.e., the expected value from looping the design)

__ES__ - Effect size. Measures the strength of the phenomenon of interest; solely magnitude based -  does not depend on sample size.

__N__ - The sample size. Larger N can lead to smaller ES being deemed significant, although the smaller ES is not always interesting in practical terms (imagine something that makes 1 million people 0.0001% happier -- a large N can cause small effects to be deemed significant, regardless of it's actual interest).

__$\rho$ (rho)__ - The correlation between time 1 and time 2. When using a within subjects design, intuitively you would expect some stability in peoples' preferences overtime.

__t1__ - Time 1. Can be modified for any time period e.g., tx 

__Z__ - Condition. In this example I use one this 1 to represent the treatment group, and 0 to represent the control. Cf. arms


## Data and Design Fabrication
DeclareDesign operates using a [MIDA framework](https://declaredesign.org/declare.pdf). The functions for each of the MIDA letters are shown below. Simply put MIDA stands for:

* __M__ - model of the world
  + `declare_population`
  + `declare_potentional_outcomes`
* __I__ - inquiry
  + `declare_estimand`
* __D__ - data strategy
  + `declare_sampling`
  + `declare_assignment`
  + `declare_reveal`
  + `declare_step`
* __A__ - answer strategy 
  + `declare_estimator`

It is important to note that the order of the functions is for the most part necessary and that they all follow the `declare_*` pattern. 

![A simplified example of a DeclareDesign ecosystem. In a real design, you may need to change the order of the Declare your design steps (shown in blue), or use a function multiple times to properly represent your design (design functions that may be used multiple times are represented with bold borders).](DeclareDesign.png)

# First Example: The effect of coffee on productivity

Let's begin with a simple design. Let's say we have a factory with 100 workers who make widgets. I split my workers into two groups, and give one group coffee. I want to see how coffee impacts the productivity of the second group. 

```{r, echo=FALSE}
knitr::opts_chunk$set(message = FALSE, warning = FALSE)
```


We start by loading the following packages:
```{r  setup}
rm(list = ls())
set.seed(13)
library(DeclareDesign)
library(DesignLibrary)
library(knitr)
library(DT)
library(tidyverse)

n_sims = 100
```

### __M__ - Model of the World
#### Terms
To begin, let's define our terms. Intuitively we know that we'll need `N`, the number of people, but we'll also need to define the effect size we think coffee will have. Notice I am making these assumptions up-front, and defining them as objects. In my opinion this is the most transparent way to use DeclareDesign and is also the most intuitive. Please note that more complex designs will have a lot more terms to define.

(Note: I keep adding to a df1 at the bottom of my chunks, but this is only for visualization purposes -- it is not part of the DeclareDesign framework, and you won't need to do this)
```{r terms}
N <- 100 # sample size, 100 factory workers
ate <- 4 # average treatment effect I think coffee will have on productivity
sd_1 <- 5 # SD of worker productivity before intervention

df1 <- ""
```

#### Population
I start by defining the population. For me this is literally everyone in the factory, all 100 people, but thinking ahead, imagine I was studying Facebook users, I would theoretically use 360 million to represent everyone on Facebook. In reality my sample wouldn't involve every user on Facebook, but that would still be my population. Next, notice I __manually assert the distributions__ for widgets because not everyone makes the same number of widgets each day. 

__The purpose of this step is to define the that groups will be compared before being given the treatment, and for me this is one group__. This step is really creating similar to making a regular dataframe with the exception of the special __`N`__ argument which defines the number of rows, and creates a column to label the participants.
```{r population}
# M: Model

population <- 
  declare_population(N = N,
                     widgets_t1 = rnorm(n = N, # 100 draws from a normal distribution
                     mean = 50, # before intervention, my workers average of 50 widgets a day
                     sd = sd_1) %>% round(., 2)) # sd of worker productivity


df1 <- population()

datatable(df1)
```

Here we see the each of the 100 workers makes approximately 50 widgets a day.

The key points from this `declare_population` are:

* The number of rows in your population will be equal to N.

* Your final sample will be equal to or less than N

* One column for _each measure that you expect that affects or will be affected by treatment_



#### Outcomes
__The purpose of this step is to define the two outcomes using each of the groups (i.e. treatment/control)__. The treatment group will have the ATE added to their distribution. Again this step is just adding new columns to our data frames.
```{r outcome, include=TRUE}
potential_outcomes <- declare_potential_outcomes(Y_Z_0 = widgets_t1, # the control group, here we expect business as usual 
                                                 Y_Z_1 = widgets_t1 + ate) # the treatment group, business as usual + the productivity boost from coffee
                                                 

df1 <- potential_outcomes(df1)

datatable(df1)
```
Notice above how everyone has a "potential outcome" for that would happen _if_ they were treated or not. Since we expect to split our design into two groups, in real life we wouldn't actually get to observe both of these potential outcomes.

The key points from this `declare_potential_outcomes` are:  

* This is the step where our treatment effect gets included in the model

* Everyone has a outcome that _would_ happen depending on which group they were placed in


### __I__ - Inquiry

#### Estimand

__The purpose of this step is to define, how you define, your ATE -- this is your TRUE TREATMENT EFFECT)__. Because we are simulating data, we have somewhat of a "perfect world", where all participants get the treatment perfectly and no one is missing (yet), we will get exactly the same ATE as we defined originally. 

The estimand below is the simplest estimand, but as Jasper points out that there are [conditional ATEs](https://pdfs.semanticscholar.org/8714/260129e51abb09fd89d6ff79065af17bb106.pdf), or special ATEs for when you have [categorical outcomes](https://arxiv.org/pdf/1501.01234.pdf) which are necessary for more complex designs.

```{r estimand, include=TRUE}
estimand <- declare_estimand(ATE = mean(Y_Z_1 - Y_Z_0))

kable(estimand(df1))
```
The key points from this `declare_estimand` are:

* This is our "dream world" treatment effect(s)

* This will be important later on for evaluating our model


### __D__ - Data Strategy

#### Assignment
There are two important arguments in the `declare_assignment` function which are `prob` (the chance of getting picked) and `assignment_variable` (the name of the variable being assigned).

I think it is really important to point out that this function defaults the name to `assignment_variable` to Z which is essential. __What this means is that the default use of `declare_assignment` is to declare conditions__, although this function has other purposes it used for also (e.g., attrition). 


__We are declaring what % of people get assigned to the treatment group. Here we create the `Z` and `Z_cond_prob` columns__
```{r assignment, include=TRUE}
assignment <- declare_assignment(prob = .5, 
                                 assignment_variable = Z) # by the relevant part of columns to assign from, default assignment_variable = Z
df1 <- assignment(df1)
datatable(df1)
```
Above `Z` represents the condition the participant is assigned to, `Z_cond_prob` the chance that the participant gets assigned to `Z`. This is of course could be a lot more complicated with more conditions, and unequal assignment levels.

The key points from this `declare_assignment` are:

* This is where we state the probablity of being assigned to a condition

* Subjects are not "literally" placed into groups yet


#### _Actually_ put people in groups (Reveal)

Next we _actually_ give people the treatment. Here we create the `Y` column, which is our observed outcome.

```{r actual_assignment, include=TRUE}
reveal <- declare_reveal(outcome_variables = Y) # here we use the prefix before the Z part begins, in our case this is simply Y


df1 <- reveal(df1)


datatable(df1)
```

Note that the way the chunk above is written, you can conceptually you can think of `declare_reveal` like:
```
df1 %>% 
  mutate(Y = if_else(Z == 1, Y_Z_1, Y_Z_0))
```

### __A__ - Answer Strategy

__This is the actual modelling we're using__

The important points here are that:  

* Our estimands argument correspond with the names of the estimands we made before,
* The model can be anything that works with the `broom` package, and can be modified to work with any model that outputs a dataframe. Here we use `estimatr::lm_robust` as our model.


The model being shown here is simply using treatment assignment (Z) to predict productivity outcomes (Y) 
```{r estimated_effect}
estimated_effect <- declare_estimator(
  Y ~ Z,
  model = lm_robust,
  estimand = c("ATE"), # also just `estimand` (the object we defined above) would work
  #subset = R == 1, # only keep people not lost to attrition
  label = "Treatment Effect"
)
```

The key points from `declare_estimator` is:

* This is where you define your model. You know, the same one you are going to use in the your paper.

* You can also use this step to try out multiple models to see which one works best for your design.


Note: __If you are planning on testing multiple models, it makes sense to create a different `delcare_estimator` function for each of the models you are planning on testing__. For example for our current design, I use linear regression, but I could have opted for a correlation or a t-test if I had wanted to.



#### Put it all together
Putting all the pieces we just wrote together we get a design that looks like this. We defined all these pieces individually, but of course we could have done this all together ggplot style, although I personally prefer to do one step at a time. From now on the combination of all of those parts will be called our ___our\_design___

```{r our_design}
our_design <- population + 
              potential_outcomes + 
              estimand + 
              assignment + 
              reveal + 
              estimated_effect 
```


Here I paste all the code from above together just incase you would like to test it as a whole:
```{r, our_design_2, eval=FALSE, include=T}
our_design <- 
  declare_population(N = N, 
                     widgets_t1 = rnorm(n = N, 
                                        mean = 50,
                                        sd = sd_1) %>% round(., 2)) +
  declare_potential_outcomes(Y_Z_0 = widgets_t1, 
                              Y_Z_1 = widgets_t1 + ate) +
  declare_estimand(ATE = mean(Y_Z_1 - Y_Z_0)) +
  declare_assignment(prob = .5, 
                     assignment_variable = Z) +
  declare_reveal(outcome_variables = Y) + 
  declare_estimator(Y ~ Z,
                    model = lm_robust,
                    estimand = c("ATE"), 
                    label = "Treatment Effect"
)
```

## Model Evaluation

The following chunks each showcase __one__ model evaluating function. Note that sometimes I use multiple functions together because that is best way to use that function. I am piping some functions into `head` and `kable` for visualization purposes.

#### To get take a peek at your design
Note: This is only a single simulation of your data with `draw_data`
```{r, first_view, include=TRUE, }
our_design %>% 
  draw_data() %>% # draw single simulation
  head(5) %>% 
  kable()
```

#### Take a peek at your unbiased estimands
`draw_estimands`
```{r draw_estimands}
our_design %>% 
  draw_estimands() %>%
  kable()
```

#### See estimates of model
`draw_estimates`
```{r, draw_estimates}
our_design %>% 
  draw_estimates() %>% # 
  kable()
```

### To see the aggregated summary of your design
`diagnose_design` will loop the creation of your design, making it the most resource intensive step described. This shows the details of your design. __This is the single most important function for the evaluation of models__

Here I use `sims = `r n_sims`` only for demonstration purposes, but in real life you would likely want to use 500 or higher.
```{r, diagnose_design}
diagnosis1 <- our_design %>% 
  diagnose_design(sims = n_sims)
```

__OPTIONAL ASIDE:__ Another way to to simulate data in DeclareDesign is with the `simulate_design` function. I don't showcase it here but it is essentially equivalent to `our_design %>% diagnose_design %>% get_simulations()`


##### To see the aggregated summary in data.frame form (diagnosands)
`get_diagnosands`. This is really useful for making plots.
```{r, get_diagnosands}
diagnosis1 %>% 
  get_diagnosands() %>%  
  kable()
```

##### What are all the stats in the diagnosand anyway?
Descriptions from [pg 7](https://declaredesign.org/declare.pdf), Formulas from [here](https://www.rdocumentation.org/packages/DeclareDesign/versions/0.14.0/topics/diagnosand_handler).
```{r diagnosand_table, echo=FALSE}
options(knitr.kable.NA = '')
get_diagnosands(diagnosis1) %>%
  slice(1) %>% 
  gather(value = "Estimate") %>% 
  mutate(Description = case_when(key == "bias" ~ "Expected difference between estimate and estimand",
                                key == "rmse" ~ "Root mean squared error",
                                key == "power" ~ "Probability of rejecting null hypothesis of no effect",
                                key == "coverage" ~ "Probability that conf. interval contains estimand",
                                key == "type_s_rate" ~ "Probability that a significant estimate has incorrect sign",
                                key == "mean_estimate" ~ "Mean of the estimates"),
         Formula = case_when(key == "bias" ~ "mean(estimate - estimand)",
                                key == "rmse" ~ "sqrt(mean((estimate - estimand) ^ 2))",
                                key == "power" ~ "mean(p.value < alpha)",
                                key == "coverage" ~ "mean(estimand <= conf.high & estimand >= conf.low)",
                                key == "type_s_rate" ~ "mean((sign(estimate) != sign(estimand))[p.value < alpha])",
                                key == "mean_estimate" ~ "mean(estimate)")) %>% 
  kable()
  
```

__OPTIONAL ASIDE - What if we don't want all these stats or want to create a new stat?__
`declare_diagnosands`  

* We can drop stats with the `subtract` argument
* Create stats by specifying out own argument
* Or keep stats with the `select` argument
```{r}
# keep only bias, rmse, & type_s_rate
my_diagnosands <- declare_diagnosands(select = c(bias, rmse, type_s_rate))

# drop type_s_rate
my_diagnosands <- declare_diagnosands(subtract = type_s_rate)

# You can add your own diagnosands in addition to or instead of the defaults e.g.,
my_diagnosands <-
  declare_diagnosands(median_bias = median(estimate - estimand))

# Keep only the stat we defined
my_diagnosands <-
   declare_diagnosands(median_bias = median(estimate - estimand),
                       keep_defaults = FALSE)
```

To apply:
```{r eval=TRUE}
our_design %>% 
  diagnose_design(sims = n_sims, diagnosands = my_diagnosands) %>% 
  get_diagnosands() %>% 
  kable()
```

##### To see the `r n_sims` individual simulations that made the aggregated summary
`get_simulations`
```{r individual_simulations}
diagnosis1 %>% 
  get_simulations() %>%
  datatable()
```

`get_simulations` also has nice properties for plotting e.g., 

```{r graph}
get_simulations(diagnosis1) %>% 
  ggplot(aes(estimate)) + 
  geom_density() +
  geom_vline(xintercept = ate, col = "red") +
  labs(title = "Density of Effect Size Estimates",
       subtitle = "Real ATE shown in Red")
```

### Change your assumptions

You can test different assumptions from your your `DeclareDesign` design using `redesign` the function. __`redesign` takes the *global values* you are using within the function and gives them new values__ The first argument is your design, the second are the new assumptions. Also note that __it makes sense in most cases to pipe this function into `diagnose_design` or `get_simulations`__. Here I experiment with changing the sample size:
```{r}
change_the_parameters <- 
  our_design %>% 
   redesign(N = list(30, 50, 75, 100), # note that I define N as a variable in the first chunk in this document
            ate = list(1, 2, 3, 4)) %>% # what if the ATE is lower than I originally expected?
   diagnose_design(sims = n_sims) # note this does sims for each of combination of N's and ate's


change_the_parameters %>% 
  get_diagnosands() %>% 
  kable()
```

__OPTIONAL ASIDE -__ Why use `list()` in redesign instead of `c()`?

Good practice in DeclareDesign is to use `list()` in `redesign` and it's sister function `expand_design` instead of `c()`. The distinction is subtle, but the reason why is there are some cases where we would want a to use a vector within our design [think `list(c("A", "B", "C"))` vs `c(c("A", "B". "C"))`] and a normal vector won't cut it. Using `c()` won't cause problems in most cases, but this is an edge case to be aware of.  



```{r}
get_diagnosands(change_the_parameters) %>%
  mutate(N = factor(N),
         ate = factor(ate)) %>% 
  ggplot(aes(N, power, fill = ate)) +
  geom_bar(position = "dodge", stat = "identity") +
  geom_errorbar(aes(ymin = power - 1.96*`se(power)`, 
                    ymax = power + 1.96*`se(power)`,
                    width = .3),
                position = position_dodge(width = .9)) +
  geom_hline(yintercept = 0.8) +
  labs(title = "Power at each N and ate",
       subtitle = "95% Confidence Intervals")
```
Notice how if my original estimate of coffee on number of widgets was a little too high, or there were some workers in my factory who didn't show up my experiment likely wouldn't work! This could be huge!
