This exercise is designed to give you a practical application of randomization and power calculations in Stata. In this exercise, you’ll learn how to use built-in and user-created Stata commands to conduct power calculations for different randomization designs. The exercise is based on data from a randomized evaluation of two education programs in India: Continuous and Comprehensive Evaluation (CCE) and Teaching at the Right Level (TaRL).
📚 Berry, James, Priya Mukherji, Shobhini Mukherji, and Marc Shotland. (2018). Failure of Frequent Assessment: An Evaluation of India’s Continuous and Comprehensive Evaluation Program. J-PAL South Asia.
To get started, make a copy of this dropbox folder containing the original data from the study, as well as the exercise coding files, on your local machine. We’ll walk through Parts A, B, C, and D together, and then ask you to complete Parts E and F.
The data you’ll be using in this exercise comes from a study conducted across 500 lower and upper primary government schools in two districts of Haryana, India, during the 2012–13 academic year. The intervention focused on improving foundational learning by testing two programs—CCE, a policy emphasizing regular assessments and student tracking, and TaRL, an instructional approach that tailors teaching to children’s learning levels. The randomized evaluation of the program was based on a factorial design, where schools were randomly assigned in equal proportions to one of four groups.
| Treatment Group | Description |
|---|---|
| T1 | Comparison group: Schools within a school-campus assigned to this group follow the standard government curriculum. |
| T2 | CCE only: Schools within a school-campus assigned to this group recieve CCE only. |
| T3 | TARL only: Schools within a school-campus assigned to this group receive TARL only. |
| T4 | CCE + TARL: Schools within a school-campus assigned to this group receive both TARL and CCE. |
The authors of the study collected data on student learning outcomes at baseline and endline, as well as teacher and school characteristics. While the real study conducted two experiments in lower and upper primary schools, this exercise will only use the lower primary school data, labeled as primary_cleaned.dta. The RCT evaluated the impact of the programs on four learning outcomes, but we’ll only be using one of the outcomes for this exercise: standardized oral Hindi test scores.
Toggle the codebook section below to see some of the main variables you’ll need.
| Data |
Variable in primary_cleaned.dta
|
|---|---|
| School campus | super_school_id |
| School | school_id |
| Original stratum | stratum |
| Female | female |
| Age (years) | age_years |
| Grade in 2011–12 school year | base_standard |
| Village (fictional) | village_id |
| Oral Hindi score at baseline (standardized) | base_aser_read_norm |
Start by opening the do-file RST_rand_power_Stata.do. The following commands can be used at the beginning of your file to set up your environment. Each command does the following operations:
clear all removes all data, saved results, and stored
programs from memoryset more off prevents Stata from pausing outputversion 16 ensures your code runs under Stata 16’s
syntax and behavior (note that in this case, your version of Stata must
be version 16 or greater).Install the below two packages, while we’ll leverage later in the exercise:
distinct is used to count the number of unique values
in a variablerandtreat is used to generate random treatment
assignmentYou can find additional documentation for the randtreat
command here, or run
help randtreat in your Stata console.
Finally, set your main working directory to the location of folder
you downloaded using with Stata’s global macro.
global and
local macros in Stata
Macros in Stata are shortcuts used to store and
reuse information, such as numbers, text, or variable lists. Stata has
two main types of macros: global and
local.
Global macros are available everywhere in your Stata session
(across do-files, programs, and commands) until you manually clear or
overwrite them, and can be referenced using the $ sign.
They are most useful for values or variables you need to repeatedly
reference throughout your code.
Local macros are available only within the block of code where they are defined (e.g., within a loop, a program, or a single do-file section). They can be referenced with a backtick and apostrophe. They are most useful for temporary or context-specific tasks like within loops or programs, and you won’t be reference them after you’ve executed the block of code where they were created.
This section goes over how to use the built-in power
command in Stata. The command can take on both continuous
(power twomeans) and binary
(power twoproportions) outcome variables.
The power command in Stata estimates statistical power, minimum detectable effect (MDE), or sample size for a study based on user-specified inputs. It can take on the following parameters:
alpha() : Significance level (the default is set to
0.05)power() : Power level (the default is set to 0.80)n1() : Sample size in control groupn2() : Sample size in treatment groupsd() : Standard deviation of outcome of interestSee the dropdown below for how to include different parameters in the
power command.
| Goal | Code | Inputs |
|---|---|---|
| Calculate power |
power twomeans 100 120, n1(500) n2(500) sd(100)
|
Control mean and estimated treatment mean, sample size, and control standard deviation |
| Calculate MDE |
power twomeans 100, n1(500) n2(500) sd(100) power(.8)
|
Control mean, sample size, control standard deviation, power |
| Calculate sample size |
power twomeans 100 120, sd(100) power(.8)
|
Control mean and estimated treatment mean, control standard deviation, power |
power
command
You can also use intervals to calculate power, MDE, or sample
size under a range of assumptions. Stata will present the output as a
table, or you can specify the graph option to see your
results as a graph.
Example: Calculate power across a range of effect sizes. The below code will calculate the power for a range of endline treatment means, starting with .05 to .3, with an interval of .05.
Example: Add a graph to visualize the results:
The power command can also calculate power, MDE, and
sample size for cluster-level designs by including any of the parameters
below:
k1() : Number of clusters in the control groupk2() : Number of clusters in the treatment groupm1() : Average cluster size in the control groupm2() : Average cluster size in the treatment grouprho() : The intracluster correlation (ICC)See the dropdown below for an overview.
| Goal | Code | Inputs |
|---|---|---|
| Calculate power |
power twomeans 100 120, m1(20) m2(20) k1(60) k2(60) rho(.1) sd(100)
|
Control mean and estimated treatment mean, control and treatment average cluster size and number of clusters, and control ICC and standard deviation |
| Calculate MDE |
power twomeans 100, m1(20) m2(20) k1(60) k2(60) rho(.1) sd(100) power(.8)
|
Control mean, average cluster size and number of clusters, control ICC and standard deviation, and power |
| Calculate number of clusters |
power twomeans 100 120, m1(20) m2(20) rho(.1) sd(100) power(.8)
|
Control mean and estimated treatment mean, average cluster size, control ICC and standard deviation, and power |
| Calculate average cluster size |
power twomeans 100 120, k1(60) k2(60) rho(.1) sd(100) power(.8)
|
Control mean and estimated treatment mean, number of clusters, control ICC and standard deviation, and power |
power command for
cluster-level designs
Instead of specifying k2, you also have the option
to specify the kratio(), i.e., the ratio of the number of
clusters in the treatment group to the comparison group. For example,
adding kratio(1) to the power command would indicate that
the number of clusters in the treatment group and the control group is
the same.
mratio(), i.e., the ratio of
the average cluster size in the treatment group to the comparison group.
The randtreat command is a user-created package that can
conduct random assignment for complex experimental designs.
To randomize treatment assignment using the randtreat
command, you must first set your data so that it is at the level at
which you want to randomize. In the sample code below, we’ll randomize
at the school level and stratify using the fictional
village_id variable, and assign any misfits globally. In
order to do that, we’ll use Stata’s preserve and
restore functions to keep our original dataset intact
before collapsing to the school level.
randtreat command
Remember that whenever you randomize, you should set a seed so
that anyone who goes back to your coding file can replicate your exact
randomization. The randtreat command has a built-in option
to set the seed–in the code below, it is set to 12345.
restore and preserve
in Stata
The preserve and restore commands in
Stata allow you to temporarily modify your dataset without
losing the original version:
preserve: Saves the current state of your dataset in
memory (including variables, observations, and sort order).restore: Restores the dataset to the exact state it was
in at the time of the most recent preserve.The randtreat command also allows you to specify
unequal allocation ratios. Suppose you have three
groups (one comparison and two treatment groups), and you want to assign
half of your sample to the comparison group, and split the remaining
half of your sample betwen the two treatment groups. You can use the
unequal() option to do so as in the sample code below.
multiple(). Note that the
unequal() option only allows fractions, not decimals.
Misfits happen when, during the randomization process, the
number of units in a group (like a stratum) doesn’t divide evenly across
the treatment groups. For example, if you have 5 units and want to
assign them evenly to two groups (treatment and comparison), one unit
will be left over—this is a misfit.
In simple cases, such as random assignment to two groups without stratification, you can just randomly assign that leftover unit after dividing up the rest. But when working with multiple strata, you need to decide whether to balance treatment assignments within each stratum or across the whole sample.
The randtreat() command has several options for dealing
with misfits. The strata option allocates misfits
independently within each stratum to maintain treatment balance across
strata. The global option allocates all misfits across the
full sample to maintain overall treatment balance. Run
help randtreat in your Stata console for more
information.
Your randomization created the treatment variable. Check
it out by running tab treatment, m in your Stata
console.
In this section, we’ll walk through how to estimate parameters for power calculations from baseline data. Storing these estimates is helpful because it allows you to easily reference them later.
In Stata, estimates from commands like sum and
regression commands (e.g., reg, areg) are
automatically stored in return lists (e.g., r() and
e()), and you can save specific values (like means,
standard deviations, or coefficients) into local or
global macros using commands as in the sample code
below.
To get a good idea of our sample, we’ll use the
treatment variable you created earlier containing treatment
assignments. To estimate the number of clusters (schools) and average
cluster size (students within a school) in each group for our power
calculations, we first want to calculate the total number of
students. We can easily to do this with Stata’s
count function by storing the returned results in
global macros.
We’ll then get the number of clusters in each group
by using the distinct function we installed earlier. This
function counts the number of unique observations of a variable, and
since the data is at the student-level, we’ll use it to avoid double
counting schools.
To get the average cluster size (i.e., number of students within each school), we’ll divide the total number of students by the total number of schools in each group (rounding down to the nearest integer value).
* Number of clusters
global cluster_size_c = floor($students_c / $clusters_c)
global cluster_size_t = floor($students_t / $clusters_t)If we’d like to check out the results, we can use Stata’s
display command and reference the global
macros we stored above.
To produce reliable power estimates, we need to estimate two key parameters from our baseline data: an outcome variance and the intra-cluster correlation (in the case of cluster-level designs). For the outcome variance, rather than using the raw standard deviation of our outcome of interest, we use the residual standard deviation—which captures how much variation remains after accounting for covariates (and strata, if included in our randomization design).
Assuming the same randomization design as in Section C, the sample
code below estimates the residual standard deviation of the outcome
variable oral Hindi test scores, base_aser_read_norm,
controlling for the covariate female and stratification at
the village_id level. The predicted residuals from the
regression are summarized, and their standard deviation is stored in a
global macro res_sd for later use.
loneway command In Stata uses a one-way ANOVA to
calculate the proportion of total variation in the outcome that is
attributable to differences between clusters, and we store that value
for later use.
In practice, preliminary power calculations are typically conducted before finalizing a randomization design to ensure your design is sufficiently powered to detect the expected effect size. In this section, you’ll conduct power calculations for a few scenarios based on the study data before settling on a final randomization design. Throughout the section, we’ll revisit the case study from Part A.
For each calculation, keep the power level set to .80 and the significance level set to .05.
Remember that for randomization designs with multiple treatment
arms, you’ll want to conduct power calculations for each pairwise
comparison you plan to analyze. For example, if your study includes a
control group, a CCE group, and a TARL group, you should calculate power
separately for the control vs. CCE comparison, control vs. TARL
comparison, and (if relevant) CCE vs. TARL comparison (although in this
case study, the authors did not power their study to detect a difference
between the CCE and the TARL group).
The researchers have onboarded you to their study and have asked you to help them conduct preliminary power calculations!
Suppose they believe there is a high risk of spillovers between schools within the same village—for example, through regular interactions among teachers, students, or parents at community events and meetings. To reduce the chance of contamination between treatment and comparison groups, they decide to randomize treatment assignment at the village level, assigning all schools within a village to the same group.
Instructions: Estimate the MDE for standardized oral Hindi test scores (i.e., baseline mean of 0, standard deviation of 1) for a pairwise comparison of the control group to any one of the treatment groups. Assume that the treatment and comparison groups have 15 villages each, and the average number of students within a cluster is 34. The ICC of oral Hindi test scores is .10.
The researchers revisit their assumptions and find that while some interactions occur within villages, schools within the same school campuses operate more independently, with limited collaboration among teachers and little contact between parents of different school campuses. To increase statistical power while still limiting the risk of spillovers, they decide to randomize at the school campus level in order to increase the number of clusters.
Instructions: Estimate the MDE for standardized oral Hindi test scores (i.e., baseline mean of 0, standard deviation of 1) for a pairwise comparison of the control group to any one of the treatment groups. Assume that the number of school campuses in the comparison and treatment groups is 90 each, and the average number of students within a school campus is 34. Since you are randomizing at the school campus level and believe that students within a school campus are very similar to each other, you also increase the ICC to .11.
The baseline data which the researchers are using was collected in 2011. Suppose they wanted to run their study several years later and were not confident on how the average number of students within schools would change, so they ask you to check what power would look like under a few scenarios.
Instructions: Estimate the MDE for standardized oral Hindi test scores (i.e., baseline mean of 0, standard deviation of 1) for a pairwise comparison of the control group to any one of the treatment groups. Assume that the treatment and comparison groups have 90 school campuses each, and vary the average number of students within a school from 15 to 40, with an interval of 5. Keep the ICC at .11.
Remember that you can include intervals in the
power command. For example, the code chunk below estimates
the MDE for several different scenarios that differ by the cluster
number, starting with a cluster number of 60, and increasing by 5 until
reaching 90.
kratio(1) option specifies that the number of clusters
is equal across the treatment and comparison group.
Suppose the researchers are worried they might face budget constraints that limit how many schools they can support with the interventions. To maximize statistical precision for comparing each treatment arm against the comparison group, they want to check whether assigning half of the sample of school campuses to the comparison group would significantly reduce their power. The remaining half would then be equally divided among the three treatment arms, reducing the overall cost of the interventions.
Instructions: Estimate the MDE for standardized oral Hindi test scores (i.e., baseline mean of 0, standard deviation of 1) for a pairwise comparison of the control group to any one of the treatment groups. Use the following parameters. Assume that half of school campuses (180) are in the comparison groupm, and each treatment group has 60 school campuses. Keep the average number of students at 34 and the ICC at .11. Compare this MDE to what your MDE would be under equal treatment allocation.
Luckily, there is no difference! The MDE under both scenarios is
equal to .15 standard deviations. However, this is not
always the case. For example, shifting just a bit more of your sample to
the comparison group from the treatment group as below results in almost
.3 standard deviation increase in the MDE!
As you saw earlier, covariates can play a large role in improving the precision of your estimates and increasing statistical power by explaining some of the variation in the outcome variable. To improve the precision of their power estimates, the researchers ask you to estimate the residual standard deviation of their main outcome variable, standardized oral Hindi test scores.
Instructions: Regress standardized oral Hindi test scores on students’ gender, grade at baseline, and age in years. Cluster standard errors at the school campus level. Use the predicted residuals to calculate the residual standard deviation.
Instructions: Using the residual standard deviation you calculated above, estimate the MDE of standardized oral Hindi test scores for a pairwise comparison of the control group to any one of the treatment groups. Assume equal allocation of school campuses to the comparison and treatment groups (90 each), 34 students within a school campus on average, and an ICC of .11.
Thus far, you’ve assumed perfect compliance in your power calculations—that is, everyone assigned to a treatment group receives the treatment and everyone in the control group does not receive the treatment. In reality, this assumption often doesn’t hold.
Suppose the researchers want you to conduct sensitivity analyses for their power calculations in case some students in the treatment groups do not receive their assigned treatment due to implementation delays or attendance.
Instructions: Using the residual standard deviation you calculated earlier, estimate the MDE of standardized oral Hindi test scores for a pairwise comparison of the control group to any one of the treatment groups. Calculate the MDE under two different assumptions about treatment take-up: 70% and 90%. Assume equal allocation of school campuses to the treatment and control group (90 each), 34 students within a school campus on average, and an ICC of .11.
If take-up is imperfect, we can account for it in our power
calculations by adjusting the minimum detectable effect (MDE).
Specifically, when estimating the treatment-on-the-treated
(TOT) effect, the MDE increases as take-up
decreases. To approximate this, divide the MDE for the
intention-to-treat (ITT) effect by the take-up
rate.
For example, if your ITT MDE is 0.10 and you expect a 50% take-up rate, the implied MDE for the TOT effect is \[ \frac{0.10}{0.5} = 0.20 \]
This means that lower take-up reduces your study’s ability to detect a given treatment effect.Based on the preliminary calculations you did for the researchers,
they decide to randomize treatment assignment to 4 groups at the school
campus level. Given that your estimated MDE is not as low as they’d like
it to be, they also decide to stratify treatment by average baseline
test scores at the school-level, and have created the
stratum variable for you to do so.
Instructions: Randomize treatment assignment of
school campuses, super_school_id, to four groups of equal
proportions. Stratify using the stratum variable and set
your seed to 12345. Handle misfits locally.
stratum variable was constructed by first grouping
schools by block and school type (lower, upper, or both). Within each
group, school campuses were sorted by average baseline test scores and
grouped into strata of four school campuses each to enable stratified
random assignment across four treatment arms.
Now that you’ve conducted the randomization process, the researchers want you to do one last power analysis based on the actualized randomization design. But first, you’ll need to get estimates to input into your power calculation.
Instructions: Using global macros,
estimate and store the number of school campuses assigned to the
comparison group and the first treatment group (the CCE group). Then,
estimate and store the total number of students in the comparison group
and the first treatment group. Round to the lowest integer.
Instructions: Using global macros,
estimate and store the average number of students in each school campus
in the comparison group and the first treatment group (the CCE group),
rounding to the lowest integer.
In addition to accounting for covariates that can help explain the
variance in oral Hindi test scores, the researchers also stratified
random assignment by the stratum variable and want to
reflect that in their final power calculation.
Instructions: Estimate and store the residual
standard deviation of standardized oral Hindi test scores by running a
regression that accounts for students’ gender, grade at baseline, and
age in years. Be sure to include fixed effects for the original
randomization strata, stratum and cluster standard errors
at the school campus level.
i.stratum to include stratum fixed-effects
in your regression model using reg, or if you prefer not to
see the output of the fixed-effects, you can use areg
instead, and add the option absorb(stratum).
stratum
variable?
Instructions: Estimate and store the ICC of standardized oral Hindi test scores (base_aser_read_norm). Remember that you have randomized at the school campus level!
Now that you have all of your estimates based on the baseline data, the researchers ask you to conduct one last power calculation to confirm the MDE that the study would be able to detect.
Instructions: Using your stored estimates, calculate the MDE of the study for oral Hindi test scores for one pair-wise comparison between the comparison group and first treatment group (the CCE group).
Even with proper randomization, small differences between groups can occur due to chance. This is expected. However, large or systematic differences may suggest problems with the randomization process or the implementation of the assignment. Therefore, before implementing the randomization assignment in the field, the researchers want you to conduct balance checks to see whether the randomization resulted in any imbalance across key variables.
So, how do you check for baseline balance? Balance is typically checked by comparing the means of key baseline variables across treatment groups. This is often done using regression analysis or t-tests to see whether any differences are statistically significant.
Instructions: Store the variables
female, age_years, base_standard,
and base_aser_read_norm in a global and
generate dummies for each treatment group using the
treatment variable you created earlier. You should have
three dummy variables for each treatment group: CCE,
TARL, and CCE_TARL.
To add multiple variables in a global, you can just
list them one after the other as below within the quotation marks.
To create dummy variables for each treatment group, you can use
Stata’s gen and replace functions to generate
new variables. The dummy for the CCE group is created for you below.
In order to check baseline balance, you’ll need to run a linear regression of each baseline variable on indicators for each treatment group. You’ll also need to estimate and store the comparison group averages of the baseline variables so you can compare them to the treatment group!
Instructions: Use a foreach loop to go
through every variable listed in the global macro you
created. For each variable:
absorb(stratum)
option if using areg, or by adding i.stratum
to your reg commandsuper_school_id level
using the cluster() optionCCE == 0, TARL == 0, and
CCE_TARL == 0).
The code below runs through each variable in the
$outcomelist_base global, and runs a regression of the
outcome on the dummy variables for each treatment group. It stores the
comparison group mean and standard deviation for each variable, and then
manually calculates the p-values from the regression. Finally, it stores
the results in a matrix called balance_table.
* Loop through each outcome
foreach var in $outcomelist_base {
* Run regression
quietly areg `var' CCE TARL CCE_TARL, cluster(super_school_id) absorb(stratum)
* Get control group mean
quietly summarize `var' if e(sample)==1 & CCE == 0 & TARL == 0 & CCE_TARL == 0
local control_mean = r(mean)
local control_sd = r(sd)
* Get p-values
local p1 = 2 * ttail(e(df_r), abs(_b[CCE] / _se[CCE]))
local p2 = 2 * ttail(e(df_r), abs(_b[TARL] / _se[TARL]))
local p3 = 2 * ttail(e(df_r), abs(_b[CCE_TARL] / _se[CCE_TARL]))
* Append results to matrix
mat balance_table = nullmat(balance_table) \ ///
`control_mean', _b[CCE], _b[TARL], _b[CCE_TARL] \ ///
`control_sd', `p1', `p2', `p3'
}Instructions: Add column names and row labels to your matrix, and print it to your Stata console to see the results.
You can add column names to the matrix using the line below.
You can base the row labels off of the variable labels for simplicity as below.
local rowlabels
foreach var in $outcomelist_base {
local rowlabels `rowlabels' `var'_mean `var'_sd
}
matrix rownames balance_table = `rowlabels'Finally, to print your matrix, which contains the results from the balance checks, you can run the line below.