A. Introduction

This exercise is designed to give you a practical application of randomization and power calculations in Stata. In this exercise, you’ll learn how to use built-in and user-created Stata commands to conduct power calculations for different randomization designs. The exercise is based on data from a randomized evaluation of two education programs in India: Continuous and Comprehensive Evaluation (CCE) and Teaching at the Right Level (TaRL).

📚 Berry, James, Priya Mukherji, Shobhini Mukherji, and Marc Shotland. (2018). Failure of Frequent Assessment: An Evaluation of India’s Continuous and Comprehensive Evaluation Program. J-PAL South Asia.

To get started, make a copy of this dropbox folder containing the original data from the study, as well as the exercise coding files, on your local machine. We’ll walk through Parts A, B, C, and D together, and then ask you to complete Parts E and F.

A.1 Study context

The data you’ll be using in this exercise comes from a study conducted across 500 lower and upper primary government schools in two districts of Haryana, India, during the 2012–13 academic year. The intervention focused on improving foundational learning by testing two programs—CCE, a policy emphasizing regular assessments and student tracking, and TaRL, an instructional approach that tailors teaching to children’s learning levels. The randomized evaluation of the program was based on a factorial design, where schools were randomly assigned in equal proportions to one of four groups.

Treatment Group	Description
T1	Comparison group: Schools within a school-campus assigned to this group follow the standard government curriculum.
T2	CCE only: Schools within a school-campus assigned to this group recieve CCE only.
T3	TARL only: Schools within a school-campus assigned to this group receive TARL only.
T4	CCE + TARL: Schools within a school-campus assigned to this group receive both TARL and CCE.

A.2 Data description

The authors of the study collected data on student learning outcomes at baseline and endline, as well as teacher and school characteristics. While the real study conducted two experiments in lower and upper primary schools, this exercise will only use the lower primary school data, labeled as primary_cleaned.dta. The RCT evaluated the impact of the programs on four learning outcomes, but we’ll only be using one of the outcomes for this exercise: standardized oral Hindi test scores.

Toggle the codebook section below to see some of the main variables you’ll need.

🔍 Codebook

Data	Variable in `primary_cleaned.dta`
School campus	super_school_id
School	school_id
Original stratum	stratum
Female	female
Age (years)	age_years
Grade in 2011–12 school year	base_standard
Village (fictional)	village_id
Oral Hindi score at baseline (standardized)	base_aser_read_norm

A.3 Setting up your environment

Start by opening the do-file RST_rand_power_Stata.do. The following commands can be used at the beginning of your file to set up your environment. Each command does the following operations:

clear all removes all data, saved results, and stored programs from memory
set more off prevents Stata from pausing output
version 16 ensures your code runs under Stata 16’s syntax and behavior (note that in this case, your version of Stata must be version 16 or greater).

Install the below two packages, while we’ll leverage later in the exercise:

distinct is used to count the number of unique values in a variable
randtreat is used to generate random treatment assignment

You can find additional documentation for the randtreat command here, or run help randtreat in your Stata console.

Finally, set your main working directory to the location of folder you downloaded using with Stata’s global macro.

💡 Tips for working with global and local macros in Stata

Macros in Stata are shortcuts used to store and reuse information, such as numbers, text, or variable lists. Stata has two main types of macros: global and local.

Global macros are available everywhere in your Stata session (across do-files, programs, and commands) until you manually clear or overwrite them, and can be referenced using the $ sign. They are most useful for values or variables you need to repeatedly reference throughout your code.

global students = 100
display $students  // Use with a dollar sign

Local macros are available only within the block of code where they are defined (e.g., within a loop, a program, or a single do-file section). They can be referenced with a backtick and apostrophe. They are most useful for temporary or context-specific tasks like within loops or programs, and you won’t be reference them after you’ve executed the block of code where they were created.

local students = 100
display `students'  // Use with backticks and apostrophes

B. Introducing the power command

This section goes over how to use the built-in power command in Stata. The command can take on both continuous (power twomeans) and binary (power twoproportions) outcome variables.

B.1 Power calculations for individual-level designs

The power command in Stata estimates statistical power, minimum detectable effect (MDE), or sample size for a study based on user-specified inputs. It can take on the following parameters:

alpha() : Significance level (the default is set to 0.05)
power() : Power level (the default is set to 0.80)
n1() : Sample size in control group
n2() : Sample size in treatment group
sd() : Standard deviation of outcome of interest

See the dropdown below for how to include different parameters in the power command.

Sample code for calculating power, MDE, and sample size for individual-level designs

Goal	Code	Inputs
Calculate power	`power twomeans 100 120, n1(500) n2(500) sd(100)`	Control mean and estimated treatment mean, sample size, and control standard deviation
Calculate MDE	`power twomeans 100, n1(500) n2(500) sd(100) power(.8)`	Control mean, sample size, control standard deviation, power
Calculate sample size	`power twomeans 100 120, sd(100) power(.8)`	Control mean and estimated treatment mean, control standard deviation, power

The main idea is that you include every parameter in the power command, apart from the one you are solving for.

💡 Tips: Additional options for the power command

You can also use intervals to calculate power, MDE, or sample size under a range of assumptions. Stata will present the output as a table, or you can specify the graph option to see your results as a graph.

Example: Calculate power across a range of effect sizes. The below code will calculate the power for a range of endline treatment means, starting with .05 to .3, with an interval of .05.

power twomeans 0 (0.05(0.05)0.4), n(300)

Example: Add a graph to visualize the results:

power twomeans 0 (0.05(0.05)0.4), n(300) graph

B.2 Power calculations for cluster-level designs

The power command can also calculate power, MDE, and sample size for cluster-level designs by including any of the parameters below:

k1() : Number of clusters in the control group
k2() : Number of clusters in the treatment group
m1() : Average cluster size in the control group
m2() : Average cluster size in the treatment group
rho() : The intracluster correlation (ICC)

See the dropdown below for an overview.

Sample code for calculating power, MDE, and sample size for cluster-level designs

Goal	Code	Inputs
Calculate power	`power twomeans 100 120, m1(20) m2(20) k1(60) k2(60) rho(.1) sd(100)`	Control mean and estimated treatment mean, control and treatment average cluster size and number of clusters, and control ICC and standard deviation
Calculate MDE	`power twomeans 100, m1(20) m2(20) k1(60) k2(60) rho(.1) sd(100) power(.8)`	Control mean, average cluster size and number of clusters, control ICC and standard deviation, and power
Calculate number of clusters	`power twomeans 100 120, m1(20) m2(20) rho(.1) sd(100) power(.8)`	Control mean and estimated treatment mean, average cluster size, control ICC and standard deviation, and power
Calculate average cluster size	`power twomeans 100 120, k1(60) k2(60) rho(.1) sd(100) power(.8)`	Control mean and estimated treatment mean, number of clusters, control ICC and standard deviation, and power

💡 Tips on using the power command for cluster-level designs

Instead of specifying k2, you also have the option to specify the kratio(), i.e., the ratio of the number of clusters in the treatment group to the comparison group. For example, adding kratio(1) to the power command would indicate that the number of clusters in the treatment group and the control group is the same.

Similarly, you can specify an mratio(), i.e., the ratio of the average cluster size in the treatment group to the comparison group.

C. Introducing the randtreat command

The randtreat command is a user-created package that can conduct random assignment for complex experimental designs.

To randomize treatment assignment using the randtreat command, you must first set your data so that it is at the level at which you want to randomize. In the sample code below, we’ll randomize at the school level and stratify using the fictional village_id variable, and assign any misfits globally. In order to do that, we’ll use Stata’s preserve and restore functions to keep our original dataset intact before collapsing to the school level.

Sample code for randomizing treatment using the randtreat command

Remember that whenever you randomize, you should set a seed so that anyone who goes back to your coding file can replicate your exact randomization. The randtreat command has a built-in option to set the seed–in the code below, it is set to 12345.

* Randomize at the school-level
preserve
keep school_id village_id
duplicates drop
sort school_id

randtreat, generate(treatment) strata(village_id) ///
    misfits(global) multiple(2) setseed(12345) 
    
save "$temp/school_campus_treatment.dta", replace
restore

💡 Tips on using restore and preserve in Stata

The preserve and restore commands in Stata allow you to temporarily modify your dataset without losing the original version:

preserve: Saves the current state of your dataset in memory (including variables, observations, and sort order).
restore: Restores the dataset to the exact state it was in at the time of the most recent preserve.

This is useful when you want to make temporary changes (e.g., dropping variables, restricting the sample) for a specific calculation, but you don’t want those changes to affect the rest of your analysis.

The randtreat command also allows you to specify unequal allocation ratios. Suppose you have three groups (one comparison and two treatment groups), and you want to assign half of your sample to the comparison group, and split the remaining half of your sample betwen the two treatment groups. You can use the unequal() option to do so as in the sample code below.

Sample code for unequal treatment allocation

randtreat, generate(treatment) unequal(1/2 1/4 1/4) setseed(12345)

Since we’ve added three fractions, we don’t need to specify the number of treatment groups using multiple(). Note that the unequal() option only allows fractions, not decimals.

💡 What are misfits and how do I deal with them?

Misfits happen when, during the randomization process, the number of units in a group (like a stratum) doesn’t divide evenly across the treatment groups. For example, if you have 5 units and want to assign them evenly to two groups (treatment and comparison), one unit will be left over—this is a misfit.

In simple cases, such as random assignment to two groups without stratification, you can just randomly assign that leftover unit after dividing up the rest. But when working with multiple strata, you need to decide whether to balance treatment assignments within each stratum or across the whole sample.

The randtreat() command has several options for dealing with misfits. The strata option allocates misfits independently within each stratum to maintain treatment balance across strata. The global option allocates all misfits across the full sample to maintain overall treatment balance. Run help randtreat in your Stata console for more information.

Check out this paper by Carril (2017) for more details on dealing with misfits.

Your randomization created the treatment variable. Check it out by running tab treatment, m in your Stata console.

D. Estimating parameters from baseline data

In this section, we’ll walk through how to estimate parameters for power calculations from baseline data. Storing these estimates is helpful because it allows you to easily reference them later.

In Stata, estimates from commands like sum and regression commands (e.g., reg, areg) are automatically stored in return lists (e.g., r() and e()), and you can save specific values (like means, standard deviations, or coefficients) into local or global macros using commands as in the sample code below.

Sample code for storing the number of students in each group

sum variable
local sd = r(sd) // stores standard deviation

reg variable treatment 
local coef = _b[treatment] // stores coefficient

D.1 Sample size

To get a good idea of our sample, we’ll use the treatment variable you created earlier containing treatment assignments. To estimate the number of clusters (schools) and average cluster size (students within a school) in each group for our power calculations, we first want to calculate the total number of students. We can easily to do this with Stata’s count function by storing the returned results in global macros.

Sample code for storing the number of students in each group

* Total number of students
count if treatment == 0     // Comparison group
global students_c = r(N)
count if treatment == 1     // Treament group
global students_t = r(N)

We’ll then get the number of clusters in each group by using the distinct function we installed earlier. This function counts the number of unique observations of a variable, and since the data is at the student-level, we’ll use it to avoid double counting schools.

Sample code for storing the number of schools in each group

* Number of clusters
distinct school_id if treatment == 0        // Comparison group
global clusters_c = r(ndistinct)    
distinct school_id if treatment == 1        // Treatment group
global clusters_t = r(ndistinct)

To get the average cluster size (i.e., number of students within each school), we’ll divide the total number of students by the total number of schools in each group (rounding down to the nearest integer value).

Sample code for estimating and storing the average number of students within a school

* Number of clusters
global cluster_size_c = floor($students_c / $clusters_c)
global cluster_size_t = floor($students_t / $clusters_t)

If we’d like to check out the results, we can use Stata’s display command and reference the global macros we stored above.

display "The number of clusters in the comparison and treatment groups is $clusters_c and $clusters_t, respectively."
display "The average cluster size in the comparison and treatment groups is $cluster_size_c and $cluster_size_t, respectively."

D.2 Outcome variance and ICC

To produce reliable power estimates, we need to estimate two key parameters from our baseline data: an outcome variance and the intra-cluster correlation (in the case of cluster-level designs). For the outcome variance, rather than using the raw standard deviation of our outcome of interest, we use the residual standard deviation—which captures how much variation remains after accounting for covariates (and strata, if included in our randomization design).

Assuming the same randomization design as in Section C, the sample code below estimates the residual standard deviation of the outcome variable oral Hindi test scores, base_aser_read_norm, controlling for the covariate female and stratification at the village_id level. The predicted residuals from the regression are summarized, and their standard deviation is stored in a global macro res_sd for later use.

Sample code for estimating and storing the residual standard deviation

* Outcome variance, baseline covariates, and strata
areg base_aser_read_norm female, absorb(village_id) vce(cluster school_id)
predict res, res
sum res  
global res_sd = r(sd)
display "The residual standard deviation is $res_sd."

💡 What actually is the residual standard deviation?

The residual standard deviation is the standard deviation of the residuals—the differences between the observed values of the outcome and the values predicted by the regression model. It reflects the amount of variation in the outcome that remains unexplained after accounting for the included covariates.

Finally, based on the same randomization design, we’ll want to estimate the intra-cluster correlation coefficient (ICC), which measures how similar individuals are within clusters (in this case, schools). The loneway command In Stata uses a one-way ANOVA to calculate the proportion of total variation in the outcome that is attributable to differences between clusters, and we store that value for later use.

Sample code for estimating the ICC

* Intracluster correlation (ICC)
loneway base_aser_read_norm school_id                                                   
global icc = r(rho)
display "The intra-cluster correlation is $icc."

E. Power calculations in practice

In practice, preliminary power calculations are typically conducted before finalizing a randomization design to ensure your design is sufficiently powered to detect the expected effect size. In this section, you’ll conduct power calculations for a few scenarios based on the study data before settling on a final randomization design. Throughout the section, we’ll revisit the case study from Part A.

For each calculation, keep the power level set to .80 and the significance level set to .05.

💡 How do I conduct power calculations for multiple treatment arms?

Remember that for randomization designs with multiple treatment arms, you’ll want to conduct power calculations for each pairwise comparison you plan to analyze. For example, if your study includes a control group, a CCE group, and a TARL group, you should calculate power separately for the control vs. CCE comparison, control vs. TARL comparison, and (if relevant) CCE vs. TARL comparison (although in this case study, the authors did not power their study to detect a difference between the CCE and the TARL group).

If your treatment arms are of equal size, and you are using the same outcome and assumptions (e.g., same variance and ICC), the power calculations for each comparison will be identical, so you’ll only need to do one power calclation. However, if your allocation ratios, expected effects, or assumptions vary across comparisons, the power calculations may differ.

E.1 Varying the level of randomization

The researchers have onboarded you to their study and have asked you to help them conduct preliminary power calculations!

Suppose they believe there is a high risk of spillovers between schools within the same village—for example, through regular interactions among teachers, students, or parents at community events and meetings. To reduce the chance of contamination between treatment and comparison groups, they decide to randomize treatment assignment at the village level, assigning all schools within a village to the same group.

Instructions: Estimate the MDE for standardized oral Hindi test scores (i.e., baseline mean of 0, standard deviation of 1) for a pairwise comparison of the control group to any one of the treatment groups. Assume that the treatment and comparison groups have 15 villages each, and the average number of students within a cluster is 34. The ICC of oral Hindi test scores is .10.

📝 What is the smallest effect on test scores your study would be able to find with this design?

The minimum detectable effect is an increase in test scores of 0.33 standard deviations, which is quite a large effect size!

The researchers revisit their assumptions and find that while some interactions occur within villages, schools within the same school campuses operate more independently, with limited collaboration among teachers and little contact between parents of different school campuses. To increase statistical power while still limiting the risk of spillovers, they decide to randomize at the school campus level in order to increase the number of clusters.

Instructions: Estimate the MDE for standardized oral Hindi test scores (i.e., baseline mean of 0, standard deviation of 1) for a pairwise comparison of the control group to any one of the treatment groups. Assume that the number of school campuses in the comparison and treatment groups is 90 each, and the average number of students within a school campus is 34. Since you are randomizing at the school campus level and believe that students within a school campus are very similar to each other, you also increase the ICC to .11.

📝 What is the smallest effect on test scores your study would be able to find with this design?

The minimum detectable effect is an increase in test scores of 0.154 standard deviations.

E.2 Varying cluster size

The baseline data which the researchers are using was collected in 2011. Suppose they wanted to run their study several years later and were not confident on how the average number of students within schools would change, so they ask you to check what power would look like under a few scenarios.

Instructions: Estimate the MDE for standardized oral Hindi test scores (i.e., baseline mean of 0, standard deviation of 1) for a pairwise comparison of the control group to any one of the treatment groups. Assume that the treatment and comparison groups have 90 school campuses each, and vary the average number of students within a school from 15 to 40, with an interval of 5. Keep the ICC at .11.

Coding Hint

Remember that you can include intervals in the power command. For example, the code chunk below estimates the MDE for several different scenarios that differ by the cluster number, starting with a cluster number of 60, and increasing by 5 until reaching 90.

power twomeans 100, m1(20) mratio(1) k1(60(5)90) kratio(1) rho(.1) sd(100) power(.8)

The kratio(1) option specifies that the number of clusters is equal across the treatment and comparison group.

📝 How does the MDE vary based on the different average cluster sizes?

At the lowest cluster size of 15, the MDE of the study would be .17 standard deviations, while at the highest cluster size of 40, the MDE of the study would be .15 standard deviations.

E.3 Unequal treatment allocation

Suppose the researchers are worried they might face budget constraints that limit how many schools they can support with the interventions. To maximize statistical precision for comparing each treatment arm against the comparison group, they want to check whether assigning half of the sample of school campuses to the comparison group would significantly reduce their power. The remaining half would then be equally divided among the three treatment arms, reducing the overall cost of the interventions.

Instructions: Estimate the MDE for standardized oral Hindi test scores (i.e., baseline mean of 0, standard deviation of 1) for a pairwise comparison of the control group to any one of the treatment groups. Use the following parameters. Assume that half of school campuses (180) are in the comparison groupm, and each treatment group has 60 school campuses. Keep the average number of students at 34 and the ICC at .11. Compare this MDE to what your MDE would be under equal treatment allocation.

📝 What is the difference between the MDE when half of the sample is allocated to the control group compared to when treatment allocation is equal across the control and treatment groups?

Luckily, there is no difference! The MDE under both scenarios is equal to .15 standard deviations. However, this is not always the case. For example, shifting just a bit more of your sample to the comparison group from the treatment group as below results in almost .3 standard deviation increase in the MDE!

power twomeans 0, k1(200) k2(40) m1(34) mratio(1) rho(.11) sd(1) power(.8) alpha(.05)

E.4 Accounting for covariates in power calculations

As you saw earlier, covariates can play a large role in improving the precision of your estimates and increasing statistical power by explaining some of the variation in the outcome variable. To improve the precision of their power estimates, the researchers ask you to estimate the residual standard deviation of their main outcome variable, standardized oral Hindi test scores.

Instructions: Regress standardized oral Hindi test scores on students’ gender, grade at baseline, and age in years. Cluster standard errors at the school campus level. Use the predicted residuals to calculate the residual standard deviation.

📝 What is the residual standard deviation when accounting for the above covariates?

The residual standard deviation is .774. Remember since a standardized variable has a standard deviation of 1, accounting for covariates has reduced the unexplained variance in your outcome variable by more than 20%!

Instructions: Using the residual standard deviation you calculated above, estimate the MDE of standardized oral Hindi test scores for a pairwise comparison of the control group to any one of the treatment groups. Assume equal allocation of school campuses to the comparison and treatment groups (90 each), 34 students within a school campus on average, and an ICC of .11.

📝 What is the MDE you’d be able to detect when accounting for these covariates?

The MDE you’d be able to detect when using the residual standard deviation of .774 is .12 standard deviations.

E.5 Accounting for imperfect compliance

Thus far, you’ve assumed perfect compliance in your power calculations—that is, everyone assigned to a treatment group receives the treatment and everyone in the control group does not receive the treatment. In reality, this assumption often doesn’t hold.

Suppose the researchers want you to conduct sensitivity analyses for their power calculations in case some students in the treatment groups do not receive their assigned treatment due to implementation delays or attendance.

Instructions: Using the residual standard deviation you calculated earlier, estimate the MDE of standardized oral Hindi test scores for a pairwise comparison of the control group to any one of the treatment groups. Calculate the MDE under two different assumptions about treatment take-up: 70% and 90%. Assume equal allocation of school campuses to the treatment and control group (90 each), 34 students within a school campus on average, and an ICC of .11.

Hint

If take-up is imperfect, we can account for it in our power calculations by adjusting the minimum detectable effect (MDE). Specifically, when estimating the treatment-on-the-treated (TOT) effect, the MDE increases as take-up decreases. To approximate this, divide the MDE for the intention-to-treat (ITT) effect by the take-up rate.

For example, if your ITT MDE is 0.10 and you expect a 50% take-up rate, the implied MDE for the TOT effect is \[ \frac{0.10}{0.5} = 0.20 \]

This means that lower take-up reduces your study’s ability to detect a given treatment effect.

📝 What is the MDE when take-up in the treatment group is 70%?

Under take-up of 70%, the minimum detectable effect would be .17 standard deviations.

📝 What is the MDE when take-up in the treatment group is 90%?

Under take-up of 90%, the minimum detectable effect would be .13 standard deviations.

F. Randomization in practice

F.1 Randomize treatment assignment at the school campus level

Based on the preliminary calculations you did for the researchers, they decide to randomize treatment assignment to 4 groups at the school campus level. Given that your estimated MDE is not as low as they’d like it to be, they also decide to stratify treatment by average baseline test scores at the school-level, and have created the stratum variable for you to do so.

Instructions: Randomize treatment assignment of school campuses, super_school_id, to four groups of equal proportions. Stratify using the stratum variable and set your seed to 12345. Handle misfits locally.

💡 How were the strata created?

The stratum variable was constructed by first grouping schools by block and school type (lower, upper, or both). Within each group, school campuses were sorted by average baseline test scores and grouped into strata of four school campuses each to enable stratified random assignment across four treatment arms.

📝 How many students are assigned to each treatment group?

The comparison group has 3,137 students assigned to it, while the first, second, and third treatment groups have 3035, 3061, and 3343 students assigned to them, respectively.

F.2 Storing parameters

Now that you’ve conducted the randomization process, the researchers want you to do one last power analysis based on the actualized randomization design. But first, you’ll need to get estimates to input into your power calculation.

Instructions: Using global macros, estimate and store the number of school campuses assigned to the comparison group and the first treatment group (the CCE group). Then, estimate and store the total number of students in the comparison group and the first treatment group. Round to the lowest integer.

📝 What is the number of school campuses in the comparison group and first treatment group?

The number of school campuses in the comparison group is 91, and the number of school campuses in the first treatment group is 90.

Instructions: Using global macros, estimate and store the average number of students in each school campus in the comparison group and the first treatment group (the CCE group), rounding to the lowest integer.

📝 What is the average number of students in each school campus in the comparison group and first treatment group?

The average cluster size in the comparison group is 34, and the average cluster size in the first treatment group is 33.

In addition to accounting for covariates that can help explain the variance in oral Hindi test scores, the researchers also stratified random assignment by the stratum variable and want to reflect that in their final power calculation.

Instructions: Estimate and store the residual standard deviation of standardized oral Hindi test scores by running a regression that accounts for students’ gender, grade at baseline, and age in years. Be sure to include fixed effects for the original randomization strata, stratum and cluster standard errors at the school campus level.

Coding Hint

You can use i.stratum to include stratum fixed-effects in your regression model using reg, or if you prefer not to see the output of the fixed-effects, you can use areg instead, and add the option absorb(stratum).

📝 What is the residual standard deviation after accounting for the above covariates and stratification by the stratum variable?

The residual standard deviation is .774.

Instructions: Estimate and store the ICC of standardized oral Hindi test scores (base_aser_read_norm). Remember that you have randomized at the school campus level!

📝 What is the ICC of oral Hindi test scores?

The ICC of oral Hindi test scores is .097.

F.3 Final power calculations

Now that you have all of your estimates based on the baseline data, the researchers ask you to conduct one last power calculation to confirm the MDE that the study would be able to detect.

Instructions: Using your stored estimates, calculate the MDE of the study for oral Hindi test scores for one pair-wise comparison between the comparison group and first treatment group (the CCE group).

📝 What is the MDE of the study based on the selected randomization design?

The MDE of the study is .114 standard deviations.

G. Bonus: Baseline balance

Even with proper randomization, small differences between groups can occur due to chance. This is expected. However, large or systematic differences may suggest problems with the randomization process or the implementation of the assignment. Therefore, before implementing the randomization assignment in the field, the researchers want you to conduct balance checks to see whether the randomization resulted in any imbalance across key variables.

So, how do you check for baseline balance? Balance is typically checked by comparing the means of key baseline variables across treatment groups. This is often done using regression analysis or t-tests to see whether any differences are statistically significant.

💡 Do we always need to conduct balance tests?

Checking baseline balance is not always necessary, and in some cases may not be possible if baseline data was not collected. Check out this blog by David McKenzie on when testing for baseline balance might make sense.

Instructions: Store the variables female, age_years, base_standard, and base_aser_read_norm in a global and generate dummies for each treatment group using the treatment variable you created earlier. You should have three dummy variables for each treatment group: CCE, TARL, and CCE_TARL.

Coding Hint

To add multiple variables in a global, you can just list them one after the other as below within the quotation marks.

global outcomelist_base "female age_years base_standard base_aser_read_norm"

To create dummy variables for each treatment group, you can use Stata’s gen and replace functions to generate new variables. The dummy for the CCE group is created for you below.

gen CCE = . 
replace CCE = 1 if treatment == 1 
replace CCE = 0 if inlist(treatment, 0, 2, 3)

📝 In which treatment group is the school campus with the ID number 10?

The school campus with ID #10 is in the third treatment group.

In order to check baseline balance, you’ll need to run a linear regression of each baseline variable on indicators for each treatment group. You’ll also need to estimate and store the comparison group averages of the baseline variables so you can compare them to the treatment group!

Instructions: Use a foreach loop to go through every variable listed in the global macro you created. For each variable:

Run a regression of the outcome on the indicators CCE, TARL, CCE_TARL
Control for stratification by using the absorb(stratum) option if using areg, or by adding i.stratum to your reg command
Cluster standard errors at the super_school_id level using the cluster() option
Compute and store the mean and standard deviation of the outcome variable for the control group (i.e., observations where CCE == 0, TARL == 0, and CCE_TARL == 0).
Manually (with code) calculate the p-values for treatment coefficients
Append the results to a matrix

Coding Hint

The code below runs through each variable in the $outcomelist_base global, and runs a regression of the outcome on the dummy variables for each treatment group. It stores the comparison group mean and standard deviation for each variable, and then manually calculates the p-values from the regression. Finally, it stores the results in a matrix called balance_table.

* Loop through each outcome
foreach var in $outcomelist_base {

    * Run regression
    quietly areg `var' CCE TARL CCE_TARL, cluster(super_school_id) absorb(stratum)

    * Get control group mean
    quietly summarize `var' if e(sample)==1 & CCE == 0 & TARL == 0 & CCE_TARL == 0
    local control_mean = r(mean) 
      local control_sd = r(sd)

    * Get p-values
    local p1 = 2 * ttail(e(df_r), abs(_b[CCE] / _se[CCE]))
    local p2 = 2 * ttail(e(df_r), abs(_b[TARL] / _se[TARL]))
    local p3 = 2 * ttail(e(df_r), abs(_b[CCE_TARL] / _se[CCE_TARL]))

    * Append results to matrix
     mat balance_table = nullmat(balance_table) \ ///
        `control_mean', _b[CCE], _b[TARL], _b[CCE_TARL] \ ///
         `control_sd', `p1', `p2', `p3'
}

Instructions: Add column names and row labels to your matrix, and print it to your Stata console to see the results.

Coding Hint

You can add column names to the matrix using the line below.

mat colnames balance_table=control CCE_only TARL_only CCE_TARL

You can base the row labels off of the variable labels for simplicity as below.

local rowlabels
foreach var in $outcomelist_base {
    local rowlabels `rowlabels' `var'_mean `var'_sd
}

matrix rownames balance_table = `rowlabels'

Finally, to print your matrix, which contains the results from the balance checks, you can run the line below.

* Print balance table
matrix list balance_table

📝 Are any of the variables imbalanced across the comparison and treatment groups? How do you know?

Students in the CCE group scored, on average, 0.071 standard deviations lower at baseline on oral Hindi test scores compared to the pure control group (no CCE or TARL), and this difference is statistically significant at the 5% level. This suggests imperfect balance on this variable.

RST Power and Randomization Exercise

A. Introduction

A.1 Study context

A.2 Data description

A.3 Setting up your environment

B. Introducing the power command

B.1 Power calculations for individual-level designs

B.2 Power calculations for cluster-level designs

C. Introducing the randtreat command

D. Estimating parameters from baseline data

D.1 Sample size

D.2 Outcome variance and ICC

E. Power calculations in practice

E.1 Varying the level of randomization

E.2 Varying cluster size

E.3 Unequal treatment allocation

E.4 Accounting for covariates in power calculations

E.5 Accounting for imperfect compliance

F. Randomization in practice

F.1 Randomize treatment assignment at the school campus level

F.2 Storing parameters

F.3 Final power calculations

G. Bonus: Baseline balance