The Logic of Gold Standard

In Week 1, we established that the simple comparison of Treated vs. Untreated groups is biased because: \[\text{Bias} = E[Y_{0i} | D_i = 1] - E[Y_{0i} | D_i = 0] \neq 0\] (The treated group is different from the untreated group even before the intervention.)

The RCT Solution:

By assigning treatment (\(D\)) randomly (e.g., a lottery), we sever the link between a participant’s characteristics and their treatment status.

  • Rich or poor? Random chance decides treatment.

  • Motivated or lazy? Random chance decides treatment.

The Logic of Gold Standard

The Mathematical Result:

Because treatment is random, the potential outcomes are statistically independent of the treatment assignment. The two groups are balanced in expectation. \[E[Y_{0i} | D_i = 1] = E[Y_{0i} | D_i = 0]\] Therefore:\[\text{Selection Bias} = 0\] and the Simple Difference in Means (SDO) becomes the true Causal Effect.

Randomization Strategies

Not all RCTs look the same. The design depends on the unit of analysis, logistical constraints, and sample size.

A. Simple Randomization

  • Method: Flip a coin for every person. Everyone has an equal probability (e.g., 50/50) of treatment.

  • Pros: Easy to explain; statistically robust in large samples.

  • Cons: In small samples (\(N < 300\)), you might get “unlucky” imbalances (e.g., the treatment group happens to be 70% female).

Randomization Strategies

B. Stratified (Block) Randomization

  • Method: Divide the population into “strata” or blocks based on key characteristics (e.g., Gender, Region) before randomizing. Then, randomize within each block.

    • Example: Separate list into Men and Women. Randomize 50% of Men to Treatment, 50% of Women to Treatment.
  • Pros: Guarantees balance on key variables; increases statistical power by reducing variance.

  • Cons: You need baseline data before you can assign treatment.

Randomization Strategies

C. Cluster Randomization

  • Method: Randomize groups (clusters) rather than individuals.

    • Example: Randomize schools to receive a new curriculum, not individual students.
  • Why use it?

    • Spillovers: If you treat one student in a class, they might share the materials with their neighbor (Control), contaminating the experiment.

    • Logistics: It is impossible to paint half a classroom blue.

  • The Cost: Drastically reduces statistical power. (See Intracluster Correlation below).

Randomization Strategies

D. Phase-In (Pipeline) Design

  • Method: Everyone eventually gets the treatment, but the timing is randomized.

    • Group A: Treated in Year 1.

    • Group B: Treated in Year 2 (acts as Control for Year 1).

  • Pros: Ethically easier (everyone benefits eventually).

  • Cons: You lose your control group once Group B gets treated. Long-term effects are hard to measure.

Power Analysis: Designing the Sample

Before starting an RCT, you must answer: “How many people do I need?” This is Power Analysis.

If your sample is too small, you might fail to detect a positive impact even if it exists (Type II Error). This is called being “Underpowered.”

A. The Four Moving Parts

To calculate sample size (\(N\)), you need to define three parameters:

  • Significance Level (\(\alpha\)): usually set to 0.05 (5%).The risk of a False Positive (saying there is an impact when there isn’t).
  • Power (\(1 - \kappa\)): usually set to 0.80 (80%).The probability of correctly finding an effect if it truly exists.
  • Minimum Detectable Effect (MDE):The smallest impact you care about.Crucial Intuition: If you want to find a very small effect (e.g., a 1% increase in income), you need a massive sample size. If you are looking for a huge effect (e.g., curing a deadly disease), you need fewer people.

Power Analysis: Designing the Sample

B. The Inverse Square Rule

Sample size requirements grow with the square of the effect size.

  • To detect an effect that is half as big, you need four times as much data.\[N\propto \frac{1}{MDE^2}\]

C. The “Design Effect” (Cluster Penalty)

If you use Cluster Randomization (Strategy C), your effective sample size is not the number of people, but closer to the number of clusters.

  • The valid sample size decreases as the Intracluster Correlation (ICC) increases.

  • ICC: How similar people are within a cluster. (e.g., Students in the same class tend to have similar test scores). High similarity = Low information per new student.

Analysis Principles

A. The Balance Test

The first table in any RCT paper is the “Balance Table.” It compares the baseline characteristics of Treatment vs. Control.

  • Goal: To show no statistically significant differences.
  • Check: If \(p > 0.05\) for all variables (Age, Income, Education), the randomization worked.

Analysis Principles

B. Intent-to-Treat (ITT) vs. Treatment-on-Treated (TOT)

What happens if you assign someone to the “Gym Program” (\(D=1\)), but they never show up?

Intent-to-Treat (ITT):

  • Compare everyone assigned to treatment vs. everyone assigned to control.
  • Rule: “Once randomized, always analyzed.”
  • Pros: This is the only unbiased estimate because it preserves the randomization. It measures the impact of the policy offering.

Treatment-on-Treated (TOT):

  • Tries to measure the impact only on those who actually participated.
  • Warning: This re-introduces selection bias (compliers are different from non-compliers). We usually estimate this using Instrumental Variables (IV), where the assignment is the instrument for participation.

R Demo

Here is a complete R script designed for Checking Balance (proving randomization worked) and Power Analysis.

Setup You will need the pwr and ggplot2 packages.

# Install necessary packages (run once)
if(!require(pwr)) install.packages("pwr")
if(!require(ggplot2)) install.packages("ggplot2")
if(!require(dplyr)) install.packages("dplyr")

library(pwr)
library(ggplot2)
library(dplyr)

R Demo

Part 1: Simulating Randomization and The “Balance Test”

# 1. Create a Synthetic Population of 1,000 people
set.seed(123) # Set seed for reproducibility

N <- 1000
data <- data.frame(
  id = 1:N,
  # Background characteristics (Pre-treatment)
  age = rnorm(N, mean = 35, sd = 10),       # Average age 35
  income = rnorm(N, mean = 50000, sd = 15000), # Average income 50k
  motivation = runif(N, min=1, max=10)      # Motivation score 1-10
)

# 2. Perform Simple Randomization (Coin Flip)
# Assign 50% to Treatment (1) and 50% to Control (0)
data$treatment <- sample(c(0,1), N, replace = TRUE, prob = c(0.5, 0.5))

R Demo

Part 1: Simulating Randomization and The “Balance Test”

# 3. Check the Balance (The "Balance Table")
# Calculate means for both groups
balance_table <- data %>%
  group_by(treatment) %>%
  summarise(
    Avg_Age = mean(age),
    Avg_Income = mean(income),
    Avg_Motivation = mean(motivation),
    Count = n()
  )
print("--- Balance Table (Means) ---")
## [1] "--- Balance Table (Means) ---"
print(balance_table)
## # A tibble: 2 × 5
##   treatment Avg_Age Avg_Income Avg_Motivation Count
##       <dbl>   <dbl>      <dbl>          <dbl> <int>
## 1         0    35.2     50117.           5.32   493
## 2         1    35.1     51143.           5.48   507

R Demo

Part 1: Simulating Randomization and The “Balance Test”

# 4. Formal Statistical Test (t-test)
# If p > 0.05, we cannot reject the null hypothesis that groups are the same.
# (i.e., Randomization worked!)

t_test_income <- t.test(income ~ treatment, data = data)

print(paste("P-value for Income difference:", round(t_test_income$p.value, 3)))
## [1] "P-value for Income difference: 0.284"
if(t_test_income$p.value > 0.05) {
  print("Result: BALANCED. No statistically significant difference found.")
} else {
  print("Result: IMBALANCED. Significant difference found (Bad luck!).")
}
## [1] "Result: BALANCED. No statistically significant difference found."

Notice how close the means are in the balance_table. They won’t be identical, but they will be very close.

Key Insights

  • If the means of the Treatment and Control groups are “close,” it confirms that Randomization succeeded.

  • This is the practical proof that you have solved the Selection Bias problem.

  • The “Apples to Apples” Confirmation. Because the two groups look the same before the experiment starts, you can be confident that the only meaningful difference between them is the Treatment.

  • How “Close” is Close Enough?. We don’t just “eyeball” the numbers; we run a test.

    • If p > 0.05: The difference between the means is small enough that it could easily be just random noise. We consider the groups Balanced.

    • If p < 0.05: The difference is large enough that it is unlikely to be random. This suggests Randomization Failed (or you got incredibly unlucky), and your experiment might be biased.

R Demo

Part 2: Power Analysis Calculation

This section answers the question: “How many people do I need?”

Scenario:

  • We expect our program to increase income by $2,000.
  • The standard deviation of income is $15,000.
  • This is a small effect size.
# 1. Calculate Cohen's d (Standardized Effect Size)
effect_size_raw <- 2000
std_dev <- 15000
cohens_d <- effect_size_raw / std_dev

print(paste("Effect Size (Cohen's d):", round(cohens_d, 3)))
## [1] "Effect Size (Cohen's d): 0.133"

R Demo

Part 2: Power Analysis Calculation

# 2. Calculate Sample Size needed
# We want 80% Power (0.8) and 5% Significance Level (0.05)
power_calc <- pwr.t.test(
  d = cohens_d,
  sig.level = 0.05,
  power = 0.80,
  type = "two.sample",
  alternative = "two.sided"
)

R Demo

Part 2: Power Analysis Calculation

## 
##      Two-sample t test power calculation 
## 
##               n = 883.9582
##               d = 0.1333333
##       sig.level = 0.05
##           power = 0.8
##     alternative = two.sided
## 
## NOTE: n is number in *each* group
## [1] "You need 884 people in Treatment and 884 in Control."
## [1] "Total participants required: 1768"

R Demo

If you try to change the effect_size_raw from 2000 to 5000. The sample size will also change: drop.

## [1] "Effect Size (Cohen's d): 0.333"
## [1] "--- Sample Size Calculation ---"
## 
##      Two-sample t test power calculation 
## 
##               n = 142.2462
##               d = 0.3333333
##       sig.level = 0.05
##           power = 0.8
##     alternative = two.sided
## 
## NOTE: n is number in *each* group
## [1] "You need 143 people in Treatment and 143 in Control."
## [1] "Total participants required: 286"

R Demo

Part 3: Visualizing the “Power Curve”

This helps you visualize the trade-off: Smaller effects require massively larger samples.

# Generate a sequence of effect sizes (from tiny to huge)
effect_sizes <- seq(0.1, 0.8, by = 0.05)
sample_sizes <- c()

# Loop through to calculate N for each effect size
for (d in effect_sizes) {
  res <- pwr.t.test(d = d, sig.level = 0.05, power = 0.80)
  sample_sizes <- c(sample_sizes, res$n)
}

R Demo

Part 3: Visualizing the “Power Curve”

This helps you visualize the trade-off: Smaller effects require massively larger samples.

# Plot
plot_data <- data.frame(EffectSize = effect_sizes, SampleSize = sample_sizes)

a<-ggplot(plot_data, aes(x = EffectSize, y = SampleSize)) +
  geom_line(color = "blue", size = 1.2) +
  geom_point(color = "red", size = 3) +
  theme_minimal() +
  labs(
    title = "The Inverse Square Law of Power",
    subtitle = "As the effect you want to find gets smaller, sample size gets bigger.",
    x = "Effect Size (Cohen's d)",
    y = "Sample Size Required (Per Group)"
  )

R Demo

Part 3: Visualizing the “Power Curve”

Case Study 1 : 4Ps

Case Study: Evaluation of the 4Ps Program (CCT) To understand how an RCT works in practice, let’s look at the Philippines’ flagship social protection program: Pantawid Pamilyang Pilipino Program (4Ps).

A. The Policy Question

The government spends billions of pesos on 4Ps. Policymakers need to answer a simple causal question:

“Does giving cash to poor households cause their children to stay in school?”

Or would those children have gone to school anyway?

B. The Design (RCT)

We cannot simply compare 4Ps beneficiaries to non-beneficiaries (Observational Study).

Why? Because 4Ps targets the poorest of the poor. Their attendance rates are naturally lower than the general population due to poverty (Child Labor, lack of transpo fare).

The Bias: Comparing 4Ps families to regular families would suffer from Negative Selection Bias, making the program look like a failure.

Case Study 1 : 4Ps

The RCT Solution:

  • Population: Identify 2,000 eligible poor households in rural municipalities (e.g., Samar or Bukidnon) who qualify for the program.

  • Randomization: Use a lottery to divide them into two groups.

    • Treatment Group (\(N=1,000\)):Receives the monthly cash grant. Condition: Children must attend 85% of school days.

    • Control Group (\(N=1,000\)):Does not receive the cash during the study period (perhaps waitlisted for next year)

Case Study 1 : 4Ps

C. Checking the “Counterfactual” (The Balance Test)

Before the money is distributed, we must prove the lottery worked. We look at the Baseline Data.

Variable Treatment.Group..Group.A. Control.Group..Group.B. Difference
Avg. Daily Income ₱ 150.00 ₱ 148.50 Not Significant
Mother’s Education 6.2 Years 6.1 Years Not Significant
Baseline Attendance 78% 79% Not Significant
Distance to School 2.5 km 2.4 km Not Significant

Crucial Point:

Because the groups are statistically identical at the start (\(Y_{0}^{Treat} = Y_{0}^{Control}\)), any difference we see next year is purely due to the 4Ps cash grant. We have eliminated the confounding variables (like distance to school or mother’s education).

Case Study 1 : 4Ps

D. The Result

One year later, we measure the School Attendance Rate (\(Y\)).

  • Treatment Group Avg: 92%
  • Control Group Avg: 80%

\[\text{Impact} = 92\% - 80\% = +12\% \text{ points}\]

Conclusion:

The 4Ps program caused a 12 percentage point increase in attendance. We can be confident this wasn’t due to luck or selection bias.

Discussion Prompt

1. The Ethics of the Control Group

Isn’t it unethical to have a Control Group? We have identified poor households who need help, but we are deliberately withholding money from half of them just to prove a point. Isn’t that cruel?

Discussion Prompt

1. The Ethics of the Control Group

Isn’t it unethical to have a Control Group? We have identified poor households who need help, but we are deliberately withholding money from half of them just to prove a point. Isn’t that cruel?

The Argument of Scarcity (The “Budget Constraint” Reality)

  • Context: In the Philippines (and most developing nations), the government rarely has enough budget to cover 100% of the eligible population immediately.

  • The Logic: If DSWD has funds for only 1,000 families but 2,000 families are eligible, someone has to be left out regardless of the study.

  • The Ethical Pivot: Without an RCT, who gets the money? Often, it goes to those with political connections (“Palakasan”) or those living nearest to the road.

  • Conclusion: A Random Lottery is actually the fairest way to distribute scarce resources. Everyone has an equal chance. The RCT simply takes advantage of this necessary rationing.

Discussion Prompt

1. The Ethics of the Control Group

Isn’t it unethical to have a Control Group? We have identified poor households who need help, but we are deliberately withholding money from half of them just to prove a point. Isn’t that cruel?

The Argument of Uncertainty (Clinical Equipoise)

  • Context: We honestly don’t know if the program works.

  • The Logic: What if the program actually harms people? (e.g., What if the 4Ps requirements take time away from parents working, making the family poorer?).

  • Conclusion: It is unethical to spend millions of taxpayers’ money on a program that might not work. We have a moral obligation to test it rigorously before scaling it up nationwide.

Discussion Prompt

1. The Ethics of the Control Group

Isn’t it unethical to have a Control Group? We have identified poor households who need help, but we are deliberately withholding money from half of them just to prove a point. Isn’t that cruel?

The Solution: Phase-In (Stepped Wedge) Design

  • The Compromise: We don’t say “No” to the Control Group; we say “Not Yet.”

  • Implementation:

    • Year 1: Treatment Group gets cash. Control Group is on the “Waitlist.”

    • Year 2: Control Group starts receiving cash.

  • Result: Everyone eventually benefits, but we get one year of clean data to measure the impact.

Discussion Prompt

2. Unpacking the Mechanism (Income vs. Conditionality)

The Provocation:

“Okay, we saw attendance go up. But was it because of the Conditionality (the rule that they must attend school) or simply because the family had more Income (money for jeepney fare and lunch)?”

The Income Effect vs. The Substitution Effect

  • Income Effect: The family is richer. They can now afford shoes, uniforms, and food. Even without the rule, the child might have gone to school simply because the barriers were removed.

  • Substitution (Price) Effect: The conditionality changes the “price” of skipping school. If the child skips school, the family loses ₱500. Skipping school becomes expensive, so they substitute towards attending.

Discussion Prompt

2. Unpacking the Mechanism (Income vs. Conditionality)

The Provocation:

“Okay, we saw attendance go up. But was it because of the Conditionality (the rule that they must attend school) or simply because the family had more Income (money for jeepney fare and lunch)?”

Why does this distinction matter for Policy?

  • Administrative Cost: Monitoring attendance (checking teacher logbooks, verifying data) is expensive and bureaucratic.

  • The UCT Alternative: If the Income Effect is the main driver, we should switch to Unconditional Cash Transfers (UCT). We could just give the money with no strings attached. It would be cheaper to run and more dignified for the poor.

  • The CCT Argument: If the Substitution Effect is the main driver, then “Conditions” are necessary. If we remove the rules, parents might take the money but still send the child to work on the farm.

Discussion Prompt

2. Unpacking the Mechanism (Income vs. Conditionality)

The Provocation:

“Okay, we saw attendance go up. But was it because of the Conditionality (the rule that they must attend school) or simply because the family had more Income (money for jeepney fare and lunch)?”

The Design Solution: The 3-Arm RCT

To answer this, we need a more complex design than just Treatment vs. Control. We need Three Arms:

  • Group A (CCT): Cash + Conditions.
  • Group B (UCT): Cash + No Conditions.
  • Group C (Control): No Cash.

Analysis:

  • Compare A vs. C: Total impact of the current 4Ps.
  • Compare A vs. B: The specific impact of the conditionality.

CS 2: The “Green Thumb” Organic Fertilizer

Policy Context: Rice farmers in Central Luzon often rely heavily on expensive chemical fertilizers.

A development agency believes that switching to Organic Fertilizer (using compost and animal manure) will reduce costs and improve soil health, ultimately increasing Rice Yields (\(Y\)).

The Research Question: “Does attending a 3-day technical training on organic fertilizer use cause an increase in rice harvest yields?”

A. The Design (RCT)

The agency identifies 1,000 smallholder rice farmers.Randomization: A lottery is conducted.

  • Treatment Group (\(N=500\)): Invited to attend a free 3-day workshop on organic composting techniques.

  • Control Group (\(N=500\)): Does not receive the invitation (business as usual).

CS 2: The “Green Thumb” Organic Fertilizer

B. The Balance Check (Baseline)Before the training starts, we check if the lottery was fair.

Baseline Characteristics of Treatment and Control Groups
Variable Treatment.Group Control.Group Difference
Farm Size (Hectares) 1.5 ha 1.6 ha Not Significant
Years of Farming 12.3 Years 12.1 Years Not Significant
Previous Yield (kg/ha) 3,500 kg 3,550 kg Not Significant
Access to Irrigation 60% 58% Not Significant

Status: The groups are Balanced. We can proceed.

CS 2: The “Green Thumb” Organic Fertilizer

C. The Complication: “Non-Compliance”

Here is the reality of training programs: You can invite people, but you can’t force them to come.

  • Treatment Group: Out of 500 invited farmers, only 300 actually attended the training. The other 200 were too busy or not interested (“Non-Compliers”).

  • Control Group: 500 farmers. None attended (because they weren’t invited).

D. The Results (Harvest Time)

After the harvest, we measure the yields (\(Y\)) for everyone.

  • Avg Yield of ALL Invited Farmers (Treatment Group): 4,000 kg/haAvg - Yield of Control Group: 3,800 kg/ha

CS 2: The “Green Thumb” Organic Fertilizer

Analysis: ITT vs. TOT

The Intent-to-Treat (ITT) Estimate

We compare the original groups assigned by the lottery, regardless of whether they showed up. \[ITT = Y_{Treatment \ (Assigned)} - Y_{Control}\]\[ITT = 4,000 - 3,800 = \mathbf{+200 \ kg/ha}\] Interpretation: “Offering this training program to a population increases average yields by 200kg.”

Why use this? It is the most robust number because it preserves the randomization. It tells the government the impact of the Policy (the invitation).

CS 2: The “Green Thumb” Organic Fertilizer

Analysis: ITT vs. TOT

The Treatment-on-Treated (TOT) Estimate

You will naturally ask: “But what about the farmers who actually went? Surely their benefit was higher than 200kg?”

To find the impact on the participants (the “Compliers”), we adjust for the participation rate (60% or 0.6). \[TOT = \frac{ITT}{\% \ Compliance}\] \[TOT = \frac{200}{0.60} = \mathbf{+333 \ kg/ha}\] Interpretation: “For those who actually attended, the training increased yields by 333kg.

“Why use this? It tells us the Physiological/Technical efficacy of the curriculum itself.

Discussion Prompt

The “Lazy Farmer” Bias (Selection Bias returns!)

“Why can’t we just compare the 300 farmers who went to training against the 500 control farmers? Why do we have to include the 200 who stayed home in the Treatment average?”

Discussion Prompt

The “Lazy Farmer” Bias (Selection Bias returns!)

“Why can’t we just compare the 300 farmers who went to training against the 500 control farmers? Why do we have to include the 200 who stayed home in the Treatment average?”

Answer:

  • The 300 who showed up are likely the most motivated, hardworking, and modern farmers. The 200 who stayed home might be older or less motivated.

  • If you drop the 200 “lazy” ones from the Treatment group but keep the “lazy” equivalents in the Control group, you re-introduce Positive Selection Bias. The Treatment group becomes “super-selected,” and your results are fake. The ITT keeps the “lazy” people in both groups to maintain the “Apples to Apples” comparison.

Discussion Prompt

Spillovers (The “Marites” Effect)

“What if a farmer in the Treatment group learns the technique and tells his neighbor who is in the Control group?”

Discussion Prompt

Spillovers (The “Marites” Effect)

“What if a farmer in the Treatment group learns the technique and tells his neighbor who is in the Control group?”

Context: In farming communities, neighbors talk.

Consequence: The Control group starts using organic fertilizer too! Their yields go up.

Result: The difference between Treatment and Control (\(T - C\)) gets smaller. The spillover underestimates the true impact of the technology.

Solution: This is why we should have used Cluster Randomization (randomize by Village/Barangay) instead of by individual farmer.

Using t-test and Regression

In an impact assessment using a Randomized Controlled Trial (RCT), the mean difference is the estimate of impact, while the t-test (or regression) is used to assess statistical significance.

We use the the Average Treatment Effect (ATE) to estimate the impact.

Why a t-test or Regression?

We cannot rely on means alone without quantifying uncertainty. If we want to answer the question: Is the observed difference in means likely due to the intervention rather than random chance?, we must perform statistical inference.

In practice, economists and data scientists rarely use the t-test command. Instead, they run a Linear Regression (OLS).This is because OLS is a much more flexible tool which allows for the integration of other socio-demographic variables (confounders) in the analysis.

T-test

The t-test is the formal statistical way to compare two groups. It calculates a “Signal-to-Noise” ratio.

Signal (Numerator): The size of the difference (\(ATE\)). Noise (Denominator): The variation (Standard Error) within the data. \[t = \frac{\text{Signal}}{\text{Noise}} = \frac{\bar{Y}_T - \bar{Y}_C}{\text{Standard Error}}\] If the Signal is much stronger than the Noise (\(t > 1.96\)), we can say the result is Statistically Significant (p < 0.05).

The OLS Regression (The Professional Standard)

In practice, economists and data scientists rarely use the t-test command. Instead, they run a Linear Regression (OLS).

Surprisingly, Regression on a binary variable is mathematically identical to a t-test. It gives you the exact same p-value and ATE, but it is more flexible.

The Equation: To estimate the impact, we run this regression equation: \[Y_i = \alpha + \beta D_i + \epsilon_i\]

Where:

  • \(Y_i\): The outcome (e.g., Rice Yield).
  • \(D_i\): The Treatment Dummy (1 if Treated, 0 if Control).
  • \(\epsilon_i\): The error term.

The OLS Regression

How to Interpret the Coefficients:

This is a favorite exam question. The regression output tells you everything you need to know:

  • \(\alpha\) (Intercept): This is the average outcome of the Control Group.

    • Why? When \(D=0\), the equation becomes \(Y = \alpha\).
  • \(\beta\) (Slope Coefficient): This is the ATE.

    • Why? When \(D\) goes from 0 to 1, \(Y\) increases by \(\beta\).
    • \(\beta = \bar{Y}_{Treated} - \bar{Y}_{Control}\)

The OLS Regression

Why use Regression instead of a simple t-test?

Reason: “Regression Adjustment” (Precision) In an RCT, you can add “Control Variables” (\(X\)) to your regression to make your estimate more precise. \[Y_i = \alpha + \beta D_i + \gamma X_i + \epsilon_i\]

Example: In the Rice Yield experiment, we know that “Farm Size” affects yield regardless of the fertilizer.

The Logic: By adding Farm Size as a control variable (covariate/confounder), the regression “soaks up” the variance explained by farm size. This shrinks the Standard Error (Noise).

The Result: A smaller Standard Error means a higher t-statistic. You are more likely to find a statistically significant result if you use regression with controls.

Adding Covariates

You might ask: Wait, didn’t you say Randomization already solves the bias? Why do we need to control for Farm Size, or Age, or Sex?

The Answer: We don’t do it to fix Bias; we do it to improve Precision.

The Logic: Reducing the Noise

Imagine you are trying to hear a Causal Signal (the treatment effect) in a noisy room.

  • Randomization ensures the signal is not distorted (unbiased).

  • Adding Controls turns down the background noise (variance).

In our Rice Yield example, we know that Farm Size has a huge effect on total harvest, regardless of whether the farmer attended training.

  • If we don’t include Farm Size, the regression sees huge swings in yield and thinks, “Wow, this data is messy! I can’t be sure if the training worked.” (High Standard Error).

  • If we do include Farm Size, the regression says, “Oh, I see! That variance is just because of the land area. I will ignore that part and focus only on the training.” (Low Standard Error).

Adding Covariates

The Consequence: Adding relevant controls (like Farm Size, Age, Sex, Education) lowers your Standard Error, which increases your t-statistic. It makes it easier to get a statistically significant result (p < 0.05).

Comparison of Methods for Estimating Impact in RCTs
Method What.it.gives.you When.to.use.it
Simple Subtraction The magnitude of impact (ATE). For quick intuition / estimate magnitude of impact.
T-Test The ATE + the p-value (statistical significance). When comparing two groups with no other covariates.
OLS Regression The ATE + significance + ability to add controls. The gold standard. Use this for final analysis.