Weekly Discussion: Omitted Variable Bias

What is bias of an estimator?

Bias is the difference between the average estimated value and the true value:

\[ Bias(\hat{\beta}) = E[\hat{\beta}] - \beta \]

If bias = 0 → estimator is unbiased
If bias ≠ 0 → estimator is biased

Will bias go away if we increase sample size or add more variables?

No. OVB does not go away with more data, you just get a more precise bust still biased estimate.

Omitted Variable Bias Example Using `mtcars`

I am using the built-in mtcars dataset.

Variables

mpg: miles per gallon (dependent variable \(Y_i\))
wt: car weight (key independent variable \(X_i\))
hp: horsepower (omitted variable \(Z_i\))

Research Question

What is the effect of car weight on fuel efficiency?

Key independent variable: \[ X_i = wt_i \]

Omitted variable: \[ Z_i = hp_i \]

Full Model

\[ mpg_i = \beta_0 + \beta_1 wt_i + \beta_2 hp_i + u_i \]

Short Model

\[ mpg_i = \alpha_0 + \alpha_1 wt_i + v_i \]

OVB Formula

\[ Bias(\hat{\alpha}_1) = \beta_2 \cdot \frac{Cov(wt_i, hp_i)}{Var(wt_i)} \]

Two Conditions for OVB

Condition 1

\[ \beta_2 \neq 0 \]

Horsepower must affect mpg.

Condition 2

\[ Cov(wt_i, hp_i) \neq 0 \]

Horsepower must be correlated with weight.

Expected Direction of Bias

Horsepower reduces mpg → \(\beta_2 < 0\)
Weight and horsepower are positively correlated → \(Cov(wt, hp) > 0\)

Therefore: \[ Bias(\hat{\alpha}_1) < 0 \]

Bias is negative.

R Code

if (!require(stargazer)) install.packages("stargazer")
library(stargazer)

data(mtcars)

df <- mtcars[, c("mpg", "wt", "hp")]

Summary Statistics

summary(df)

##       mpg              wt              hp       
##  Min.   :10.40   Min.   :1.513   Min.   : 52.0  
##  1st Qu.:15.43   1st Qu.:2.581   1st Qu.: 96.5  
##  Median :19.20   Median :3.325   Median :123.0  
##  Mean   :20.09   Mean   :3.217   Mean   :146.7  
##  3rd Qu.:22.80   3rd Qu.:3.610   3rd Qu.:180.0  
##  Max.   :33.90   Max.   :5.424   Max.   :335.0

Check OVB Conditions

Condition 1: Correlation between hp and mpg

cor.test(df$hp, df$mpg)

## 
##  Pearson's product-moment correlation
## 
## data:  df$hp and df$mpg
## t = -6.7424, df = 30, p-value = 1.788e-07
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  -0.8852686 -0.5860994
## sample estimates:
##        cor 
## -0.7761684

Condition 2: Correlation between hp and wt

cor.test(df$hp, df$wt)

## 
##  Pearson's product-moment correlation
## 
## data:  df$hp and df$wt
## t = 4.7957, df = 30, p-value = 4.146e-05
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
##  0.4025113 0.8192573
## sample estimates:
##       cor 
## 0.6587479

Run Regressions

full_model <- lm(mpg ~ wt + hp, data = df)
short_model <- lm(mpg ~ wt, data = df)

summary(full_model)

## 
## Call:
## lm(formula = mpg ~ wt + hp, data = df)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -3.941 -1.600 -0.182  1.050  5.854 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 37.22727    1.59879  23.285  < 2e-16 ***
## wt          -3.87783    0.63273  -6.129 1.12e-06 ***
## hp          -0.03177    0.00903  -3.519  0.00145 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.593 on 29 degrees of freedom
## Multiple R-squared:  0.8268, Adjusted R-squared:  0.8148 
## F-statistic: 69.21 on 2 and 29 DF,  p-value: 9.109e-12

summary(short_model)

## 
## Call:
## lm(formula = mpg ~ wt, data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -4.5432 -2.3647 -0.1252  1.4096  6.8727 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  37.2851     1.8776  19.858  < 2e-16 ***
## wt           -5.3445     0.5591  -9.559 1.29e-10 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 3.046 on 30 degrees of freedom
## Multiple R-squared:  0.7528, Adjusted R-squared:  0.7446 
## F-statistic: 91.38 on 1 and 30 DF,  p-value: 1.294e-10

Side-by-Side Table

stargazer(short_model, full_model,
          type = "html",
          title = "Short vs Full Model",
          column.labels = c("Short Model", "Full Model"),
          dep.var.labels = "MPG",
          covariate.labels = c("Weight", "Horsepower"),
          digits = 3)

**Short vs Full Model**

	Dependent variable:

	MPG
	Short Model	Full Model
	(1)	(2)

Weight	-5.344^***	-3.878^***
	(0.559)	(0.633)

Horsepower		-0.032^***
		(0.009)

Constant	37.285^***	37.227^***
	(1.878)	(1.599)


Observations	32	32
R²	0.753	0.827
Adjusted R²	0.745	0.815
Residual Std. Error	3.046 (df = 30)	2.593 (df = 29)
F Statistic	91.375^*** (df = 1; 30)	69.211^*** (df = 2; 29)

Note:	p<0.1; p<0.05; p<0.01

Compare Coefficients

coef(short_model)["wt"]

##        wt 
## -5.344472

coef(full_model)["wt"]

##        wt 
## -3.877831

Interpretation

The comparison between the short and full models shows evidence of omitted variable bias because the estimated effect of weight changes once horsepower is included. In the short model, the coefficient on weight is more negative, suggesting that weight has a larger effect on reducing miles per gallon than it actually does. The full model, which includes horsepower, provides a more accurate estimate because it accounts for an important omitted variable. Since horsepower reduces fuel efficiency and is positively correlated with weight, leaving it out causes the weight coefficient to capture part of horsepower’s negative effect. As a result, the short model overstates the impact of weight. Intuitively, heavier cars tend to have more powerful engines, and those engines use more fuel, so if horsepower is ignored, weight ends up taking the blame for both effects and its estimated impact becomes too large in magnitude.

Conclusion

Full model: \[ mpg_i = \beta_0 + \beta_1 wt_i + \beta_2 hp_i + u_i \]

Short model: \[ mpg_i = \alpha_0 + \alpha_1 wt_i + v_i \]

Both OVB conditions are satisfied.

Direction of bias: \[ (-) \times (+) = (-) \]

Omitting horsepower causes negative bias in the estimated effect of weight.

Weekly Discussion: Omitted Variable Bias

Pawemi Kumwenda

2026-03-30

What is bias of an estimator?

Will bias go away if we increase sample size or add more variables?

Omitted Variable Bias Example Using `mtcars`

Variables

Research Question

Full Model

Short Model

OVB Formula

Two Conditions for OVB

Condition 1

Condition 2

Expected Direction of Bias

R Code

Summary Statistics

Check OVB Conditions

Condition 1: Correlation between hp and mpg

Condition 2: Correlation between hp and wt

Run Regressions

Side-by-Side Table

Compare Coefficients

Interpretation

Conclusion

Weekly Discussion: Omitted Variable Bias

Pawemi Kumwenda

2026-03-30

What is bias of an estimator?

Will bias go away if we increase sample size or add more variables?

Omitted Variable Bias Example Using mtcars

Variables

Research Question

Full Model

Short Model

OVB Formula

Two Conditions for OVB

Condition 1

Condition 2

Expected Direction of Bias

R Code

Summary Statistics

Check OVB Conditions

Condition 1: Correlation between hp and mpg

Condition 2: Correlation between hp and wt

Run Regressions

Side-by-Side Table

Compare Coefficients

Interpretation

Conclusion

Omitted Variable Bias Example Using `mtcars`