This lecture covers the Gauss-Markov assumptions that underlie the Ordinary Least Squares (OLS) estimator.
We’ll explore these from both conceptual and mathematical perspectives
Some assumptions are critical for estimation
Others are critical for inference
Population vs. Sample Regression
Population Regression Function (PRF)
\[Y_i = \alpha + \beta X_i + \epsilon_i\]
Sample Regression Function (SRF)
\[Y_i = a + b X_i + e_i\]
We rarely observe the PRF (the exception being a census). Instead, we observe the SRF and use \(a\) and \(b\) as point estimates of \(\alpha\) and \(\beta\).
The Need for Assumptions
Because our estimates are subject to sampling error, point estimates should always accompany an indicator of uncertainty.
We can estimate \(Var(b)\) and \(Var(a)\)
This allows us to calculate standard errors, t-statistics, and confidence intervals
We can then use these to draw inferences
If we make some assumptions about the PRF, the OLS estimator has several desirable properties.
Assumption 1: Linearity
The PRF is linear in parameters.
\[Y_i = \alpha + \beta X_i + \epsilon_i\]
This assumption is necessary for estimation and we’ll used to demonstrate unbiasedness.
Assumption 2: Exogeneity
The Xs are exogenous and fixed.
The Xs are “given” and not determined within the model
The Xs are uncorrelated with the error term
\[cov(X_i, \epsilon_i) = 0\]
This assumption is necessary for estimation and is used to demonstrate unbiasedness.
Assumption 3: Zero Mean Error
Regardless of \(X_i\), the error has a zero mean: \(E(\epsilon_i) = 0\)
The variance around \(\hat{Y_i}\) is the same across all levels of \(X_i\).
This assumption is necessary for inference.
Assumption 5: Normality of Error
\[\epsilon_i \sim N(0, \sigma)\]
Then, \(Y_i \sim N(\alpha + \beta X_i, \sigma)\)
This is an extension of assumptions 3 and 4. If we assume normality, the OLS estimator has minimum variance among all unbiased estimators.
This assumption is necessary for inference.
Assumption 6: Independent Errors
\[cov(\epsilon_i, \epsilon_j) = 0, \quad \forall i \neq j\]
The ith error term is uncorrelated with the jth error term, for all \(i \neq j\).
This is the no autocorrelation assumption. With the exception of the relationship between \(Y\)s determined by \(X\), there is no residual relationship between \(Y\) variables.
This assumption is necessary for inference.
Assumptions 7-9
Assumption 7: Variation in \(X\)
Necessary for estimation
Assumption 8: Correct Specification
Correct functional form
Correctly included IVs
Necessary for estimation
Assumption 9: No Perfect Multicollinearity
Recall, the denominator is 0 if \(r_{x_1, x_2} = 1\)
Necessary for estimation
Summary of Assumptions
Assumption
Description
Required For
1. Linearity
PRF linear in parameters
Estimation
2. Exogeneity
\(cov(X_i, \epsilon_i) = 0\)
Estimation
3. Zero Mean
\(E(\epsilon_i) = 0\)
Inference
4. Homoskedasticity
\(var(\epsilon_i) = \sigma^2\)
Inference
5. Normality
\(\epsilon_i \sim N(0, \sigma)\)
Inference
6. Independence
\(cov(\epsilon_i, \epsilon_j) = 0\)
Inference
7. Variation in X
\(var(X) > 0\)
Estimation
8. Correct Specification
Functional form, IVs
Estimation
9. No Multicollinearity
\(r_{x_1,x_2} \neq 1\)
Estimation
The Gauss-Markov Theorem
The Gauss-Markov Theorem states:
The Ordinary Least Squares (OLS) estimator is the Best Linear Unbiased Estimator (OLS is BLUE) with minimum variance of all unbiased linear estimators.
Important: “Best” means something specific as a statistical characteristic, but does not mean ideal or advisable in all circumstances, even when the GM assumptions hold.
What BLUE Means
When the GM assumptions hold, we can demonstrate:
Linearity: The estimators are a linear function of \(Y_i\)
Unbiasedness: \(E(a) = \alpha\) and \(E(b_k) = \beta_k\)
Minimum Variance: Of all linear unbiased estimators, the OLS estimator will have minimum \(var(a)\) and \(var(b)\)
This is important for efficiency—the estimators have minimum sampling variance and smallest error.
This property is crucial for deriving the variance of \(b\).
Simulation: Verifying \(k_i\) Properties
Code
set.seed(42)library(MASS)n <-100mu <-c(5, 10) # means of X and YSigma <-matrix(c(4, 3, # variance of X = 4, covariance = 33, 9), # variance of Y = 9nrow =2)data <-mvrnorm(n, mu, Sigma)X <- data[, 1]Y <- data[, 2]head(data)
The inside of the expectation is a double summation over all \(i\) and \(j\). It forms a matrix of terms, each weighted by \(k_i k_j\) and involving the product \(\epsilon_i \epsilon_j\).
Applying Assumptions About Errors
We now use two assumptions about the error terms:
Assumption 4 (Homoskedasticity): \(var(\epsilon_i) = \sigma^2\) for all \(i\)