1 Statistical Properties of OLS Estimators

  • What are the properties of the distributions of \(\beta_0\) and \(\beta_1\) over different random samples from the population?

  • What are the expected values and variances of OLS estimators?

  • We will first examine finite sample properties: unbiasedness and efficiency. These are valid for any sample size n.

  • Recall that unbiasedness means that the mean of the sampling distribution of an estimator is equal to the unknown parameter value.

  • Efficiency is related to the variance of the estimators.

  • An estimator is said to be efficient if its variance is the smallest among a set of unbiased estimators.

2 Unbiasedness of OLS Estimators

We need the following assumptions for unbiasedness:

  1. (SLR.1) Model is linear in parameters: \(y = \beta_0 + \beta_1x + u\)

  2. (SLR.2) Random sampling: we have a random sample from the target population.

  3. (SLR.3) The variance of \(x\) must not be zero: \(\sum_{i=1}^n (x_i - \overline{x}\)

  4. (SLR.4) Zero conditional mean: \(E(u|x) = 0\). Since we have a random sample we can write:

\[ E(u_i|x_i) = 0, \; \forall i = 1,2,\cdots,n \]

THEOREM:

If all SLR.1-SLR.4 assumptions hold then OLS estimators are unbiased:

\[\begin{align} E(\hat{\beta}_0) &= \beta_0 \notag \\ E(\hat{\beta}_1) &= \beta_1 \notag \end{align}\]

Proof: (see Wooldridge, pp 43-44)

2.1 Notes on Unbiasedness

  • Unbiasedness is feature of the sampling distributions of \(\beta_0\) and \(\beta_1\) that are obtained via repeated random sampling.

  • As such, it does not say anything about the estimate that we obtain for a given sample. It is possible that we could obtain an estimate which is far from the true value.

  • Unbiasedness generally fails if any of the SLR assumptions fail.

  • SLR.2 needs to be relaxed for time series data. But there are ways that it cannot hold in cross-sectional data as well.

  • If SLR.4 fails then the OLS estimators will generally be biased. This is the most important issue in nonexperimental data. If \(x\) and \(u\) are correlated then we have biased estimators.

  • Spurious correlation: we find a relationship between \(y\) and \(x\) that is really due to other unobserved factors that affect \(y\)

2.2 Unbiasedness of OLS: A Simple Monte Carlo Experiment

  • Population model (DGP - Data Generating Process): \[ y = 1 + 0:5x + 2 × N(0; 1)\ \]

  • True parameter values are known: \(\beta_0 = 1, \beta_1 = 0:5, u = 2 × N(0; 1)\) (what is the variance of u?). N(0; 1) represents a random draw from the standard normal distribution.

  • The values of \(x\) are drawn from the Uniform distribution: \(x \sim 10 × Unif(0; 1)\)

  • Using random numbers we can generate artificial data sets. Then, for each data set we can apply the OLS method to find estimates.

  • After repeating these steps many times, say 1000, we would obtain 1000 slope and intercept estimates. Then we can analyze the sampling distribution of these estimates.

  • This is a simple example of Monte Carlo simulation experiment. These experiments may be useful in analyzing properties of estimators.

# Set the random seed
# So that we will obtain the same results 
# Otherwise, simulation results will change 
set.seed(1234567)

# set sample size 
n <- 50 
# the number of simulations
MCreps <- 10000

# set true parameters: betas and standard deviation of u
beta0 <- 1 
beta1 <- 0.5 
su <- 2

# initialize b0hat and b1hat to store results later:
b0hat <- numeric(MCreps)
b1hat <- numeric(MCreps)

# Draw a sample of x 
# this is going to be fixed in repeated samples 
x <- 10*runif(n,0,1)

# repeat MCreps times:
for(i in 1:MCreps) {
  print(i)
  # Draw a sample of y:
  u <- rnorm(n,0,su)
  y <- beta0 + beta1*x + u
  # estimate parameters by OLS and store them in the vectors
  bhat <- coefficients( lm(y~x) )
  b0hat[i] <- bhat["(Intercept)"]
  b1hat[i] <- bhat["x"]
}
# draw histogram and summary statistics
hist(b0hat)

summary(b0hat)
mean(b0hat)
sd(b0hat)

hist(b1hat)

summary(b1hat)
mean(b1hat)
sd(b1hat)

# smoothed histogram 
hist(b1hat, 
     freq = FALSE, 
     breaks=seq(0,1,0.025), 
     axes = FALSE, 
     main=expression("Sampling Distribution of b1hat"))

axis(1,at = seq(0,1,0.1),labels = TRUE,pos = 0)
axis(2,pos = 0)
lines(density(b1hat), lwd=2, col="blue")

hist(b0hat, 
     freq = FALSE, 
     breaks=seq(-2,4,0.1), 
     axes = FALSE, 
     main="Sampling Distribution of b0hat")

axis(1,at = seq(-1,3,1),labels = TRUE,pos = 0)
axis(2,pos = -2)
lines(density(b0hat), lwd=2, col="blue")

3 Variances of the OLS Estimators

  • Unbiasedness of OLS estimators, \(\beta_0\) and \(\beta_1\) is a feature about the center of the sampling distributions.

  • We should also know how far we can expect \(\hat{\beta}_1\) to be away from \(\beta_1\) on average.

  • In other words, we should know the sampling variation in OLS estimators in order to establish efficiency and to calculate standard errors.

  • SLR.5: Homoscedasticity (constant variance assumption): This says that the variance of \(u\) conditional on \(x\) is constant, \(var(u|x) = var(u) = \sigma^2\)

  • Assumptions SLR.4 and SLR.5 can be rewritten in terms of the conditional mean and variance of \(y\):

\[\begin{align} E(y|x) &= \beta_0 + \beta_1 x \notag \\ var(y|x) &= \sigma^2 \notag \end{align}\]

Simple Regression Model under Homoscedasticity

Simple Regression Model under Homoscedasticity

Simple Regression Model under Hoeteroscedasticity

Simple Regression Model under Hoeteroscedasticity

3.1 Sampling Variances of the OLS Estimators

  • Under assumptions SLR.1 through SLR.5:

\[\begin{align} Var(\hat{\beta}_0) &= \frac{\sigma^2}{\sum_{i=1}^n (x_i = \overline{x})^2} \notag \\ \text{and} \notag \\ Var(\hat{\beta}_1) &= \frac{\sigma^2 \sum_{i=1}^n x_i^2}{n \sum_{i=1}^n (x_i = \overline{x})^2} \notag \end{align}\]

  • These formulas are not valid under heteroscedasticity (if SLR.5 does not hold).

  • Sampling variances of OLS estimators increase with the error variance and decrease with the sampling variation in \(x\)

3.2 Estimating the Error Variance

  • We would like to find an unbiased estimator for \(\sigma^2\).

  • Since by assumption we have \(E(u^2) = \sigma^2\) an unbiased estimator is:

\[ \frac{1}{n}\sum_{i=1}^n u_i^2 \]

  • But we cannot use this because we do not observe \(u\). Replacing the errors with the residuals:

\[ \frac{1}{n}\sum_{i=1}^n \hat{u}_i^2 = \frac{SSE}{n} \]

  • However, this estimator is biased. We need to make degrees of freedom adjustment. Thus, the unbiased estimator is:

\[ \frac{1}{n}\sum_{i=1}^n \hat{u}_i^2 = \frac{SSE}{n-2} \]

  • degrees of freedom (df) = number of observations - number of parameters = \(n-2\)

3.3 Standard Errors of OLS estimators

  • The square root of the variance of the error term is called the standard error of the regression (SER) or more commonly referred to as Root Mean Square Error (RMSE):

\[ \hat{\sigma} = \sqrt{\frac{SSE}{n-2}} \] - Standard error of the OLS slope estimate can be written as:

\[ se(\hat{\beta}_1) = \frac{\hat{\sigma}}{\sqrt{{\sum_{i=1}^n} (x_i - \overline{x})^2}} = \frac{\hat{\sigma}}{s_x} \]

  • Standard errors summarize the uncertainty surrounding the coefficient estimates.

4 Regression Through the Origin

  • In some rare cases we want y = 0 when x = 0. For example, tax revenue is zero whenever income is zero.

  • We can redefine the simple regression model without the constant term as follows: \(\tilde{y} = \tilde{\beta}_1x\)

  • Using OLS principle

\[ min \sum_{i=1}^n (\tilde{y} - \tilde{\beta}_1x_i)^2 \]

  • First Order Condition:

\[ \sum_{i=1}^n x_i (\tilde{y} - \tilde{\beta}_1x_i) = 0 \]

  • Solving this we obtain the OLS estimator of the slope parameter: \[ \tilde{\beta}_1 = \frac{\sum_{i=1}^nx_iy_i}{\sum_{i=1}^n x_i^2} \]

  • For example,

# Regression through the origin:
res1 <- lm(salary ~ 0 + roe, data = ceosal1)
summary(res1)
## 
## Call:
## lm(formula = salary ~ 0 + roe, data = ceosal1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1697.4  -309.1   -34.3   459.2 13589.4 
## 
## Coefficients:
##     Estimate Std. Error t value Pr(>|t|)    
## roe   63.538      5.156   12.32   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1429 on 208 degrees of freedom
## Multiple R-squared:  0.422,  Adjusted R-squared:  0.4193 
## F-statistic: 151.9 on 1 and 208 DF,  p-value: < 2.2e-16
# Regression on a constant
res2 <- lm(salary ~ 1, data = ceosal1)
summary(res2)
## 
## Call:
## lm(formula = salary ~ 1, data = ceosal1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1058.1  -545.1  -242.1   125.9 13540.9 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  1281.12      94.93    13.5   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1372 on 208 degrees of freedom
#Full SLR
res3 <- lm(salary ~ roe, data = ceosal1)
summary(res3)
## 
## Call:
## lm(formula = salary ~ roe, data = ceosal1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1160.2  -526.0  -254.0   138.8 13499.9 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)   963.19     213.24   4.517 1.05e-05 ***
## roe            18.50      11.12   1.663   0.0978 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1367 on 207 degrees of freedom
## Multiple R-squared:  0.01319,    Adjusted R-squared:  0.008421 
## F-statistic: 2.767 on 1 and 207 DF,  p-value: 0.09777
plot(x= ceosal1$roe, 
     y = ceosal1$salary, 
     ylim = c(0,4000),
     xlab = "Return on equity",
     ylab = "CEO salary")
abline(res1,col="blue")
abline(res2,col="red")
abline(res3,col="black")

  • Obtaining an estimate of \(\beta_1\) using regression through the origin is not done very often in applied work, and for good reason: if the intercept \(\beta_0 \neq 0\), then \(\tilde{\beta}_1\) is a biased estimator of \(\beta_1\).