OLS Estimator: Detailed Derivation
POL 682: Linear Regression Analysis
1 Overview
Goal: The broad goal is to derive the OLS estimator. This means deriving the formulae for the slope and intercept in the equation, \[Y_i = a + b X_i + e_i\]
How do we figure out the means to estimate \(a\) (intercept) and \(b\) (slope) with real data?
Problem. There are an infinite number of lines that could fit a set of data points. How do we choose the “best” line?
Method: Minimize the sum of squared residuals. Minimize the distance between the observed values and values predicted by the equation (the regression line), i.e., the residuals
Starting Point: Sample Regression Function. Later we’ll infer to the population. Recall we start with the equation
\[Y_i = a + b X_i + e_i\]
where \(e_i = Y_i - a - b X_i\) is the residual
2 The Objective Function
An objective function is a mathematical expression that we want to minimize (or in 683,maximize). In OLS, our objective function is the sum of squared residuals (SSR)
In particular, we want to minimize
\[SSR = \sum_{i=1}^n e_i^2\]
Substituting \(e_i = Y_i - a - b X_i\):
\[SSR(a,b) = \sum_{i=1}^n (Y_i - a - b X_i)^2\]
The Puzzle and Solution: There are two unknowns: \(a\) and \(b\), and an infinite number of values we could apply to these variables.. But we have a goal, find values of \(a\) and \(b\) that minimize this function. If we can find those values, we have derived the OLS estimators
3 First-Order Conditions
Recall from POL 681, that if a function is convex or concave, we can always find the minimum or maximum by taking the first derivative, setting it equal to zero, and solving for the unknown(s). Thus, what we will do is minimize \(SSR(a,b)\), take partial derivatives and set to zero:
\[\frac{\partial SSR}{\partial a} = 0 \quad \text{and} \quad \frac{\partial SSR}{\partial b} = 0\]
These give us two equations (the normal equations) in two unknowns
Let’s move through the derivation for each parameter step-by-step
3.1 Deriving \(\frac{\partial SSR}{\partial a}\)
3.1.1 Step 1: Calculate the Partial Derivative
Start with:
\[\frac{\partial SSR}{\partial a} = \frac{\partial}{\partial a} \sum_{i=1}^n (Y_i - a - b X_i)^2\]
This is a case where we need to use the chain rule, since we have an inner function and an outer function. \[= \sum_{i=1}^n \frac{\partial}{\partial a}\left[(Y_i - a - b X_i)^2\right]\]
\[= \sum_{i=1}^n 2(Y_i - a - b X_i) \cdot \frac{\partial}{\partial a}(Y_i - a - b X_i)\]
Since \(\frac{\partial}{\partial a}(Y_i - a - b X_i) = -1\):
Therefore:
\[= \sum_{i=1}^n 2(Y_i - a - b X_i) \cdot (-1)\]
\[= -2 \sum_{i=1}^n (Y_i - a - b X_i)\]
3.1.2 Step 2: Set Equal to Zero
We’ve calcluated the partial derivative. Now we must set the equation to zero to solve for \(a\).
\[-2 \sum_{i=1}^n (Y_i - a - b X_i) = 0\]
Divide by \(-2\):
\[\sum_{i=1}^n (Y_i - a - b X_i) = 0\]
3.1.3 Step 3: Sum
Now, let’s expand the summatio.
\[\sum_{i=1}^n Y_i - \sum_{i=1}^n a - \sum_{i=1}^n b X_i = 0\]
Since \(a\) and \(b\) are constants, they don’t change (recall, we’re estimating a line, a constant effect observed as a slope and intercept). Thus,
\[\sum_{i=1}^n Y_i - na - b \sum_{i=1}^n X_i = 0\]
This is equation has a special name, the first normal equation. It corresponds to the intercept. From here, it’s a straightforward set of algebreic steps to solve for the intecept
3.2 Solving for \(a\)
3.2.1 Step 1: Isolate the Intercept
From the first normal equation:
\[\sum_{i=1}^n Y_i - na - b \sum_{i=1}^n X_i = 0\]
Rearrange to isolate \(a\):
\[na = \sum_{i=1}^n Y_i - b \sum_{i=1}^n X_i\]
Divide both sides by \(n\):
\[a = \frac{1}{n}\sum_{i=1}^n Y_i - b \cdot \frac{1}{n}\sum_{i=1}^n X_i\]
It’s important to note that \(\bar{Y} = \frac{1}{n}\sum_{i=1}^n Y_i\) and \(\bar{X} = \frac{1}{n}\sum_{i=1}^n X_i\):
\[\boxed{a = \bar{Y} - b\bar{X}}\]
Key Characteristic of the equation: The regression line passes through the point \((\bar{X}, \bar{Y})\)
Moving On:We’re halfway there. We still need to find \(b\). Fortunately, the process is the same.
3.3 Deriving \(\frac{\partial SSR}{\partial b}\)
3.3.1 Step 1: Take the Partial Derivative
Recall our original equation.
\[\frac{\partial}{\partial b} \sum_{i=1}^n (Y_i - a - b X_i)^2\]
Because we have an inner and outer function, apply the chain rule, but with respect to \(b\) this time:
\[= \sum_{i=1}^n 2(Y_i - a - b X_i) \cdot \frac{\partial}{\partial b}(Y_i - a - b X_i)\]
\[= \sum_{i=1}^n 2(Y_i - a - b X_i) \cdot (-X_i)\]
\[= -2 \sum_{i=1}^n X_i(Y_i - a - b X_i)\]
3.3.2 Step 2: Set Equal to Zero
Setting the derivative equal to zero:
\[-2 \sum_{i=1}^n X_i(Y_i - a - b X_i) = 0\]
Divide by \(-2\):
\[\sum_{i=1}^n X_i(Y_i - a - b X_i) = 0\]
Expand:
\[\sum_{i=1}^n X_i Y_i - a\sum_{i=1}^n X_i - b\sum_{i=1}^n X_i^2 = 0\]
This is the second normal equation, again formed simply by calculating the partial derivative and setting it equal to zero
3.4 Solving for \(b\)
3.4.1 Step 1: Substitute the Intercept Formula
From the second normal equation:
\[\sum_{i=1}^n X_i Y_i - a\sum_{i=1}^n X_i - b\sum_{i=1}^n X_i^2 = 0\]
Let’s substitute \(a = \bar{Y} - b\bar{X}\):
\[\sum_{i=1}^n X_i Y_i - (\bar{Y} - b\bar{X})\sum_{i=1}^n X_i - b\sum_{i=1}^n X_i^2 = 0\]
3.4.2 Step 2: Distribute and Simplify
Distribute:
\[\sum_{i=1}^n X_i Y_i - \bar{Y}\sum_{i=1}^n X_i + b\bar{X}\sum_{i=1}^n X_i - b\sum_{i=1}^n X_i^2 = 0\]
Note that \(\sum_{i=1}^n X_i = n\bar{X}\):
\[\sum_{i=1}^n X_i Y_i - \bar{Y} \cdot n\bar{X} + b\bar{X} \cdot n\bar{X} - b\sum_{i=1}^n X_i^2 = 0\]
\[\sum_{i=1}^n X_i Y_i - n\bar{X}\bar{Y} + bn\bar{X}^2 - b\sum_{i=1}^n X_i^2 = 0\]
3.4.3 Step 3: Solve for \(b\)
Collect terms with \(b\):
\[\sum_{i=1}^n X_i Y_i - n\bar{X}\bar{Y} = b\left(\sum_{i=1}^n X_i^2 - n\bar{X}^2\right)\]
Solve for \(b\):
\[\boxed{b = \frac{\sum_{i=1}^n X_i Y_i - n\bar{X}\bar{Y}}{\sum_{i=1}^n X_i^2 - n\bar{X}^2}}\]
This is the OLS estimator for the slope
4 Alternative Form Using Deviations
Define deviations from means:
\[x_i = X_i - \bar{X} \quad \text{and} \quad y_i = Y_i - \bar{Y}\]
Numerator:
\[\sum_{i=1}^n X_i Y_i - n\bar{X}\bar{Y} = \sum_{i=1}^n x_i y_i\]
Denominator:
\[\sum_{i=1}^n X_i^2 - n\bar{X}^2 = \sum_{i=1}^n x_i^2\]
Therefore:
\[\boxed{b = \frac{\sum_{i=1}^n x_i y_i}{\sum_{i=1}^n x_i^2}}\]
This is the deviation form of the OLS slope estimator
An Important Insight: The slope is the covariance between \(X\) and \(Y\) divided by the variance of \(X\)
\[b = \frac{Cov(X,Y)}{Var(X)}\]
4.1 Why?
Let’s see why these identies hold: \(\sum X_i Y_i - n\bar{X}\bar{Y} = \sum x_i y_i\):
\[\sum_{i=1}^n x_i y_i = \sum_{i=1}^n (X_i - \bar{X})(Y_i - \bar{Y})\]
Again, expand:
\[= \sum_{i=1}^n (X_i Y_i - X_i\bar{Y} - \bar{X}Y_i + \bar{X}\bar{Y})\]
\[= \sum_{i=1}^n X_i Y_i - \bar{Y}\sum_{i=1}^n X_i - \bar{X}\sum_{i=1}^n Y_i + n\bar{X}\bar{Y}\]
\[= \sum_{i=1}^n X_i Y_i - \bar{Y} \cdot n\bar{X} - \bar{X} \cdot n\bar{Y} + n\bar{X}\bar{Y}\]
\[= \sum_{i=1}^n X_i Y_i - n\bar{X}\bar{Y} - n\bar{X}\bar{Y} + n\bar{X}\bar{Y}\]
\[= \sum_{i=1}^n X_i Y_i - n\bar{X}\bar{Y}\]
And now, lets’ see how \(\sum X_i^2 - n\bar{X}^2 = \sum x_i^2\):
\[\sum_{i=1}^n x_i^2 = \sum_{i=1}^n (X_i - \bar{X})^2\]
Again expand:
\[= \sum_{i=1}^n (X_i^2 - 2X_i\bar{X} + \bar{X}^2)\]
\[= \sum_{i=1}^n X_i^2 - 2\bar{X}\sum_{i=1}^n X_i + n\bar{X}^2\]
\[= \sum_{i=1}^n X_i^2 - 2\bar{X} \cdot n\bar{X} + n\bar{X}^2\]
\[= \sum_{i=1}^n X_i^2 - 2n\bar{X}^2 + n\bar{X}^2\]
\[= \sum_{i=1}^n X_i^2 - n\bar{X}^2\]
5 The Two Normal Equations
First Normal Equation (from \(\frac{\partial SSR}{\partial a} = 0\)):
\[\sum_{i=1}^n Y_i = na + b\sum_{i=1}^n X_i\]
Second Normal Equation (from \(\frac{\partial SSR}{\partial b} = 0\)):
\[\sum_{i=1}^n X_i Y_i = a\sum_{i=1}^n X_i + b\sum_{i=1}^n X_i^2\]
These are two linear equations in two unknowns (\(a\) and \(b\))
6 Summary of OLS Formulas
Slope:
\[b = \frac{\sum_{i=1}^n (X_i - \bar{X})(Y_i - \bar{Y})}{\sum_{i=1}^n (X_i - \bar{X})^2} = \frac{\sum x_i y_i}{\sum x_i^2}\]
Intercept:
\[a = \bar{Y} - b\bar{X}\]
Fitted Values:
\[\hat{Y}_i = a + b X_i\]
Residuals:
\[e_i = Y_i - \hat{Y}_i\]
7 Concluding Remarks
- OLS minimizes the sum of squared residuals
- OLS is one way – of many – to “optimize” a regression line – i.e., find the best fiitting line.
- Derivation uses calculus (partial derivatives)
- Results in two normal equations
- Slope formula has intuitive interpretation: \(\frac{Cov(X,Y)}{Var(X)}\)
- Intercept ensures line passes through \((\bar{X}, \bar{Y})\)
- Residuals are orthogonal (uncorrelaed) to the predictor
- This derivation assumes a simple linear regression with one predictor using scalar algebra. With multiple variables, the math becomes more complex bordering on intractable. In that case, we use matrix algebra (covered in another chapter)