EC 524 Notes

Lecture 1 - Linear Regression Review

Regression: \(f(X)=E[Y|X]\)
- conditional expectation of Y given X
Classification: \(f(X)=Pr[Y=\text {label}|X]\)
- conditional probability that y takes on a given label, given X
why conditional expectations?
- \(E[Y|X]\) minimizes the mean squared error
- \(E[\epsilon |X]=0\) is uncorrelated with any function of X
- we have broken Y into a component explained by X, and another component that is orthogonal to X
linear regression goal: find the best linear approximation of \(E[Y|X]\) to minimize the mean squared error between prediction of Y and sum of actual values of Y observed at each point, estimated bt \(E[Y|X]=\alpha + \beta X\)

estimate linear regression using OLS, which finds the values of parameters to minimize prediction errors
- choose \(\alpha, \beta\) to minimize the Residual Sum of Squares (RSS) \[ RSS = \sum_{i=1}^{n} (y_i - \hat{y}_i)^2 \] where \[ \hat{\beta}=\frac {cov(x,y)}{var(X)} \]
key assumption behind OLS: \(E(\epsilon|X)=0\)
- re: the difference between X and Y is effectively random, and everything else in the world that explains Y (aside from X) is uncorrelated with X (aka no omitted variable bias!)
key assumption behind hypothesis tetsing in OLS: an individual’s error variance cannot tell me anything about another individual’s error variance
- no correlation of epsilon across individuals in our sample -> overestimation of the degree to which including X in your model explains the variation of Y

use “binscattering” in R to produce more readable figures when there is a lot of data
the underlying relationship stays the same, with the linear OLS estimation remaining constant across the original and binned data
easier to visualize whether the data should be modelled linearly, quadratically, etc.

linear regression is also a useful tool to compute group means
computing group means
- run regression sans intercept, hypothesis tests are meaningful for each bin