- A really useful resource for learning R (and working with data more generally) is R for Data Science by Hadley Wickham.
- To print multiple slides per page of these slides, use the print properties/preferences/advanced menu:
September 22, 2016
We are trying to draw a line of best fit two-dimensional data (\(x,y\))
We want a process that lets us draw a line through the sample data that will be a good estimate of the true relationship + How do we get \(\hat{\beta}\)'s that reflect the true \(\beta\)'s?
Under ideal conditions we can get such a measure by drawing a line that minimizes the sum of squared residuals.
Also AKA as: The "Gauss Markov" assumptions:
This just means that our model splits up the effects of each parameter that we're estimating (e.g. \(\hat{u},\hat{\beta_0},\hat{\beta_1}\)).
We can still capture non-linear effects by modifying \(x\) and \(y\) appropriately.
Example: estimating the relationship between the size of government budget and outcomes like life expectancy, literacy rate, child mortality, etc.
Given our theoretical assumptions (the "Gauss-Markov assumptions" for the "Classical Linear Regression Model"), and restricting ourselves to linear models (which are much easier to deal with in general), there is a type of estimator that is unbiased and efficient (i.e. has lower variance than other candidate estimators).
The Best Linear Unbiased Estimator (BLUE) is the Ordinary Least Squares estimate.
We will estimate \(\beta_0\) and \(\beta_1\) (the linear model) by drawing a line that minimizes the Sum of Squared Residuals (SSR).
\[SSR \equiv \sum_{i=1}^{n}(\hat{u_i})^2\]
The counterpart to the SSR (Sum of Squared Residuals or Residual Sum of Squares) is:
the Explained Sum of Squares (SSE).
\[SSE \equiv \sum_{i=1}^{n}(\hat{y_i} - \bar{y})^2\]
Sometimes we talk about Residual Sum of Squares, but some other people talk about Sum of Squared Errors.
Those people also talk about the Regression Sum of Squares when we're talking about the Explained Sum of Squares.
So sometimes SSR really means SSE and vice versa.
Also sometimes SSE is ESS.
Economists should take linguistics classes.
The SSR + SSE together gives us the Total Sum of Squares (SST).
\[SST \equiv \sum_{i=1}^{n}(y_i - \bar{y})^2\]
\[SSE \equiv \sum_{i=1}^{n}(\hat{y_i} - \bar{y})^2\]
\[SSR \equiv \sum_{i=1}^{n}(\hat{u_i})^2\]
\[SST \equiv \sum_{i=1}^{n}(y_i - \bar{y})^2\]
Here we're looking at the difference between what we observe (\(y_i\)) and the average (\(\bar{y}\)).
In other words, how far off is \(y\) from its average value?
\[SST \equiv \sum_{i=1}^{n}(y_i - \bar{y})^2\]
Squaring the term means that we aren't making the mistake of adding negative and positive numbers and feeling good when they add up to 0.
Squaring also means that outliers are treated as more important than points that are close to average.
\(SST/(n-1)\) gives us the sample variance of \(y\).
\[SSE \equiv \sum_{i=1}^{n}(\hat{y_i} - \bar{y})^2\]
Now we're comparing how far our estimated values of \(y\) (\(\hat{y_i}\)) are from average. Our estimates are explaining that \(y\) is sometimes above or below average because of the effects of \(x\).
\[SSR \equiv \sum_{i=1}^{n}(\hat{u_i})^2\]
Finally, we're asking, how far off our estimates are from the actual observed values of \(y\). Another way of writing this is:
\[SSR \equiv \sum_{i=1}^{n}(y_i - \hat{y_i})^2\]
Our standard "goodness-of-fit" measure is \(R^2\) which is defined as
\[R^2 \equiv \frac{SSE}{SST} = 1 - \frac{SSR}{SST}\]
The closer \(R^2\) is to 1, the better our model fits the data. But we don't want to overfit.
\[\hat{\beta_1} = \frac{\sum_{i=1}^{n}(x_i - \bar{x})(y_i-\bar{y})}{\sum_{i=1}^{n}(x_i - \bar{x})(x_i - \bar{x})} = \frac{\sum_{i=1}^{n}(x_i - \bar{x})(y_i-\bar{y})}{\sum_{i=1}^{n}(x_i - \bar{x})^2}\]
Which is the same as:
\[ \hat{\beta_1} = \frac{(n-1)cov(x,y)}{(n-1)var(x)}= \frac{cov(x,y)}{var(x)} \]
Our line will go through the point \((\bar{x},\bar{y})\), so we can rearrange our model to find:
\[\hat{\beta_0} = \bar{y} - \hat{\beta_1} \bar{x}\]