September 22, 2016

Sidenotes

Useful things for R and these slides

  • A really useful resource for learning R (and working with data more generally) is R for Data Science by Hadley Wickham.
  • To print multiple slides per page of these slides, use the print properties/preferences/advanced menu:

Recap: The Simple Regression

  • We are trying to draw a line of best fit two-dimensional data (\(x,y\))

  • We are assuming that there is a true relationship that we could uncover if we had the entire population,
    • but we only have a sample.

Recap: The Simple Regression

  • We want a process that lets us draw a line through the sample data that will be a good estimate of the true relationship + How do we get \(\hat{\beta}\)'s that reflect the true \(\beta\)'s?

  • Under ideal conditions we can get such a measure by drawing a line that minimizes the sum of squared residuals.

Assumptions of the Classical Linear Regression Model

Also AKA as: The "Gauss Markov" assumptions:

  • The model is linear in parameters
    • We can add together the effects of each variable (i.e. there aren't complex interdependencies that are impossible tosatisfactorily disentangle)
  • Random Sampling
    • Observations are independent of one another + \(cov(x_i,x_j) = 0\) for \(i \neq j\)

Assumptions of the Classical Linear Regression Model

  • Sample variation in explanatory variable
    • Variation in \(y\) can't be explained by \(x\) if it doesn't change.
  • Zero conditional mean
    • What we saw last time. Errors cancel out and are unpredictable.

Linear in the parameters

  • This just means that our model splits up the effects of each parameter that we're estimating (e.g. \(\hat{u},\hat{\beta_0},\hat{\beta_1}\)).

  • We can still capture non-linear effects by modifying \(x\) and \(y\) appropriately.

Random Sampling

  • If our sample isn't (effectively) random, it can introduce bias into our estimates.

Random Sampling

Example: estimating the relationship between the size of government budget and outcomes like life expectancy, literacy rate, child mortality, etc.

  • If we only sample countries with good statistics we'll pick up a lot of European countries, but might miss a lot of Central Asian and African countries.
    • In this case the problem is obvious so we just adjust our interpretation:
      • "\(X\) and \(Y\) are related in such-and-such a way for wealthy Northern/Western countries."
    • It won't always be so obvious. Beware of statistical monsters lurking in the shadows.

Variation in the explanatory variable

  • If we want to know how height affects wages, we can't answer our question by polling people at the "We're All 5'10" Club".
  • We can't learn about one person without seeing how they're different than other people.

Zero Conditional Mean

  • We've already seen this. tl;dr: If \(E(u|x) \neq 0\) then our estimated model is biased: it tells us to look for \(y\) in the wrong place.

Least Squares

Given our theoretical assumptions (the "Gauss-Markov assumptions" for the "Classical Linear Regression Model"), and restricting ourselves to linear models (which are much easier to deal with in general), there is a type of estimator that is unbiased and efficient (i.e. has lower variance than other candidate estimators).

The Best Linear Unbiased Estimator (BLUE) is the Ordinary Least Squares estimate.

Ordinary Least Squares

We will estimate \(\beta_0\) and \(\beta_1\) (the linear model) by drawing a line that minimizes the Sum of Squared Residuals (SSR).

\[SSR \equiv \sum_{i=1}^{n}(\hat{u_i})^2\]

Most Squares

The counterpart to the SSR (Sum of Squared Residuals or Residual Sum of Squares) is:

the Explained Sum of Squares (SSE).

\[SSE \equiv \sum_{i=1}^{n}(\hat{y_i} - \bar{y})^2\]

Confusing side note:

  • Sometimes we talk about Residual Sum of Squares, but some other people talk about Sum of Squared Errors.

  • Those people also talk about the Regression Sum of Squares when we're talking about the Explained Sum of Squares.

  • So sometimes SSR really means SSE and vice versa.

  • Also sometimes SSE is ESS.

  • Economists should take linguistics classes.

Unconfuse now:

  • SST = Total Sum of Squares
  • SSE = Explained Sum of Squares
  • SSR = Sum of Squared Residuals

All the Squares: Measures of Variation

The SSR + SSE together gives us the Total Sum of Squares (SST).

\[SST \equiv \sum_{i=1}^{n}(y_i - \bar{y})^2\]

\[SSE \equiv \sum_{i=1}^{n}(\hat{y_i} - \bar{y})^2\]

\[SSR \equiv \sum_{i=1}^{n}(\hat{u_i})^2\]

Total Sum of Squares

\[SST \equiv \sum_{i=1}^{n}(y_i - \bar{y})^2\]

  • Here we're looking at the difference between what we observe (\(y_i\)) and the average (\(\bar{y}\)).

  • In other words, how far off is \(y\) from its average value?

Total Sum of Squares

\[SST \equiv \sum_{i=1}^{n}(y_i - \bar{y})^2\]

  • Squaring the term means that we aren't making the mistake of adding negative and positive numbers and feeling good when they add up to 0.

  • Squaring also means that outliers are treated as more important than points that are close to average.

  • \(SST/(n-1)\) gives us the sample variance of \(y\).

Explained Sum of Squares

\[SSE \equiv \sum_{i=1}^{n}(\hat{y_i} - \bar{y})^2\]

Now we're comparing how far our estimated values of \(y\) (\(\hat{y_i}\)) are from average. Our estimates are explaining that \(y\) is sometimes above or below average because of the effects of \(x\).

Residual Sum of Squares

\[SSR \equiv \sum_{i=1}^{n}(\hat{u_i})^2\]

Finally, we're asking, how far off our estimates are from the actual observed values of \(y\). Another way of writing this is:

\[SSR \equiv \sum_{i=1}^{n}(y_i - \hat{y_i})^2\]

All together

How good is our line?

Our standard "goodness-of-fit" measure is \(R^2\) which is defined as

\[R^2 \equiv \frac{SSE}{SST} = 1 - \frac{SSR}{SST}\]

The closer \(R^2\) is to 1, the better our model fits the data. But we don't want to overfit.

What are the OLS estimators?

  • We might just draw a lot of lines until we find the one that minimizes SSR. But it turns out there's an easier solution (if our assumptions hold).
  • Lots of fancy calculus results in this estimator:

\[\hat{\beta_1} = \frac{\sum_{i=1}^{n}(x_i - \bar{x})(y_i-\bar{y})}{\sum_{i=1}^{n}(x_i - \bar{x})(x_i - \bar{x})} = \frac{\sum_{i=1}^{n}(x_i - \bar{x})(y_i-\bar{y})}{\sum_{i=1}^{n}(x_i - \bar{x})^2}\]

Which is the same as:

\[ \hat{\beta_1} = \frac{(n-1)cov(x,y)}{(n-1)var(x)}= \frac{cov(x,y)}{var(x)} \]

What are the OLS estimators?

Our line will go through the point \((\bar{x},\bar{y})\), so we can rearrange our model to find:

\[\hat{\beta_0} = \bar{y} - \hat{\beta_1} \bar{x}\]

What are the OLS estimators?

  • A more thorough derivation is available in Section 2-2 of the text, or online.