October 11, 2016

Misc.

  • Don't forget to do your MindTap work!
  • There are still a couple 0's
  • There is a deadline coming up for Ch. 3
  • Practing is a much better way to learn than listening!

Other misc: Anscombe Quartet

Other misc: Anscombe Quartet

Other misc: Anscombe Quartet

Other misc: Anscombe Quartet

The F-test

  • We care about the hypothesis \(\beta_i = 0\)
  • We also care about the joint hypothesis:

\[ \beta_0 = \beta_1 = ... = \beta_k = 0 \]

  • The first evaluates the impact of \(x_i\) on \(y\).
  • The second evaluates the impact of all of the \(x\) variables taken together on \(y\)

The F-distribution

If \(V_1\) and \(V_2\) are independent, \(\chi^2_{m_i}\) distributed random variables, then:

\[ F = \frac{V_1/m_1}{V_2/m_2} \sim F_{m_1,m_2}\]

First, let's explore the Chi Squared distribution

The \(\chi^2\) distribution

For a vector of independent random variables \(x_1, x_2, ... x_m\) that are standard normally distributed, then the sum of its squared values follows a Chi Squared distribution with \(m\) degrees of freedom.

\[ V = x_1^2 + x_2^2 + ... + x_m^2 \sim \chi^2_{df = m}\]

The probability that \(V\) is a very large number is low because each \(x_i\) has a mean of 0.

Since each number is squared, \(V > 0\).

\(E(x_i^2) = 1\) so \(E(V) = m\).

The \(\chi^2\) distribution

X1 <- rnorm(800) # 800 random standard normal numbers
X2 <- rnorm(8) ; SS1 <- X1^2 %>% sum() ; SS2 <- X2^2 %>% sum()
print(SS1); print(SS2)
## [1] 805.7233
## [1] 3.754872

What are the probabilities of finding values this big?

## [1] 0.4366764
## [1] 0.8785349

The \(\chi^2\) distribution

The \(\chi^2\) distribution

The F-distribution

If \(V_1\) and \(V_2\) are independent, \(\chi^2_{m_i}\) distributed random variables, then:

\[ F = \frac{V_1/m_1}{V_2/m_2} \sim F_{m_1,m_2}\]

Now that we understand how the \(V_i\)'s operate…

The expected value of a \(\chi^2_m\) distributed variable is \(m\) (the variance is \(2m\)), so \(V_1/m_1\) is showing the size of \(V_1\) relative to what we would expect.

The F-distribution

\[ F = \frac{V_1/m_1}{V_2/m_2} \sim F_{m_1,m_2}\]

So how should \(F\) behave? Let's find an example:

The F-distribution

fit1 <- lm(mpg ~ hp + wt + disp, mtcars)
print(summary(fit1)$fstatistic)
##    value    numdf    dendf 
## 44.56552  3.00000 28.00000

F-distribution

x <- seq(from = 0, to = 7, by = 0.1)
y <- df(x,df1=3,df2=28)
ggplot(as.data.frame(x=x,y=y),aes(x,y)) + geom_line() +
  ggtitle("F distribution with 3 and 28 degrees of freedom")

F-distribution

F-distribution

F-distribution

F-distribution

The F-distribution

\[ F = \frac{V_1/m_1}{V_2/m_2} \sim F_{m_1,m_2}\]

If \(V_i\) is significantly different from what we would expect (\(E(V_i) = m_i)\) so it could be too big or too small), we will end up with an F number that is too big or too small.

How are we using the F-test?

Recall: we're not just worried about whether \(x_i\) appears to have an effect (holding constant the other \(x\)'s), but also whether all the \(x\) variables together have an impact.

In general, we can impose restrictions on our model, and test those restrictions.

Testing the restriction \(\beta_1 + \beta_2 + ... = 0\) constitutes a joint hypothesis test.

How are we using the F-test?

Most econometric programs automatically test the hypothesis that all of our slope coefficients are equal to 0. The F statistic is calculated as:

\[F \equiv \frac{(SSR_{r}-SSR_{ur})/q}{SSR_{ur}/(n-k-1)}\]

Where:

  • \(SSR_{ur}\) (\(SSR_{r}\)) is the sum of squared residuals for the unrestricted (restricted) model.
  • \(q\) is the number of restrictions (\(q = k\) for the hypothesis that none of our variables matter).

Notice that in both cases we're looking at the sum of squared variables divided by a normalizing number.

Recap of OLS assumptions for Multiple Regression

  1. Linear in parameters
  2. Random sampling
  3. No perfect multicollinearity
  4. Zero conditional mean (\(E(u|x) = 0\)) (i.e. correctly specified model)
  5. Homoskedasticity

1,2, and 4 should look familiar already.

Multicollinearity

Two variables are collinear when they are highly correlated. If \(x_1\) and \(x_2\) are perfectly correlated we are essentially estimating:

\[\hat{y} = \beta_0 + \beta_1 x_1 + \beta_1 x_1 + ... + \beta_k x_k + \hat{u}\]

This is impossible, and R will automatically drop one of the collinear variables.

Multicollinearity

On way to (accidentally) get perfect multicollinearity is the dummy variable trap.

\[\hat{wage} = \beta_0 + \beta_1 male + \beta_2 female + \beta_3 age + \hat{u}\]

How would we interpret \(\beta_1\) and \(\beta_2\)?

Multicollinearity

Another example could be the same variable measured in different units.

  • How much you complain about flying is certainly a function of how many inches tall you are, but
  • you can't figure out that effect if you try to hold constant your height in centimeters.

Multicollinearity in practice

In practice you'll often have two variables that are imperfectly collinear. In this situation your regression will probably be fine overall, but the coefficients on the collinear variables may be undependable.

Homoskedasticity

We face Heteroskedasticity when the variance of the error term isn't constant for different combinations of our \(x\) variables.

For example, the variance of the errors for this regression increases with the size of the bill.

Homoskedasticty

  • When we have homoskedasticity (same residual variance throughout) our estimates of standard error will be accurate.

  • When we have heteroskedasticity (residual variance that changes depending on the level of at least one of the \(x\) variables) our estimates of standard error will be inaccurate.
    • but our \(\beta\) estimates are still unbiased!

Later in the semester we will see ways to ameliorate heteroskedasticity.

Omitted variable bias

We're estimating the relationship between \(y\) and \(x\) while assuming that the Gauss-Markov conditions hold.

An important way they might not hold is if our error term isn't well behaved because we've omitted an important variable.

Omitted variable bias

If you estimate that a school's average SAT score increases by 20 points as per student spending increases by $100, can you conclude that richer schools are better schools?

Omitted variable bias

Richer schools probably also have richer parents. Richer parents are more likely to pay for test prep.

If you don't control for this, your estimate of the effects of school spending will be biased upward.

Overview of Chapter 3

Quick Review

  • Multiple linear regression let's us see the effect of \(x_i\) on \(y\) holding constant \(x_j\) (where \(x_j\) is any other variable you included in your model).

  • The model is linear in the parameters but we can approximate non-linear effects with the right (in)dependent variables (e.g. \(ln(y)\) or \(x^2\)).

Quick Review

  • Ordinary Least Squares (OLS) provides an estimate for slope parameters that represent the partial effect of a change in a variable (i.e. holding constant all the other independent variables)

  • We can measure how well our model fits the data by calculating the \(R^2\) which is a ratio showing how much of the total variation our model explains. A higher \(R^2\) is not always a good thing.

Quick Review

Gauss-Markov Assumptions (1-4):

  1. Linear in parameters
  2. Random sampling
  3. No perfect multicollinearity
  4. Zero conditional mean (\(E(u|x) = 0\))

If these assumptions hold, our OLS estimators are unbiased–they will tend to reflect the true relationship between our \(x\) variables and \(y\).

Number 4 implies that we've correctly specified the model so that (for example) \(u\) isn't capturing the effect of a variable that matters but we left out of the model.

  • Including irrelevant variables won't bias our estimates, but the model is likely to be less useful.

Quick Review

The fifth Gauss-Markov Assumption: homoskedasticity.

Without homoskedasticity, our estimates of the standard error become unreliable and it becomes harder to do hypothesis testing.

  • When all five assumptions hold, OLS estimators are BLUE
    • Best Linear Unbiased Estimator