- Don't forget to do your MindTap work!
- There are still a couple 0's
- There is a deadline coming up for Ch. 3
- Practing is a much better way to learn than listening!
October 11, 2016
\[ \beta_0 = \beta_1 = ... = \beta_k = 0 \]
If \(V_1\) and \(V_2\) are independent, \(\chi^2_{m_i}\) distributed random variables, then:
\[ F = \frac{V_1/m_1}{V_2/m_2} \sim F_{m_1,m_2}\]
First, let's explore the Chi Squared distribution
For a vector of independent random variables \(x_1, x_2, ... x_m\) that are standard normally distributed, then the sum of its squared values follows a Chi Squared distribution with \(m\) degrees of freedom.
\[ V = x_1^2 + x_2^2 + ... + x_m^2 \sim \chi^2_{df = m}\]
The probability that \(V\) is a very large number is low because each \(x_i\) has a mean of 0.
Since each number is squared, \(V > 0\).
\(E(x_i^2) = 1\) so \(E(V) = m\).
X1 <- rnorm(800) # 800 random standard normal numbers X2 <- rnorm(8) ; SS1 <- X1^2 %>% sum() ; SS2 <- X2^2 %>% sum() print(SS1); print(SS2)
## [1] 805.7233
## [1] 3.754872
What are the probabilities of finding values this big?
## [1] 0.4366764
## [1] 0.8785349
If \(V_1\) and \(V_2\) are independent, \(\chi^2_{m_i}\) distributed random variables, then:
\[ F = \frac{V_1/m_1}{V_2/m_2} \sim F_{m_1,m_2}\]
Now that we understand how the \(V_i\)'s operate…
The expected value of a \(\chi^2_m\) distributed variable is \(m\) (the variance is \(2m\)), so \(V_1/m_1\) is showing the size of \(V_1\) relative to what we would expect.
\[ F = \frac{V_1/m_1}{V_2/m_2} \sim F_{m_1,m_2}\]
So how should \(F\) behave? Let's find an example:
fit1 <- lm(mpg ~ hp + wt + disp, mtcars) print(summary(fit1)$fstatistic)
## value numdf dendf ## 44.56552 3.00000 28.00000
x <- seq(from = 0, to = 7, by = 0.1) y <- df(x,df1=3,df2=28) ggplot(as.data.frame(x=x,y=y),aes(x,y)) + geom_line() + ggtitle("F distribution with 3 and 28 degrees of freedom")
\[ F = \frac{V_1/m_1}{V_2/m_2} \sim F_{m_1,m_2}\]
If \(V_i\) is significantly different from what we would expect (\(E(V_i) = m_i)\) so it could be too big or too small), we will end up with an F number that is too big or too small.
Recall: we're not just worried about whether \(x_i\) appears to have an effect (holding constant the other \(x\)'s), but also whether all the \(x\) variables together have an impact.
In general, we can impose restrictions on our model, and test those restrictions.
Testing the restriction \(\beta_1 + \beta_2 + ... = 0\) constitutes a joint hypothesis test.
Most econometric programs automatically test the hypothesis that all of our slope coefficients are equal to 0. The F statistic is calculated as:
\[F \equiv \frac{(SSR_{r}-SSR_{ur})/q}{SSR_{ur}/(n-k-1)}\]
Where:
Notice that in both cases we're looking at the sum of squared variables divided by a normalizing number.
1,2, and 4 should look familiar already.
Two variables are collinear when they are highly correlated. If \(x_1\) and \(x_2\) are perfectly correlated we are essentially estimating:
\[\hat{y} = \beta_0 + \beta_1 x_1 + \beta_1 x_1 + ... + \beta_k x_k + \hat{u}\]
This is impossible, and R will automatically drop one of the collinear variables.
On way to (accidentally) get perfect multicollinearity is the dummy variable trap.
\[\hat{wage} = \beta_0 + \beta_1 male + \beta_2 female + \beta_3 age + \hat{u}\]
How would we interpret \(\beta_1\) and \(\beta_2\)?
Another example could be the same variable measured in different units.
In practice you'll often have two variables that are imperfectly collinear. In this situation your regression will probably be fine overall, but the coefficients on the collinear variables may be undependable.
We face Heteroskedasticity when the variance of the error term isn't constant for different combinations of our \(x\) variables.
For example, the variance of the errors for this regression increases with the size of the bill.
When we have homoskedasticity (same residual variance throughout) our estimates of standard error will be accurate.
Later in the semester we will see ways to ameliorate heteroskedasticity.
We're estimating the relationship between \(y\) and \(x\) while assuming that the Gauss-Markov conditions hold.
An important way they might not hold is if our error term isn't well behaved because we've omitted an important variable.
If you estimate that a school's average SAT score increases by 20 points as per student spending increases by $100, can you conclude that richer schools are better schools?
Richer schools probably also have richer parents. Richer parents are more likely to pay for test prep.
If you don't control for this, your estimate of the effects of school spending will be biased upward.
Multiple linear regression let's us see the effect of \(x_i\) on \(y\) holding constant \(x_j\) (where \(x_j\) is any other variable you included in your model).
The model is linear in the parameters but we can approximate non-linear effects with the right (in)dependent variables (e.g. \(ln(y)\) or \(x^2\)).
Ordinary Least Squares (OLS) provides an estimate for slope parameters that represent the partial effect of a change in a variable (i.e. holding constant all the other independent variables)
We can measure how well our model fits the data by calculating the \(R^2\) which is a ratio showing how much of the total variation our model explains. A higher \(R^2\) is not always a good thing.
Gauss-Markov Assumptions (1-4):
If these assumptions hold, our OLS estimators are unbiased–they will tend to reflect the true relationship between our \(x\) variables and \(y\).
Number 4 implies that we've correctly specified the model so that (for example) \(u\) isn't capturing the effect of a variable that matters but we left out of the model.
The fifth Gauss-Markov Assumption: homoskedasticity.
Without homoskedasticity, our estimates of the standard error become unreliable and it becomes harder to do hypothesis testing.