October 5, 2016
Example: mtcars
# Multiple regression fit1 <- lm(mpg ~ hp + wt + disp, mtcars)
## ## Call: lm(formula = mpg ~ hp + wt + disp, data = mtcars) ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 37.105505 2.110815 17.579 < 1e-04 *** ## hp -0.031157 0.011436 -2.724 0.01097 * ## wt -3.800891 1.066191 -3.565 0.00133 ** ## disp -0.000937 0.010350 -0.091 0.92851 ## ## Residual standard error: 2.639 on 28 degrees of freedom ## Multiple R-squared: 0.8268, Adjusted R-squared: 0.8083 ## F-statistic: 44.57 on 3 and 28 DF, p-value: < 1e-04
Recall z-tests from MTH 110. If you're told:
men's height is normally distributed with mean 70 inches and standard deviation 3 inches find the probability of meeting a 7 foot (84 inch) or taller man
print(100 - pnorm(84,mean = 70, sd = 3)*100) # in percentage terms
## [1] 0.0001530627
In hypothesis terms: "We've found a 7 foot tall man… is he a regular person? Regular people are, on average, 70 inches tall with s.d. of 3 inches. The probability of finding someone so tall is less than 1/1000% so maybe he isn't a regular person."
Under \(H_0: \beta_i = 0\) and the assumption that our random variable (e.g. \(\beta_wt\)) is normally distributed, we have to use the t-test because we don't know the true (population) variance of this particular parameter.
Define:
\[t = \frac{\beta_i - \beta_{NULL}}{se(\beta_i)}\]
Where the standard error is our estimate of the variance of \(\beta_i\).
Usually \(\beta_{NULL}\) is 0. We start with the assumption of randomness… sometimes \(y\) is above average and \(x\) has nothing to do with it.
Sometimes we might want to test another null hypothesis… For example, what hypothesis are we testing here?
library(UsingR) ; fit.galton <- lm(child ~ parent, galton) t <- (summary(fit.galton)$coefficients[2,1] - 1)/summary(fit.galton)$coefficients[2,2] print(pt(t,926))
## [1] 1.703241e-17
Part of our ideal theoretical assumption is that our \(x_i\)'s are independent of one another, and therefore our \(\hat{u_i}\)'s are independent of one another.
Central Limit Theorem says that sampling distributions converges to a normal distribution as sample size grows.
Thought experiment:
x0 <- rbinom(1000,1,0.6) %>% as.data.frame() x1 <- rbinom(1000,2,0.6) %>% as.data.frame() x2 <- rbinom(1000,4,0.6) %>% as.data.frame() x3 <- rbinom(1000,8,0.6) %>% as.data.frame() x4 <- rbinom(1000,16,0.6) %>% as.data.frame() x5 <- rbinom(1000,32,0.6) %>% as.data.frame() x6 <- rbinom(1000,64,0.6) %>% as.data.frame() x7 <- rbinom(1000,128,0.6) %>% as.data.frame() x8 <- rbinom(1000,256,0.6) %>% as.data.frame() x9 <- rbinom(1000,512,0.6) %>% as.data.frame()
print(summary(fit1),concise = TRUE)
## ## Call: lm(formula = mpg ~ hp + wt + disp, data = mtcars) ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 37.105505 2.110815 17.579 < 1e-04 *** ## hp -0.031157 0.011436 -2.724 0.01097 * ## wt -3.800891 1.066191 -3.565 0.00133 ** ## disp -0.000937 0.010350 -0.091 0.92851 ## ## Residual standard error: 2.639 on 28 degrees of freedom ## Multiple R-squared: 0.8268, Adjusted R-squared: 0.8083 ## F-statistic: 44.57 on 3 and 28 DF, p-value: 1e-04
g <- ggplot(data = data.frame( yhat = fit1$fitted.values, residuals = fit1$residuals), mapping = aes(x = yhat, y = residuals)) g + geom_point() + geom_smooth() + geom_hline(yintercept = 0, color = "red")
What graph are we drawing here?
\[ \beta_0 = \beta_1 = ... = \beta_k = 0 \]
If \(V_1\) and \(V_2\) are independent, \(\chi^2_{m_i}\) distributed random variables, then:
\[ F = \frac{V_1/m_1}{V_2/m_2} \sim F_{m_1,m_2}\]
First, let's explore the Chi Squared distribution
For a vector of independent random variables \(x_1, x_2, ... x_m\) that are standard normally distributed, then the sum of its squared values follows a Chi Squared distribution with \(m\) degrees of freedom.
\[ V = x_1^2 + x_2^2 + ... + x_m^2 \sim \chi^2_{df = m}\]
The probability that \(V\) is a very large number is low.
Why?
What is the probability of getting a slightly negative number?
\[ V = x_1^2 + x_2^2 + ... + x_m^2 \sim \chi^2_{df = m}\]
The probability that \(V\) is a specific number depends on \(m\).
Why?
X1 <- rnorm(800) # 800 random standard normal numbers X2 <- rnorm(8) ; SS1 <- X1^2 %>% sum() ; SS2 <- X2^2 %>% sum() print(SS1); print(SS2)
## [1] 822.8233
## [1] 6.224667
What are the probabilities of finding values this big?
## [1] 0.280326
## [1] 0.6220818
If \(V_1\) and \(V_2\) are independent, \(\chi^2_{m_i}\) distributed random variables, then:
\[ F = \frac{V_1/m_1}{V_2/m_2} \sim F_{m_1,m_2}\]
Now that we understand how the \(V_i\)'s operate…
The expected value of a \(\chi^2_m\) distributed variable is \(m\) (the variance is \(2m\)), so \(V_1/m_1\) is showing the size of \(V_1\) relative to what we would expect.
\[ F = \frac{V_1/m_1}{V_2/m_2} \sim F_{m_1,m_2}\]
So how should \(F\) behave? Let's find an example:
## ## Call: ## lm(formula = mpg ~ hp + wt + disp, data = mtcars) ## ## Residuals: ## Min 1Q Median 3Q Max ## -3.891 -1.640 -0.172 1.061 5.861 ## ## Coefficients: ## Estimate Std. Error t value Pr(>|t|) ## (Intercept) 37.105505 2.110815 17.579 < 2e-16 *** ## hp -0.031157 0.011436 -2.724 0.01097 * ## wt -3.800891 1.066191 -3.565 0.00133 ** ## disp -0.000937 0.010350 -0.091 0.92851 ## --- ## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 ## ## Residual standard error: 2.639 on 28 degrees of freedom ## Multiple R-squared: 0.8268, Adjusted R-squared: 0.8083 ## F-statistic: 44.57 on 3 and 28 DF, p-value: 0
x <- seq(from = 0, to = 7, by = 0.1) y <- df(x,df1=3,df2=28) ggplot(as.data.frame(x=x,y=y),aes(x,y)) + geom_line() + ggtitle("F distribution with 3 and 28 degrees of freedom")
\[ F = \frac{V_1/m_1}{V_2/m_2} \sim F_{m_1,m_2}\]
How is it behaving?
Not a lot of change with
Recall: we're not just worried about whether \(x_i\) appears to have an effect (holding constant the other \(x\)'s), but also whether all the \(x\) variables together have an impact.