Lecture 3.1b

October 5, 2016

Multiple Regression

Example: mtcars

# Multiple regression
fit1 <- lm(mpg ~ hp + wt + disp, mtcars)

Multiple Regression:

## 
## Call: lm(formula = mpg ~ hp + wt + disp, data = mtcars)
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 37.105505   2.110815  17.579  < 1e-04 ***
## hp          -0.031157   0.011436  -2.724  0.01097 *  
## wt          -3.800891   1.066191  -3.565  0.00133 ** 
## disp        -0.000937   0.010350  -0.091  0.92851    
## 
## Residual standard error: 2.639 on 28 degrees of freedom
## Multiple R-squared:  0.8268, Adjusted R-squared:  0.8083 
## F-statistic: 44.57 on 3 and 28 DF,  p-value: < 1e-04

Hypothesis testing review (worth repeating):

Start with a null hypothesis (e.g. \(\beta_1 = 0\) or \(\bar{x_1} - \bar{x_2} = 0\)),
make assumptions about the distribution of the statistic, then
compare the statistic to what the math from our assumed distribution would do.
If, given our distribution, the observed statistic is very unlikely we reject the null hypothesis and (tentatively) accept the alternative.

Review: the Z-test

Recall z-tests from MTH 110. If you're told:

men's height is normally distributed with mean 70 inches and standard deviation 3 inches find the probability of meeting a 7 foot (84 inch) or taller man

print(100 - pnorm(84,mean = 70, sd = 3)*100) # in percentage terms

## [1] 0.0001530627

In hypothesis terms: "We've found a 7 foot tall man… is he a regular person? Regular people are, on average, 70 inches tall with s.d. of 3 inches. The probability of finding someone so tall is less than 1/1000% so maybe he isn't a regular person."

Student's t

Fewer degrees of freedom (red) means a wider spread… more randomness

Testing the slope coefficients:

Under \(H_0: \beta_i = 0\) and the assumption that our random variable (e.g. \(\beta_wt\)) is normally distributed, we have to use the t-test because we don't know the true (population) variance of this particular parameter.

Define:

\[t = \frac{\beta_i - \beta_{NULL}}{se(\beta_i)}\]

Where the standard error is our estimate of the variance of \(\beta_i\).

The null hypothesis:

Usually \(\beta_{NULL}\) is 0. We start with the assumption of randomness… sometimes \(y\) is above average and \(x\) has nothing to do with it.

Sometimes we might want to test another null hypothesis… For example, what hypothesis are we testing here?

library(UsingR) ; fit.galton <- lm(child ~ parent, galton)
t <- (summary(fit.galton)$coefficients[2,1] -
        1)/summary(fit.galton)$coefficients[2,2]
print(pt(t,926))

## [1] 1.703241e-17

Normality

Why should everything end up being normal?

Part of our ideal theoretical assumption is that our \(x_i\)'s are independent of one another, and therefore our \(\hat{u_i}\)'s are independent of one another.

Central Limit Theorem says that sampling distributions converges to a normal distribution as sample size grows.

Central Limit Theorem

Thought experiment:

x0 <- rbinom(1000,1,0.6) %>% as.data.frame()
x1 <- rbinom(1000,2,0.6) %>% as.data.frame()
x2 <- rbinom(1000,4,0.6) %>% as.data.frame()
x3 <- rbinom(1000,8,0.6) %>% as.data.frame()
x4 <- rbinom(1000,16,0.6) %>% as.data.frame()
x5 <- rbinom(1000,32,0.6) %>% as.data.frame()
x6 <- rbinom(1000,64,0.6) %>% as.data.frame()
x7 <- rbinom(1000,128,0.6) %>% as.data.frame()
x8 <- rbinom(1000,256,0.6) %>% as.data.frame()
x9 <- rbinom(1000,512,0.6) %>% as.data.frame()

What do the histograms look like?

Other examples:

Online demonstration

Testing slope coefficient significance:

print(summary(fit1),concise = TRUE)

## 
## Call: lm(formula = mpg ~ hp + wt + disp, data = mtcars)
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 37.105505   2.110815  17.579  < 1e-04 ***
## hp          -0.031157   0.011436  -2.724  0.01097 *  
## wt          -3.800891   1.066191  -3.565  0.00133 ** 
## disp        -0.000937   0.010350  -0.091  0.92851    
## 
## Residual standard error: 2.639 on 28 degrees of freedom
## Multiple R-squared:  0.8268, Adjusted R-squared:  0.8083 
## F-statistic: 44.57 on 3 and 28 DF,  p-value: 1e-04

Evaluating the fit:

g <- ggplot(data = data.frame(
  yhat = fit1$fitted.values,
  residuals = fit1$residuals),
  mapping = aes(x = yhat, y = residuals))
g + geom_point() + geom_smooth() + 
  geom_hline(yintercept = 0, color = "red")

What graph are we drawing here?

Residual plot:

The F-test

We care about the hypothesis \(\beta_i = 0\)
We also care about the joint hypothesis:

\[ \beta_0 = \beta_1 = ... = \beta_k = 0 \]

The first evaluates the impact of \(x_i\) on \(y\).
The second evaluates the impact of all of the \(x\) variables taken together on \(y\)

The F-distribution

If \(V_1\) and \(V_2\) are independent, \(\chi^2_{m_i}\) distributed random variables, then:

\[ F = \frac{V_1/m_1}{V_2/m_2} \sim F_{m_1,m_2}\]

First, let's explore the Chi Squared distribution

The \(\chi^2\) distribution

For a vector of independent random variables \(x_1, x_2, ... x_m\) that are standard normally distributed, then the sum of its squared values follows a Chi Squared distribution with \(m\) degrees of freedom.

\[ V = x_1^2 + x_2^2 + ... + x_m^2 \sim \chi^2_{df = m}\]

The probability that \(V\) is a very large number is low.

Why?

What is the probability of getting a slightly negative number?

The \(\chi^2\) distribution

\[ V = x_1^2 + x_2^2 + ... + x_m^2 \sim \chi^2_{df = m}\]

The probability that \(V\) is a specific number depends on \(m\).

Why?

The \(\chi^2\) distribution

X1 <- rnorm(800) # 800 random standard normal numbers
X2 <- rnorm(8) ; SS1 <- X1^2 %>% sum() ; SS2 <- X2^2 %>% sum()
print(SS1); print(SS2)

## [1] 822.8233

## [1] 6.224667

What are the probabilities of finding values this big?

## [1] 0.280326

## [1] 0.6220818

The \(\chi^2\) distribution

The F-distribution

If \(V_1\) and \(V_2\) are independent, \(\chi^2_{m_i}\) distributed random variables, then:

\[ F = \frac{V_1/m_1}{V_2/m_2} \sim F_{m_1,m_2}\]

Now that we understand how the \(V_i\)'s operate…

The expected value of a \(\chi^2_m\) distributed variable is \(m\) (the variance is \(2m\)), so \(V_1/m_1\) is showing the size of \(V_1\) relative to what we would expect.

The F-distribution

\[ F = \frac{V_1/m_1}{V_2/m_2} \sim F_{m_1,m_2}\]

So how should \(F\) behave? Let's find an example:

The F-distribution

## 
## Call:
## lm(formula = mpg ~ hp + wt + disp, data = mtcars)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -3.891 -1.640 -0.172  1.061  5.861 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 37.105505   2.110815  17.579  < 2e-16 ***
## hp          -0.031157   0.011436  -2.724  0.01097 *  
## wt          -3.800891   1.066191  -3.565  0.00133 ** 
## disp        -0.000937   0.010350  -0.091  0.92851    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.639 on 28 degrees of freedom
## Multiple R-squared:  0.8268, Adjusted R-squared:  0.8083 
## F-statistic: 44.57 on 3 and 28 DF,  p-value: 0

F-distribution

x <- seq(from = 0, to = 7, by = 0.1)
y <- df(x,df1=3,df2=28)
ggplot(as.data.frame(x=x,y=y),aes(x,y)) + geom_line() +
  ggtitle("F distribution with 3 and 28 degrees of freedom")

F-distribution

The F-distribution

\[ F = \frac{V_1/m_1}{V_2/m_2} \sim F_{m_1,m_2}\]

How is it behaving?

Not a lot of change with

How are we using the F-test?

Recall: we're not just worried about whether \(x_i\) appears to have an effect (holding constant the other \(x\)'s), but also whether all the \(x\) variables together have an impact.