Before we start anything we need to read in the data and create our model.
Galton <- read.csv("http://cknudson.com/data/Galton.csv")
attach(Galton)
mod1<-lm(Height ~ FatherHeight+Gender, data = Galton)
summary(mod1)
##
## Call:
## lm(formula = Height ~ FatherHeight + Gender, data = Galton)
##
## Residuals:
## Min 1Q Median 3Q Max
## -9.3708 -1.4808 0.0192 1.5616 9.4153
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 34.46113 2.13628 16.13 <2e-16 ***
## FatherHeight 0.42782 0.03079 13.90 <2e-16 ***
## GenderM 5.17604 0.15211 34.03 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.277 on 895 degrees of freedom
## Multiple R-squared: 0.5971, Adjusted R-squared: 0.5962
## F-statistic: 663.2 on 2 and 895 DF, p-value: < 2.2e-16
Today we learned how to test the significance of a single predictor variable to see if it is related to our response variable. We use a hypothesis test to do this. Our null hypothesis is that the Father’s Height coefficient is equal to 0 with our alternative hypothesis being that the Father’s Height coefficient is not equal to 0. To find our p-value we run a T-test with n-(k+1) degrees of freedom. Our test statistic is \[b_j/s_b\].
tstat <- coef(summary(mod1))[2,1]/coef(summary(mod1))[2,2]
tstat
## [1] 13.89704
Now we can use our test statistic and the df of 895 that we found for our model to run our t-test.
2*pt(tstat, 895, lower.tail=FALSE)
## [1] 6.693554e-40
Here we can see our P-value is much smaller than our .05 level of significance so this tells us to reject our null hypothesis and that our predictor variable of FatherHeight has a significant relationship with our response variable of Height. This also means that for every inch the father’s height increases, the response person’s height will increase .42782 inches with all other variables held constant.
Another thing to look at is the confidence intervals for our coefficients.
confint(mod1)
## 2.5 % 97.5 %
## (Intercept) 30.2684263 38.6538353
## FatherHeight 0.3674023 0.4882411
## GenderM 4.8775165 5.4745684
We can interpret this output to mean: for every inch the father’s height increases, we are 95% confident that the response person’s height will increase between 0.3674023 and 0.4882411 inches with all other variables held constant.
Finally we did confidence intervals and prediction intervals for a multiple regression model. I decided to run intervals using a father’s height of 73 and the gender of Male.
ndata <- data.frame(FatherHeight = 73, Gender="M")
predI <- predict(mod1, ndata, interval="predict")
confI <- predict(mod1, ndata, interval="confidence")
confI
## fit lwr upr
## 1 70.86816 70.55745 71.17886
predI
## fit lwr upr
## 1 70.86816 66.38894 75.34737
confI %*% c(0, -1, 1)
## [,1]
## 1 0.6214155
predI %*% c(0, -1, 1)
## [,1]
## 1 8.958424
The CI gives us a point estimate for the mean of height given father’s height is 73 and gender of Male. It also gives us a range where we are 95% confident the mean of height given father’s height is 73 and gender of Male will fall in between 70.55745 and 71.17886.
The PI also gives us the same point estimate but this time it is for what the person’s induvidual height would be given the father’s height is 73 and gender of Male. It also gives us a range where we are 95% confident the induvidual’s height given father’s height is 73 and gender of Male will fall in between 66.38894 and 75.34737.