We started off class conducting F-tests and T-tests on models that used multiple linear regression. Doing this helped clarify the reasons/differences of T-tests and F-tests. For simple linear regression, there isn’t really a difference. For multiple linear regression, the F-test (and it’s pvalue) will tell us if there is a significant factor in the model. Then we can look at the t-test to see which of our predictor variables has the significant pvalue.
After that we moved on to finding which model was best. We were able to use the anova() function for this. One model had to have all of the predictor variables in it, and the reduced model doesn’t have all of the complete model’s predictors, but can’t add any new into it. When you run the function, you will get a pvalue for the anova. If it is significant, that means to use the complete model. If the pvalue is big, that means you are dropping worthless variables, so you should use the reduced models.
library(Lock5withR)
## Warning: package 'Lock5withR' was built under R version 3.4.3
data("GPAGender")
mod.complete <- lm(GPA ~ Exercise + SAT + Pulse + Piercings, data = GPAGender)
mod.reduced <- lm(GPA ~ SAT + Piercings, data = GPAGender)
anova(mod.complete, mod.reduced)
## Analysis of Variance Table
##
## Model 1: GPA ~ Exercise + SAT + Pulse + Piercings
## Model 2: GPA ~ SAT + Piercings
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 338 44.638
## 2 340 45.886 -2 -1.2477 4.7237 0.009477 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Using the GPAGender data, you can see the complete and reduced models in the R data insert. After running the anova function, we can see that the pvalue is .009477 which is > .05. This means that the reduced model is getting rid of essential predictor variables, so we need to use the complete model.
This will be helpful in the future when we want to see if we can use simpler equations in order to find our response variable.
It is also taking our knowledge of anovas (from Math Stat) and applying them to the linear regression models we are learning in this class.