Concepts

Today in class we reviewed information we had learned in chapter 4, which involved practice using and interpreting t-tests and F-tests, and learned what a partial F-test is, why we use it and how it is used in R.

In general, F-tests are used to determine if any of the variables in your model have a significant relationship with your response variable. T-tests are used to determine if each variable in your model has a significant relationship with your response variable. Lastly, partial F-tests are used to determine if any subset of your predictor variables has a significant relationship with your response variable.

To understand the key equations and R code needed for these concepts, I will demonstrate them using an example.

Example

First, I’ll create a model. I’ll be using the dataset iris. My model will predict sepal length based on sepal width, petal length, petal width and species.

mod1 <- lm(Sepal.Length ~ ., data = iris)
summary(mod1)
## 
## Call:
## lm(formula = Sepal.Length ~ ., data = iris)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.79424 -0.21874  0.00899  0.20255  0.73103 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        2.17127    0.27979   7.760 1.43e-12 ***
## Sepal.Width        0.49589    0.08607   5.761 4.87e-08 ***
## Petal.Length       0.82924    0.06853  12.101  < 2e-16 ***
## Petal.Width       -0.31516    0.15120  -2.084  0.03889 *  
## Speciesversicolor -0.72356    0.24017  -3.013  0.00306 ** 
## Speciesvirginica  -1.02350    0.33373  -3.067  0.00258 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3068 on 144 degrees of freedom
## Multiple R-squared:  0.8673, Adjusted R-squared:  0.8627 
## F-statistic: 188.3 on 5 and 144 DF,  p-value: < 2.2e-16

In our summary, we can see the indiviual t-statistic for each variable, and their p-values, as well as the F-statistic for the model, and its p-value. If we first look at the results of the F-test, we see that at least one of our variables has a significant relationship with sepal length since the p-value is less than even a very small alpha value of .001. Then, if we look at the results of our t-tests, we can see that all of the variables have a significant relationship with the sepal length of an iris since all of the p-values are at least below an alpha value of .05.

Now, let’s say we want to test whether petal length and petal width have a significant relationship with sepal length. To do that, we need to conduct a partial F-test. First, we’ll create a new model, the same as our first model, but missing two predictor variables, petal length and petal width. Then, we can run the partial F-test.

mod2 <- lm(Sepal.Length ~ Sepal.Width + Species, data = iris)
anova(mod1, mod2)
## Analysis of Variance Table
## 
## Model 1: Sepal.Length ~ Sepal.Width + Petal.Length + Petal.Width + Species
## Model 2: Sepal.Length ~ Sepal.Width + Species
##   Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
## 1    144 13.556                                  
## 2    146 28.004 -2   -14.447 76.731 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Looking at our results, we can see that the partial F-statistic is 76.731 and the p-value for the test is less than 2.2 X 10^-16. From this we can see that the combination of petal length and petal width has a significant relationship with sepal length. This leads us to believe that these two variables are an important part of our model.

Comparison to Other Topics

The new topic, the partial F-test, that we learned today is very similar to a normal F-test in that it tells us whether or not our variables of interest have a significant relationship with our response variable. In the overall theme of our course, this is important becuase creating the best model without adding unnecessary complexity is a very important part of creating regression models and the partial F-test can help us determine the most important variables to include in our model so we do not include anything unnecessary.