4.10 particle F-Test

On Thursday, we learned how to use subsets of variables to find out if a certain group of variables are better predictors of the response variable. A good example of this would be if you had the response variable “price of a house” and predictor variables “location, Sqft, bedrooms, color of roof tile” The color of roof tile would not be as good as a predictor for the price of a house as the others and thus we would get rid of it by doing a particle F-Test.

library (MASS)
data(hills)
attach(hills)
Mod1<-lm(time~dist+climb)
Mod2<-lm(time~climb)
summary (Mod1)
## 
## Call:
## lm(formula = time ~ dist + climb)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -16.215  -7.129  -1.186   2.371  65.121 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -8.992039   4.302734  -2.090   0.0447 *  
## dist         6.217956   0.601148  10.343 9.86e-12 ***
## climb        0.011048   0.002051   5.387 6.45e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 14.68 on 32 degrees of freedom
## Multiple R-squared:  0.9191, Adjusted R-squared:  0.914 
## F-statistic: 181.7 on 2 and 32 DF,  p-value: < 2.2e-16
summary(Mod2)
## 
## Call:
## lm(formula = time ~ climb)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -36.616 -18.293  -4.215   5.103 127.706 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 12.69917    7.71050   1.647    0.109    
## climb        0.02489    0.00319   7.801 5.45e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 30.12 on 33 degrees of freedom
## Multiple R-squared:  0.6484, Adjusted R-squared:  0.6378 
## F-statistic: 60.86 on 1 and 33 DF,  p-value: 5.452e-09
anova(Mod1,Mod2)
## Analysis of Variance Table
## 
## Model 1: time ~ dist + climb
## Model 2: time ~ climb
##   Res.Df     RSS Df Sum of Sq      F    Pr(>F)    
## 1     32  6891.9                                  
## 2     33 29933.8 -1    -23042 106.99 9.859e-12 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

H0: Drop a variable HA: Keep all variables

As you can see above I created two liner regressions models from the data hills found in the package MASS. In the first one I included both of the variables and in the second one I took away one of the variables. As you can see from the summaries, both predictor variables are good for the response variable since they both have p values under .00005. Also the r^2 for the first model showed that it had 91.91% of variables affecting the response. The one where we took away a variable only had 64.84%. So we know that the model with all the variables is better in this case, but lets test this by using the particle F-test using anova.

The anova test then compares the two models and says whether or not we should drop the variable that we dropped. From the anova we can see that the p value located next to model 2 is very small this means that instead of dropping the variable that we dropped we should instead keep them, since we reject the null hypothesis and accept the alterinate. ]