Let’s use the iris data!

data("iris")
attach(iris)

Looking at an F-test. (Review)

This looks at if any of the predictor variables are significant.

modiris<-lm(Petal.Length~Sepal.Length+Sepal.Width)
summary(modiris)
## 
## Call:
## lm(formula = Petal.Length ~ Sepal.Length + Sepal.Width)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -1.25582 -0.46922 -0.05741  0.45530  1.75599 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -2.52476    0.56344  -4.481 1.48e-05 ***
## Sepal.Length  1.77559    0.06441  27.569  < 2e-16 ***
## Sepal.Width  -1.33862    0.12236 -10.940  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.6465 on 147 degrees of freedom
## Multiple R-squared:  0.8677, Adjusted R-squared:  0.8659 
## F-statistic:   482 on 2 and 147 DF,  p-value: < 2.2e-16

The p-value for our F-stat isles than .05 so that means at least one of the predictor variables should be kept in our model.

Looking at t-test (review)

This looks at if an individual predictor variable has a significant linear relationship with Petal length after accounting for all the other predictors.

modiris1<-lm(Petal.Length~Sepal.Length+Sepal.Width+ Petal.Width+ Species)
summary(modiris1)
## 
## Call:
## lm(formula = Petal.Length ~ Sepal.Length + Sepal.Width + Petal.Width + 
##     Species)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.78396 -0.15708  0.00193  0.14730  0.65418 
## 
## Coefficients:
##                   Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       -1.11099    0.26987  -4.117 6.45e-05 ***
## Sepal.Length       0.60801    0.05024  12.101  < 2e-16 ***
## Sepal.Width       -0.18052    0.08036  -2.246   0.0262 *  
## Petal.Width        0.60222    0.12144   4.959 1.97e-06 ***
## Speciesversicolor  1.46337    0.17345   8.437 3.14e-14 ***
## Speciesvirginica   1.97422    0.24480   8.065 2.60e-13 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2627 on 144 degrees of freedom
## Multiple R-squared:  0.9786, Adjusted R-squared:  0.9778 
## F-statistic:  1317 on 5 and 144 DF,  p-value: < 2.2e-16

All t-test p-values are less tahn .05 which means that accounting for all the other variables, they still have a significant linear relationship with Petal Length.

Partial F-tests

This looks at if a a subset of variables from the complete model are significant in our model.

try excluding predictors about sepals use anova comamand. order does not matter in anova code. modiris1=complete, modiris2=reduced

modiris1<-lm(Petal.Length~Sepal.Length+Sepal.Width+ Petal.Width+ Species)
modiris2<-lm(Petal.Length~Petal.Width+Species)
anova(modiris1,modiris2)
## Analysis of Variance Table
## 
## Model 1: Petal.Length ~ Sepal.Length + Sepal.Width + Petal.Width + Species
## Model 2: Petal.Length ~ Petal.Width + Species
##   Res.Df     RSS Df Sum of Sq      F    Pr(>F)    
## 1    144  9.9397                                  
## 2    146 20.8334 -2   -10.894 78.911 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Because the p-val is less than .05 reject null. This means we do not drop all predictors about sepals. ***at least one of the sepal-predictors has a significantlinear relationship with Petal Length, even after accounting for the petal and species variables.

Conlusion

These techniques can all be helpful in determing which predictor variables should be included in the linear models we create.