In class today we reviewed multiple linear regression and talked about Ftests and partial Ftests. To recap our review we will look at the wblake data set.

library(alr3)
## Warning: package 'alr3' was built under R version 3.4.3
## Loading required package: car
## Warning: package 'car' was built under R version 3.4.3
data("wblake")
attach(wblake)

We use a t-test to decide if we can drop one predictor or not. If the pvalue of our ttest is very large, we can drop the predictor. We use an F-test to see if any of the predictors have a linear relationship with our response. If our pvalue of the ftest is very large, none of the predictors have a linear relationship with the response.

FishMod <- lm(Age ~ Length+Scale)
summary(FishMod)
## 
## Call:
## lm(formula = Age ~ Length + Scale)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.68036 -0.52766  0.03982  0.54636  2.81994 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.008884   0.139800  -7.217 2.38e-12 ***
## Length       0.027344   0.001773  15.427  < 2e-16 ***
## Scale       -0.011078   0.044012  -0.252    0.801    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.8545 on 436 degrees of freedom
## Multiple R-squared:  0.8165, Adjusted R-squared:  0.8157 
## F-statistic:   970 on 2 and 436 DF,  p-value: < 2.2e-16

So we can use the pr(>|t|) column to decide whether or not to drop the predictor. In this scenario, we can drop scale radius because it doesn’t give us new information. We can use the very bottom row (F-statistic) to decide whether there is any linear relationship. We can see that our p-value is very small and therefore there is a linear relationship between either or both Length and Scale radius.

To continue, We can do a partial F-test of a subset to decide whether or not drop a set of predictors. We can create a subset of the data and then create a reduced model.

data(water)
attach(water)
watermod<-lm(BSAAM~APMAM+APSAB+APSLAKE+OPBPC+OPRC+OPSLAKE)
watermodreduced<-lm(BSAAM~APMAM+APSAB+APSLAKE)

anova(watermod,watermodreduced)
## Analysis of Variance Table
## 
## Model 1: BSAAM ~ APMAM + APSAB + APSLAKE + OPBPC + OPRC + OPSLAKE
## Model 2: BSAAM ~ APMAM + APSAB + APSLAKE
##   Res.Df        RSS Df  Sum of Sq     F    Pr(>F)    
## 1     36 2.0558e+09                                  
## 2     39 2.5116e+10 -3 -2.306e+10 134.6 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

So you can see here we are using the water data set, which measure snowfall and runoff. Our complete set contains all of the weather stations. Our reduced set got ride of the weather stations that started with O.

Looking in the Pr(>F) column, you can see that our Pvalue is extremely small. This means that we have to reject the null and we cannot get rid of the O weather stations. In other words, the O stations give us significant data.