Exercise 3.4

Using the sat data:

  1. Fit a model woth total sat score as the response and expend, ratio and salary as predictors. Test the hypothesis that Bsalary = 0. Test the hypothesis that Bsalary = Bratio = Bexpend = 0. Do any of these predictors have an effect on the response?

We can obtain the result of the test of all predictors by fitting the null model and using the anova function. We can see the result of whether any of the predictors have significance in the model, that is, whether Bsalary = Bratio = Bexpend = 0. Since the p-value of the model 0.01209 is small, the null hypothesis is rejected. That is, at least one of the variables salary, ratio, or expend has significance in the model.

Testing one predictor

To test whether Bsalary = 0, we drop salary from the full model by testing the hypothesis that the corresponding parameter is zero. The p-value of 0.06667 in the ANOVA table below is large enough to indicate that the null hypothesis cannot be rejected here. i.e. we cannot reject that salary is not a significant variable in explaining total sat score.

require(faraway)
## Loading required package: faraway
lmod <- lm(total ~ expend + ratio + salary, sat)
summary(lmod)
## 
## Call:
## lm(formula = total ~ expend + ratio + salary, data = sat)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -140.911  -46.740   -7.535   47.966  123.329 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1069.234    110.925   9.639 1.29e-12 ***
## expend        16.469     22.050   0.747   0.4589    
## ratio          6.330      6.542   0.968   0.3383    
## salary        -8.823      4.697  -1.878   0.0667 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 68.65 on 46 degrees of freedom
## Multiple R-squared:  0.2096, Adjusted R-squared:  0.1581 
## F-statistic: 4.066 on 3 and 46 DF,  p-value: 0.01209
nullmod <- lm(total ~ 1, sat)
anova(nullmod, lmod)
## Analysis of Variance Table
## 
## Model 1: total ~ 1
## Model 2: total ~ expend + ratio + salary
##   Res.Df    RSS Df Sum of Sq      F  Pr(>F)  
## 1     49 274308                              
## 2     46 216812  3     57496 4.0662 0.01209 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#Using the F-testing formula, we arrive at the same F-statistic and p-values.

(rss0 <- deviance(nullmod))
## [1] 274307.7
(rss <- deviance(lmod))
## [1] 216811.9
(df0 <- df.residual(nullmod))
## [1] 49
(df <- df.residual(lmod))
## [1] 46
(fstat <- ((rss0 - rss)/ (df0 - df)) / (rss/df))
## [1] 4.066203
1 - pf(fstat, df0 - df, df)
## [1] 0.01208607
#Testing whether salary is significant

lmods <- lm(total ~ expend + ratio, sat)
summary(lmods)
## 
## Call:
## lm(formula = total ~ expend + ratio, data = sat)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -147.694  -51.816    6.258   37.756  127.742 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1136.336    107.803  10.541 5.69e-14 ***
## expend       -22.308      7.956  -2.804  0.00731 ** 
## ratio         -2.295      4.784  -0.480  0.63370    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 70.48 on 47 degrees of freedom
## Multiple R-squared:  0.149,  Adjusted R-squared:  0.1128 
## F-statistic: 4.114 on 2 and 47 DF,  p-value: 0.02258
anova(lmods, lmod)
## Analysis of Variance Table
## 
## Model 1: total ~ expend + ratio
## Model 2: total ~ expend + ratio + salary
##   Res.Df    RSS Df Sum of Sq      F  Pr(>F)  
## 1     47 233443                              
## 2     46 216812  1     16631 3.5285 0.06667 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
  1. Now add takers to the model. Test the hypothesis that Btakers = 0. Compare this model to the previous one using an F-test. Demonstrate that the F-test and t-test here are equivalent.

We design another model lmods adding takers as a predictor variable. Comparing this to the prior model using the F-test, we see that the p-value of the model is 2.2e-16 which implies the model is significant, i.e., we can reject the null hypothesis that none of the predictors (expend, ratio, salary, takers) have significance. i.e., we can reject the null hypothesis that Bexpend = Bratio = Bsalary = Btakers = 0.

The t-stat for testing the hypothesis is derived from ti = Bi^ / se(bi^) = -2.9045/0.2313 = -12.55728 = 157.6853 which is the same as the F-value derived below.

lmod1 <- lm(total ~ expend + ratio + salary + takers, sat)
summary(lmod1)
## 
## Call:
## lm(formula = total ~ expend + ratio + salary + takers, data = sat)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -90.531 -20.855  -1.746  15.979  66.571 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1045.9715    52.8698  19.784  < 2e-16 ***
## expend         4.4626    10.5465   0.423    0.674    
## ratio         -3.6242     3.2154  -1.127    0.266    
## salary         1.6379     2.3872   0.686    0.496    
## takers        -2.9045     0.2313 -12.559 2.61e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 32.7 on 45 degrees of freedom
## Multiple R-squared:  0.8246, Adjusted R-squared:  0.809 
## F-statistic: 52.88 on 4 and 45 DF,  p-value: < 2.2e-16
anova(lmod1, lmod)
## Analysis of Variance Table
## 
## Model 1: total ~ expend + ratio + salary + takers
## Model 2: total ~ expend + ratio + salary
##   Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
## 1     45  48124                                  
## 2     46 216812 -1   -168688 157.74 2.607e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1