Discussion3

Testing one predictor

To test whether Bsalary = 0, we drop salary from the full model by testing the hypothesis that the corresponding parameter is zero. The p-value of 0.06667 in the ANOVA table below is large enough to indicate that the null hypothesis cannot be rejected here. i.e. we cannot reject that salary is not a significant variable in explaining total sat score.

require(faraway)

## Loading required package: faraway

lmod <- lm(total ~ expend + ratio + salary, sat)
summary(lmod)

## 
## Call:
## lm(formula = total ~ expend + ratio + salary, data = sat)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -140.911  -46.740   -7.535   47.966  123.329 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1069.234    110.925   9.639 1.29e-12 ***
## expend        16.469     22.050   0.747   0.4589    
## ratio          6.330      6.542   0.968   0.3383    
## salary        -8.823      4.697  -1.878   0.0667 .  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 68.65 on 46 degrees of freedom
## Multiple R-squared:  0.2096, Adjusted R-squared:  0.1581 
## F-statistic: 4.066 on 3 and 46 DF,  p-value: 0.01209

nullmod <- lm(total ~ 1, sat)
anova(nullmod, lmod)

## Analysis of Variance Table
## 
## Model 1: total ~ 1
## Model 2: total ~ expend + ratio + salary
##   Res.Df    RSS Df Sum of Sq      F  Pr(>F)  
## 1     49 274308                              
## 2     46 216812  3     57496 4.0662 0.01209 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

#Using the F-testing formula, we arrive at the same F-statistic and p-values.

(rss0 <- deviance(nullmod))

## [1] 274307.7

(rss <- deviance(lmod))

## [1] 216811.9

(df0 <- df.residual(nullmod))

## [1] 49

(df <- df.residual(lmod))

## [1] 46

(fstat <- ((rss0 - rss)/ (df0 - df)) / (rss/df))

## [1] 4.066203

1 - pf(fstat, df0 - df, df)

## [1] 0.01208607

#Testing whether salary is significant

lmods <- lm(total ~ expend + ratio, sat)
summary(lmods)

## 
## Call:
## lm(formula = total ~ expend + ratio, data = sat)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -147.694  -51.816    6.258   37.756  127.742 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1136.336    107.803  10.541 5.69e-14 ***
## expend       -22.308      7.956  -2.804  0.00731 ** 
## ratio         -2.295      4.784  -0.480  0.63370    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 70.48 on 47 degrees of freedom
## Multiple R-squared:  0.149,  Adjusted R-squared:  0.1128 
## F-statistic: 4.114 on 2 and 47 DF,  p-value: 0.02258

anova(lmods, lmod)

## Analysis of Variance Table
## 
## Model 1: total ~ expend + ratio
## Model 2: total ~ expend + ratio + salary
##   Res.Df    RSS Df Sum of Sq      F  Pr(>F)  
## 1     47 233443                              
## 2     46 216812  1     16631 3.5285 0.06667 .
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Now add takers to the model. Test the hypothesis that Btakers = 0. Compare this model to the previous one using an F-test. Demonstrate that the F-test and t-test here are equivalent.

We design another model lmods adding takers as a predictor variable. Comparing this to the prior model using the F-test, we see that the p-value of the model is 2.2e-16 which implies the model is significant, i.e., we can reject the null hypothesis that none of the predictors (expend, ratio, salary, takers) have significance. i.e., we can reject the null hypothesis that Bexpend = Bratio = Bsalary = Btakers = 0.

The t-stat for testing the hypothesis is derived from ti = Bi^ / se(bi^) = -2.9045/0.2313 = -12.55728 = 157.6853 which is the same as the F-value derived below.

lmod1 <- lm(total ~ expend + ratio + salary + takers, sat)
summary(lmod1)

## 
## Call:
## lm(formula = total ~ expend + ratio + salary + takers, data = sat)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -90.531 -20.855  -1.746  15.979  66.571 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 1045.9715    52.8698  19.784  < 2e-16 ***
## expend         4.4626    10.5465   0.423    0.674    
## ratio         -3.6242     3.2154  -1.127    0.266    
## salary         1.6379     2.3872   0.686    0.496    
## takers        -2.9045     0.2313 -12.559 2.61e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 32.7 on 45 degrees of freedom
## Multiple R-squared:  0.8246, Adjusted R-squared:  0.809 
## F-statistic: 52.88 on 4 and 45 DF,  p-value: < 2.2e-16

anova(lmod1, lmod)

## Analysis of Variance Table
## 
## Model 1: total ~ expend + ratio + salary + takers
## Model 2: total ~ expend + ratio + salary
##   Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
## 1     45  48124                                  
## 2     46 216812 -1   -168688 157.74 2.607e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Discussion3

Exercise 3.4

Testing one predictor