Variability in Linear Regression Models
For the null model, we have
We will later see that under normal linear models, the distribution of residual mean squares is the \(\chi^2\) distribution.
Decomposing Variability
R-Squared and Multiple Correlation
Races<- read.table ("http://stat4ds.rwth-aachen.de/data/ScotsRaces.dat" , header= TRUE )
head (Races,3 ) # timeM for men, timeW for women
race distance climb timeM timeW
1 AnTeallach 10.6 1.062 74.68 89.72
2 ArrocharAlps 25.0 2.400 187.32 222.03
3 BaddinsgillRound 16.4 0.650 87.18 102.48
fit.dc2<- lm (timeW~ distance + climb,data= Races[- 41 ,])
summary (fit.dc2) # for Races data file without Highland Fling
Call:
lm(formula = timeW ~ distance + climb, data = Races[-41, ])
Residuals:
Min 1Q Median 3Q Max
-28.4353 -6.1442 -0.1459 7.0130 29.8179
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) -8.931 3.281 -2.723 0.00834 **
distance 4.172 0.240 17.383 < 2e-16 ***
climb 43.852 3.715 11.806 < 2e-16 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 12.23 on 64 degrees of freedom
Multiple R-squared: 0.952, Adjusted R-squared: 0.9505
F-statistic: 634.3 on 2 and 64 DF, p-value: < 2.2e-16
cor (Races[- 41 ,]$ timeW,fitted (fit.dc2)) # multiple correlation
(sd (Races[- 41 ,]$ timeW))^ 2
Statistical Inference for Normal Linear Models
Example: Normal Linear Model for Mental Impairment
Mental<- read.table ("http://stat4ds.rwth-aachen.de/data/Mental.dat" , header= TRUE )
head (Mental)
impair life ses
1 17 46 84
2 19 39 97
3 20 27 24
4 20 3 85
5 20 10 15
6 21 44 55
attach (Mental)
pairs (~ impair+ life+ ses) # scatter plot matrix for variable pairs (not shown)
cor (cbind (impair,life,ses)) # correlation matrix
impair life ses
impair 1.0000000 0.3722206 -0.3985676
life 0.3722206 1.0000000 0.1233370
ses -0.3985676 0.1233370 1.0000000
summary (lm (impair~ life+ ses, data= Mental))
Call:
lm(formula = impair ~ life + ses, data = Mental)
Residuals:
Min 1Q Median 3Q Max
-8.678 -2.494 -0.336 2.886 10.891
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 28.22981 2.17422 12.984 2.38e-15 ***
life 0.10326 0.03250 3.177 0.00300 **
ses -0.09748 0.02908 -3.351 0.00186 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 4.556 on 37 degrees of freedom
Multiple R-squared: 0.3392, Adjusted R-squared: 0.3034
F-statistic: 9.495 on 2 and 37 DF, p-value: 0.0004697
Mental <- read.table ("http://stat4ds.rwth-aachen.de/data/Mental.dat" , header= TRUE )
fit <- lm (impair ~ life + ses, data= Mental)
summary (fit)
Call:
lm(formula = impair ~ life + ses, data = Mental)
Residuals:
Min 1Q Median 3Q Max
-8.678 -2.494 -0.336 2.886 10.891
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 28.22981 2.17422 12.984 2.38e-15 ***
life 0.10326 0.03250 3.177 0.00300 **
ses -0.09748 0.02908 -3.351 0.00186 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 4.556 on 37 degrees of freedom
Multiple R-squared: 0.3392, Adjusted R-squared: 0.3034
F-statistic: 9.495 on 2 and 37 DF, p-value: 0.0004697
layout (matrix (1 : 4 , 2 , 2 )) # multiple plots in 2x2 matrix layout
plot (fit, which= c (1 ,2 ,4 ,5 )) # diagnostic plots shown in Figure A6.1
t Tests and Confidence Intervals for Individual Effects
Multicollinearity: Nearly Redundant Explanatory Variables
summary (lm (climb~ timeW,data= Races))
Call:
lm(formula = climb ~ timeW, data = Races)
Residuals:
Min 1Q Median 3Q Max
-1.66351 -0.21704 -0.04151 0.23325 0.89699
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.375956 0.082231 4.572 2.18e-05 ***
timeW 0.005076 0.000664 7.645 1.15e-10 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.3945 on 66 degrees of freedom
Multiple R-squared: 0.4696, Adjusted R-squared: 0.4616
F-statistic: 58.44 on 1 and 66 DF, p-value: 1.145e-10
summary (lm (climb~ timeW + timeM,data= Races))
Call:
lm(formula = climb ~ timeW + timeM, data = Races)
Residuals:
Min 1Q Median 3Q Max
-1.50002 -0.24360 -0.05403 0.23647 0.86249
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 0.345481 0.085152 4.057 0.000136 ***
timeW 0.014440 0.007280 1.984 0.051528 .
timeM -0.010752 0.008324 -1.292 0.201060
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 0.3925 on 65 degrees of freedom
Multiple R-squared: 0.4829, Adjusted R-squared: 0.467
F-statistic: 30.35 on 2 and 65 DF, p-value: 4.912e-10
cor (Races$ timeM,Races$ timeW)
Confidence Interval for \(E(Y)\) and Prediction Interval for \(Y\)
newdata<- data.frame (life= 44.42 ,ses= 56.60 )#mean values of life events and SES
fit1<- lm (impair~ life,data= Mental)
predict (fit1,newdata,interval= "confidence" )
fit lwr upr
1 27.29955 25.65644 28.94266
predict (fit1,newdata,interval= "predict" )
fit lwr upr
1 27.29955 16.77851 37.82059
fit2<- lm (impair~ life+ ses, data= Mental)
predict (fit2,newdata,interval= "confidence" )
fit lwr upr
1 27.29948 25.83974 28.75923
predict (fit2,newdata,interval= "predict" )
fit lwr upr
1 27.29948 17.95257 36.64639
Exercises