4.7 Practice

Women’s Data

data("women")
attach(women)
mymod1<-lm(weight~height)
summary(mymod1)

## 
## Call:
## lm(formula = weight ~ height)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.7333 -1.1333 -0.3833  0.7417  3.1167 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -87.51667    5.93694  -14.74 1.71e-09 ***
## height        3.45000    0.09114   37.85 1.09e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.525 on 13 degrees of freedom
## Multiple R-squared:  0.991,  Adjusted R-squared:  0.9903 
## F-statistic:  1433 on 1 and 13 DF,  p-value: 1.091e-14

plot(mymod1$residuals~mymod1$fitted.values)
abline(0,0)

Now we look at the quadratic

mymod2<-lm(weight~height+I(height^2)+I(height^3))
summary(mymod2)

## 
## Call:
## lm(formula = weight ~ height + I(height^2) + I(height^3))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.40677 -0.17391  0.03091  0.12051  0.42191 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)   
## (Intercept) -8.967e+02  2.946e+02  -3.044  0.01116 * 
## height       4.641e+01  1.366e+01   3.399  0.00594 **
## I(height^2) -7.462e-01  2.105e-01  -3.544  0.00460 **
## I(height^3)  4.253e-03  1.079e-03   3.940  0.00231 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2583 on 11 degrees of freedom
## Multiple R-squared:  0.9998, Adjusted R-squared:  0.9997 
## F-statistic: 1.679e+04 on 3 and 11 DF,  p-value: < 2.2e-16

plot(mymod2$residuals~mymod2$fitted.values)
abline(0,0)

Comparing the F-Test of the two models we see that the p-val of the quadratic model is smaller, therefore we can conclude that it is a good idea to keep the second term within the model ## Fish Data

library(alr3)

## Warning: package 'alr3' was built under R version 3.4.3

## Loading required package: car

## Warning: package 'car' was built under R version 3.4.3

data(wblake)
names(wblake)

## [1] "Age"    "Length" "Scale"

attach(wblake)
mymod3<-lm(Age~Scale)
summary(mymod3)

## 
## Call:
## lm(formula = Age ~ Scale)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.2579 -0.5909 -0.0696  0.5927  3.1222 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  0.53054    0.12159   4.363  1.6e-05 ***
## Scale        0.62622    0.01885  33.221  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.061 on 437 degrees of freedom
## Multiple R-squared:  0.7163, Adjusted R-squared:  0.7157 
## F-statistic:  1104 on 1 and 437 DF,  p-value: < 2.2e-16

plot(mymod3$residuals~mymod3$fitted.values)
abline(0,0)

From the Scatter plot of the residuals, we see that at lower values the model is over estimating the response, but at higher values there appears to be consistent overestimation.

Looking at Quadratic

mymod4<-lm(Age~Scale+I(Scale^2)+I(Scale^3)+I(Scale^4))
summary(mymod4)

## 
## Call:
## lm(formula = Age ~ Scale + I(Scale^2) + I(Scale^3) + I(Scale^4))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -3.09859 -0.64794  0.00865  0.60453  2.68573 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  2.0883874  0.6436203   3.245  0.00127 ** 
## Scale       -1.4225077  0.4665388  -3.049  0.00244 ** 
## I(Scale^2)   0.6557084  0.1140038   5.752 1.67e-08 ***
## I(Scale^3)  -0.0709122  0.0112536  -6.301 7.26e-10 ***
## I(Scale^4)   0.0023578  0.0003823   6.168 1.58e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.8965 on 434 degrees of freedom
## Multiple R-squared:  0.7989, Adjusted R-squared:  0.7971 
## F-statistic: 431.1 on 4 and 434 DF,  p-value: < 2.2e-16

plot(mymod4$residuals~mymod4$fitted.values)

Looking at the p-values of the F-Test of both linear models, we see that they are the same. Therefore there is no need to utilize the quadratic term because it doesn’t improve the p-value

plot(wblake)

These plots are a good visual to see what the relationship is between the different variables within the data set

4.7 Practice

Jeff Courneya

March 8, 2018

Women’s Data

Looking at Quadratic