Polynomial Regression

First, we viewed our residual plots just as we did in previous chapters, in an ideal situation there would be no trend in this scatter plot. If there is a trend, then we are systematically over/underpredicting results. We don’t want to systematically over/under predict, we want this to be as random as possible. When there is a trend we use polynomial regression. In class we used the women data set to demonstrate this. Below you can find the scatterplot of the women data set residuals.

data("women")
names(women)

## [1] "height" "weight"

attach(women)
wt={weight}
ht={height}
mymod=lm(wt~ht)
myresids=mymod$residuals
plot(myresids~mymod$fitted.values)
abline(0,0)

As you can see there is a definite trend when we plot the residuals. We are systematically overpredicting on the outside and underpredicting in the center.

Add Quadratic term

To try to fix our plot we can add a quadratic term to our model.

mymod2=lm(wt~ht+I(ht^2))
myresids2=mymod2$residuals
plot(myresids2~mymod2$fitted.values)
abline(0,0)

As you can see the trend in the residual plot is practically gone.

Check

We can check that we need this term by using the summary function below.

summary(mymod2)

## 
## Call:
## lm(formula = wt ~ ht + I(ht^2))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.50941 -0.29611 -0.00941  0.28615  0.59706 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 261.87818   25.19677  10.393 2.36e-07 ***
## ht           -7.34832    0.77769  -9.449 6.58e-07 ***
## I(ht^2)       0.08306    0.00598  13.891 9.32e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3841 on 12 degrees of freedom
## Multiple R-squared:  0.9995, Adjusted R-squared:  0.9994 
## F-statistic: 1.139e+04 on 2 and 12 DF,  p-value: < 2.2e-16

Our p-val for the quadratic term is 9.32e-09 which is very small and will be significant for almost all alphas, so we should add it! With the p-val being that small we could probably add another term.

Additional term

mymod3=lm(wt~ht+I(ht^2)+I(ht^3))
myresids3=mymod3$residuals
plot(myresids3~mymod3$fitted.values)
abline(0,0)

Check

Now lets check to see if we should include this term.

summary(mymod3)

## 
## Call:
## lm(formula = wt ~ ht + I(ht^2) + I(ht^3))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.40677 -0.17391  0.03091  0.12051  0.42191 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)   
## (Intercept) -8.967e+02  2.946e+02  -3.044  0.01116 * 
## ht           4.641e+01  1.366e+01   3.399  0.00594 **
## I(ht^2)     -7.462e-01  2.105e-01  -3.544  0.00460 **
## I(ht^3)      4.253e-03  1.079e-03   3.940  0.00231 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2583 on 11 degrees of freedom
## Multiple R-squared:  0.9998, Adjusted R-squared:  0.9997 
## F-statistic: 1.679e+04 on 3 and 11 DF,  p-value: < 2.2e-16

This pval of .00231 is significant with our alpha so we should include it as well! Next, I will touch on the basics of Multicollinearity.

log 12

Kaitlyn Massman

March 10, 2018

Class Concepts

Polynomial Regression

Add Quadratic term

Check

Additional term

Check

Multicollinearity