Learning Log 12

Polynomial

We first tried making a linear model and had a high R^2 value but we saw a pattern in our residual plot so we knew this was not a good model.

data(women)
attach(women)
lmod<-lm(weight~height)
summary(lmod)

## 
## Call:
## lm(formula = weight ~ height)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.7333 -1.1333 -0.3833  0.7417  3.1167 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -87.51667    5.93694  -14.74 1.71e-09 ***
## height        3.45000    0.09114   37.85 1.09e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.525 on 13 degrees of freedom
## Multiple R-squared:  0.991,  Adjusted R-squared:  0.9903 
## F-statistic:  1433 on 1 and 13 DF,  p-value: 1.091e-14

lreside<-residuals(lmod)
plot(lreside~height)
abline(0,0)

So the next step I took was to make an x^2 variable and use that to make a polynomial model and then check that residual plot as well.

xsq<-height^2
qmod<-lm(weight~height+xsq)
summary(qmod)

## 
## Call:
## lm(formula = weight ~ height + xsq)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.50941 -0.29611 -0.00941  0.28615  0.59706 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 261.87818   25.19677  10.393 2.36e-07 ***
## height       -7.34832    0.77769  -9.449 6.58e-07 ***
## xsq           0.08306    0.00598  13.891 9.32e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3841 on 12 degrees of freedom
## Multiple R-squared:  0.9995, Adjusted R-squared:  0.9994 
## F-statistic: 1.139e+04 on 2 and 12 DF,  p-value: < 2.2e-16

qresid<-residuals(qmod)
plot(qresid~height)
abline(0,0)

As we can see this residual plot has no pattern so it is much better and means our polynomial model is better. Also there is a higher correlation coefficient for the polynomial model.

The last thing to do is then compare the two prediction lines and see how our polynomial line actually fits better.

plot(weight~height)
x<- seq(from=58, to=72, by=.1)
coef(qmod)

##  (Intercept)       height          xsq 
## 261.87818358  -7.34831933   0.08306399

y<-coef(qmod) [1]+coef(qmod) [2]*x+coef(qmod) [3]*x^2
lines(x,y, lty=1, col=2)
abline(lmod)

As we can see our red line follows a little better than the straight black line.

Learning Log 12

Stenroos

March 13, 2018

Polynomial

Multicollinearity