How to create a quadratic regression mod
create new variable xsq <- x^2 lm(y ~ x + xsq) OR lm(y ~ x + I(x^2))
Can do this for any polynomial lm(y ~ x + I(x^2) + I(x^3)) ?poly poly(x,4)
data(women)
names(women)
## [1] "height" "weight"
attach(women)
women.mod <- lm(weight ~ height)
plot(women.mod$residuals ~ women.mod$fitted.values)
abline(0,0)
This is not what we want to see. We are systematically overpredicting and underpredicting.
Let’s add a quadratic term.
women.mod.quad <- lm(weight ~ height + I(height^2))
plot(women.mod.quad$residuals ~ women.mod.quad$fitted.values)
abline(0,0)
This resid plot is a little bit better, but I wouldn’t necessarily say it’s good yet.
Test to see if we need quad term
summary(women.mod.quad)
##
## Call:
## lm(formula = weight ~ height + I(height^2))
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.50941 -0.29611 -0.00941 0.28615 0.59706
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 261.87818 25.19677 10.393 2.36e-07 ***
## height -7.34832 0.77769 -9.449 6.58e-07 ***
## I(height^2) 0.08306 0.00598 13.891 9.32e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3841 on 12 degrees of freedom
## Multiple R-squared: 0.9995, Adjusted R-squared: 0.9994
## F-statistic: 1.139e+04 on 2 and 12 DF, p-value: < 2.2e-16
I(height^2) has a p-val of near 0, so it is significant. We do need the quadratic term!
Okay, let’s try with x^3.
women.mod.x3 <- lm(weight ~ height + I(height^2) + I(height^3))
plot(women.mod.x3$residuals ~ women.mod.x3$fitted.values)
abline(0,0)
summary(women.mod.x3)
##
## Call:
## lm(formula = weight ~ height + I(height^2) + I(height^3))
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.40677 -0.17391 0.03091 0.12051 0.42191
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -8.967e+02 2.946e+02 -3.044 0.01116 *
## height 4.641e+01 1.366e+01 3.399 0.00594 **
## I(height^2) -7.462e-01 2.105e-01 -3.544 0.00460 **
## I(height^3) 4.253e-03 1.079e-03 3.940 0.00231 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.2583 on 11 degrees of freedom
## Multiple R-squared: 0.9998, Adjusted R-squared: 0.9997
## F-statistic: 1.679e+04 on 3 and 11 DF, p-value: < 2.2e-16
How to plot the new quad mod
coef(women.mod.quad)
## (Intercept) height I(height^2)
## 261.87818358 -7.34831933 0.08306399
plot(weight~height)
x <- seq(from=58, to=72, by = .1)
y <- coef(women.mod.quad)[1] + coef(women.mod.quad)[2]*x + coef(women.mod.quad)[3]*x^2
length(x)
## [1] 141
length(y)
## [1] 141
lines(x,y,lty=3,col=2)
abline(women.mod)
If we were to look at x^3, we would find it’s significant as well.
What’s crazy is that we only have 15 data points and we’re still able to conclude that all of the terms are significant!
library(alr3)
## Warning: package 'alr3' was built under R version 3.4.3
## Loading required package: car
## Warning: package 'car' was built under R version 3.4.3
data(wblake)
names(wblake)
## [1] "Age" "Length" "Scale"
attach(wblake)
# use age & scale radius
plot(Age ~ Scale)
Looks smooth, but not necessarily like a line.
fish.mod <- lm(Age ~ Scale + I(Scale^2))
summary(fish.mod)
##
## Call:
## lm(formula = Age ~ Scale + I(Scale^2))
##
## Residuals:
## Min 1Q Median 3Q Max
## -2.95237 -0.69641 0.02624 0.65374 2.70003
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -1.56253 0.21438 -7.289 1.48e-12 ***
## Scale 1.44382 0.07441 19.403 < 2e-16 ***
## I(Scale^2) -0.06493 0.00576 -11.272 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.9349 on 436 degrees of freedom
## Multiple R-squared: 0.7804, Adjusted R-squared: 0.7793
## F-statistic: 774.5 on 2 and 436 DF, p-value: < 2.2e-16
Yes, we need the quadratic term!