March 8, 2018

How to create a quadratic regression mod

create new variable xsq <- x^2 lm(y ~ x + xsq) OR lm(y ~ x + I(x^2))

Can do this for any polynomial lm(y ~ x + I(x^2) + I(x^3)) ?poly poly(x,4)

Example A

data(women)
names(women)
## [1] "height" "weight"
attach(women)
women.mod <- lm(weight ~ height)
plot(women.mod$residuals ~ women.mod$fitted.values)
  abline(0,0)

This is not what we want to see. We are systematically overpredicting and underpredicting.

Let’s add a quadratic term.

women.mod.quad <- lm(weight ~ height + I(height^2))
plot(women.mod.quad$residuals ~ women.mod.quad$fitted.values)
  abline(0,0)

This resid plot is a little bit better, but I wouldn’t necessarily say it’s good yet.

Test to see if we need quad term

summary(women.mod.quad)
## 
## Call:
## lm(formula = weight ~ height + I(height^2))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.50941 -0.29611 -0.00941  0.28615  0.59706 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 261.87818   25.19677  10.393 2.36e-07 ***
## height       -7.34832    0.77769  -9.449 6.58e-07 ***
## I(height^2)   0.08306    0.00598  13.891 9.32e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3841 on 12 degrees of freedom
## Multiple R-squared:  0.9995, Adjusted R-squared:  0.9994 
## F-statistic: 1.139e+04 on 2 and 12 DF,  p-value: < 2.2e-16

I(height^2) has a p-val of near 0, so it is significant. We do need the quadratic term!

Okay, let’s try with x^3.

women.mod.x3 <- lm(weight ~ height + I(height^2) + I(height^3))
plot(women.mod.x3$residuals ~ women.mod.x3$fitted.values)
  abline(0,0)

summary(women.mod.x3)
## 
## Call:
## lm(formula = weight ~ height + I(height^2) + I(height^3))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.40677 -0.17391  0.03091  0.12051  0.42191 
## 
## Coefficients:
##               Estimate Std. Error t value Pr(>|t|)   
## (Intercept) -8.967e+02  2.946e+02  -3.044  0.01116 * 
## height       4.641e+01  1.366e+01   3.399  0.00594 **
## I(height^2) -7.462e-01  2.105e-01  -3.544  0.00460 **
## I(height^3)  4.253e-03  1.079e-03   3.940  0.00231 **
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.2583 on 11 degrees of freedom
## Multiple R-squared:  0.9998, Adjusted R-squared:  0.9997 
## F-statistic: 1.679e+04 on 3 and 11 DF,  p-value: < 2.2e-16

How to plot the new quad mod

coef(women.mod.quad)
##  (Intercept)       height  I(height^2) 
## 261.87818358  -7.34831933   0.08306399
plot(weight~height)
x <- seq(from=58, to=72, by = .1)
y <- coef(women.mod.quad)[1] + coef(women.mod.quad)[2]*x + coef(women.mod.quad)[3]*x^2
length(x)
## [1] 141
length(y)
## [1] 141
lines(x,y,lty=3,col=2)
abline(women.mod)

If we were to look at x^3, we would find it’s significant as well.

What’s crazy is that we only have 15 data points and we’re still able to conclude that all of the terms are significant!

B

library(alr3)
## Warning: package 'alr3' was built under R version 3.4.3
## Loading required package: car
## Warning: package 'car' was built under R version 3.4.3
data(wblake)
names(wblake)
## [1] "Age"    "Length" "Scale"
attach(wblake)
# use age & scale radius
plot(Age ~ Scale)

Looks smooth, but not necessarily like a line.

fish.mod <- lm(Age ~ Scale + I(Scale^2))
summary(fish.mod)
## 
## Call:
## lm(formula = Age ~ Scale + I(Scale^2))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.95237 -0.69641  0.02624  0.65374  2.70003 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.56253    0.21438  -7.289 1.48e-12 ***
## Scale        1.44382    0.07441  19.403  < 2e-16 ***
## I(Scale^2)  -0.06493    0.00576 -11.272  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9349 on 436 degrees of freedom
## Multiple R-squared:  0.7804, Adjusted R-squared:  0.7793 
## F-statistic: 774.5 on 2 and 436 DF,  p-value: < 2.2e-16

Yes, we need the quadratic term!