Quadratic of Polynomial regression

Quadratic regression is technically considered a type of regression. However, it is used to better fit data that has a definite curve in the data. The equation is very similar to the linear regression equation: \[\hat{y_i}= \hat{\beta_0}+\hat{\beta_1} x + \hat{\beta_2} x^2\] \({\beta_0}\) can be interpreted as the y intercept, again, only if it makes sense for x to be able to be 0.
\({\beta_1}\) shifts the parabola left and right.
\({\beta_2}\) affects the shape of the curve of the parabola.
we can look at a model to compare using just a regular linear model, then add the quadratic term in to see if it’s a better fit for the model.
otherwise, you can just use a t test on beta 2.

data(women)
attach(women)
wmod <- lm(weight~height,data=women)
summary(wmod)
## 
## Call:
## lm(formula = weight ~ height, data = women)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -1.7333 -1.1333 -0.3833  0.7417  3.1167 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -87.51667    5.93694  -14.74 1.71e-09 ***
## height        3.45000    0.09114   37.85 1.09e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.525 on 13 degrees of freedom
## Multiple R-squared:  0.991,  Adjusted R-squared:  0.9903 
## F-statistic:  1433 on 1 and 13 DF,  p-value: 1.091e-14

When to use a quadratic model? Looking at the residuals..

wresids <- resid(wmod)
plot(wresids~wmod$fitted.values)
abline(0,0)

..this keys us into trying a quadratic model because its residuals are concave up / ‘banana shaped’

wmod2 <- lm(weight~height+I(height^2))
summary(wmod2)
## 
## Call:
## lm(formula = weight ~ height + I(height^2))
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.50941 -0.29611 -0.00941  0.28615  0.59706 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) 261.87818   25.19677  10.393 2.36e-07 ***
## height       -7.34832    0.77769  -9.449 6.58e-07 ***
## I(height^2)   0.08306    0.00598  13.891 9.32e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.3841 on 12 degrees of freedom
## Multiple R-squared:  0.9995, Adjusted R-squared:  0.9994 
## F-statistic: 1.139e+04 on 2 and 12 DF,  p-value: < 2.2e-16
wresids2 <-resid(wmod2)
plot(wresids2~wmod2$fitted.values)
abline(0,0)

Now the residuals look much less like they have a trend which is exactly what we want. If there is a trend that means we are systematically over or under predicting so our data is obviously correlated in some way which does not yield a good model.

anova(wmod,wmod2)
## Analysis of Variance Table
## 
## Model 1: weight ~ height
## Model 2: weight ~ height + I(height^2)
##   Res.Df     RSS Df Sum of Sq      F    Pr(>F)    
## 1     13 30.2333                                  
## 2     12  1.7701  1    28.463 192.96 9.322e-09 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

From our anova test, we can see it is a very small p value. Remember our hypothesis is H_0 : exclude the squared term against the alternative of don’t exclude. If we have a small p value, the concludsion we can draw is that if you exclude the squared term, then it makes the model worse.
The quadratic is super necessary which is why the pvalue of the anova test is super small( examining the sameness of variance ).

Multicollinearity

Next we looked at multicollinearity, which is when predictors are correlated, and the bottom line is that this is HORRIBLE.

pairs(~weight+height+I(height^2))

you can also just calculate cor(weight, height),etc, for each of the variables but obviously that gets tough if you’ve got 5+ predictors.
so why is it bad?
it inflates standard errors. higher correlation will decrease the test stat and in turn, you will have less power to reject the null hypothesis H_0 of any beta = 0 and you’re more likely not to include it in the model. it also makes estimators usntable and you will calculate very different betas for each different sample.
we will measure this by checking the variance inflation factor
1) write linear model with lm(x_j ~ all predictors except x_j)
2) find the r squared for that lm
3) calculate the VIF = 1 / (1 - R^2)
the hope is that you will get a VIF = 1 which means no correlation and this is great news!
if you get a R^2 close to 1, then your VIF will be huge, which is SUPER BAD and you should probably rethink your life goals. If VIF is >10, this is worst case scenario.