Learning Log Day 11

Quadratic Regression

In our most recent class we talked about quadratic regression, which is similar to linear regression, but it looks at a smooth relationship between x and y that is not linear. Our equation for this is: \[\hat{y_i}= \hat{\beta_0}+\hat{\beta_1} x_1 +\hat{\beta_2}x_2^2\]

In this equation, beta(zero) is our intercept, beta(one) shifts our parabola left and right, and beta(two) affects the curvature of the relationship.

When we want to see if using quadratic regression is necessary, we look at the plot of our residuals. In class we did this using the wblake data. First we fit the model and then we plot the residuals:

library(alr3)

## Warning: package 'alr3' was built under R version 3.4.3

## Loading required package: car

## Warning: package 'car' was built under R version 3.4.3

data(wblake)
attach(wblake)
head(wblake)

##   Age Length   Scale
## 1   1     71 1.90606
## 2   1     64 1.87707
## 3   1     57 1.09736
## 4   1     68 1.33108
## 5   1     72 1.59283
## 6   1     80 1.91602

mymod3 <- lm(Age ~ Scale)
myresids3 <- mymod3$residuals
plot(myresids3 ~ mymod3$fitted.values)
abline(0,0)

Because we can see there is a trend in this model, we have an issue. A trend means that we are systematically over and underpredicting. We will fix our linear model by using quadratic regression. We will add x² and x³ (optional) terms to the model, and our equation should look something like this:

xsq2 <- Scale^2
xc <- Scale^3
mymod4 <- lm(Age ~ Scale+xsq2 +xc)

We will now plot the residuals of our quadratic equation and check for a trend:

myresids4 <- mymod4$residuals
plot(myresids4 ~ mymod4$fitted.values)
abline(0,0)

This plot looks better. If we compare the previous graph and this graph around the x-value of about 1, we can see that there is more variability in this graph’s residuals (equal amounts of data points above and below the abline) than before, meaning we have less systematic errors in our predictions.

We also talked about the fact that you can run a T-test for beta(two) or compare F-tests for the linear and quadratic models to see if the quadratic term is necessary. If we run a T-test, we use the summary function and look at the p-values:

summary(mymod4)

## 
## Call:
## lm(formula = Age ~ Scale + xsq2 + xc)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -2.99281 -0.65734  0.02646  0.64421  2.75180 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -1.094809   0.400642  -2.733  0.00654 ** 
## Scale        1.172388   0.210092   5.580 4.23e-08 ***
## xsq2        -0.021289   0.032111  -0.663  0.50769    
## xc          -0.002065   0.001495  -1.381  0.16788    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 0.9339 on 435 degrees of freedom
## Multiple R-squared:  0.7813, Adjusted R-squared:  0.7798 
## F-statistic:   518 on 3 and 435 DF,  p-value: < 2.2e-16

Our p-value for our squared term is >.05, so we will not reject the null hypothesis.

Learning Log Day 11

Cara Christianson

March 11, 2018

Quadratic Regression

Ch 5