Problem 8

a)

set.seed(1)
y = rnorm(100)
x = rnorm(100)
y = x-2*x^2+rnorm(100)

n = 100, p = 1, -2

We are using the model Y = X + βX^2 + ε

b)

plot(x, y)

We see what appears to be a symmetric negative parabola with a maximum at approximately (.25, 0). This suggestests a non-linear relationship.

c)

library(boot)
set.seed(1)
n = 100
x = rnorm(n)
y = x - 2*x^2 + rnorm(n) 
MSE = numeric(4)
data = data.frame(x,y)

for (i in 1:4) 
{
  glm.fit = glm(y~poly(x, i))
  MSE[i] = cv.glm(data, glm.fit)$delta[1]  
}

MSE
## [1] 7.2881616 0.9374236 0.9566218 0.9539049

Results displayed above. They are behaving as expected, where the degree 1 polynomial has significantly higher variance than 2 through 4, whose MSE are not significantly different.

d)

set.seed(2)
n = 100
x = rnorm(n)
y = x - 2*x^2 + rnorm(n)
MSE = numeric(4)
data = data.frame(x,y)

for (i in 1:4) 
{
  glm.fit = glm(y~poly(x, i))
  MSE[i] = cv.glm(data, glm.fit)$delta[1]  
}

MSE
## [1] 9.858301 1.004410 1.018030 1.035601

The results are as expected. Similar to part c)

e)

The smallest errors were the polynomials. Specifically, degree 2 in either models. This fits with our earlier observation that the observations followed a quadratic pattern. We should note that the difference between polynomials 2, 3, and 4 were not so great. This is also to be expected, as there is always dimenishing returns as you increase the degrees.

f)

summary(glm.fit)
## 
## Call:
## glm(formula = y ~ poly(x, i))
## 
## Deviance Residuals: 
##      Min        1Q    Median        3Q       Max  
## -2.08635  -0.78633   0.06263   0.76755   2.11807  
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  -2.6685     0.1001 -26.665  < 2e-16 ***
## poly(x, i)1   9.4434     1.0008   9.436 2.65e-15 ***
## poly(x, i)2 -28.7879     1.0008 -28.766  < 2e-16 ***
## poly(x, i)3  -0.2163     1.0008  -0.216    0.829    
## poly(x, i)4   0.0866     1.0008   0.087    0.931    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for gaussian family taken to be 1.001533)
## 
##     Null deviance: 1013.122  on 99  degrees of freedom
## Residual deviance:   95.146  on 95  degrees of freedom
## AIC: 290.81
## 
## Number of Fisher Scoring iterations: 2

The summary shows that the degree 0 (intercept), 1, and 2 coefficients are highly signficant, while degrees 3 and 4 are not. This further supports that a quadratic model is well-fitted.