set.seed(1)
y = rnorm(100)
x = rnorm(100)
y = x-2*x^2+rnorm(100)
n = 100, p = 1, -2
We are using the model Y = X + βX^2 + ε
plot(x, y)
We see what appears to be a symmetric negative parabola with a maximum at approximately (.25, 0). This suggestests a non-linear relationship.
library(boot)
set.seed(1)
n = 100
x = rnorm(n)
y = x - 2*x^2 + rnorm(n)
MSE = numeric(4)
data = data.frame(x,y)
for (i in 1:4)
{
glm.fit = glm(y~poly(x, i))
MSE[i] = cv.glm(data, glm.fit)$delta[1]
}
MSE
## [1] 7.2881616 0.9374236 0.9566218 0.9539049
Results displayed above. They are behaving as expected, where the degree 1 polynomial has significantly higher variance than 2 through 4, whose MSE are not significantly different.
set.seed(2)
n = 100
x = rnorm(n)
y = x - 2*x^2 + rnorm(n)
MSE = numeric(4)
data = data.frame(x,y)
for (i in 1:4)
{
glm.fit = glm(y~poly(x, i))
MSE[i] = cv.glm(data, glm.fit)$delta[1]
}
MSE
## [1] 9.858301 1.004410 1.018030 1.035601
The results are as expected. Similar to part c)
The smallest errors were the polynomials. Specifically, degree 2 in either models. This fits with our earlier observation that the observations followed a quadratic pattern. We should note that the difference between polynomials 2, 3, and 4 were not so great. This is also to be expected, as there is always dimenishing returns as you increase the degrees.
summary(glm.fit)
##
## Call:
## glm(formula = y ~ poly(x, i))
##
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -2.08635 -0.78633 0.06263 0.76755 2.11807
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -2.6685 0.1001 -26.665 < 2e-16 ***
## poly(x, i)1 9.4434 1.0008 9.436 2.65e-15 ***
## poly(x, i)2 -28.7879 1.0008 -28.766 < 2e-16 ***
## poly(x, i)3 -0.2163 1.0008 -0.216 0.829
## poly(x, i)4 0.0866 1.0008 0.087 0.931
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## (Dispersion parameter for gaussian family taken to be 1.001533)
##
## Null deviance: 1013.122 on 99 degrees of freedom
## Residual deviance: 95.146 on 95 degrees of freedom
## AIC: 290.81
##
## Number of Fisher Scoring iterations: 2
The summary shows that the degree 0 (intercept), 1, and 2 coefficients are highly signficant, while degrees 3 and 4 are not. This further supports that a quadratic model is well-fitted.