Problem 2. a.
set.seed(1)
x=rnorm(100)
y=x-2*x^2+rnorm(100)
mod <- lm(x~y, data =)
ggplot(data = mod, aes(x=x, y=y))+
geom_point()
##This data has a strong curvature (it looks alot like a boomerang), and its distributed rather normally, but with some density from -1 to 1#
#c.
set.seed(2)
X = rnorm(100)
Z = X-2*x^2+rnorm(100)
#i
glm_mod <- glm(X~Z, data = )
coef(glm_mod)
## (Intercept) Z
## 0.3368199 0.2265015
Data <- data.frame(X,Z)
library(boot)
cv.err <- cv.glm(Data, glm_mod)
cv.err$delta
## [1] 0.9529207 0.9527366
cv.error<-rep(0, 10)
for(i in 1:10){
glm.fit<-glm(Z~poly(X, i), data=Data)
cv.error[i]<-cv.glm(Data, glm.fit)$delta[1]
}
cvDF<-data.frame(degree=1:10, cv.error)
ggplot(data=cvDF, aes(x=degree, y=cv.error))+
geom_point()+
geom_line()
head(cvDF)
## degree cv.error
## 1 1 5.871398
## 2 2 5.994912
## 3 3 5.822776
## 4 4 5.604157
## 5 5 5.669690
## 6 6 5.914482
#i.5.871398, ii. 5.994912, iii. 5.822776, iv. 5.604157
#d.
set.seed(3)
X = rnorm(100)
Z = X-2*x^2+rnorm(100)
#i
glm_mod <- glm(X~Z, data = )
coef(glm_mod)
## (Intercept) Z
## 0.1942194 0.1151280
Data <- data.frame(X,Z)
library(boot)
cv.err <- cv.glm(Data, glm_mod)
cv.err$delta
## [1] 0.6718393 0.6716574
cv.error<-rep(0, 10)
for(i in 1:10){
glm.fit<-glm(Z~poly(X, i), data=Data)
cv.error[i]<-cv.glm(Data, glm.fit)$delta[1]
}
cvDF<-data.frame(degree=1:10, cv.error)
ggplot(data=cvDF, aes(x=degree, y=cv.error))+
geom_point()+
geom_line()
head(cvDF)
## degree cv.error
## 1 1 6.106246
## 2 2 6.233267
## 3 3 6.644171
## 4 4 7.974751
## 5 5 8.388879
## 6 6 9.633325
# i.6.106246, ii. 6.233267, iii. 6.644171, iv. 7.974751
# The cv.error for this seed for the first 4 degree polynomials are much larger than the one from before, and the overall shape of the graph is much more normal with its curvature.
# e. The first model in c. had the smallest LOOCV error. This is most likely because of the curve of the graph, meaning that fitting large order polynomials to a graph with this much normal curvature would create more error and be kind of redundant
# f. The coefficient results from fiting c. using least squares highlight how using higher order polynomials can be helpful to a model, because the smallest error from this cvDF resulted from the 4th order polynomial, whereas a model like one from d. does not benefit from being fit with such a polynomial.