Problem Set 6

Problem 2. a.

set.seed(1)
x=rnorm(100)
y=x-2*x^2+rnorm(100)

mod <- lm(x~y, data =)

ggplot(data = mod, aes(x=x, y=y))+
       geom_point()

##This data has a strong curvature (it looks alot like a boomerang), and its distributed rather normally, but with some density from -1 to 1#

#c.

set.seed(2)
X = rnorm(100)
Z = X-2*x^2+rnorm(100)
#i
glm_mod <- glm(X~Z, data = )
coef(glm_mod)

## (Intercept)           Z 
##   0.3368199   0.2265015

Data <- data.frame(X,Z)

library(boot)

cv.err <- cv.glm(Data, glm_mod)
cv.err$delta

## [1] 0.9529207 0.9527366

cv.error<-rep(0, 10)
for(i in 1:10){
  glm.fit<-glm(Z~poly(X, i), data=Data)
  cv.error[i]<-cv.glm(Data, glm.fit)$delta[1]
}

cvDF<-data.frame(degree=1:10, cv.error)

ggplot(data=cvDF, aes(x=degree, y=cv.error))+
  geom_point()+
  geom_line()

head(cvDF)

##   degree cv.error
## 1      1 5.871398
## 2      2 5.994912
## 3      3 5.822776
## 4      4 5.604157
## 5      5 5.669690
## 6      6 5.914482

#i.5.871398, ii. 5.994912, iii. 5.822776, iv. 5.604157

#d.

set.seed(3)
X = rnorm(100)
Z = X-2*x^2+rnorm(100)
#i
glm_mod <- glm(X~Z, data = )
coef(glm_mod)

## (Intercept)           Z 
##   0.1942194   0.1151280

Data <- data.frame(X,Z)

library(boot)

cv.err <- cv.glm(Data, glm_mod)
cv.err$delta

## [1] 0.6718393 0.6716574

cv.error<-rep(0, 10)
for(i in 1:10){
  glm.fit<-glm(Z~poly(X, i), data=Data)
  cv.error[i]<-cv.glm(Data, glm.fit)$delta[1]
}

cvDF<-data.frame(degree=1:10, cv.error)

ggplot(data=cvDF, aes(x=degree, y=cv.error))+
  geom_point()+
  geom_line()

head(cvDF)

##   degree cv.error
## 1      1 6.106246
## 2      2 6.233267
## 3      3 6.644171
## 4      4 7.974751
## 5      5 8.388879
## 6      6 9.633325

# i.6.106246, ii. 6.233267, iii. 6.644171, iv. 7.974751
# The cv.error for this seed for the first 4 degree polynomials are much larger than the one from before, and the overall shape of the graph is much more normal with its curvature. 
# e.  The first model in c. had the smallest LOOCV error. This is most likely because of the curve of the graph, meaning that fitting large order polynomials to a graph with this much normal curvature would create more error and be kind of redundant
# f. The coefficient results from fiting c. using least squares highlight how using higher order polynomials can be helpful to a model, because the smallest error from this cvDF resulted from the 4th order polynomial, whereas a model like one from d. does not benefit from being fit with such a polynomial.

Problem Set 6

Noah Snizik

10/21/2019