R Markdown
- In this exercise, you will further analyze the Wage data set
considered throughout this chapter.
- Perform polynomial regression to predict wage using age. Use
cross-validation to select the optimal degree d for the polynomial. What
degree was chosen, and how does this compare to the results of
hypothesis testing using ANOVA? Make a plot of the resulting polynomial
fit to the data. 324 7. Moving Beyond Linearity
library(ISLR)
library(boot)
set.seed(1)
degree <- 10
cv.errs <- rep(NA, degree)
plot(1:degree, cv.errs, xlab = 'Degree', ylab = 'Test MSE', type = 'l')
deg.min <- which.min(cv.errs)
points(deg.min, cv.errs[deg.min], col = 'red', cex = 2, pch = 19)
age.range <- range(Wage$age)
age.grid <- seq(from = age.range[1], to = age.range[2])
fit <- lm(wage ~ poly(age, 3), data = Wage)
preds <- predict(fit, newdata = list(age = age.grid))
- Fit a step function to predict wage using age, and perform cross
validation to choose the optimal number of cuts. Make a plot of the fit
obtained
cv.errs <- rep(NA, degree)
for (i in 2:degree) {
Wage$age.cut <- cut(Wage$age, i)
plot(2:degree, cv.errs[-1], xlab = 'Cuts', ylab = 'Test MSE', type = 'l')
deg.min <- which.min(cv.errs)
points(deg.min, cv.errs[deg.min], col = 'red', cex = 2, pch = 19)
lines(age.grid, preds, col = "red", lwd = 2)
- This question relates to the College data set.
- Split the data into a training set and a test set. Using
out-of-state tuition as the response and the other variables as the
predictors, perform forward stepwise selection on the training set in
order to identify a satisfactory model that uses just a subset of the
predictors.
library(ISLR)
library(leaps)
test <- -train
fit <- regsubsets(Outstate ~ ., data = College, subset = train, method = 'forward')
fit.summary <- summary(fit)
## Subset selection object
## Call: regsubsets.formula(Outstate ~ ., data = College, subset = train,
## method = "forward")
## 17 Variables (and intercept)
## Forced in Forced out
## PrivateYes FALSE FALSE
## Apps FALSE FALSE
## Accept FALSE FALSE
## Enroll FALSE FALSE
## Top10perc FALSE FALSE
## Top25perc FALSE FALSE
## 1 ( 1 ) " " "*" " " " " " " " " " "
## 2 ( 1 ) " " "*" " " " " " " " " " "
## 3 ( 1 ) " " "*" " " " " " " " " " "
## 4 ( 1 ) " " "*" " " " " " " " " " "
## 5 ( 1 ) " " "*" " " " " "*" " " " "
coef(fit, id = 6)
## (Intercept) PrivateYes Room.Board PhD perc.alumni
## -3815.6574509 2880.3858979 0.9861841 43.6735045 40.4602197
- Fit a GAM on the training data, using out-of-state tuition as the
response and the features selected in the previous step as the
predictors. Plot the results, and explain your findings.
library(gam)
gam.mod <- gam(Outstate ~ Private + s(Room.Board, 5) + s(Terminal, 5) + s(perc.alumni, 5) + s(Expend, 5) + s(Grad.Rate, 5), data = College, subset = train)
- Evaluate the model obtained on the test set, and explain the results
obtained.
preds <- predict(gam.mod, College[test, ])
1 - (RSS / TSS)
- For which variables, if any, is there evidence of a non-linear
relationship with the response?
summary(gam.mod)
##
## Call: gam(formula = Outstate ~ Private + s(Room.Board, 5) + s(Terminal,
## 5) + s(perc.alumni, 5) + s(Expend, 5) + s(Grad.Rate, 5),
## data = College, subset = train)
## Deviance Residuals:
## Min 1Q Median 3Q Max
## -7289.5 -1004.3 18.3 1123.6 4218.8
## (Intercept)
## Private
## s(Room.Board, 5) 4 3.6201 0.006576 **
## s(Terminal, 5) 4 2.3018 0.058243 .
## s(perc.alumni, 5) 4 0.8690 0.482600
## s(Expend, 5) 4 28.0768 < 2.2e-16 ***