Problem 1

Part 1

[a] What is the answer to the nonlinear equation in the code above?

Sketch the output of the algorithm.

library("animation")
ani.options(interval=0.5)
newton.method(FUN = function(x) exp(-x)*x, init = 2, rg = c(0, 10))

The answer to the nonlinear equation in the code above is \[r \approx 10.019 \]

[b] Plot the function and provide a new initial value that will provide the

correct solution. Sketch the output of the algorithm.

newton.method(FUN = function(x) exp(-x)*x, init = .75, rg = c(0, 10))

The answer to the nonlinear equation in the code above is \[r \approx 0 \]

Part 2: Use Newton’s method to solve for the following equations: (Could not solve)

\[f_1(x_1,x_2) = 4x_1^{2}-20x_1+\frac{1}{4}x_2^2 + 8 = 0 \\ f_2(x_1,x_2) = \frac{1}{2}x_1x_2^2+2x_1-5x_2+8=0\]

#newton.method(FUN = function(x,y) 4*x*x-20*x + (1/4)*y*y+8, init = 2, rg = c(0, 10))

Problem 2 (Could not solve)

Problem 3

(a) Remove the observations for whom the salary information is unknown, and then log-transform the salaries.

library("ISLR")

First we ensure that only Salary column has NA entries.

sum(complete.cases(Hitters[, -19])) - dim(Hitters[,-19])[1]
## [1] 0

Now we can now omit na rows and log transform salary

newHitters = na.omit(Hitters)
newHitters$Salary = log(newHitters$Salary)

(b) Create a training set consisting of the first 200 observations, and a test set consisting of the remaining observations.

train = sample(1 : nrow(newHitters), 200)
trainingData = newHitters[train,]
testingData = newHitters[-train,]

(c) Perform boosting on the training set with 1,000 trees for a range of values of the shrinkage parameter ??. Produce a plot with different shrinkage values on the x-axis and the corresponding training set MSE on the y-axis.

(d) Produce a plot with different shrinkage values on the x-axis and the corresponding test set MSE on the y-axis.

lambda = seq(0, 0.05, 0.0005)
library("gbm")
## Loaded gbm 2.1.5
 train.MSE = array(0, length(lambda))
 test.MSE = array(0, length(lambda))


for (i in 1:length(lambda))
{
  boost.Hitters = gbm(Salary ~., data = newHitters[train, ], distribution = "gaussian", n.trees = 1000, 
    shrinkage = lambda[i], interaction.depth = 4)
  yhat.trainBoost = predict(boost.Hitters, trainingData, n.trees = 1000)
  yhat.testBoost = predict(boost.Hitters, testingData, n.trees = 1000)

  train.MSE[i] = mean((yhat.trainBoost - trainingData$Salary)^2)
  test.MSE[i] = mean((yhat.testBoost - testingData$Salary)^2)
}
library("ggplot2")
boost.plot = data.frame(
  lambda,
  Type = rep(c("Train", "Test")),
  Error = c(train.MSE, test.MSE)
)

#ggplot(boost.plot, aes(x=lambda, y=Error))  + xlab("Shrinkage") + ylab("MSE") + geom_line(aes(color = Type)) + #ggtitle("Shrinkage to MSE")

ggplot(data.frame(x = lambda, y = train.MSE), aes(x=x, y=y)) +xlab("Shrinkage") + ylab("MSE") + geom_line() + ggtitle("Shrinkage to Training MSE")

ggplot(data.frame(x = lambda, y = test.MSE), aes(x=x, y=y)) +xlab("Shrinkage") + ylab("MSE") + geom_line() + ggtitle("Shrinkage to Testing MSE")

(e) Compare the test MSE of boosting to the test MSE that results from applying two of the regression approaches seen in Chapters 3 and 6.

We must compare our Test MSE to the MSE of linear, lasso, and ridge regression.

library("glmnet")
## Loading required package: Matrix
## Loading required package: foreach
## Loaded glmnet 2.0-16
hit.lin = lm(Salary ~., trainingData)
lin.pred = predict(hit.lin, testingData)
lin.MSE = mean((lin.pred - testingData$Salary)^2)


x.train = model.matrix(Salary~., trainingData)
x.test = model.matrix(Salary~., testingData)
y = trainingData$Salary

hit.lasso = glmnet(x.train, y, alpha = 1)
lasso.pred = predict(hit.lasso, s = 0.01, newx = x.test)
lasso.MSE = mean((lasso.pred - testingData$Salary)^2)

hit.ridge = glmnet(x.train, y, alpha = 0)
ridge.pred = predict(hit.ridge, s = 0.01, newx = x.test)
ridge.MSE = mean((ridge.pred - testingData$Salary)^2)

Ratios of Boosting MSE to linear, lasso, and ridge regression, respectively

min(test.MSE)/lin.MSE
## [1] 0.5723911
min(test.MSE)/lasso.MSE
## [1] 0.5783447
min(test.MSE)/ridge.MSE
## [1] 0.580974

The MSE of boosting is smaller than all three forms of regression.

(f) Which variables appear to be the most important predictors in the boosted model?

boost.Hitters.opt = gbm(Salary ~., data = newHitters[train, ], distribution = "gaussian", n.trees = 1000, 
    shrinkage = lambda[which.min(test.MSE)], interaction.depth = 4)
summary(boost.Hitters.opt)

##                 var    rel.inf
## CRuns         CRuns 18.1873965
## CHits         CHits 15.6671915
## CAtBat       CAtBat 11.1131478
## CWalks       CWalks  7.1956297
## CRBI           CRBI  6.9726930
## CHmRun       CHmRun  5.6893421
## Years         Years  5.1900089
## PutOuts     PutOuts  4.8303277
## AtBat         AtBat  4.3656742
## Hits           Hits  3.6102481
## RBI             RBI  3.2517714
## Walks         Walks  3.1058116
## Assists     Assists  2.5022831
## Errors       Errors  2.4717164
## HmRun         HmRun  2.3206024
## Runs           Runs  2.0462817
## League       League  0.6669063
## Division   Division  0.4744327
## NewLeague NewLeague  0.3385349

By far the most important predictor is CAtBat, which is twice as important as the next highest predictor. The next 6 most influential predictors, by order of importance, are CHits, CRBI, CWalks, CRuns, and years. Note that all of these predictors are related to a player’s career record, and 6 of the 7 predictors are only related to offense. It appears that 1986 baseball salaries reflect a player’s skills on offensive and previous career accomplishments more than other factors.

(g) Now apply bagging to the training set. What is the test set MSE for this approach?

Test set MSE of bagging approach is

library("randomForest")
## randomForest 4.6-14
## Type rfNews() to see new features/changes/bug fixes.
## 
## Attaching package: 'randomForest'
## The following object is masked from 'package:ggplot2':
## 
##     margin
p = (dim(newHitters)[2]-1)
bag.Hitters = randomForest(Salary ~., trainingData, mtry = p, importance = TRUE)
bag.predict = predict(bag.Hitters, testingData)
bag.MSE = mean((bag.predict - testingData$Salary)^2)
bag.MSE
## [1] 0.2082645

The ratio of boosting to baggin MSE’s is

min(test.MSE)/bag.MSE
## [1] 1.25437

Boosting has a lower MSE

Problem 4

(a) Create a binary variable that takes on a 1 for cars with gas mileage above the median, and a 0 for cars with gas mileage below the median.

library("e1071")
mpg.med = median(Auto$mpg)
mpg01 = Auto$mpg > mpg.med
mpg01 = mpg01 + 0

(b) Fit a support vector classifier to the data with various values of cost, in order to predict whether a car gets high or low gas mileage. Report the cross-validation errors associated with different values of this parameter. Comment on your results.

new.Auto = Auto[,2:8]
new.Auto = cbind(new.Auto, mpg01)
train2 = sample(1 : nrow(Auto), round(.75*nrow(Auto)))
train.Auto = new.Auto[train2,]
test.Auto = new.Auto[-train2,]
svm.linear = tune(svm, mpg01 ~., data =  train.Auto, kernal = "linear",
                  ranges = list(cost = c(0.001, 0.01, 0.1, 1, 5, 10, 100)), 
                  scale = FALSE)
best.linear = svm.linear$best.model
summary(svm.linear)
## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  cost
##     1
## 
## - best performance: 0.2445232 
## 
## - Detailed performance results:
##    cost     error  dispersion
## 1 1e-03 0.4252199 0.080749288
## 2 1e-02 0.4177696 0.079204803
## 3 1e-01 0.3533744 0.063480623
## 4 1e+00 0.2445232 0.007831161
## 5 5e+00 0.2445232 0.007831161
## 6 1e+01 0.2445232 0.007831161
## 7 1e+02 0.2445232 0.007831161
summary(best.linear)
## 
## Call:
## best.tune(method = svm, train.x = mpg01 ~ ., data = train.Auto, 
##     ranges = list(cost = c(0.001, 0.01, 0.1, 1, 5, 10, 100)), 
##     kernal = "linear", scale = FALSE)
## 
## 
## Parameters:
##    SVM-Type:  eps-regression 
##  SVM-Kernel:  radial 
##        cost:  1 
##       gamma:  0.1428571 
##     epsilon:  0.1 
## 
## 
## Number of Support Vectors:  294

Our best parameter is cost = 1

(c) Now repeat (b), this time using SVMs with radial and polynomial basis kernels, with different values of gamma and degree and cost. Comment on your results.

Radial

svm.radial = tune(svm, mpg01 ~., data =  train.Auto, kernal = "radial", 
                  ranges = list(cost = c(0.001, 0.01, 0.1, 1, 5, 10, 100),
                                gamma = c(0,0001, 0.001, 0.01, 0.1, 1, 5, 10, 100)),
                  scale = FALSE)
best.radial = svm.radial$best.model
summary(svm.radial)
## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  cost gamma
##     1 0.001
## 
## - best performance: 0.1220807 
## 
## - Detailed performance results:
##     cost gamma     error  dispersion
## 1  1e-03 0e+00 0.3967586 0.071644453
## 2  1e-02 0e+00 0.3967586 0.071644453
## 3  1e-01 0e+00 0.3967586 0.071644453
## 4  1e+00 0e+00 0.3967586 0.071644453
## 5  5e+00 0e+00 0.3967586 0.071644453
## 6  1e+01 0e+00 0.3967586 0.071644453
## 7  1e+02 0e+00 0.3967586 0.071644453
## 8  1e-03 1e+00 0.3963660 0.071550509
## 9  1e-02 1e+00 0.3897190 0.070108426
## 10 1e-01 1e+00 0.3333770 0.056751163
## 11 1e+00 1e+00 0.2483648 0.004822997
## 12 5e+00 1e+00 0.2483648 0.004822997
## 13 1e+01 1e+00 0.2483648 0.004822997
## 14 1e+02 1e+00 0.2483648 0.004822997
## 15 1e-03 1e-03 0.3921360 0.071166008
## 16 1e-02 1e-03 0.3518972 0.066979573
## 17 1e-01 1e-03 0.1507929 0.037289759
## 18 1e+00 1e-03 0.1220807 0.036534853
## 19 5e+00 1e-03 0.1283168 0.039156983
## 20 1e+01 1e-03 0.1287278 0.039185033
## 21 1e+02 1e-03 0.1314902 0.038770947
## 22 1e-03 1e-02 0.3958758 0.071684205
## 23 1e-02 1e-02 0.3860060 0.070620308
## 24 1e-01 1e-02 0.3031015 0.058739764
## 25 1e+00 1e-02 0.2055700 0.017882452
## 26 5e+00 1e-02 0.2055651 0.017897787
## 27 1e+01 1e-02 0.2055651 0.017897787
## 28 1e+02 1e-02 0.2055651 0.017897787
## 29 1e-03 1e-01 0.3963357 0.071560378
## 30 1e-02 1e-01 0.3894888 0.070030269
## 31 1e-01 1e-01 0.3314138 0.056054427
## 32 1e+00 1e-01 0.2442898 0.008302937
## 33 5e+00 1e-01 0.2442898 0.008302937
## 34 1e+01 1e-01 0.2442898 0.008302937
## 35 1e+02 1e-01 0.2442898 0.008302937
## 36 1e-03 1e+00 0.3963660 0.071550509
## 37 1e-02 1e+00 0.3897190 0.070108426
## 38 1e-01 1e+00 0.3333770 0.056751163
## 39 1e+00 1e+00 0.2483648 0.004822997
## 40 5e+00 1e+00 0.2483648 0.004822997
## 41 1e+01 1e+00 0.2483648 0.004822997
## 42 1e+02 1e+00 0.2483648 0.004822997
## 43 1e-03 5e+00 0.3963753 0.071554829
## 44 1e-02 5e+00 0.3898204 0.070170175
## 45 1e-01 5e+00 0.3343971 0.057165240
## 46 1e+00 5e+00 0.2510208 0.002816000
## 47 5e+00 5e+00 0.2510208 0.002816000
## 48 1e+01 5e+00 0.2510208 0.002816000
## 49 1e+02 5e+00 0.2510208 0.002816000
## 50 1e-03 1e+01 0.3963754 0.071554897
## 51 1e-02 1e+01 0.3898227 0.070170684
## 52 1e-01 1e+01 0.3344163 0.057172622
## 53 1e+00 1e+01 0.2510799 0.002776242
## 54 5e+00 1e+01 0.2510799 0.002776242
## 55 1e+01 1e+01 0.2510799 0.002776242
## 56 1e+02 1e+01 0.2510799 0.002776242
## 57 1e-03 1e+02 0.3963754 0.071554897
## 58 1e-02 1e+02 0.3898227 0.070170686
## 59 1e-01 1e+02 0.3344164 0.057172665
## 60 1e+00 1e+02 0.2510808 0.002772573
## 61 5e+00 1e+02 0.2510808 0.002772573
## 62 1e+01 1e+02 0.2510808 0.002772573
## 63 1e+02 1e+02 0.2510808 0.002772573
summary(best.radial)
## 
## Call:
## best.tune(method = svm, train.x = mpg01 ~ ., data = train.Auto, 
##     ranges = list(cost = c(0.001, 0.01, 0.1, 1, 5, 10, 100), 
##         gamma = c(0, 1, 0.001, 0.01, 0.1, 1, 5, 10, 100)), kernal = "radial", 
##     scale = FALSE)
## 
## 
## Parameters:
##    SVM-Type:  eps-regression 
##  SVM-Kernel:  radial 
##        cost:  1 
##       gamma:  0.001 
##     epsilon:  0.1 
## 
## 
## Number of Support Vectors:  237

Best Parameters for radial svm are cost = 1 and gamma = 0.001

Polynomial

svm.polynomial = tune(svm, mpg01 ~., data =  train.Auto, kernal = "polynomial", 
                  ranges = list(cost = c(0.001, 0.01, 0.1, 1, 5, 10, 100),
                                degree = c(2, 3, 4, 5, 6, 7)),
                  scale = FALSE)
best.polynomial = svm.polynomial$best.model
summary(svm.polynomial)
## 
## Parameter tuning of 'svm':
## 
## - sampling method: 10-fold cross validation 
## 
## - best parameters:
##  cost degree
##     1      2
## 
## - best performance: 0.2438683 
## 
## - Detailed performance results:
##     cost degree     error  dispersion
## 1  1e-03      2 0.3937784 0.081976021
## 2  1e-02      2 0.3871832 0.079823465
## 3  1e-01      2 0.3310817 0.060934772
## 4  1e+00      2 0.2438683 0.005482859
## 5  5e+00      2 0.2438683 0.005482859
## 6  1e+01      2 0.2438683 0.005482859
## 7  1e+02      2 0.2438683 0.005482859
## 8  1e-03      3 0.3937784 0.081976021
## 9  1e-02      3 0.3871832 0.079823465
## 10 1e-01      3 0.3310817 0.060934772
## 11 1e+00      3 0.2438683 0.005482859
## 12 5e+00      3 0.2438683 0.005482859
## 13 1e+01      3 0.2438683 0.005482859
## 14 1e+02      3 0.2438683 0.005482859
## 15 1e-03      4 0.3937784 0.081976021
## 16 1e-02      4 0.3871832 0.079823465
## 17 1e-01      4 0.3310817 0.060934772
## 18 1e+00      4 0.2438683 0.005482859
## 19 5e+00      4 0.2438683 0.005482859
## 20 1e+01      4 0.2438683 0.005482859
## 21 1e+02      4 0.2438683 0.005482859
## 22 1e-03      5 0.3937784 0.081976021
## 23 1e-02      5 0.3871832 0.079823465
## 24 1e-01      5 0.3310817 0.060934772
## 25 1e+00      5 0.2438683 0.005482859
## 26 5e+00      5 0.2438683 0.005482859
## 27 1e+01      5 0.2438683 0.005482859
## 28 1e+02      5 0.2438683 0.005482859
## 29 1e-03      6 0.3937784 0.081976021
## 30 1e-02      6 0.3871832 0.079823465
## 31 1e-01      6 0.3310817 0.060934772
## 32 1e+00      6 0.2438683 0.005482859
## 33 5e+00      6 0.2438683 0.005482859
## 34 1e+01      6 0.2438683 0.005482859
## 35 1e+02      6 0.2438683 0.005482859
## 36 1e-03      7 0.3937784 0.081976021
## 37 1e-02      7 0.3871832 0.079823465
## 38 1e-01      7 0.3310817 0.060934772
## 39 1e+00      7 0.2438683 0.005482859
## 40 5e+00      7 0.2438683 0.005482859
## 41 1e+01      7 0.2438683 0.005482859
## 42 1e+02      7 0.2438683 0.005482859
summary(best.polynomial)
## 
## Call:
## best.tune(method = svm, train.x = mpg01 ~ ., data = train.Auto, 
##     ranges = list(cost = c(0.001, 0.01, 0.1, 1, 5, 10, 100), 
##         degree = c(2, 3, 4, 5, 6, 7)), kernal = "polynomial", 
##     scale = FALSE)
## 
## 
## Parameters:
##    SVM-Type:  eps-regression 
##  SVM-Kernel:  radial 
##        cost:  1 
##       gamma:  0.1428571 
##     epsilon:  0.1 
## 
## 
## Number of Support Vectors:  294

Best parameters are cost = 1 and degree = 2

(d) Could not manage to work the plot function