3. Consider the Gini index, classification error, and cross-entropy in a simple classification setting with two classes. Create a single plot that displays each of these quantities as a function of ˆpm1. The x axis should display ˆpm1, ranging from 0 to 1, and the y-axis should display the value of the Gini index, classification error, and entropy

p=seq(0,1,0.01)

gini= 2*p*(1-p)
classerror= 1-pmax(p,1-p)
crossentropy= -(p*log(p)+(1-p)*log(1-p))

matplot(p, cbind(gini, classerror, crossentropy), col = c("red", "green", "blue"))

(a)

Split the data set into a training set and a test set.

library(ISLR)
## Warning: package 'ISLR' was built under R version 4.0.3
library(tree)
## Warning: package 'tree' was built under R version 4.0.4
library(randomForest)
## Warning: package 'randomForest' was built under R version 4.0.5
## randomForest 4.6-14
## Type rfNews() to see new features/changes/bug fixes.
library(caret)
## Warning: package 'caret' was built under R version 4.0.3
## Loading required package: lattice
## Loading required package: ggplot2
## 
## Attaching package: 'ggplot2'
## The following object is masked from 'package:randomForest':
## 
##     margin
#View(Carseats)
set.seed(1)
index = sample(1:nrow(Carseats), 0.75*nrow(Carseats))
train.carseats = Carseats[index,]
test.carseats = Carseats[-index,]

(b)

Fit a regression tree to the training set. Plot the tree, and interpret the results. What test MSE do you obtain?

Looking at the plot we see that our most important variable is ShelveLoc which splits on bad and medium. It then looks like our mot important variable is price. Computing the MSE we return an error of 4.91.

tree.carseats = tree(Sales~ ., data = train.carseats)
plot(tree.carseats)
text(tree.carseats, pretty = 0, cex = .7)

tree.predict <- predict(tree.carseats, newdata = test.carseats)
mean((tree.predict - test.carseats$Sales)^2)
## [1] 4.910268

(c)

Use cross-validation in order to determine the optimal level of tree complexity. Does pruning the tree improve the test MSE?

Computing for the most optimal level we return a value of 17, which at first glance seems pretty high. Next i’ll plot the cross validation to identify an optimal level. I decided 7 is most optimal as it’s the lowest point before the graph begins to increase.
After pruning the tree and calculating the MSE we return a value of 5.1, which does not end up improving the test error.

cv.carseats = cv.tree(tree.carseats)
#cv.carseats
which.min(cv.carseats$size)
## [1] 17
plot(cv.carseats$size, cv.carseats$dev, type = 'b')

prune.carseats = prune.tree(tree.carseats, best = 7)
plot(prune.carseats)
text(prune.carseats, pretty = 0)

prune.predict <- predict(prune.carseats, newdata = test.carseats)
mean((prune.predict - test.carseats$Sales)^2)
## [1] 5.104051

(d)

Use the bagging approach in order to analyze this data. What test MSE do you obtain? Use the importance() function to determine which variables are most important.

Using random forest and the bagging approach we get the lowest MSE so far at 2.82. Using the importance function we see that our most imporatant variable is ShelvelLoc followed by Price.

set.seed(1)
rf.carseats = randomForest(Sales ~ ., data = train.carseats, mtry = 10, ntree = 500, importance = TRUE)
predict.bag = predict(rf.carseats, newdata = test.carseats)
mean((predict.bag - test.carseats$Sales)^2)
## [1] 2.820284
importance(rf.carseats)
##                %IncMSE IncNodePurity
## CompPrice   36.1922541     243.46553
## Income       8.1825801     129.36706
## Advertising 21.7200717     170.81770
## Population  -2.9629193      70.42358
## Price       69.0884762     701.82750
## ShelveLoc   78.6820621     691.74585
## Age         26.6644544     248.85675
## Education    4.0907048      56.33708
## Urban        0.1344484      10.43790
## US           1.8235528      11.35347
varImpPlot(rf.carseats)

(e)

Use random forests to analyze this data. What test MSE do you obtain? Use the importance() function to determine which variables are most important. Describe the effect of m, the number of variables considered at each split, on the error rate obtained.

We first wanted to find the most optimal number for our mtry using the bestTune and our results were 11. Applying 11 to our random Forest model gave us an error though so we went back and looked at the results of our first random forest and saw that an mtry of 6 was also acceptable. Using mtry of 6 for our random forest results with a MSE of 2.79 just slightly lower than what we saw in part d.  Similarly when using the importance function we get the same top two variables of Price and ShelveLoc.

carseat.rf = train(Sales ~., data= train.carseats, method='rf', trControl=trainControl('cv',number =10)
                   ,importance=T)
carseat.rf$results
##   mtry     RMSE  Rsquared      MAE    RMSESD RsquaredSD     MAESD
## 1    2 1.873057 0.6763707 1.525785 0.2512151 0.09254889 0.2105758
## 2    6 1.599387 0.7104705 1.279808 0.2013524 0.08760461 0.1696911
## 3   11 1.556582 0.7100990 1.246339 0.2136924 0.08545993 0.1749030
carseat.rf$bestTune
##   mtry
## 3   11
carseat.rf$finalModel
## 
## Call:
##  randomForest(x = x, y = y, mtry = param$mtry, importance = ..1) 
##                Type of random forest: regression
##                      Number of trees: 500
## No. of variables tried at each split: 11
## 
##           Mean of squared residuals: 2.395651
##                     % Var explained: 69.86
set.seed(1)
rf.carseats = randomForest(Sales ~ ., data = train.carseats, mtry = 6, ntree = 500, importance = TRUE)
predict.bag = predict(rf.carseats, newdata = test.carseats)
mean((predict.bag - test.carseats$Sales)^2)
## [1] 2.793601
importance(rf.carseats)
##                %IncMSE IncNodePurity
## CompPrice   27.2576997     228.07002
## Income       7.9502187     141.21172
## Advertising 20.0426577     179.95941
## Population  -2.5580791      92.72272
## Price       64.7466610     660.69802
## ShelveLoc   70.4708127     651.12878
## Age         23.1915453     262.61350
## Education    1.5418740      69.42682
## Urban       -0.1863153      13.29967
## US           4.6346726      19.22371
varImpPlot(rf.carseats)

9. This problem involves the OJ data set which is part of the ISLR package.

(a)

Create a training set containing a random sample of 800 observations, and a test set containing the remaining observations.

set.seed(1)
index = 1:800
oj.train = OJ[index,]
oj.test = OJ[-index,]

(b)

Fit a tree to the training data, with Purchase as the response and the other variables as predictors. Use the summary() function to produce summary statistics about the tree, and describe the results obtained. What is the training error rate? How many terminal nodes does the tree have?

Looking at the summary of our decision tree obtained from the OJ training data set we return an error rate of 16.5% and 7 terminal nodes.

oj.tree = tree(Purchase ~., data = oj.train)
summary(oj.tree)
## 
## Classification tree:
## tree(formula = Purchase ~ ., data = oj.train)
## Variables actually used in tree construction:
## [1] "LoyalCH"   "PriceDiff"
## Number of terminal nodes:  7 
## Residual mean deviance:  0.7342 = 582.3 / 793 
## Misclassification error rate: 0.165 = 132 / 800

(c)

Type in the name of the tree object in order to get a detailed text output. Pick one of the terminal nodes, and interpret the information displayed.

oj.tree
## node), split, n, deviance, yval, (yprob)
##       * denotes terminal node
## 
##  1) root 800 1062.00 CH ( 0.62125 0.37875 )  
##    2) LoyalCH < 0.5036 348  413.70 MM ( 0.28161 0.71839 )  
##      4) LoyalCH < 0.276142 164  113.50 MM ( 0.10976 0.89024 )  
##        8) LoyalCH < 0.035047 62    0.00 MM ( 0.00000 1.00000 ) *
##        9) LoyalCH > 0.035047 102   95.06 MM ( 0.17647 0.82353 ) *
##      5) LoyalCH > 0.276142 184  251.90 MM ( 0.43478 0.56522 )  
##       10) PriceDiff < 0.05 75   77.75 MM ( 0.21333 0.78667 ) *
##       11) PriceDiff > 0.05 109  147.80 CH ( 0.58716 0.41284 ) *
##    3) LoyalCH > 0.5036 452  326.70 CH ( 0.88274 0.11726 )  
##      6) LoyalCH < 0.753545 172  188.90 CH ( 0.76163 0.23837 )  
##       12) PriceDiff < 0.145 67   92.15 CH ( 0.55224 0.44776 ) *
##       13) PriceDiff > 0.145 105   70.44 CH ( 0.89524 0.10476 ) *
##      7) LoyalCH > 0.753545 280   99.08 CH ( 0.95714 0.04286 ) *

(d)

Create a plot of the tree, and interpret the results.

From the plot we see that the two most important variable is LoyalCH and that it actually splits into a smaller subset of LoyalCH.

plot(oj.tree)
text(oj.tree, pretty = 0)

(e)

Predict the response on the test data, and produce a confusion matrix comparing the test labels to the predicted test labels. What is the test error rate?

We return a test error rate of 21.85%

oj.pred = predict(oj.tree, oj.test, type = "class")
table(oj.pred, oj.test$Purchase)
##        
## oj.pred  CH  MM
##      CH 141  44
##      MM  15  70
1-(141+70)/270
## [1] 0.2185185

(f)

Apply the cv.tree() function to the training set in order to determine the optimal tree size.

It looks like the optimal tree size is 7.

oj.cv = cv.tree(oj.tree, FUN = prune.misclass)
oj.cv
## $size
## [1] 7 4 2 1
## 
## $dev
## [1] 152 152 157 303
## 
## $k
## [1]  -Inf   0.0   9.5 152.0
## 
## $method
## [1] "misclass"
## 
## attr(,"class")
## [1] "prune"         "tree.sequence"
bestsize = oj.cv$size[which.min(oj.cv$dev)]
bestsize
## [1] 7

(g)

Produce a plot with tree size on the x-axis and cross-validated classification error rate on the y-axis.

plot(oj.cv$size, oj.cv$dev, type = "b", xlab = "Tree size", ylab = "Deviance")

(h)

Which tree size corresponds to the lowest cross-validated classification error rate? Looking at the plot it looks like 4 is actually the lowest cross-validated error rate.

(i)

Produce a pruned tree corresponding to the optimal tree size obtained using cross-validation. If cross-validation does not lead to selection of a pruned tree, then create a pruned tree with five terminal nodes.

oj.prune = prune.misclass(oj.tree, best = 7)
plot(oj.prune)
text(oj.prune, pretty = 0)

(j)

Compare the training error rates between the pruned and unpruned trees. Which is higher?

For the unpruned tree we return a training error rate of 16.5%, which is actually the same error rate as the pruned tree.

summary(oj.prune)
## 
## Classification tree:
## tree(formula = Purchase ~ ., data = oj.train)
## Variables actually used in tree construction:
## [1] "LoyalCH"   "PriceDiff"
## Number of terminal nodes:  7 
## Residual mean deviance:  0.7342 = 582.3 / 793 
## Misclassification error rate: 0.165 = 132 / 800
summary(oj.tree)
## 
## Classification tree:
## tree(formula = Purchase ~ ., data = oj.train)
## Variables actually used in tree construction:
## [1] "LoyalCH"   "PriceDiff"
## Number of terminal nodes:  7 
## Residual mean deviance:  0.7342 = 582.3 / 793 
## Misclassification error rate: 0.165 = 132 / 800

(k)

Compare the test error rates between the pruned and unpruned trees. Which is higher?

Again, we return the same test error rate.

pred.prune <- predict(oj.prune, oj.test, type = "class")
table(pred.prune, oj.test$Purchase)
##           
## pred.prune  CH  MM
##         CH 141  44
##         MM  15  70