8) . In the lab, a classification tree was applied to the Carseats
data set after converting Sales into a qualitative response variable.
Now we will seek to predict Sales using regression trees and related
approaches, treating the response as a quantitative variable.
a) Split the data set into a training set and a test set.
library(ISLR)
## Warning: package 'ISLR' was built under R version 4.2.3
set.seed(1)
library(dplyr)
## Warning: package 'dplyr' was built under R version 4.2.3
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
Default = as.data.frame(Carseats)
Default$id <- 1:nrow(Default)
#use 70% of dataset as training set and 30% as test set
train <- Default %>% dplyr::sample_frac(0.70)
test <- dplyr::anti_join(Default, train, by = 'id')
b) Fit a regression tree to the training set. Plot the tree, and
interpret the results. What test MSE do you obtain?
library(tree)
## Warning: package 'tree' was built under R version 4.2.3
tree.carseats = tree(Sales ~., data = train)
summary(tree.carseats)
##
## Regression tree:
## tree(formula = Sales ~ ., data = train)
## Variables actually used in tree construction:
## [1] "ShelveLoc" "Price" "Age" "Income" "CompPrice"
## [6] "Advertising"
## Number of terminal nodes: 18
## Residual mean deviance: 2.409 = 631.1 / 262
## Distribution of residuals:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -4.77800 -0.96100 -0.08865 0.00000 1.01800 4.14100
plot(tree.carseats)
text(tree.carseats, pretty = 0)

pred.carseats = predict(tree.carseats, test)
mean((test$Sales - pred.carseats)^2)
## [1] 4.208383
The test MSE is 4.208383
c) Use CV to determine the optimal level of tree complexity. Does
pruning improve the test MSE?
cv.carseats = cv.tree(tree.carseats, FUN = prune.tree)
par(mfrow = c(1,2))
plot(cv.carseats$size, cv.carseats$dev, type = "b")
plot(cv.carseats$k, cv.carseats$dev, type = "b")
The optimal level of tree complexity appearst to be 9. This is shown in
the leftmost plot
pruned_carseat = prune.tree(tree.carseats, best = 9)
par(mfrow = c(1,1))
plot(pruned_carseat)
text(pruned_carseat, pretty = 0)

pred.pruned = predict(pruned_carseat, test)
mean((test$Sales - pred.pruned)^2)
## [1] 4.469155
Prunning the tree increases the test MSE from 4.208383 to
4.469155
d) Use the bagging approach in order to analyze this data. What test
MSE do you obtain? use the importance() function to determine which
variables are most important
library(randomForest)
## randomForest 4.7-1.1
## Type rfNews() to see new features/changes/bug fixes.
##
## Attaching package: 'randomForest'
## The following object is masked from 'package:dplyr':
##
## combine
## The following object is masked from 'package:ggplot2':
##
## margin
set.seed(1)
bag_carseats = randomForest(Sales ~., data = train, mtry = 10, ntree = 500,
importance = T)
bag_pred = predict(bag_carseats, test)
mean((test$Sales - bag_pred)^2)
## [1] 2.6556
importance(bag_carseats)
## %IncMSE IncNodePurity
## CompPrice 31.55532152 207.837522
## Income 7.45762172 111.079571
## Advertising 18.35305002 146.681392
## Population -1.89970692 56.677429
## Price 67.16063167 673.680733
## ShelveLoc 70.20125832 626.848378
## Age 22.23144324 218.387615
## Education 3.89034598 57.952595
## Urban 0.53554064 8.712999
## US 1.12101065 11.748103
## id -0.09695973 84.170500
varImpPlot(bag_carseats)

Bagging the data improved the MSE to 2.6556
In addition, Price, Shelvelock, and Age are the 3 most important
variables
e) Use random forests to analyze this data. What test MSE do you
obtain? Use the importance() function to determine which variables are
the most important. Describe the effect of m, the number of variables
considered at ea split, on the error rate obtained.
set.seed(1)
rf_carseats_5 = randomForest(Sales ~., data = train, mtry = 5, ntree = 500, importance = TRUE)
rf_carseats_3 = randomForest(Sales ~., data = train, mtry = 3, ntree = 500, importance = TRUE)
rf_pred = predict(rf_carseats_5, test)
mean((test$Sales - rf_pred)^2)
## [1] 2.690526
rf_pred = predict(rf_carseats_3, test)
mean((test$Sales - rf_pred)^2)
## [1] 3.010083
importance(rf_carseats_5)
## %IncMSE IncNodePurity
## CompPrice 21.70565002 197.57862
## Income 4.57550329 135.20068
## Advertising 13.34407435 146.70369
## Population -2.71698695 89.51997
## Price 52.65151194 595.66263
## ShelveLoc 52.69896388 548.96867
## Age 17.52465467 243.60878
## Education 0.01985629 70.03573
## Urban 0.29586280 14.02564
## US 4.86655108 20.41531
## id 2.87123791 121.33156
importance(rf_carseats_3)
## %IncMSE IncNodePurity
## CompPrice 14.9221640 193.26453
## Income 5.9442624 152.28503
## Advertising 12.0133913 147.95287
## Population -0.3075682 117.27630
## Price 42.6311294 522.10468
## ShelveLoc 42.9990603 456.65491
## Age 15.4422646 250.03706
## Education 2.9161029 86.98659
## Urban -0.8761880 18.63114
## US 4.6393399 28.73869
## id 2.5245843 162.87162
par(mfrow = c(1,4))
varImpPlot(rf_carseats_5)

varImpPlot(rf_carseats_3)

The test MSE when mtry = 5 is 2.690526, and it goes up by a small
ammount in comparison to section d’s MSE.
By changing M, it changes the variation of the MSE, ranging from
about 2.5 - 3.1
A mtry =5 tends to yield a slightly lower teset MSE than mtry = 1 :
4
In addition, Price, Shelvelock, and Age are the 3 most important
variables once again
Overall, it appears that prunning is the best approach, and the 2nd
one is bagging and random forrest.
f) analyze the data using bart and report these resutl
library(BART)
## Warning: package 'BART' was built under R version 4.2.3
## Loading required package: nlme
##
## Attaching package: 'nlme'
## The following object is masked from 'package:dplyr':
##
## collapse
## Loading required package: nnet
## Loading required package: survival
xtrain <- train[, 1:10]
ytrain <- train$Sales
xtest <- test[, 1:10]
bart_fit <- gbart(xtrain, ytrain, x.test = xtest)
## *****Calling gbart: type=1
## *****Data:
## data:n,p,np: 280, 13, 120
## y1,yn: 2.763393, 0.113393
## x1,x[n*p]: 10.360000, 1.000000
## xp1,xp[np*p]: 10.060000, 1.000000
## *****Number of Trees: 200
## *****Number of Cut Points: 100 ... 1
## *****burn,nd,thin: 100,1000,1
## *****Prior:beta,alpha,tau,nu,lambda,offset: 2,0.95,0.276302,3,3.82641e-30,7.59661
## *****sigma: 0.000000
## *****w (weights): 1.000000 ... 1.000000
## *****Dirichlet:sparse,theta,omega,a,b,rho,augment: 0,0,1,0.5,1,13,0
## *****printevery: 100
##
## MCMC
## done 0 (out of 1100)
## done 100 (out of 1100)
## done 200 (out of 1100)
## done 300 (out of 1100)
## done 400 (out of 1100)
## done 500 (out of 1100)
## done 600 (out of 1100)
## done 700 (out of 1100)
## done 800 (out of 1100)
## done 900 (out of 1100)
## done 1000 (out of 1100)
## time: 3s
## trcnt,tecnt: 1000,1000
summary(bart_fit)
## Length Class Mode
## sigma 1100 -none- numeric
## yhat.train 280000 -none- numeric
## yhat.test 120000 -none- numeric
## varcount 13000 -none- numeric
## varprob 13000 -none- numeric
## treedraws 2 -none- list
## proc.time 5 proc_time numeric
## hostname 1 -none- logical
## yhat.train.mean 280 -none- numeric
## sigma.mean 1 -none- numeric
## LPML 1 -none- numeric
## yhat.test.mean 120 -none- numeric
## ndpost 1 -none- numeric
## offset 1 -none- numeric
## varcount.mean 13 -none- numeric
## varprob.mean 13 -none- numeric
## rm.const 13 -none- numeric
bart_pred = bart_fit$yhat.test.mean
mse = mean((bart_pred - test$Sales)^2)
mse
## [1] 0.1433705
Here we use BART, this model produces an MSE of 0.1433705, which is
much lower than any of the methods shown in question 8. It is a great
improvemnt as compared to the rest of the MSE’s
9) This prob involves the OJ data set which is part of the ISLR2
package
a) create a training set containing a random sample of 800 obs, and
a test set with the remaining obs
library(ISLR)
set.seed(1)
library(dplyr)
Default = as.data.frame(OJ)
Default$id <- 1:nrow(Default)
#use 74.76635514% of dataset as training set and the rest as the test set
train <- Default %>% dplyr::sample_frac(0.7476635514)
test <- dplyr::anti_join(Default, train, by = 'id')
dim(train)
## [1] 800 19
dim(test)
## [1] 270 19
b) fit a tree to the training data, with pruchases as the response
and the other variables as predictors. Use the summary() function to
produce summary stats about the tree, and describe the results. What is
the error rate? How many terminal nodes does the tree have?
library(tree)
oj_tree = tree(Purchase ~., data = train)
summary(oj_tree)
##
## Classification tree:
## tree(formula = Purchase ~ ., data = train)
## Variables actually used in tree construction:
## [1] "LoyalCH" "PriceDiff" "SpecialCH" "ListPriceDiff"
## [5] "PctDiscMM" "id"
## Number of terminal nodes: 10
## Residual mean deviance: 0.7286 = 575.6 / 790
## Misclassification error rate: 0.1588 = 127 / 800
The tree uses 6 variables: [1] “LoyalCH” “PriceDiff” “SpecialCH”
“ListPriceDiff” “PctDiscMM” “id”
The tree has 10 terminal nodes. The error rate (missclassification
error) is 127/800 = .1588
c) Type in the name of the tree object in order to get details. Pick
on terminal nodes, and interpret the info displayed
oj_tree
## node), split, n, deviance, yval, (yprob)
## * denotes terminal node
##
## 1) root 800 1073.00 CH ( 0.60625 0.39375 )
## 2) LoyalCH < 0.5036 365 441.60 MM ( 0.29315 0.70685 )
## 4) LoyalCH < 0.280875 177 140.50 MM ( 0.13559 0.86441 )
## 8) LoyalCH < 0.0356415 59 10.14 MM ( 0.01695 0.98305 ) *
## 9) LoyalCH > 0.0356415 118 116.40 MM ( 0.19492 0.80508 ) *
## 5) LoyalCH > 0.280875 188 258.00 MM ( 0.44149 0.55851 )
## 10) PriceDiff < 0.05 79 84.79 MM ( 0.22785 0.77215 )
## 20) SpecialCH < 0.5 64 51.98 MM ( 0.14062 0.85938 ) *
## 21) SpecialCH > 0.5 15 20.19 CH ( 0.60000 0.40000 ) *
## 11) PriceDiff > 0.05 109 147.00 CH ( 0.59633 0.40367 ) *
## 3) LoyalCH > 0.5036 435 337.90 CH ( 0.86897 0.13103 )
## 6) LoyalCH < 0.764572 174 201.00 CH ( 0.73563 0.26437 )
## 12) ListPriceDiff < 0.235 72 99.81 MM ( 0.50000 0.50000 )
## 24) PctDiscMM < 0.196196 55 73.14 CH ( 0.61818 0.38182 )
## 48) id < 267 11 0.00 CH ( 1.00000 0.00000 ) *
## 49) id > 267 44 60.91 CH ( 0.52273 0.47727 ) *
## 25) PctDiscMM > 0.196196 17 12.32 MM ( 0.11765 0.88235 ) *
## 13) ListPriceDiff > 0.235 102 65.43 CH ( 0.90196 0.09804 ) *
## 7) LoyalCH > 0.764572 261 91.20 CH ( 0.95785 0.04215 ) *
Tbh, this looks like a nest for loop. Picking node 9, the splitting
variable iis at “LoyalCH”. The splitting value is 0.0356415. There are
118 points in the subtree below this node. The deviance for all points
contained in the region below this node is 116.40. The * denotes that it
is a terminal node. The prediction at this node is Sales. About 19% of
the values are representing nodes, and the remaining 81% of values have
MM values as Sales
d) create a plot of the tree, and interpret the results
plot(oj_tree)
text(oj_tree, pretty = 0)

LoyalCH is the most important variable in the tree model. The next 2
conditions are also revolving around Loyal CH. Starting from the left
side:
If loyalCH is < .28 we go to the left most node. Once again, if
LoyalCH is less than .04, then it is MM, and MM once again is it is >
.04 If loyalCH is > .28 we go to the right node. If PriceDiff is <
.05, and Special CH < .05 we predict MM, ELSE, we are CH. The same
process is repeated in the right side. Basically, it is a nest for
loop.
e) Predict the response on the data, and produce a confusion matrix
comparing the test labels to predict test labels. What is the test error
rate?
oj_pred = predict(oj_tree, test, type = "class")
table(test$Purchase, oj_pred)
## oj_pred
## CH MM
## CH 160 8
## MM 38 64
total_preds <- sum(table(test$Purchase, oj_pred))
incorrect_preds <- total_preds - sum(diag(table(test$Purchase, oj_pred)))
test_error_rate <- 100 * incorrect_preds / total_preds
test_error_rate
## [1] 17.03704
The test error rate is 17.03704
f) Apply the cv.tree()…
cv_oj = cv.tree(oj_tree, FUN = prune.tree)
cv_oj
## $size
## [1] 10 9 8 7 6 5 4 3 2 1
##
## $dev
## [1] 704.9056 723.2379 723.4760 715.7605 715.7605 714.1093 725.4734
## [8] 780.2099 790.0301 1074.2062
##
## $k
## [1] -Inf 12.23818 12.62207 13.94616 14.35384 26.21539 35.74964
## [8] 43.07317 45.67120 293.15784
##
## $method
## [1] "deviance"
##
## attr(,"class")
## [1] "prune" "tree.sequence"
g) produce a plot with tree size on the x-axis…
plot(cv_oj$size, cv_oj$dev, type = "b", xlab = "tree size", ylab = "deviance")
#### h) which tree size corresponds to the lowest cv class error
rate
The value with the lowest cross validation appears to be be at 5 (yes
10 is the lowest, but it be best to choose 5 to reduce Over Fitting)
i) Produce a pruned tree…
oj_pruned = prune.tree(oj_tree, best = 5)
plot(oj_pruned)
text(oj_pruned, pretty = 0)
#### j) compare the training error rates between the pruned and unpruned
trees.
summary(oj_pruned)
##
## Classification tree:
## snip.tree(tree = oj_tree, nodes = c(4L, 12L, 5L))
## Variables actually used in tree construction:
## [1] "LoyalCH" "ListPriceDiff"
## Number of terminal nodes: 5
## Residual mean deviance: 0.8239 = 655 / 795
## Misclassification error rate: 0.205 = 164 / 800
Our pruned Misclassification error rate: 0.205 = 164 / 800
Unpruned: Misclassification error rate: 0.1588 = 127 / 800
Our missclassification error rate increased, but not by much. If we
round, then the error rate is the same
k) compare the test error rates betweeen pruned and unpruned
pred_unpruned = predict(oj_tree, test, type = "class")
misclass_unpr = sum(test$Purchase != pred_unpruned)
misclass_unpr/length(pred_unpruned)
## [1] 0.1703704
pred_pruned = predict(oj_pruned, test, type = "class")
misclass_unpr = sum(test$Purchase != pred_pruned)
misclass_unpr/length(pred_pruned)
## [1] 0.1925926
The pruned misclassification rate (0.2037037), is higher than the
unpruned misclassification rate (0.1703704). However, these values are
really close, with a small difference. Depending on computational
intensities, it may be more efficient, and interpretable to go with the
pruned tree (however, this depends on a case by case scenario)