Case

Kasus yang dibahas ini adalah tentang sebuah perusahaan yang memproduksi tempat duduk untuk bayi yang diletakkan dalam mobil (child car seats). Dataset ISLR::Carseats berisi data dari 400 outlet yang menjual produk tersebut. Untuk mengambil keputusan penjualan, data dari sebelas variables telah dikumpulkan. Variabel-variabel tersebut adalah sebagai berikut: a. Sales: Unit sales (in thousands) at each location; b. CompPrice: Price charged by competitor at each location; c. Income: Community income level (in thousands of dollars); d. Advertising: Local advertising budget for company at each location (in thousands of dollars) e. Population: Population size in region (in thousands); f. Price: Price company charges for car seats at each site; g. ShelveLoc: A factor with levels Bad, Good and Medium indicating the quality of the shelving location for the car seats at each site; h. Age: Average age of the local population; i. Education: Education level at each location; j. Urban: A factor with levels No and Yes to indicate whether the store is in an urban or rural location; k. US: A factor with levels No and Yes to indicate whether the store is in the US or not.

Data Case

Load data Carseats from package ISLR and randomize using NPM as random number. Masukkan NPM Saudara di baris 35!

library("tree")
library("ISLR")

# Load dataseat called Carseats
attach(Carseats)

# Put Carseats data into new dataframe df and randomize
NPM = 1401184039
set.seed(NPM)
train = sample (1: nrow(Carseats), 400, replace = TRUE)  #Row sample for training data
df <- Carseats[train,] #Training data row from Carseats
df

Pertanyaan 1 =========================================================== (CLO1)

Sebagai pengambil keputusan, Saudara diminta untuk menentukan batas_1 Sales = High, an batas_2 Sales = Low. Sales di antara batas_1 dan batas_2 adalah Sales = Medium. Untuk itu Saudara memutuskan batas_1 adalah mean(Sales)+ standard_deviasi(Sales), dan batas_2 mean(Sales) - standard_deviasi(Sales). Berapakah batas_1 dan batas_2?

Jawab: batas_1 bernilai 10.2573476017475 dan batas 2 bernilai 4.54915239825251

batas_1 <- mean(df$Sales) + sd(df$Sales)
paste0("batas_1 = ", batas_1)

## [1] "batas_1 = 10.2573476017475"

batas_2 <- mean(df$Sales) - sd(df$Sales)
paste0("batas_2 = ", batas_2)

## [1] "batas_2 = 4.54915239825251"

Pertanyaan 2 =========================================================== (CLO1)

Berapakah jumlah toko/outlet yang Salesnya mencapai masing-masing High, Medium, dan Low? Jawab: outlet yang memiliki sales tinggi yaitu sejumlah 69 outlet, sedangkan outlet yang jumlah salesnya medium terdapat 269 outlet, dan outlet dengan sales terendah berjumlah 62 toko

Use the ifelse() function to create a variable, called Sales_cat, which takes on a value of High if the Sales variable exceeds 8, and takes on a value of Medium if it exceeds 6 and Low otherwise.

#Discretisize Sales from Low, Medium, High and add as new variable
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

df = df %>%
  mutate(Sales_cat <- as.factor(ifelse(Sales < batas_2, "Low",
                                       (ifelse(Sales < batas_1, " Medium", "High")))))
colnames(df)[12] <- c('Sales_cat')

print("Sales")

## [1] "Sales"

table(df$Sales_cat)

## 
##  Medium    High     Low 
##     269      69      62

Pertanyaan 3 =========================================================== (CLO2)

Saudara berencana untuk membuka outlet baru. Untuk itu Saudara ingin mempelajari pola data case Saudara sedemikian sehingga bisa menentukan taktik penjualannya. Selanjutnya Saudara membuat model klasifikasi yang bisa memprediksi Sales High, Medium atau Low menggunakan algoritma CART dari package tree. Faktor-faktor apa saja yang berperan sebagai predictor dalam model klasifikasi Saudara?

Jawab: faktor yang berperan sebagai prediktor adalah ShelveLoc, Price, Comprice, Age, Advertising, Income, Education

Use the tree() function to fit a classification tree in order to predict High using all variables but Sales. The syntax of the tree() function is quite similar to that of the lm() function.

# Model induction using data df and split criteria deviance
tree.carseats <- tree::tree(Sales_cat ~ .-Sales, 
                      data = df, 
                      split = "deviance",
                      model = TRUE)

# The summary() function lists the variables that are used as internal nodes in the tree, the number of terminal nodes, and the model error rate.
summary(tree.carseats)

## 
## Classification tree:
## tree::tree(formula = Sales_cat ~ . - Sales, data = df, split = "deviance", 
##     model = TRUE)
## Variables actually used in tree construction:
## [1] "ShelveLoc"   "Price"       "CompPrice"   "Age"         "Advertising"
## [6] "Income"      "Education"  
## Number of terminal nodes:  27 
## Residual mean deviance:  0.5045 = 188.2 / 373 
## Misclassification error rate: 0.0925 = 37 / 400

Pertanyaan 4 =========================================================== (CLO2)

Berapakah kesalahan misklasifikasi total (ovarall misclassification) model Saudara? Apakah Saudara cukup puas dengan model Saudara apabila akan digunakan untuk pengambilan keputusan pembukaan outlet baru? Jelaskan jawaban Saudara!

Jawab: nilai misclasification dari model ini adalah 0.0925 yang diperoleh dari 37/400. Nilai ini cukup memuaskan untuk digunakan dalam pengambilan keputusan pembukaan toko baru karena nilai misclasification eror cukup kecil yaitu kurang dai 1 dan hanya sebesar 37 data dari total 400 data set yang tidak terklasifikasi dengan baik.

Pertanyaan 5 =========================================================== (CLO2)

Jelaskan model saudara di bawah ini! Mulailah dengan melihat faktor apa yang berperan paling penting! (The most important indicator of Sales appears to be …, since …. Etc.) Dsb.

Jawab: indicator paling penting pada sales dalam model ini adalah ShelveLoc karena indicator tersebut berada pada puncak tree decision (root node), kemudian diikuti faktor price dan income dibawahnya, comprice dan Age berada dibawah price dan advertising diberada di cabang dari income

# Visualize classification model using textual tree
tree:::print.tree(tree.carseats)

## node), split, n, deviance, yval, (yprob)
##       * denotes terminal node
## 
##   1) root 400 687.100  Medium ( 0.67250 0.17250 0.15500 )  
##     2) ShelveLoc: Bad,Medium 324 506.900  Medium ( 0.71296 0.09568 0.19136 )  
##       4) Price < 127 224 312.900  Medium ( 0.76786 0.13839 0.09375 )  
##         8) CompPrice < 123.5 130 139.200  Medium ( 0.82308 0.02308 0.15385 )  
##          16) Price < 74.5 11  12.890  Medium ( 0.72727 0.27273 0.00000 ) *
##          17) Price > 74.5 119 107.800  Medium ( 0.83193 0.00000 0.16807 )  
##            34) ShelveLoc: Bad 38  50.020  Medium ( 0.63158 0.00000 0.36842 )  
##              68) Age < 42.5 11   0.000  Medium ( 1.00000 0.00000 0.00000 ) *
##              69) Age > 42.5 27  37.390 Low ( 0.48148 0.00000 0.51852 )  
##               138) Advertising < 12 18  21.270 Low ( 0.27778 0.00000 0.72222 )  
##                 276) CompPrice < 116.5 9   0.000 Low ( 0.00000 0.00000 1.00000 ) *
##                 277) CompPrice > 116.5 9  12.370  Medium ( 0.55556 0.00000 0.44444 ) *
##               139) Advertising > 12 9   6.279  Medium ( 0.88889 0.00000 0.11111 ) *
##            35) ShelveLoc: Medium 81  42.780  Medium ( 0.92593 0.00000 0.07407 )  
##              70) Age < 68.5 60  10.170  Medium ( 0.98333 0.00000 0.01667 ) *
##              71) Age > 68.5 21  23.050  Medium ( 0.76190 0.00000 0.23810 )  
##               142) CompPrice < 107 6   7.638 Low ( 0.33333 0.00000 0.66667 ) *
##               143) CompPrice > 107 15   7.348  Medium ( 0.93333 0.00000 0.06667 ) *
##         9) CompPrice > 123.5 94 124.900  Medium ( 0.69149 0.29787 0.01064 )  
##          18) Price < 95 16  12.060 High ( 0.12500 0.87500 0.00000 ) *
##          19) Price > 95 78  83.720  Medium ( 0.80769 0.17949 0.01282 )  
##            38) Advertising < 10.5 48  26.260  Medium ( 0.93750 0.04167 0.02083 )  
##              76) CompPrice < 145.5 43   9.499  Medium ( 0.97674 0.00000 0.02326 ) *
##              77) CompPrice > 145.5 5   6.730  Medium ( 0.60000 0.40000 0.00000 ) *
##            39) Advertising > 10.5 30  40.380  Medium ( 0.60000 0.40000 0.00000 )  
##              78) Income < 101.5 22  20.860  Medium ( 0.81818 0.18182 0.00000 )  
##               156) Income < 66 12   0.000  Medium ( 1.00000 0.00000 0.00000 ) *
##               157) Income > 66 10  13.460  Medium ( 0.60000 0.40000 0.00000 )  
##                 314) CompPrice < 132.5 5   0.000  Medium ( 1.00000 0.00000 0.00000 ) *
##                 315) CompPrice > 132.5 5   5.004 High ( 0.20000 0.80000 0.00000 ) *
##              79) Income > 101.5 8   0.000 High ( 0.00000 1.00000 0.00000 ) *
##       5) Price > 127 100 135.400  Medium ( 0.59000 0.00000 0.41000 )  
##        10) Age < 65 70  75.260  Medium ( 0.77143 0.00000 0.22857 )  
##          20) ShelveLoc: Bad 22  30.500  Medium ( 0.50000 0.00000 0.50000 )  
##            40) CompPrice < 122.5 5   0.000 Low ( 0.00000 0.00000 1.00000 ) *
##            41) CompPrice > 122.5 17  22.070  Medium ( 0.64706 0.00000 0.35294 )  
##              82) Education < 12.5 6   0.000  Medium ( 1.00000 0.00000 0.00000 ) *
##              83) Education > 12.5 11  15.160 Low ( 0.45455 0.00000 0.54545 )  
##               166) Education < 15.5 6   0.000 Low ( 0.00000 0.00000 1.00000 ) *
##               167) Education > 15.5 5   0.000  Medium ( 1.00000 0.00000 0.00000 ) *
##          21) ShelveLoc: Medium 48  32.080  Medium ( 0.89583 0.00000 0.10417 ) *
##        11) Age > 65 30  27.030 Low ( 0.16667 0.00000 0.83333 )  
##          22) Income < 66 22   0.000 Low ( 0.00000 0.00000 1.00000 ) *
##          23) Income > 66 8  10.590  Medium ( 0.62500 0.00000 0.37500 ) *
##     3) ShelveLoc: Good 76 105.400  Medium ( 0.50000 0.50000 0.00000 )  
##       6) Income < 39.5 15   7.348  Medium ( 0.93333 0.06667 0.00000 ) *
##       7) Income > 39.5 61  81.770 High ( 0.39344 0.60656 0.00000 )  
##        14) Advertising < 1 20  24.430  Medium ( 0.70000 0.30000 0.00000 )  
##          28) Price < 94.5 6   5.407 High ( 0.16667 0.83333 0.00000 ) *
##          29) Price > 94.5 14   7.205  Medium ( 0.92857 0.07143 0.00000 ) *
##        15) Advertising > 1 41  45.550 High ( 0.24390 0.75610 0.00000 )  
##          30) Income < 85 29  19.290 High ( 0.10345 0.89655 0.00000 ) *
##          31) Income > 85 12  16.300  Medium ( 0.58333 0.41667 0.00000 ) *

# Visualize classification model using plot tree diagram
plot(tree.carseats, type = "proportional" )
text(tree.carseats, 
     col=1, 
     cex=1/2,
     pretty=0)

Pertanyaan 6 =========================================================== (CLO2)

Saudara berusaha mengurangi resiko pengambilan keputusan dengan mencari terminal node (leaf) High yang puritynya lebih tinggi dari overall accuracy dari model. Adakah leaf node yang Saudara cari? Kalau ada leaf node mana saja?

Jawab: leaf node merupakan node yang tidak mempunyai cabang dan biasanya terletak pada bagian paling bawah dari sebuah bagan pohon. pada bagan tersebut terdapat leaf node dengan nilai purity lebih tinggi daripada overall accuracy adalah node CompPrice > 132.5 high.

Pertanyaan 7 =========================================================== (CLO2)

Jelaskan gambar hasil evaluasi model di bawah ini!

Jawab: jika dilihat dari bagan tersebut hasil evaluasi antara aktual sales dan prediksi yang mendapatkan nilai kemungkinan paling besar adalah medium prediksi dan medium aktual bernilai 260, high prediksi dan high aktual bernilai 57, serta low prediksi dan low aktual bernilai 46. hal ini menunjukkan bahwa map plot prediksi vs aktual mempunyai akurasi tinggi atau bisa dikatakan berhasil memprediksi, karena jika prediksi medium maka penjualan yang didapatkan jg medium, jika prediksi menunjukkan tinggi maka sales juga akan meningkat, dan seterusnya.

Evaluate the model.

# Prediction using training data
set.seed(1401184039)
tree.pred <- predict(tree.carseats,
                     df, 
                     type ="class")
# Confusion matrix
table(tree.pred, df$Sales_cat)

##          
## tree.pred  Medium High Low
##    Medium     260   12  16
##   High          7   57   0
##   Low           2    0  46

# Plot confusion matrix
plot(table(tree.pred, 
           df$Sales_cat),
     main = 'Map plot of actual vs prediction',
     ylab = 'Actual',
     xlab = 'Prediction')

Pertanyaan 8 =========================================================== (CLO3)

Jelaskan hasil evaluasi precision - recall dari model pada gambar di bawah ini!

Jawab: Precision-Recall adalah ukuran keberhasilan prediksi yang berguna ketika kelas sangat tidak seimbang. nilai precision/recall plot pada bagan tersebut menyatakan bahwa model ini berhasil dalam memprediksi kelas yang akan diukur. karena bisa dilihat pada nilai bagan tersebut titik tertinggi berada pada sumbu x 0.88 (recall) dengan nilai 1.00 (precission) pada sumbu y. dan titik terendahnya berada pada nilai 1.00 di sumbu x (recall) dan 0.88 di sumbu y (precision) kedua titik ini berkebalikan dengan nilai x,y yang sama.

Evaluate model performance on the training dataset. Precision/Recall Plot: requires the ROCR package

library(ROCR)

dfROC = Carseats %>%
  mutate(Sales_cat <- as.factor(ifelse(Sales < batas_1, "Low", "High")))
colnames(dfROC)[12] <- c('Sales_cat1')

tree.carseatsROC <- tree::tree(Sales_cat1 ~ .-Sales, 
                      data = dfROC, 
                      split = "deviance",
                      model = TRUE)


# Prediction
predROC = predict(tree.carseatsROC, dfROC, type ="vector")[,2]
predictionROC <- prediction(as.numeric(predROC), as.numeric(dfROC$Sales_cat1))

ROCR::plot(performance(predictionROC, "prec", "rec"), col="#CC0000FF", lty=1, add=FALSE)

# Add decorations to the plot.
title(main="Precision/Recall Plot",
     sub=paste("rocr", format(Sys.time(), "%Y-%b-%d %H:%M:%S"), Sys.info()["user"]))
grid()

Pertanyaan 9 =========================================================== (CLO3)

Saudara berkeinginan untuk mengevaluasi apakah pruning dimungkinkan dengan dugaan bahwa pruning akan memperbaiki overall accuracy dan mempermudah pengambilan keputusan karena jumlah faktor yang berperan penting dalam prediksi bis diprediksi. Lakukanlah pruning terhadap model yang sudah Saudara bangundi atas lalu putuskan apakah pruning tersebut terbukti memperbaiki pengambilan keputusan atau tidak!

Jawab: a. Size yang terbaik untuk pruning adalah 25 b. Pruning memperbaiki/tidak memperbaiki overall accuracy. jelaskan! Jawab : prunning tidak memperbaiki overall accuaracy karena nilai pada misclassification tetap menghasilkan nilai yang sama. c. Pruning mempermudah/tidak mempermudah/sama-sama saja pengambilan keputusan penempatan outlet baru karena adanya node terminal (leaf) yang puritynya di atas overall accuracy dari model. Jawab : pruning dapat mempermudah pengambilan keputusan penempatan outlet baru, karena dapat menyempitkan prediksi-prediksi yang dianggap terbaik daripada keputusan lain sebelum melakukan prunning.

Pruning the tree We consider whether pruning the tree might lead to improved results. The function cv.tree() performs cross-validation in order to determine the optimal level of tree complexity; cost complexity pruning is used in order to select a sequence of trees for consideration. We use the argument FUN=prune.misclass in order to indicate that we want the classification error rate to guide the cross-validation and pruning process, rather than the default for the cv.tree() function, which is deviance. The cv.tree() function reports the number of terminal nodes of each tree considered (size) as well as the corresponding error rate and the value of the cost-complexity parameter used (k).

set.seed (1401184039)
cv.carseats = tree::cv.tree(tree(Sales_cat ~ . -Sales, 
                                 data = df, 
                                 split = "deviance",
                                 model = TRUE),
                            FUN=prune.misclass, 
                            K = 15)
cv.carseats

## $size
##  [1] 27 26 23 21 15 11  8  6  5  1
## 
## $dev
##  [1]  83  83  91  90  91  89  96 104 134 133
## 
## $k
##  [1] -Inf 0.00 1.00 1.50 2.00 2.75 4.00 6.00 8.00 8.25
## 
## $method
## [1] "misclass"
## 
## attr(,"class")
## [1] "prune"         "tree.sequence"

We plot the error rate as a function of both size and k.

par(mfrow =c(1,2))
plot(cv.carseats$size ,
     cv.carseats$dev ,
     type="b")
plot(cv.carseats$k ,
     cv.carseats$dev ,
     type="b")

Apply the prune.misclass() function in order to prune the tree to obtain the nine-node tree.

set.seed(123)
prune.carseats = prune.misclass(tree.carseats , 
                                best = 25)
plot(prune.carseats )
text(prune.carseats ,
     col=1, 
     cex=1/2)

tree:::print.tree(prune.carseats)

## node), split, n, deviance, yval, (yprob)
##       * denotes terminal node
## 
##   1) root 400 687.100  Medium ( 0.67250 0.17250 0.15500 )  
##     2) ShelveLoc: Bad,Medium 324 506.900  Medium ( 0.71296 0.09568 0.19136 )  
##       4) Price < 127 224 312.900  Medium ( 0.76786 0.13839 0.09375 )  
##         8) CompPrice < 123.5 130 139.200  Medium ( 0.82308 0.02308 0.15385 )  
##          16) Price < 74.5 11  12.890  Medium ( 0.72727 0.27273 0.00000 ) *
##          17) Price > 74.5 119 107.800  Medium ( 0.83193 0.00000 0.16807 )  
##            34) ShelveLoc: Bad 38  50.020  Medium ( 0.63158 0.00000 0.36842 )  
##              68) Age < 42.5 11   0.000  Medium ( 1.00000 0.00000 0.00000 ) *
##              69) Age > 42.5 27  37.390 Low ( 0.48148 0.00000 0.51852 )  
##               138) Advertising < 12 18  21.270 Low ( 0.27778 0.00000 0.72222 )  
##                 276) CompPrice < 116.5 9   0.000 Low ( 0.00000 0.00000 1.00000 ) *
##                 277) CompPrice > 116.5 9  12.370  Medium ( 0.55556 0.00000 0.44444 ) *
##               139) Advertising > 12 9   6.279  Medium ( 0.88889 0.00000 0.11111 ) *
##            35) ShelveLoc: Medium 81  42.780  Medium ( 0.92593 0.00000 0.07407 )  
##              70) Age < 68.5 60  10.170  Medium ( 0.98333 0.00000 0.01667 ) *
##              71) Age > 68.5 21  23.050  Medium ( 0.76190 0.00000 0.23810 )  
##               142) CompPrice < 107 6   7.638 Low ( 0.33333 0.00000 0.66667 ) *
##               143) CompPrice > 107 15   7.348  Medium ( 0.93333 0.00000 0.06667 ) *
##         9) CompPrice > 123.5 94 124.900  Medium ( 0.69149 0.29787 0.01064 )  
##          18) Price < 95 16  12.060 High ( 0.12500 0.87500 0.00000 ) *
##          19) Price > 95 78  83.720  Medium ( 0.80769 0.17949 0.01282 )  
##            38) Advertising < 10.5 48  26.260  Medium ( 0.93750 0.04167 0.02083 ) *
##            39) Advertising > 10.5 30  40.380  Medium ( 0.60000 0.40000 0.00000 )  
##              78) Income < 101.5 22  20.860  Medium ( 0.81818 0.18182 0.00000 )  
##               156) Income < 66 12   0.000  Medium ( 1.00000 0.00000 0.00000 ) *
##               157) Income > 66 10  13.460  Medium ( 0.60000 0.40000 0.00000 )  
##                 314) CompPrice < 132.5 5   0.000  Medium ( 1.00000 0.00000 0.00000 ) *
##                 315) CompPrice > 132.5 5   5.004 High ( 0.20000 0.80000 0.00000 ) *
##              79) Income > 101.5 8   0.000 High ( 0.00000 1.00000 0.00000 ) *
##       5) Price > 127 100 135.400  Medium ( 0.59000 0.00000 0.41000 )  
##        10) Age < 65 70  75.260  Medium ( 0.77143 0.00000 0.22857 )  
##          20) ShelveLoc: Bad 22  30.500  Medium ( 0.50000 0.00000 0.50000 )  
##            40) CompPrice < 122.5 5   0.000 Low ( 0.00000 0.00000 1.00000 ) *
##            41) CompPrice > 122.5 17  22.070  Medium ( 0.64706 0.00000 0.35294 )  
##              82) Education < 12.5 6   0.000  Medium ( 1.00000 0.00000 0.00000 ) *
##              83) Education > 12.5 11  15.160 Low ( 0.45455 0.00000 0.54545 )  
##               166) Education < 15.5 6   0.000 Low ( 0.00000 0.00000 1.00000 ) *
##               167) Education > 15.5 5   0.000  Medium ( 1.00000 0.00000 0.00000 ) *
##          21) ShelveLoc: Medium 48  32.080  Medium ( 0.89583 0.00000 0.10417 ) *
##        11) Age > 65 30  27.030 Low ( 0.16667 0.00000 0.83333 )  
##          22) Income < 66 22   0.000 Low ( 0.00000 0.00000 1.00000 ) *
##          23) Income > 66 8  10.590  Medium ( 0.62500 0.00000 0.37500 ) *
##     3) ShelveLoc: Good 76 105.400  Medium ( 0.50000 0.50000 0.00000 )  
##       6) Income < 39.5 15   7.348  Medium ( 0.93333 0.06667 0.00000 ) *
##       7) Income > 39.5 61  81.770 High ( 0.39344 0.60656 0.00000 )  
##        14) Advertising < 1 20  24.430  Medium ( 0.70000 0.30000 0.00000 )  
##          28) Price < 94.5 6   5.407 High ( 0.16667 0.83333 0.00000 ) *
##          29) Price > 94.5 14   7.205  Medium ( 0.92857 0.07143 0.00000 ) *
##        15) Advertising > 1 41  45.550 High ( 0.24390 0.75610 0.00000 )  
##          30) Income < 85 29  19.290 High ( 0.10345 0.89655 0.00000 ) *
##          31) Income > 85 12  16.300  Medium ( 0.58333 0.41667 0.00000 ) *

summary(prune.carseats)

## 
## Classification tree:
## snip.tree(tree = tree.carseats, nodes = 38L)
## Variables actually used in tree construction:
## [1] "ShelveLoc"   "Price"       "CompPrice"   "Age"         "Advertising"
## [6] "Income"      "Education"  
## Number of terminal nodes:  26 
## Residual mean deviance:  0.53 = 198.2 / 374 
## Misclassification error rate: 0.0925 = 37 / 400

Misclassification training error after pruning = 0.0925 which is same value as compares to before pruning = 0.0925

Generate the confusion matrix showing counts for pruned tree.

set.seed(NPM)
tree.pred <- predict(prune.carseats,
                     df, 
                     type ="class")

print("confusion matrix showing test counts")

## [1] "confusion matrix showing test counts"

per <- rattle::errorMatrix(as.numeric(df$Sales_cat), 
                           as.numeric(tree.pred), 
                           count=FALSE)
per

##       Predicted
## Actual  1    2    3 Error
##      1 65  1.8  0.5   3.3
##      2  3 14.2  0.0  17.4
##      3  4  0.0 11.5  25.8

Calculate the overall error percentage for pruned tree.

print("overall error percentage")

## [1] "overall error percentage"

cat(100-sum(diag(per), na.rm=TRUE))

## 9.3

Pertanyaan 10 =========================================================== (CLO3)

Apakah saran Saudara untuk outlet yang kinerja Salesnya masih Low?

Jawab: untuk toko yang mempunyai nilai sales low dapat memperhatikan kembali faktor-faktor yang dapat mempengaruhi tingkat sales. misalnya faktor ShelveLoc yang merupakan faktor paling berpengaruh, outlet dapat mempertimbangkan perbaikan dalam pengaturan ShelveLoc. kemudian faktor lain yang paling berpengaruh setelah Shelveloc adalah price dan income. Kedua faktor ini saling berpengaruh karena income seseorang dapat mempengaruhi daya beli nya. outlet dengan tingkat penjualan rendah harus memperhatikan faktor-faktor tersebut. jika faktor tersebut dapat diatasi sesuai dengan harapan dari konsumen maka kemungkinan tingkat penjualan dapat meningkat.

UTS - BIG DATA DAN ANALITIK DATA / EMI1E3

gdr

12 April 2022