2— title: “Homework6 Tree models” author: “Tianhai Zu” date: “10/22/2023” output: html_document —
Refer to http://archive.ics.uci.edu/ml/datasets/Statlog+(German+Credit+Data))
for variable description. The response variable is Class
and all others are predictors.
Only run the following code once to install the package
caret. The German credit scoring data in
provided in that package.
install.packages('caret')
library(caret) #this package contains the german data with its numeric format
## Loading required package: ggplot2
## Loading required package: lattice
data(GermanCredit)
GermanCredit$Class <- as.numeric(GermanCredit$Class == "Good") # use this code to convert `Class` into True or False (equivalent to 1 or 0)
GermanCredit$Class <- as.factor(GermanCredit$Class) #make sure `Class` is a factor as SVM require a factor response,now 1 is good and 0 is bad.
str(GermanCredit)
## 'data.frame': 1000 obs. of 62 variables:
## $ Duration : int 6 48 12 42 24 36 24 36 12 30 ...
## $ Amount : int 1169 5951 2096 7882 4870 9055 2835 6948 3059 5234 ...
## $ InstallmentRatePercentage : int 4 2 2 2 3 2 3 2 2 4 ...
## $ ResidenceDuration : int 4 2 3 4 4 4 4 2 4 2 ...
## $ Age : int 67 22 49 45 53 35 53 35 61 28 ...
## $ NumberExistingCredits : int 2 1 1 1 2 1 1 1 1 2 ...
## $ NumberPeopleMaintenance : int 1 1 2 2 2 2 1 1 1 1 ...
## $ Telephone : num 0 1 1 1 1 0 1 0 1 1 ...
## $ ForeignWorker : num 1 1 1 1 1 1 1 1 1 1 ...
## $ Class : Factor w/ 2 levels "0","1": 2 1 2 2 1 2 2 2 2 1 ...
## $ CheckingAccountStatus.lt.0 : num 1 0 0 1 1 0 0 0 0 0 ...
## $ CheckingAccountStatus.0.to.200 : num 0 1 0 0 0 0 0 1 0 1 ...
## $ CheckingAccountStatus.gt.200 : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CheckingAccountStatus.none : num 0 0 1 0 0 1 1 0 1 0 ...
## $ CreditHistory.NoCredit.AllPaid : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CreditHistory.ThisBank.AllPaid : num 0 0 0 0 0 0 0 0 0 0 ...
## $ CreditHistory.PaidDuly : num 0 1 0 1 0 1 1 1 1 0 ...
## $ CreditHistory.Delay : num 0 0 0 0 1 0 0 0 0 0 ...
## $ CreditHistory.Critical : num 1 0 1 0 0 0 0 0 0 1 ...
## $ Purpose.NewCar : num 0 0 0 0 1 0 0 0 0 1 ...
## $ Purpose.UsedCar : num 0 0 0 0 0 0 0 1 0 0 ...
## $ Purpose.Furniture.Equipment : num 0 0 0 1 0 0 1 0 0 0 ...
## $ Purpose.Radio.Television : num 1 1 0 0 0 0 0 0 1 0 ...
## $ Purpose.DomesticAppliance : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Purpose.Repairs : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Purpose.Education : num 0 0 1 0 0 1 0 0 0 0 ...
## $ Purpose.Vacation : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Purpose.Retraining : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Purpose.Business : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Purpose.Other : num 0 0 0 0 0 0 0 0 0 0 ...
## $ SavingsAccountBonds.lt.100 : num 0 1 1 1 1 0 0 1 0 1 ...
## $ SavingsAccountBonds.100.to.500 : num 0 0 0 0 0 0 0 0 0 0 ...
## $ SavingsAccountBonds.500.to.1000 : num 0 0 0 0 0 0 1 0 0 0 ...
## $ SavingsAccountBonds.gt.1000 : num 0 0 0 0 0 0 0 0 1 0 ...
## $ SavingsAccountBonds.Unknown : num 1 0 0 0 0 1 0 0 0 0 ...
## $ EmploymentDuration.lt.1 : num 0 0 0 0 0 0 0 0 0 0 ...
## $ EmploymentDuration.1.to.4 : num 0 1 0 0 1 1 0 1 0 0 ...
## $ EmploymentDuration.4.to.7 : num 0 0 1 1 0 0 0 0 1 0 ...
## $ EmploymentDuration.gt.7 : num 1 0 0 0 0 0 1 0 0 0 ...
## $ EmploymentDuration.Unemployed : num 0 0 0 0 0 0 0 0 0 1 ...
## $ Personal.Male.Divorced.Seperated : num 0 0 0 0 0 0 0 0 1 0 ...
## $ Personal.Female.NotSingle : num 0 1 0 0 0 0 0 0 0 0 ...
## $ Personal.Male.Single : num 1 0 1 1 1 1 1 1 0 0 ...
## $ Personal.Male.Married.Widowed : num 0 0 0 0 0 0 0 0 0 1 ...
## $ Personal.Female.Single : num 0 0 0 0 0 0 0 0 0 0 ...
## $ OtherDebtorsGuarantors.None : num 1 1 1 0 1 1 1 1 1 1 ...
## $ OtherDebtorsGuarantors.CoApplicant : num 0 0 0 0 0 0 0 0 0 0 ...
## $ OtherDebtorsGuarantors.Guarantor : num 0 0 0 1 0 0 0 0 0 0 ...
## $ Property.RealEstate : num 1 1 1 0 0 0 0 0 1 0 ...
## $ Property.Insurance : num 0 0 0 1 0 0 1 0 0 0 ...
## $ Property.CarOther : num 0 0 0 0 0 0 0 1 0 1 ...
## $ Property.Unknown : num 0 0 0 0 1 1 0 0 0 0 ...
## $ OtherInstallmentPlans.Bank : num 0 0 0 0 0 0 0 0 0 0 ...
## $ OtherInstallmentPlans.Stores : num 0 0 0 0 0 0 0 0 0 0 ...
## $ OtherInstallmentPlans.None : num 1 1 1 1 1 1 1 1 1 1 ...
## $ Housing.Rent : num 0 0 0 0 0 0 0 1 0 0 ...
## $ Housing.Own : num 1 1 1 0 0 0 1 0 1 1 ...
## $ Housing.ForFree : num 0 0 0 1 1 1 0 0 0 0 ...
## $ Job.UnemployedUnskilled : num 0 0 0 0 0 0 0 0 0 0 ...
## $ Job.UnskilledResident : num 0 0 1 0 0 1 0 0 1 0 ...
## $ Job.SkilledEmployee : num 1 1 0 1 1 0 1 0 0 0 ...
## $ Job.Management.SelfEmp.HighlyQualified: num 0 0 0 0 0 0 0 1 0 1 ...
#load tree model packages
library(rpart)
library(rpart.plot)
#This is the code that drop variables that provide no information in the data
GermanCredit = GermanCredit[,-c(14,19,27,30,35,40,44,45,48,52,55,58,62)]
2024 for
reproducibility. (5pts)set.seed(2024)
index <- sample(1:nrow(GermanCredit),nrow(GermanCredit)*0.80)
GermanCredit_train = GermanCredit[index,]
GermanCredit_test = GermanCredit[-index,]
Class) are right. (10pts)library(rpart)
library(rpart.plot)
# fit the tree
GermanCredit_tree <- rpart(Class ~ ., data = GermanCredit_train)
summary(GermanCredit_tree)
## Call:
## rpart(formula = Class ~ ., data = GermanCredit_train)
## n= 800
##
## CP nsplit rel error xerror xstd
## 1 0.03056769 0 1.0000000 1.000000 0.05582842
## 2 0.02401747 4 0.8777293 1.013100 0.05604510
## 3 0.01965066 8 0.7729258 1.004367 0.05590117
## 4 0.01310044 10 0.7336245 1.017467 0.05611630
## 5 0.01091703 19 0.6069869 1.013100 0.05604510
## 6 0.01000000 21 0.5851528 1.008734 0.05597339
##
## Variable importance
## Amount Duration
## 15 14
## CheckingAccountStatus.lt.0 CheckingAccountStatus.0.to.200
## 14 14
## Age SavingsAccountBonds.lt.100
## 7 4
## Purpose.UsedCar Job.SkilledEmployee
## 4 3
## InstallmentRatePercentage Property.RealEstate
## 3 3
## SavingsAccountBonds.100.to.500 OtherInstallmentPlans.Stores
## 3 2
## NumberExistingCredits Purpose.Business
## 2 2
## Job.UnskilledResident EmploymentDuration.lt.1
## 1 1
## OtherDebtorsGuarantors.None Telephone
## 1 1
## SavingsAccountBonds.500.to.1000 SavingsAccountBonds.gt.1000
## 1 1
## Personal.Male.Divorced.Seperated Property.Insurance
## 1 1
## Purpose.Furniture.Equipment
## 1
##
## Node number 1: 800 observations, complexity param=0.03056769
## predicted class=1 expected loss=0.28625 P(node) =1
## class counts: 229 571
## probabilities: 0.286 0.714
## left son=2 (211 obs) right son=3 (589 obs)
## Primary splits:
## CheckingAccountStatus.lt.0 < 0.5 to the right, improve=21.222720, (0 missing)
## Duration < 25.5 to the right, improve=13.584620, (0 missing)
## Amount < 10918 to the right, improve=12.537530, (0 missing)
## SavingsAccountBonds.lt.100 < 0.5 to the right, improve= 8.092071, (0 missing)
## CreditHistory.ThisBank.AllPaid < 0.5 to the right, improve= 7.040837, (0 missing)
## Surrogate splits:
## Amount < 355.5 to the left, agree=0.738, adj=0.005, (0 split)
##
## Node number 2: 211 observations, complexity param=0.03056769
## predicted class=1 expected loss=0.478673 P(node) =0.26375
## class counts: 101 110
## probabilities: 0.479 0.521
## left son=4 (178 obs) right son=5 (33 obs)
## Primary splits:
## Duration < 11.5 to the right, improve=8.373770, (0 missing)
## Amount < 4802.5 to the right, improve=4.982836, (0 missing)
## CreditHistory.Delay < 0.5 to the right, improve=3.726962, (0 missing)
## Job.SkilledEmployee < 0.5 to the right, improve=3.414315, (0 missing)
## ForeignWorker < 0.5 to the right, improve=3.382024, (0 missing)
## Surrogate splits:
## Age < 66.5 to the left, agree=0.858, adj=0.091, (0 split)
## Amount < 617.5 to the right, agree=0.853, adj=0.061, (0 split)
##
## Node number 3: 589 observations, complexity param=0.03056769
## predicted class=1 expected loss=0.2173175 P(node) =0.73625
## class counts: 128 461
## probabilities: 0.217 0.783
## left son=6 (210 obs) right son=7 (379 obs)
## Primary splits:
## CheckingAccountStatus.0.to.200 < 0.5 to the right, improve=20.662260, (0 missing)
## Amount < 10918 to the right, improve=15.274340, (0 missing)
## Duration < 25.5 to the right, improve= 8.276487, (0 missing)
## OtherInstallmentPlans.Bank < 0.5 to the right, improve= 5.258972, (0 missing)
## Age < 25.5 to the left, improve= 4.661922, (0 missing)
## Surrogate splits:
## Duration < 43.5 to the right, agree=0.660, adj=0.048, (0 split)
## Amount < 11191 to the right, agree=0.660, adj=0.048, (0 split)
## CreditHistory.NoCredit.AllPaid < 0.5 to the right, agree=0.654, adj=0.029, (0 split)
## CreditHistory.ThisBank.AllPaid < 0.5 to the right, agree=0.652, adj=0.024, (0 split)
## SavingsAccountBonds.100.to.500 < 0.5 to the right, agree=0.650, adj=0.019, (0 split)
##
## Node number 4: 178 observations, complexity param=0.02401747
## predicted class=0 expected loss=0.4606742 P(node) =0.2225
## class counts: 96 82
## probabilities: 0.539 0.461
## left son=8 (38 obs) right son=9 (140 obs)
## Primary splits:
## Duration < 31.5 to the right, improve=3.769739, (0 missing)
## Job.SkilledEmployee < 0.5 to the right, improve=3.558204, (0 missing)
## CreditHistory.Delay < 0.5 to the right, improve=2.756581, (0 missing)
## Amount < 4802.5 to the right, improve=2.493525, (0 missing)
## Purpose.NewCar < 0.5 to the right, improve=2.196990, (0 missing)
## Surrogate splits:
## Amount < 6668.5 to the right, agree=0.843, adj=0.263, (0 split)
##
## Node number 5: 33 observations
## predicted class=1 expected loss=0.1515152 P(node) =0.04125
## class counts: 5 28
## probabilities: 0.152 0.848
##
## Node number 6: 210 observations, complexity param=0.03056769
## predicted class=1 expected loss=0.3952381 P(node) =0.2625
## class counts: 83 127
## probabilities: 0.395 0.605
## left son=12 (16 obs) right son=13 (194 obs)
## Primary splits:
## Amount < 9908.5 to the right, improve=10.185580, (0 missing)
## Duration < 22.5 to the right, improve= 6.836080, (0 missing)
## Property.RealEstate < 0.5 to the left, improve= 6.773416, (0 missing)
## Housing.Own < 0.5 to the left, improve= 4.050114, (0 missing)
## Age < 25.5 to the left, improve= 2.835462, (0 missing)
##
## Node number 7: 379 observations
## predicted class=1 expected loss=0.1187335 P(node) =0.47375
## class counts: 45 334
## probabilities: 0.119 0.881
##
## Node number 8: 38 observations
## predicted class=0 expected loss=0.2631579 P(node) =0.0475
## class counts: 28 10
## probabilities: 0.737 0.263
##
## Node number 9: 140 observations, complexity param=0.02401747
## predicted class=1 expected loss=0.4857143 P(node) =0.175
## class counts: 68 72
## probabilities: 0.486 0.514
## left son=18 (129 obs) right son=19 (11 obs)
## Primary splits:
## Purpose.UsedCar < 0.5 to the left, improve=5.632780, (0 missing)
## Amount < 1377 to the left, improve=3.929252, (0 missing)
## Purpose.NewCar < 0.5 to the right, improve=3.629554, (0 missing)
## Purpose.Business < 0.5 to the left, improve=2.208009, (0 missing)
## InstallmentRatePercentage < 2.5 to the right, improve=1.545196, (0 missing)
## Surrogate splits:
## Age < 61.5 to the left, agree=0.929, adj=0.091, (0 split)
##
## Node number 12: 16 observations
## predicted class=0 expected loss=0.0625 P(node) =0.02
## class counts: 15 1
## probabilities: 0.938 0.062
##
## Node number 13: 194 observations, complexity param=0.01965066
## predicted class=1 expected loss=0.3505155 P(node) =0.2425
## class counts: 68 126
## probabilities: 0.351 0.649
## left son=26 (136 obs) right son=27 (58 obs)
## Primary splits:
## Property.RealEstate < 0.5 to the left, improve=4.281722, (0 missing)
## Duration < 22.5 to the right, improve=3.588005, (0 missing)
## Age < 25.5 to the left, improve=3.343549, (0 missing)
## CreditHistory.ThisBank.AllPaid < 0.5 to the right, improve=2.575549, (0 missing)
## OtherDebtorsGuarantors.None < 0.5 to the right, improve=2.533807, (0 missing)
## Surrogate splits:
## OtherDebtorsGuarantors.None < 0.5 to the right, agree=0.768, adj=0.224, (0 split)
## Amount < 632 to the right, agree=0.716, adj=0.052, (0 split)
## Age < 20.5 to the right, agree=0.706, adj=0.017, (0 split)
## OtherDebtorsGuarantors.CoApplicant < 0.5 to the left, agree=0.706, adj=0.017, (0 split)
## Job.UnskilledResident < 0.5 to the left, agree=0.706, adj=0.017, (0 split)
##
## Node number 18: 129 observations, complexity param=0.02401747
## predicted class=0 expected loss=0.4728682 P(node) =0.16125
## class counts: 68 61
## probabilities: 0.527 0.473
## left son=36 (121 obs) right son=37 (8 obs)
## Primary splits:
## Purpose.Business < 0.5 to the left, improve=2.758425, (0 missing)
## Amount < 1377 to the left, improve=2.425020, (0 missing)
## Purpose.NewCar < 0.5 to the right, improve=2.289513, (0 missing)
## InstallmentRatePercentage < 2.5 to the right, improve=1.865086, (0 missing)
## Age < 30.5 to the right, improve=1.542534, (0 missing)
##
## Node number 19: 11 observations
## predicted class=1 expected loss=0 P(node) =0.01375
## class counts: 0 11
## probabilities: 0.000 1.000
##
## Node number 26: 136 observations, complexity param=0.01965066
## predicted class=1 expected loss=0.4191176 P(node) =0.17
## class counts: 57 79
## probabilities: 0.419 0.581
## left son=52 (31 obs) right son=53 (105 obs)
## Primary splits:
## Age < 25.5 to the left, improve=4.103230, (0 missing)
## Personal.Male.Single < 0.5 to the left, improve=3.308824, (0 missing)
## Purpose.NewCar < 0.5 to the right, improve=3.045537, (0 missing)
## Housing.Rent < 0.5 to the right, improve=2.499899, (0 missing)
## Amount < 931.5 to the left, improve=2.272952, (0 missing)
## Surrogate splits:
## OtherDebtorsGuarantors.None < 0.5 to the left, agree=0.794, adj=0.097, (0 split)
## Duration < 54 to the right, agree=0.779, adj=0.032, (0 split)
## Amount < 546.5 to the left, agree=0.779, adj=0.032, (0 split)
##
## Node number 27: 58 observations, complexity param=0.01310044
## predicted class=1 expected loss=0.1896552 P(node) =0.0725
## class counts: 11 47
## probabilities: 0.190 0.810
## left son=54 (7 obs) right son=55 (51 obs)
## Primary splits:
## Duration < 22 to the right, improve=4.3822080, (0 missing)
## Age < 31.5 to the left, improve=2.1545090, (0 missing)
## OtherDebtorsGuarantors.None < 0.5 to the right, improve=1.5894910, (0 missing)
## Amount < 1221.5 to the right, improve=1.1907440, (0 missing)
## Purpose.Furniture.Equipment < 0.5 to the right, improve=0.9088187, (0 missing)
## Surrogate splits:
## OtherInstallmentPlans.Stores < 0.5 to the right, agree=0.914, adj=0.286, (0 split)
## Personal.Male.Divorced.Seperated < 0.5 to the right, agree=0.897, adj=0.143, (0 split)
##
## Node number 36: 121 observations, complexity param=0.02401747
## predicted class=0 expected loss=0.446281 P(node) =0.15125
## class counts: 67 54
## probabilities: 0.554 0.446
## left son=72 (82 obs) right son=73 (39 obs)
## Primary splits:
## InstallmentRatePercentage < 2.5 to the right, improve=2.368882, (0 missing)
## Purpose.Furniture.Equipment < 0.5 to the left, improve=2.368882, (0 missing)
## Amount < 1577.5 to the left, improve=2.144262, (0 missing)
## Purpose.NewCar < 0.5 to the right, improve=1.585437, (0 missing)
## OtherDebtorsGuarantors.None < 0.5 to the right, improve=1.149010, (0 missing)
## Surrogate splits:
## Amount < 3571 to the left, agree=0.744, adj=0.205, (0 split)
## Personal.Male.Divorced.Seperated < 0.5 to the left, agree=0.744, adj=0.205, (0 split)
## Duration < 29 to the left, agree=0.686, adj=0.026, (0 split)
## NumberExistingCredits < 2.5 to the left, agree=0.686, adj=0.026, (0 split)
## Purpose.Furniture.Equipment < 0.5 to the left, agree=0.686, adj=0.026, (0 split)
##
## Node number 37: 8 observations
## predicted class=1 expected loss=0.125 P(node) =0.01
## class counts: 1 7
## probabilities: 0.125 0.875
##
## Node number 52: 31 observations
## predicted class=0 expected loss=0.3548387 P(node) =0.03875
## class counts: 20 11
## probabilities: 0.645 0.355
##
## Node number 53: 105 observations, complexity param=0.01310044
## predicted class=1 expected loss=0.352381 P(node) =0.13125
## class counts: 37 68
## probabilities: 0.352 0.648
## left son=106 (52 obs) right son=107 (53 obs)
## Primary splits:
## SavingsAccountBonds.lt.100 < 0.5 to the right, improve=2.455014, (0 missing)
## Age < 48.5 to the right, improve=1.981837, (0 missing)
## Amount < 931.5 to the left, improve=1.964626, (0 missing)
## Housing.Own < 0.5 to the left, improve=1.689451, (0 missing)
## Personal.Male.Single < 0.5 to the left, improve=1.658566, (0 missing)
## Surrogate splits:
## SavingsAccountBonds.100.to.500 < 0.5 to the left, agree=0.724, adj=0.442, (0 split)
## Job.SkilledEmployee < 0.5 to the left, agree=0.610, adj=0.212, (0 split)
## Age < 31.5 to the right, agree=0.600, adj=0.192, (0 split)
## Telephone < 0.5 to the left, agree=0.600, adj=0.192, (0 split)
## Purpose.Furniture.Equipment < 0.5 to the right, agree=0.571, adj=0.135, (0 split)
##
## Node number 54: 7 observations
## predicted class=0 expected loss=0.2857143 P(node) =0.00875
## class counts: 5 2
## probabilities: 0.714 0.286
##
## Node number 55: 51 observations
## predicted class=1 expected loss=0.1176471 P(node) =0.06375
## class counts: 6 45
## probabilities: 0.118 0.882
##
## Node number 72: 82 observations, complexity param=0.01310044
## predicted class=0 expected loss=0.3780488 P(node) =0.1025
## class counts: 51 31
## probabilities: 0.622 0.378
## left son=144 (40 obs) right son=145 (42 obs)
## Primary splits:
## Amount < 1577.5 to the left, improve=1.658595, (0 missing)
## Telephone < 0.5 to the right, improve=1.449397, (0 missing)
## Purpose.NewCar < 0.5 to the right, improve=1.370800, (0 missing)
## Purpose.Radio.Television < 0.5 to the left, improve=1.132404, (0 missing)
## Age < 55 to the left, improve=1.081246, (0 missing)
## Surrogate splits:
## Purpose.Furniture.Equipment < 0.5 to the left, agree=0.646, adj=0.275, (0 split)
## Duration < 16.5 to the left, agree=0.634, adj=0.250, (0 split)
## InstallmentRatePercentage < 3.5 to the right, agree=0.622, adj=0.225, (0 split)
## Telephone < 0.5 to the right, agree=0.622, adj=0.225, (0 split)
## Personal.Male.Single < 0.5 to the left, agree=0.622, adj=0.225, (0 split)
##
## Node number 73: 39 observations, complexity param=0.01310044
## predicted class=1 expected loss=0.4102564 P(node) =0.04875
## class counts: 16 23
## probabilities: 0.410 0.590
## left son=146 (26 obs) right son=147 (13 obs)
## Primary splits:
## Duration < 15.5 to the right, improve=2.564103, (0 missing)
## Telephone < 0.5 to the left, improve=1.538462, (0 missing)
## EmploymentDuration.lt.1 < 0.5 to the right, improve=1.538462, (0 missing)
## Age < 30.5 to the right, improve=1.257664, (0 missing)
## Amount < 1961.5 to the right, improve=1.189036, (0 missing)
## Surrogate splits:
## Amount < 1828.5 to the right, agree=0.846, adj=0.538, (0 split)
## Age < 35 to the left, agree=0.692, adj=0.077, (0 split)
##
## Node number 106: 52 observations, complexity param=0.01310044
## predicted class=1 expected loss=0.4615385 P(node) =0.065
## class counts: 24 28
## probabilities: 0.462 0.538
## left son=212 (32 obs) right son=213 (20 obs)
## Primary splits:
## NumberExistingCredits < 1.5 to the left, improve=2.908654, (0 missing)
## Duration < 28.5 to the right, improve=2.447658, (0 missing)
## Age < 35.5 to the right, improve=1.846154, (0 missing)
## ResidenceDuration < 1.5 to the right, improve=1.246671, (0 missing)
## EmploymentDuration.gt.7 < 0.5 to the right, improve=1.231775, (0 missing)
## Surrogate splits:
## Age < 27.5 to the right, agree=0.673, adj=0.15, (0 split)
## InstallmentRatePercentage < 1.5 to the right, agree=0.654, adj=0.10, (0 split)
## CreditHistory.PaidDuly < 0.5 to the right, agree=0.654, adj=0.10, (0 split)
## OtherInstallmentPlans.Stores < 0.5 to the left, agree=0.654, adj=0.10, (0 split)
## Job.UnemployedUnskilled < 0.5 to the left, agree=0.654, adj=0.10, (0 split)
##
## Node number 107: 53 observations, complexity param=0.01091703
## predicted class=1 expected loss=0.245283 P(node) =0.06625
## class counts: 13 40
## probabilities: 0.245 0.755
## left son=214 (24 obs) right son=215 (29 obs)
## Primary splits:
## SavingsAccountBonds.100.to.500 < 0.5 to the right, improve=2.576664, (0 missing)
## Amount < 1930 to the left, improve=1.693587, (0 missing)
## EmploymentDuration.lt.1 < 0.5 to the right, improve=1.611103, (0 missing)
## EmploymentDuration.1.to.4 < 0.5 to the left, improve=1.334922, (0 missing)
## NumberExistingCredits < 1.5 to the right, improve=1.280380, (0 missing)
## Surrogate splits:
## Purpose.NewCar < 0.5 to the right, agree=0.679, adj=0.292, (0 split)
## EmploymentDuration.1.to.4 < 0.5 to the left, agree=0.660, adj=0.250, (0 split)
## EmploymentDuration.lt.1 < 0.5 to the right, agree=0.642, adj=0.208, (0 split)
## Age < 31.5 to the left, agree=0.604, adj=0.125, (0 split)
## InstallmentRatePercentage < 1.5 to the left, agree=0.585, adj=0.083, (0 split)
##
## Node number 144: 40 observations
## predicted class=0 expected loss=0.275 P(node) =0.05
## class counts: 29 11
## probabilities: 0.725 0.275
##
## Node number 145: 42 observations, complexity param=0.01310044
## predicted class=0 expected loss=0.4761905 P(node) =0.0525
## class counts: 22 20
## probabilities: 0.524 0.476
## left son=290 (30 obs) right son=291 (12 obs)
## Primary splits:
## Amount < 2135.5 to the right, improve=2.5190480, (0 missing)
## SavingsAccountBonds.lt.100 < 0.5 to the right, improve=2.0836940, (0 missing)
## Housing.Own < 0.5 to the left, improve=1.2857140, (0 missing)
## Age < 24.5 to the right, improve=0.9523810, (0 missing)
## NumberExistingCredits < 1.5 to the right, improve=0.6857143, (0 missing)
## Surrogate splits:
## ForeignWorker < 0.5 to the right, agree=0.738, adj=0.083, (0 split)
##
## Node number 146: 26 observations, complexity param=0.01310044
## predicted class=0 expected loss=0.4615385 P(node) =0.0325
## class counts: 14 12
## probabilities: 0.538 0.462
## left son=292 (12 obs) right son=293 (14 obs)
## Primary splits:
## Amount < 3506.5 to the left, improve=1.9945050, (0 missing)
## Duration < 19 to the left, improve=1.3594410, (0 missing)
## EmploymentDuration.lt.1 < 0.5 to the right, improve=1.0341880, (0 missing)
## Age < 30.5 to the right, improve=0.8480769, (0 missing)
## Property.CarOther < 0.5 to the left, improve=0.6175214, (0 missing)
## Surrogate splits:
## Age < 26.5 to the left, agree=0.731, adj=0.417, (0 split)
## Duration < 19 to the left, agree=0.654, adj=0.250, (0 split)
## Telephone < 0.5 to the right, agree=0.654, adj=0.250, (0 split)
## Purpose.Radio.Television < 0.5 to the right, agree=0.654, adj=0.250, (0 split)
## Housing.Rent < 0.5 to the right, agree=0.654, adj=0.250, (0 split)
##
## Node number 147: 13 observations
## predicted class=1 expected loss=0.1538462 P(node) =0.01625
## class counts: 2 11
## probabilities: 0.154 0.846
##
## Node number 212: 32 observations, complexity param=0.01310044
## predicted class=0 expected loss=0.40625 P(node) =0.04
## class counts: 19 13
## probabilities: 0.594 0.406
## left son=424 (22 obs) right son=425 (10 obs)
## Primary splits:
## Age < 32.5 to the right, improve=2.5102270, (0 missing)
## Duration < 11.5 to the right, improve=1.7003570, (0 missing)
## CreditHistory.PaidDuly < 0.5 to the left, improve=1.1002450, (0 missing)
## Amount < 1383 to the left, improve=0.8481280, (0 missing)
## EmploymentDuration.gt.7 < 0.5 to the right, improve=0.5976732, (0 missing)
## Surrogate splits:
## EmploymentDuration.lt.1 < 0.5 to the left, agree=0.812, adj=0.4, (0 split)
## Duration < 7.5 to the right, agree=0.719, adj=0.1, (0 split)
## CreditHistory.PaidDuly < 0.5 to the left, agree=0.719, adj=0.1, (0 split)
##
## Node number 213: 20 observations
## predicted class=1 expected loss=0.25 P(node) =0.025
## class counts: 5 15
## probabilities: 0.250 0.750
##
## Node number 214: 24 observations, complexity param=0.01091703
## predicted class=1 expected loss=0.4166667 P(node) =0.03
## class counts: 10 14
## probabilities: 0.417 0.583
## left son=428 (7 obs) right son=429 (17 obs)
## Primary splits:
## Job.SkilledEmployee < 0.5 to the left, improve=3.8347340, (0 missing)
## Amount < 2814.5 to the left, improve=1.9603730, (0 missing)
## Age < 31.5 to the right, improve=1.1523810, (0 missing)
## Property.CarOther < 0.5 to the left, improve=1.1523810, (0 missing)
## NumberExistingCredits < 1.5 to the right, improve=0.6736597, (0 missing)
## Surrogate splits:
## Job.UnskilledResident < 0.5 to the right, agree=0.875, adj=0.571, (0 split)
## Amount < 6499 to the right, agree=0.833, adj=0.429, (0 split)
## InstallmentRatePercentage < 1.5 to the left, agree=0.792, adj=0.286, (0 split)
## Property.Insurance < 0.5 to the right, agree=0.792, adj=0.286, (0 split)
## OtherInstallmentPlans.Stores < 0.5 to the right, agree=0.792, adj=0.286, (0 split)
##
## Node number 215: 29 observations
## predicted class=1 expected loss=0.1034483 P(node) =0.03625
## class counts: 3 26
## probabilities: 0.103 0.897
##
## Node number 290: 30 observations, complexity param=0.01310044
## predicted class=0 expected loss=0.3666667 P(node) =0.0375
## class counts: 19 11
## probabilities: 0.633 0.367
## left son=580 (22 obs) right son=581 (8 obs)
## Primary splits:
## SavingsAccountBonds.lt.100 < 0.5 to the right, improve=3.206061, (0 missing)
## Purpose.Radio.Television < 0.5 to the left, improve=1.456061, (0 missing)
## Age < 28.5 to the right, improve=1.354148, (0 missing)
## Duration < 15 to the left, improve=1.274242, (0 missing)
## NumberExistingCredits < 1.5 to the right, improve=1.274242, (0 missing)
## Surrogate splits:
## SavingsAccountBonds.500.to.1000 < 0.5 to the left, agree=0.833, adj=0.375, (0 split)
## SavingsAccountBonds.gt.1000 < 0.5 to the left, agree=0.833, adj=0.375, (0 split)
## Age < 22.5 to the right, agree=0.800, adj=0.250, (0 split)
## OtherInstallmentPlans.Stores < 0.5 to the left, agree=0.800, adj=0.250, (0 split)
## Duration < 29 to the left, agree=0.767, adj=0.125, (0 split)
##
## Node number 291: 12 observations
## predicted class=1 expected loss=0.25 P(node) =0.015
## class counts: 3 9
## probabilities: 0.250 0.750
##
## Node number 292: 12 observations
## predicted class=0 expected loss=0.25 P(node) =0.015
## class counts: 9 3
## probabilities: 0.750 0.250
##
## Node number 293: 14 observations
## predicted class=1 expected loss=0.3571429 P(node) =0.0175
## class counts: 5 9
## probabilities: 0.357 0.643
##
## Node number 424: 22 observations
## predicted class=0 expected loss=0.2727273 P(node) =0.0275
## class counts: 16 6
## probabilities: 0.727 0.273
##
## Node number 425: 10 observations
## predicted class=1 expected loss=0.3 P(node) =0.0125
## class counts: 3 7
## probabilities: 0.300 0.700
##
## Node number 428: 7 observations
## predicted class=0 expected loss=0.1428571 P(node) =0.00875
## class counts: 6 1
## probabilities: 0.857 0.143
##
## Node number 429: 17 observations
## predicted class=1 expected loss=0.2352941 P(node) =0.02125
## class counts: 4 13
## probabilities: 0.235 0.765
##
## Node number 580: 22 observations
## predicted class=0 expected loss=0.2272727 P(node) =0.0275
## class counts: 17 5
## probabilities: 0.773 0.227
##
## Node number 581: 8 observations
## predicted class=1 expected loss=0.25 P(node) =0.01
## class counts: 2 6
## probabilities: 0.250 0.750
Your observation:
rpart.plot(GermanCredit_tree,extra=1, yesno=2)
summary(GermanCredit_tree)
## Call:
## rpart(formula = Class ~ ., data = GermanCredit_train)
## n= 800
##
## CP nsplit rel error xerror xstd
## 1 0.03056769 0 1.0000000 1.000000 0.05582842
## 2 0.02401747 4 0.8777293 1.013100 0.05604510
## 3 0.01965066 8 0.7729258 1.004367 0.05590117
## 4 0.01310044 10 0.7336245 1.017467 0.05611630
## 5 0.01091703 19 0.6069869 1.013100 0.05604510
## 6 0.01000000 21 0.5851528 1.008734 0.05597339
##
## Variable importance
## Amount Duration
## 15 14
## CheckingAccountStatus.lt.0 CheckingAccountStatus.0.to.200
## 14 14
## Age SavingsAccountBonds.lt.100
## 7 4
## Purpose.UsedCar Job.SkilledEmployee
## 4 3
## InstallmentRatePercentage Property.RealEstate
## 3 3
## SavingsAccountBonds.100.to.500 OtherInstallmentPlans.Stores
## 3 2
## NumberExistingCredits Purpose.Business
## 2 2
## Job.UnskilledResident EmploymentDuration.lt.1
## 1 1
## OtherDebtorsGuarantors.None Telephone
## 1 1
## SavingsAccountBonds.500.to.1000 SavingsAccountBonds.gt.1000
## 1 1
## Personal.Male.Divorced.Seperated Property.Insurance
## 1 1
## Purpose.Furniture.Equipment
## 1
##
## Node number 1: 800 observations, complexity param=0.03056769
## predicted class=1 expected loss=0.28625 P(node) =1
## class counts: 229 571
## probabilities: 0.286 0.714
## left son=2 (211 obs) right son=3 (589 obs)
## Primary splits:
## CheckingAccountStatus.lt.0 < 0.5 to the right, improve=21.222720, (0 missing)
## Duration < 25.5 to the right, improve=13.584620, (0 missing)
## Amount < 10918 to the right, improve=12.537530, (0 missing)
## SavingsAccountBonds.lt.100 < 0.5 to the right, improve= 8.092071, (0 missing)
## CreditHistory.ThisBank.AllPaid < 0.5 to the right, improve= 7.040837, (0 missing)
## Surrogate splits:
## Amount < 355.5 to the left, agree=0.738, adj=0.005, (0 split)
##
## Node number 2: 211 observations, complexity param=0.03056769
## predicted class=1 expected loss=0.478673 P(node) =0.26375
## class counts: 101 110
## probabilities: 0.479 0.521
## left son=4 (178 obs) right son=5 (33 obs)
## Primary splits:
## Duration < 11.5 to the right, improve=8.373770, (0 missing)
## Amount < 4802.5 to the right, improve=4.982836, (0 missing)
## CreditHistory.Delay < 0.5 to the right, improve=3.726962, (0 missing)
## Job.SkilledEmployee < 0.5 to the right, improve=3.414315, (0 missing)
## ForeignWorker < 0.5 to the right, improve=3.382024, (0 missing)
## Surrogate splits:
## Age < 66.5 to the left, agree=0.858, adj=0.091, (0 split)
## Amount < 617.5 to the right, agree=0.853, adj=0.061, (0 split)
##
## Node number 3: 589 observations, complexity param=0.03056769
## predicted class=1 expected loss=0.2173175 P(node) =0.73625
## class counts: 128 461
## probabilities: 0.217 0.783
## left son=6 (210 obs) right son=7 (379 obs)
## Primary splits:
## CheckingAccountStatus.0.to.200 < 0.5 to the right, improve=20.662260, (0 missing)
## Amount < 10918 to the right, improve=15.274340, (0 missing)
## Duration < 25.5 to the right, improve= 8.276487, (0 missing)
## OtherInstallmentPlans.Bank < 0.5 to the right, improve= 5.258972, (0 missing)
## Age < 25.5 to the left, improve= 4.661922, (0 missing)
## Surrogate splits:
## Duration < 43.5 to the right, agree=0.660, adj=0.048, (0 split)
## Amount < 11191 to the right, agree=0.660, adj=0.048, (0 split)
## CreditHistory.NoCredit.AllPaid < 0.5 to the right, agree=0.654, adj=0.029, (0 split)
## CreditHistory.ThisBank.AllPaid < 0.5 to the right, agree=0.652, adj=0.024, (0 split)
## SavingsAccountBonds.100.to.500 < 0.5 to the right, agree=0.650, adj=0.019, (0 split)
##
## Node number 4: 178 observations, complexity param=0.02401747
## predicted class=0 expected loss=0.4606742 P(node) =0.2225
## class counts: 96 82
## probabilities: 0.539 0.461
## left son=8 (38 obs) right son=9 (140 obs)
## Primary splits:
## Duration < 31.5 to the right, improve=3.769739, (0 missing)
## Job.SkilledEmployee < 0.5 to the right, improve=3.558204, (0 missing)
## CreditHistory.Delay < 0.5 to the right, improve=2.756581, (0 missing)
## Amount < 4802.5 to the right, improve=2.493525, (0 missing)
## Purpose.NewCar < 0.5 to the right, improve=2.196990, (0 missing)
## Surrogate splits:
## Amount < 6668.5 to the right, agree=0.843, adj=0.263, (0 split)
##
## Node number 5: 33 observations
## predicted class=1 expected loss=0.1515152 P(node) =0.04125
## class counts: 5 28
## probabilities: 0.152 0.848
##
## Node number 6: 210 observations, complexity param=0.03056769
## predicted class=1 expected loss=0.3952381 P(node) =0.2625
## class counts: 83 127
## probabilities: 0.395 0.605
## left son=12 (16 obs) right son=13 (194 obs)
## Primary splits:
## Amount < 9908.5 to the right, improve=10.185580, (0 missing)
## Duration < 22.5 to the right, improve= 6.836080, (0 missing)
## Property.RealEstate < 0.5 to the left, improve= 6.773416, (0 missing)
## Housing.Own < 0.5 to the left, improve= 4.050114, (0 missing)
## Age < 25.5 to the left, improve= 2.835462, (0 missing)
##
## Node number 7: 379 observations
## predicted class=1 expected loss=0.1187335 P(node) =0.47375
## class counts: 45 334
## probabilities: 0.119 0.881
##
## Node number 8: 38 observations
## predicted class=0 expected loss=0.2631579 P(node) =0.0475
## class counts: 28 10
## probabilities: 0.737 0.263
##
## Node number 9: 140 observations, complexity param=0.02401747
## predicted class=1 expected loss=0.4857143 P(node) =0.175
## class counts: 68 72
## probabilities: 0.486 0.514
## left son=18 (129 obs) right son=19 (11 obs)
## Primary splits:
## Purpose.UsedCar < 0.5 to the left, improve=5.632780, (0 missing)
## Amount < 1377 to the left, improve=3.929252, (0 missing)
## Purpose.NewCar < 0.5 to the right, improve=3.629554, (0 missing)
## Purpose.Business < 0.5 to the left, improve=2.208009, (0 missing)
## InstallmentRatePercentage < 2.5 to the right, improve=1.545196, (0 missing)
## Surrogate splits:
## Age < 61.5 to the left, agree=0.929, adj=0.091, (0 split)
##
## Node number 12: 16 observations
## predicted class=0 expected loss=0.0625 P(node) =0.02
## class counts: 15 1
## probabilities: 0.938 0.062
##
## Node number 13: 194 observations, complexity param=0.01965066
## predicted class=1 expected loss=0.3505155 P(node) =0.2425
## class counts: 68 126
## probabilities: 0.351 0.649
## left son=26 (136 obs) right son=27 (58 obs)
## Primary splits:
## Property.RealEstate < 0.5 to the left, improve=4.281722, (0 missing)
## Duration < 22.5 to the right, improve=3.588005, (0 missing)
## Age < 25.5 to the left, improve=3.343549, (0 missing)
## CreditHistory.ThisBank.AllPaid < 0.5 to the right, improve=2.575549, (0 missing)
## OtherDebtorsGuarantors.None < 0.5 to the right, improve=2.533807, (0 missing)
## Surrogate splits:
## OtherDebtorsGuarantors.None < 0.5 to the right, agree=0.768, adj=0.224, (0 split)
## Amount < 632 to the right, agree=0.716, adj=0.052, (0 split)
## Age < 20.5 to the right, agree=0.706, adj=0.017, (0 split)
## OtherDebtorsGuarantors.CoApplicant < 0.5 to the left, agree=0.706, adj=0.017, (0 split)
## Job.UnskilledResident < 0.5 to the left, agree=0.706, adj=0.017, (0 split)
##
## Node number 18: 129 observations, complexity param=0.02401747
## predicted class=0 expected loss=0.4728682 P(node) =0.16125
## class counts: 68 61
## probabilities: 0.527 0.473
## left son=36 (121 obs) right son=37 (8 obs)
## Primary splits:
## Purpose.Business < 0.5 to the left, improve=2.758425, (0 missing)
## Amount < 1377 to the left, improve=2.425020, (0 missing)
## Purpose.NewCar < 0.5 to the right, improve=2.289513, (0 missing)
## InstallmentRatePercentage < 2.5 to the right, improve=1.865086, (0 missing)
## Age < 30.5 to the right, improve=1.542534, (0 missing)
##
## Node number 19: 11 observations
## predicted class=1 expected loss=0 P(node) =0.01375
## class counts: 0 11
## probabilities: 0.000 1.000
##
## Node number 26: 136 observations, complexity param=0.01965066
## predicted class=1 expected loss=0.4191176 P(node) =0.17
## class counts: 57 79
## probabilities: 0.419 0.581
## left son=52 (31 obs) right son=53 (105 obs)
## Primary splits:
## Age < 25.5 to the left, improve=4.103230, (0 missing)
## Personal.Male.Single < 0.5 to the left, improve=3.308824, (0 missing)
## Purpose.NewCar < 0.5 to the right, improve=3.045537, (0 missing)
## Housing.Rent < 0.5 to the right, improve=2.499899, (0 missing)
## Amount < 931.5 to the left, improve=2.272952, (0 missing)
## Surrogate splits:
## OtherDebtorsGuarantors.None < 0.5 to the left, agree=0.794, adj=0.097, (0 split)
## Duration < 54 to the right, agree=0.779, adj=0.032, (0 split)
## Amount < 546.5 to the left, agree=0.779, adj=0.032, (0 split)
##
## Node number 27: 58 observations, complexity param=0.01310044
## predicted class=1 expected loss=0.1896552 P(node) =0.0725
## class counts: 11 47
## probabilities: 0.190 0.810
## left son=54 (7 obs) right son=55 (51 obs)
## Primary splits:
## Duration < 22 to the right, improve=4.3822080, (0 missing)
## Age < 31.5 to the left, improve=2.1545090, (0 missing)
## OtherDebtorsGuarantors.None < 0.5 to the right, improve=1.5894910, (0 missing)
## Amount < 1221.5 to the right, improve=1.1907440, (0 missing)
## Purpose.Furniture.Equipment < 0.5 to the right, improve=0.9088187, (0 missing)
## Surrogate splits:
## OtherInstallmentPlans.Stores < 0.5 to the right, agree=0.914, adj=0.286, (0 split)
## Personal.Male.Divorced.Seperated < 0.5 to the right, agree=0.897, adj=0.143, (0 split)
##
## Node number 36: 121 observations, complexity param=0.02401747
## predicted class=0 expected loss=0.446281 P(node) =0.15125
## class counts: 67 54
## probabilities: 0.554 0.446
## left son=72 (82 obs) right son=73 (39 obs)
## Primary splits:
## InstallmentRatePercentage < 2.5 to the right, improve=2.368882, (0 missing)
## Purpose.Furniture.Equipment < 0.5 to the left, improve=2.368882, (0 missing)
## Amount < 1577.5 to the left, improve=2.144262, (0 missing)
## Purpose.NewCar < 0.5 to the right, improve=1.585437, (0 missing)
## OtherDebtorsGuarantors.None < 0.5 to the right, improve=1.149010, (0 missing)
## Surrogate splits:
## Amount < 3571 to the left, agree=0.744, adj=0.205, (0 split)
## Personal.Male.Divorced.Seperated < 0.5 to the left, agree=0.744, adj=0.205, (0 split)
## Duration < 29 to the left, agree=0.686, adj=0.026, (0 split)
## NumberExistingCredits < 2.5 to the left, agree=0.686, adj=0.026, (0 split)
## Purpose.Furniture.Equipment < 0.5 to the left, agree=0.686, adj=0.026, (0 split)
##
## Node number 37: 8 observations
## predicted class=1 expected loss=0.125 P(node) =0.01
## class counts: 1 7
## probabilities: 0.125 0.875
##
## Node number 52: 31 observations
## predicted class=0 expected loss=0.3548387 P(node) =0.03875
## class counts: 20 11
## probabilities: 0.645 0.355
##
## Node number 53: 105 observations, complexity param=0.01310044
## predicted class=1 expected loss=0.352381 P(node) =0.13125
## class counts: 37 68
## probabilities: 0.352 0.648
## left son=106 (52 obs) right son=107 (53 obs)
## Primary splits:
## SavingsAccountBonds.lt.100 < 0.5 to the right, improve=2.455014, (0 missing)
## Age < 48.5 to the right, improve=1.981837, (0 missing)
## Amount < 931.5 to the left, improve=1.964626, (0 missing)
## Housing.Own < 0.5 to the left, improve=1.689451, (0 missing)
## Personal.Male.Single < 0.5 to the left, improve=1.658566, (0 missing)
## Surrogate splits:
## SavingsAccountBonds.100.to.500 < 0.5 to the left, agree=0.724, adj=0.442, (0 split)
## Job.SkilledEmployee < 0.5 to the left, agree=0.610, adj=0.212, (0 split)
## Age < 31.5 to the right, agree=0.600, adj=0.192, (0 split)
## Telephone < 0.5 to the left, agree=0.600, adj=0.192, (0 split)
## Purpose.Furniture.Equipment < 0.5 to the right, agree=0.571, adj=0.135, (0 split)
##
## Node number 54: 7 observations
## predicted class=0 expected loss=0.2857143 P(node) =0.00875
## class counts: 5 2
## probabilities: 0.714 0.286
##
## Node number 55: 51 observations
## predicted class=1 expected loss=0.1176471 P(node) =0.06375
## class counts: 6 45
## probabilities: 0.118 0.882
##
## Node number 72: 82 observations, complexity param=0.01310044
## predicted class=0 expected loss=0.3780488 P(node) =0.1025
## class counts: 51 31
## probabilities: 0.622 0.378
## left son=144 (40 obs) right son=145 (42 obs)
## Primary splits:
## Amount < 1577.5 to the left, improve=1.658595, (0 missing)
## Telephone < 0.5 to the right, improve=1.449397, (0 missing)
## Purpose.NewCar < 0.5 to the right, improve=1.370800, (0 missing)
## Purpose.Radio.Television < 0.5 to the left, improve=1.132404, (0 missing)
## Age < 55 to the left, improve=1.081246, (0 missing)
## Surrogate splits:
## Purpose.Furniture.Equipment < 0.5 to the left, agree=0.646, adj=0.275, (0 split)
## Duration < 16.5 to the left, agree=0.634, adj=0.250, (0 split)
## InstallmentRatePercentage < 3.5 to the right, agree=0.622, adj=0.225, (0 split)
## Telephone < 0.5 to the right, agree=0.622, adj=0.225, (0 split)
## Personal.Male.Single < 0.5 to the left, agree=0.622, adj=0.225, (0 split)
##
## Node number 73: 39 observations, complexity param=0.01310044
## predicted class=1 expected loss=0.4102564 P(node) =0.04875
## class counts: 16 23
## probabilities: 0.410 0.590
## left son=146 (26 obs) right son=147 (13 obs)
## Primary splits:
## Duration < 15.5 to the right, improve=2.564103, (0 missing)
## Telephone < 0.5 to the left, improve=1.538462, (0 missing)
## EmploymentDuration.lt.1 < 0.5 to the right, improve=1.538462, (0 missing)
## Age < 30.5 to the right, improve=1.257664, (0 missing)
## Amount < 1961.5 to the right, improve=1.189036, (0 missing)
## Surrogate splits:
## Amount < 1828.5 to the right, agree=0.846, adj=0.538, (0 split)
## Age < 35 to the left, agree=0.692, adj=0.077, (0 split)
##
## Node number 106: 52 observations, complexity param=0.01310044
## predicted class=1 expected loss=0.4615385 P(node) =0.065
## class counts: 24 28
## probabilities: 0.462 0.538
## left son=212 (32 obs) right son=213 (20 obs)
## Primary splits:
## NumberExistingCredits < 1.5 to the left, improve=2.908654, (0 missing)
## Duration < 28.5 to the right, improve=2.447658, (0 missing)
## Age < 35.5 to the right, improve=1.846154, (0 missing)
## ResidenceDuration < 1.5 to the right, improve=1.246671, (0 missing)
## EmploymentDuration.gt.7 < 0.5 to the right, improve=1.231775, (0 missing)
## Surrogate splits:
## Age < 27.5 to the right, agree=0.673, adj=0.15, (0 split)
## InstallmentRatePercentage < 1.5 to the right, agree=0.654, adj=0.10, (0 split)
## CreditHistory.PaidDuly < 0.5 to the right, agree=0.654, adj=0.10, (0 split)
## OtherInstallmentPlans.Stores < 0.5 to the left, agree=0.654, adj=0.10, (0 split)
## Job.UnemployedUnskilled < 0.5 to the left, agree=0.654, adj=0.10, (0 split)
##
## Node number 107: 53 observations, complexity param=0.01091703
## predicted class=1 expected loss=0.245283 P(node) =0.06625
## class counts: 13 40
## probabilities: 0.245 0.755
## left son=214 (24 obs) right son=215 (29 obs)
## Primary splits:
## SavingsAccountBonds.100.to.500 < 0.5 to the right, improve=2.576664, (0 missing)
## Amount < 1930 to the left, improve=1.693587, (0 missing)
## EmploymentDuration.lt.1 < 0.5 to the right, improve=1.611103, (0 missing)
## EmploymentDuration.1.to.4 < 0.5 to the left, improve=1.334922, (0 missing)
## NumberExistingCredits < 1.5 to the right, improve=1.280380, (0 missing)
## Surrogate splits:
## Purpose.NewCar < 0.5 to the right, agree=0.679, adj=0.292, (0 split)
## EmploymentDuration.1.to.4 < 0.5 to the left, agree=0.660, adj=0.250, (0 split)
## EmploymentDuration.lt.1 < 0.5 to the right, agree=0.642, adj=0.208, (0 split)
## Age < 31.5 to the left, agree=0.604, adj=0.125, (0 split)
## InstallmentRatePercentage < 1.5 to the left, agree=0.585, adj=0.083, (0 split)
##
## Node number 144: 40 observations
## predicted class=0 expected loss=0.275 P(node) =0.05
## class counts: 29 11
## probabilities: 0.725 0.275
##
## Node number 145: 42 observations, complexity param=0.01310044
## predicted class=0 expected loss=0.4761905 P(node) =0.0525
## class counts: 22 20
## probabilities: 0.524 0.476
## left son=290 (30 obs) right son=291 (12 obs)
## Primary splits:
## Amount < 2135.5 to the right, improve=2.5190480, (0 missing)
## SavingsAccountBonds.lt.100 < 0.5 to the right, improve=2.0836940, (0 missing)
## Housing.Own < 0.5 to the left, improve=1.2857140, (0 missing)
## Age < 24.5 to the right, improve=0.9523810, (0 missing)
## NumberExistingCredits < 1.5 to the right, improve=0.6857143, (0 missing)
## Surrogate splits:
## ForeignWorker < 0.5 to the right, agree=0.738, adj=0.083, (0 split)
##
## Node number 146: 26 observations, complexity param=0.01310044
## predicted class=0 expected loss=0.4615385 P(node) =0.0325
## class counts: 14 12
## probabilities: 0.538 0.462
## left son=292 (12 obs) right son=293 (14 obs)
## Primary splits:
## Amount < 3506.5 to the left, improve=1.9945050, (0 missing)
## Duration < 19 to the left, improve=1.3594410, (0 missing)
## EmploymentDuration.lt.1 < 0.5 to the right, improve=1.0341880, (0 missing)
## Age < 30.5 to the right, improve=0.8480769, (0 missing)
## Property.CarOther < 0.5 to the left, improve=0.6175214, (0 missing)
## Surrogate splits:
## Age < 26.5 to the left, agree=0.731, adj=0.417, (0 split)
## Duration < 19 to the left, agree=0.654, adj=0.250, (0 split)
## Telephone < 0.5 to the right, agree=0.654, adj=0.250, (0 split)
## Purpose.Radio.Television < 0.5 to the right, agree=0.654, adj=0.250, (0 split)
## Housing.Rent < 0.5 to the right, agree=0.654, adj=0.250, (0 split)
##
## Node number 147: 13 observations
## predicted class=1 expected loss=0.1538462 P(node) =0.01625
## class counts: 2 11
## probabilities: 0.154 0.846
##
## Node number 212: 32 observations, complexity param=0.01310044
## predicted class=0 expected loss=0.40625 P(node) =0.04
## class counts: 19 13
## probabilities: 0.594 0.406
## left son=424 (22 obs) right son=425 (10 obs)
## Primary splits:
## Age < 32.5 to the right, improve=2.5102270, (0 missing)
## Duration < 11.5 to the right, improve=1.7003570, (0 missing)
## CreditHistory.PaidDuly < 0.5 to the left, improve=1.1002450, (0 missing)
## Amount < 1383 to the left, improve=0.8481280, (0 missing)
## EmploymentDuration.gt.7 < 0.5 to the right, improve=0.5976732, (0 missing)
## Surrogate splits:
## EmploymentDuration.lt.1 < 0.5 to the left, agree=0.812, adj=0.4, (0 split)
## Duration < 7.5 to the right, agree=0.719, adj=0.1, (0 split)
## CreditHistory.PaidDuly < 0.5 to the left, agree=0.719, adj=0.1, (0 split)
##
## Node number 213: 20 observations
## predicted class=1 expected loss=0.25 P(node) =0.025
## class counts: 5 15
## probabilities: 0.250 0.750
##
## Node number 214: 24 observations, complexity param=0.01091703
## predicted class=1 expected loss=0.4166667 P(node) =0.03
## class counts: 10 14
## probabilities: 0.417 0.583
## left son=428 (7 obs) right son=429 (17 obs)
## Primary splits:
## Job.SkilledEmployee < 0.5 to the left, improve=3.8347340, (0 missing)
## Amount < 2814.5 to the left, improve=1.9603730, (0 missing)
## Age < 31.5 to the right, improve=1.1523810, (0 missing)
## Property.CarOther < 0.5 to the left, improve=1.1523810, (0 missing)
## NumberExistingCredits < 1.5 to the right, improve=0.6736597, (0 missing)
## Surrogate splits:
## Job.UnskilledResident < 0.5 to the right, agree=0.875, adj=0.571, (0 split)
## Amount < 6499 to the right, agree=0.833, adj=0.429, (0 split)
## InstallmentRatePercentage < 1.5 to the left, agree=0.792, adj=0.286, (0 split)
## Property.Insurance < 0.5 to the right, agree=0.792, adj=0.286, (0 split)
## OtherInstallmentPlans.Stores < 0.5 to the right, agree=0.792, adj=0.286, (0 split)
##
## Node number 215: 29 observations
## predicted class=1 expected loss=0.1034483 P(node) =0.03625
## class counts: 3 26
## probabilities: 0.103 0.897
##
## Node number 290: 30 observations, complexity param=0.01310044
## predicted class=0 expected loss=0.3666667 P(node) =0.0375
## class counts: 19 11
## probabilities: 0.633 0.367
## left son=580 (22 obs) right son=581 (8 obs)
## Primary splits:
## SavingsAccountBonds.lt.100 < 0.5 to the right, improve=3.206061, (0 missing)
## Purpose.Radio.Television < 0.5 to the left, improve=1.456061, (0 missing)
## Age < 28.5 to the right, improve=1.354148, (0 missing)
## Duration < 15 to the left, improve=1.274242, (0 missing)
## NumberExistingCredits < 1.5 to the right, improve=1.274242, (0 missing)
## Surrogate splits:
## SavingsAccountBonds.500.to.1000 < 0.5 to the left, agree=0.833, adj=0.375, (0 split)
## SavingsAccountBonds.gt.1000 < 0.5 to the left, agree=0.833, adj=0.375, (0 split)
## Age < 22.5 to the right, agree=0.800, adj=0.250, (0 split)
## OtherInstallmentPlans.Stores < 0.5 to the left, agree=0.800, adj=0.250, (0 split)
## Duration < 29 to the left, agree=0.767, adj=0.125, (0 split)
##
## Node number 291: 12 observations
## predicted class=1 expected loss=0.25 P(node) =0.015
## class counts: 3 9
## probabilities: 0.250 0.750
##
## Node number 292: 12 observations
## predicted class=0 expected loss=0.25 P(node) =0.015
## class counts: 9 3
## probabilities: 0.750 0.250
##
## Node number 293: 14 observations
## predicted class=1 expected loss=0.3571429 P(node) =0.0175
## class counts: 5 9
## probabilities: 0.357 0.643
##
## Node number 424: 22 observations
## predicted class=0 expected loss=0.2727273 P(node) =0.0275
## class counts: 16 6
## probabilities: 0.727 0.273
##
## Node number 425: 10 observations
## predicted class=1 expected loss=0.3 P(node) =0.0125
## class counts: 3 7
## probabilities: 0.300 0.700
##
## Node number 428: 7 observations
## predicted class=0 expected loss=0.1428571 P(node) =0.00875
## class counts: 6 1
## probabilities: 0.857 0.143
##
## Node number 429: 17 observations
## predicted class=1 expected loss=0.2352941 P(node) =0.02125
## class counts: 4 13
## probabilities: 0.235 0.765
##
## Node number 580: 22 observations
## predicted class=0 expected loss=0.2272727 P(node) =0.0275
## class counts: 17 5
## probabilities: 0.773 0.227
##
## Node number 581: 8 observations
## predicted class=1 expected loss=0.25 P(node) =0.01
## class counts: 2 6
## probabilities: 0.250 0.750
# Make predictions on the training sets
GermanCredit_pred.train <- predict(GermanCredit_tree, GermanCredit_train)
summary(GermanCredit_pred.train)
## 0 1
## Min. :0.0000 Min. :0.0625
## 1st Qu.:0.1187 1st Qu.:0.6429
## Median :0.1187 Median :0.8813
## Mean :0.2863 Mean :0.7137
## 3rd Qu.:0.3571 3rd Qu.:0.8813
## Max. :0.9375 Max. :1.0000
# predictions on the training set with predicted classes
GermanCredit_pred.train <- predict(GermanCredit_tree, GermanCredit_train, type = "class")
GermanCredit_pred.train
## 578 549 557 700 255 913 621 416 105 634 738 29 11 784 925 62
## 1 0 1 1 0 0 1 1 1 1 1 1 0 0 1 1
## 252 398 930 26 172 562 410 32 725 385 203 35 361 238 593 284
## 1 0 0 1 1 0 1 1 1 1 1 1 1 0 1 1
## 304 216 596 476 852 427 442 884 276 951 87 505 997 618 892 900
## 1 1 0 0 1 1 1 1 1 1 1 0 1 1 1 1
## 647 948 441 336 212 835 281 290 217 825 817 310 858 643 153 705
## 1 1 1 1 1 1 1 0 1 1 1 1 1 1 1 1
## 6 788 393 719 717 464 963 354 186 305 627 108 261 720 902 131
## 1 1 0 1 1 1 1 0 1 1 1 1 1 1 1 0
## 938 459 723 414 329 189 259 541 954 747 960 445 334 528 548 209
## 1 1 0 1 1 0 1 1 1 1 1 1 1 1 1 1
## 585 935 752 118 891 402 875 674 147 652 834 873 987 173 702 454
## 1 0 0 1 1 1 1 1 1 0 1 1 1 0 0 1
## 68 543 795 113 463 827 932 736 483 635 943 504 888 94 446 765
## 1 1 1 1 1 1 0 1 1 0 1 1 0 1 1 1
## 982 270 715 457 661 706 266 896 346 34 625 187 1000 411 976 901
## 1 1 0 1 1 1 0 1 1 1 0 0 1 0 1 0
## 737 770 611 109 999 826 805 469 897 369 119 568 789 676 576 766
## 0 1 0 1 0 1 1 1 0 0 0 1 0 1 1 1
## 80 31 425 278 868 899 642 269 586 321 51 249 856 818 185 641
## 0 1 0 1 1 1 1 1 0 1 1 1 1 1 0 0
## 808 247 776 16 955 133 679 513 387 206 24 600 649 348 846 995
## 1 1 0 0 1 1 0 1 1 1 1 1 1 0 1 1
## 60 388 666 980 292 275 664 675 477 927 871 421 25 712 154 520
## 0 1 1 0 1 1 1 1 1 0 1 1 1 1 1 1
## 861 316 589 326 65 350 314 553 778 103 159 920 673 265 754 115
## 1 0 0 1 1 1 0 0 0 1 1 0 1 1 1 1
## 59 564 508 225 830 709 224 638 409 175 521 946 461 95 244 204
## 1 0 0 1 1 1 1 1 1 0 1 1 0 1 1 0
## 364 669 792 619 467 245 917 991 139 640 929 768 144 613 468 135
## 1 0 1 0 1 1 1 1 1 0 1 1 0 1 1 1
## 362 122 535 531 798 620 90 176 975 478 178 489 179 610 104 487
## 1 1 1 1 1 1 0 1 1 1 1 1 1 1 0 1
## 263 599 831 242 887 366 71 384 340 591 291 220 594 527 228 970
## 1 1 1 1 1 1 1 1 1 0 1 1 0 1 0 1
## 500 219 419 730 726 854 672 306 268 449 761 77 150 615 222 289
## 1 0 1 1 1 0 1 1 1 1 1 0 1 1 0 1
## 860 435 437 962 933 996 202 78 655 70 785 947 658 93 941 998
## 1 1 1 1 1 1 0 0 1 1 1 1 1 1 1 1
## 481 685 495 880 967 96 235 412 968 491 315 277 240 58 308 569
## 1 0 1 1 0 0 1 1 1 1 1 1 0 1 0 1
## 213 237 196 735 84 479 694 499 574 550 584 756 341 125 200 691
## 1 0 1 1 1 1 1 1 1 1 0 0 0 1 0 1
## 355 839 554 501 42 894 563 952 471 684 432 359 128 763 631 54
## 1 1 1 0 1 1 1 0 0 1 0 1 1 0 1 1
## 916 551 254 949 786 182 28 874 49 188 984 232 210 807 799 405
## 0 1 1 1 1 0 1 1 1 1 0 1 1 1 1 1
## 50 974 510 161 841 30 815 886 624 130 708 524 745 390 710 327
## 1 0 1 1 0 0 0 0 1 0 0 1 0 1 1 1
## 389 760 829 403 466 429 299 170 408 668 297 395 363 287 86 677
## 1 0 0 1 1 1 1 0 0 1 1 1 1 0 1 1
## 865 570 253 136 703 956 804 248 571 332 124 796 191 66 688 488
## 1 0 0 1 1 0 1 1 0 1 1 1 1 1 1 1
## 958 211 511 582 813 285 264 626 522 651 3 680 881 803 988 904
## 1 1 0 1 0 1 1 1 0 0 1 1 1 1 1 1
## 678 729 117 19 689 298 580 507 101 914 250 465 7 877 957 37
## 0 0 0 0 1 1 1 1 1 1 1 1 1 1 1 1
## 451 309 184 323 836 39 490 503 692 134 722 8 15 971 99 663
## 1 1 1 1 0 1 1 1 1 1 0 0 1 1 0 1
## 426 138 417 573 221 201 246 629 622 73 157 538 43 882 516 79
## 0 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1
## 227 903 462 812 950 231 75 140 711 989 749 698 1 607 923 819
## 0 1 1 1 1 1 0 1 1 1 1 1 1 1 1 0
## 994 283 205 842 274 849 351 145 386 783 226 360 373 575 52 48
## 0 1 1 1 0 1 1 1 1 1 1 0 1 1 1 1
## 707 683 840 714 879 324 660 151 837 937 727 375 605 18 482 33
## 0 1 1 1 1 1 0 1 1 1 1 0 1 1 1 1
## 695 169 98 572 744 966 567 512 823 759 579 912 517 530 866 422
## 1 1 1 1 1 1 1 1 0 1 1 0 1 1 1 1
## 905 152 258 302 197 177 450 713 751 936 368 654 293 907 160 322
## 1 1 0 0 1 1 1 1 1 0 1 0 1 1 1 1
## 486 732 547 833 271 539 940 780 637 337 27 870 61 944 746 764
## 0 0 1 0 1 0 1 0 1 0 1 0 1 1 1 1
## 595 379 36 116 383 88 519 431 127 223 146 965 614 241 972 22
## 0 0 0 1 1 0 1 1 0 1 1 0 1 0 1 1
## 47 632 657 129 328 21 38 928 990 267 869 229 53 338 993 92
## 1 0 0 1 1 1 1 0 1 1 1 1 1 0 0 1
## 514 617 779 319 55 604 606 979 162 142 301 367 243 194 311 494
## 1 1 1 1 1 1 1 1 1 1 1 1 0 1 1 1
## 828 256 910 370 644 400 609 294 452 413 851 750 601 908 774 757
## 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1
## 645 895 392 347 401 820 493 876 166 682 515 498 755 148 455 646
## 1 1 1 1 1 0 1 1 1 1 1 1 1 1 1 1
## 855 506 475 372 2 85 959 342 537 824 656 848 650 295 898 5
## 1 1 0 1 0 1 0 1 1 1 1 1 0 1 1 0
## 406 616 438 257 378 404 953 667 801 806 439 565 782 460 436 76
## 0 0 1 1 1 1 1 1 1 0 0 1 1 1 1 1
## 791 890 889 83 811 365 509 546 282 357 448 909 121 345 9 536
## 0 1 1 1 1 0 1 0 1 1 1 1 1 1 1 1
## 12 356 193 325 192 317 344 163 181 485 198 864 40 353 718 214
## 0 0 0 1 0 1 1 1 1 1 1 1 1 1 1 1
## 969 561 158 333 560 648 636 787 981 132 559 190 853 918 773 234
## 1 1 0 0 1 1 1 1 0 0 0 1 1 1 1 1
## 693 123 985 734 724 300 566 623 82 46 590 800 296 444 81 423
## 1 1 1 1 1 1 0 1 1 1 0 1 0 1 1 1
## 330 977 961 97 23 4 838 931 922 382 72 687 681 14 696 358
## 1 1 1 1 1 0 1 1 1 0 1 1 1 0 1 1
## 168 456 313 832 542 111 407 484 612 628 639 518 307 339 492 767
## 1 1 1 0 1 1 1 1 1 1 1 1 1 0 1 0
## Levels: 0 1
# Create confusion matrix
confusion_train <- table(true = GermanCredit_train$Class, pred = GermanCredit_pred.train)
confusion_train
## pred
## true 0 1
## 0 145 84
## 1 50 521
# Calculate the Misclassification Rate (MR)
MR_train <- 1 - sum(diag(confusion_train)) / sum(confusion_train)
MR_train
## [1] 0.1675
Your observation:
The MR of 0.167 for the training set shows that the model misclassified 16.7% of the observations, achieving an accuracy of 83.3%. This relatively low error rate suggests that the model fits the training data well.
# Use the testing set to predict classes
GermanCredit_pred_test <- predict(GermanCredit_tree, GermanCredit_test, type = "class")
GermanCredit_pred_test
## 10 13 17 20 41 44 45 56 57 63 64 67 69 74 89 91 100 102 106 107
## 1 0 1 1 1 1 0 1 1 0 0 1 1 0 1 1 1 0 0 1
## 110 112 114 120 126 137 141 143 149 155 156 164 165 167 171 174 180 183 195 199
## 1 1 1 1 1 1 1 0 0 0 1 0 1 0 0 1 0 1 0 1
## 207 208 215 218 230 233 236 239 251 260 262 272 273 279 280 286 288 303 312 318
## 1 1 1 1 0 1 1 1 1 1 1 1 0 1 1 0 0 1 1 1
## 320 331 335 343 349 352 371 374 376 377 380 381 391 394 396 397 399 415 418 420
## 1 1 0 1 1 1 1 1 0 1 1 1 1 1 0 0 1 0 1 1
## 424 428 430 433 434 440 443 447 453 458 470 472 473 474 480 496 497 502 523 525
## 1 1 0 1 1 1 1 0 1 1 1 1 1 1 0 0 0 0 0 1
## 526 529 532 533 534 540 544 545 552 555 556 558 577 581 583 587 588 592 597 598
## 1 0 1 1 1 1 1 1 1 1 0 1 1 1 1 1 0 1 0 0
## 602 603 608 630 633 653 659 662 665 670 671 686 690 697 699 701 704 716 721 728
## 1 0 1 1 0 0 1 0 1 1 1 1 1 1 1 1 1 1 1 1
## 731 733 739 740 741 742 743 748 753 758 762 769 771 772 775 777 781 790 793 794
## 0 1 1 0 0 1 1 0 1 1 1 1 1 0 1 1 0 0 1 1
## 797 802 809 810 814 816 821 822 843 844 845 847 850 857 859 862 863 867 872 878
## 1 1 0 1 0 0 1 1 1 1 1 1 0 1 0 1 0 0 1 1
## 883 885 893 906 911 915 919 921 924 926 934 939 942 945 964 973 978 983 986 992
## 0 0 0 1 1 1 0 1 1 0 1 0 1 1 1 0 1 1 0 1
## Levels: 0 1
# Confusion matrix for the testing set
confusion_test <- table(true = GermanCredit_test$Class, pred = GermanCredit_pred_test)
confusion_test
## pred
## true 0 1
## 0 36 35
## 1 26 103
# Calculate the MR for the testing set
MR_test <- 1 - sum(diag(confusion_test)) / sum(confusion_test)
MR_test
## [1] 0.305
Your observation: The MR of 0.305 indicates that 30.5% of the observations in the testing set were misclassified, meaning the model correctly classified 69.5% of the data. This performance could be improved by exploring other techniques or using alternative models such as random forests or SVM.
library(ROCR)
# Obtain predicted probabilities
GermanCredit_pred_prob_test <- predict(GermanCredit_tree, GermanCredit_test, type = "prob")[, 2]
# Generate prediction
pred_test <- prediction(GermanCredit_pred_prob_test, GermanCredit_test$Class)
roc_test <- performance(pred_test, "tpr", "fpr")
# ROC curve
plot(roc_test, colorize = TRUE, main = "ROC Curve Testing Set")
# Calculate and display AUC
auc_test <- performance(pred_test, "auc")
auc_test_value <- unlist(slot(auc_test, "y.values"))
auc_test_value
## [1] 0.6742548
We will use the built-in mtcars dataset to predict miles per gallon (mpg) using other car characteristics. The dataset includes information about 32 cars from Motor Trend magazine (1973-74).
# Load the mtcars dataset
data(mtcars)
# Display the structure of the dataset
str(mtcars)
## 'data.frame': 32 obs. of 11 variables:
## $ mpg : num 21 21 22.8 21.4 18.7 18.1 14.3 24.4 22.8 19.2 ...
## $ cyl : num 6 6 4 6 8 6 8 4 4 6 ...
## $ disp: num 160 160 108 258 360 ...
## $ hp : num 110 110 93 110 175 105 245 62 95 123 ...
## $ drat: num 3.9 3.9 3.85 3.08 3.15 2.76 3.21 3.69 3.92 3.92 ...
## $ wt : num 2.62 2.88 2.32 3.21 3.44 ...
## $ qsec: num 16.5 17 18.6 19.4 17 ...
## $ vs : num 0 0 1 1 0 1 0 1 1 1 ...
## $ am : num 1 1 1 0 0 0 0 0 0 0 ...
## $ gear: num 4 4 4 3 3 3 3 4 4 4 ...
## $ carb: num 4 4 1 1 2 1 4 2 2 4 ...
set.seed(2024)
# Splitting
index <- sample(1:nrow(mtcars), size = floor(0.85 * nrow(mtcars)))
mtcars_train <- mtcars[index, ]
mtcars_test <- mtcars[-index, ]
# Display the dimensions
dim(mtcars_train)
## [1] 27 11
dim(mtcars_test)
## [1] 5 11
library(rpart)
# regression tree
mtcars_tree <- rpart(mpg ~ ., data = mtcars_train, method = "anova")
summary(mtcars_tree)
## Call:
## rpart(formula = mpg ~ ., data = mtcars_train, method = "anova")
## n= 27
##
## CP nsplit rel error xerror xstd
## 1 0.6121479 0 1.0000000 1.1010927 0.2728919
## 2 0.0100000 1 0.3878521 0.7898291 0.1866011
##
## Variable importance
## cyl disp hp qsec vs wt
## 20 18 18 14 14 14
##
## Node number 1: 27 observations, complexity param=0.6121479
## mean=20.50741, MSE=37.20809
## left son=2 (17 obs) right son=3 (10 obs)
## Primary splits:
## cyl < 5 to the right, improve=0.6121479, (0 missing)
## hp < 118 to the right, improve=0.6068166, (0 missing)
## wt < 3.325 to the right, improve=0.5916267, (0 missing)
## disp < 120.55 to the right, improve=0.5838435, (0 missing)
## vs < 0.5 to the left, improve=0.5158466, (0 missing)
## Surrogate splits:
## disp < 142.9 to the right, agree=0.963, adj=0.9, (0 split)
## hp < 109.5 to the right, agree=0.963, adj=0.9, (0 split)
## wt < 2.5425 to the right, agree=0.889, adj=0.7, (0 split)
## qsec < 18.41 to the left, agree=0.889, adj=0.7, (0 split)
## vs < 0.5 to the left, agree=0.889, adj=0.7, (0 split)
##
## Node number 2: 17 observations
## mean=16.84706, MSE=10.98484
##
## Node number 3: 10 observations
## mean=26.73, MSE=20.2901
library(rpart.plot)
# Plot the tree
rpart.plot(mtcars_tree, type = 3, digits = 2, fallen.leaves = TRUE, main = "Regression Tree for mpg")
mtcars_tree
## n= 27
##
## node), split, n, deviance, yval
## * denotes terminal node
##
## 1) root 27 1004.6190 20.50741
## 2) cyl>=5 17 186.7424 16.84706 *
## 3) cyl< 5 10 202.9010 26.73000 *
Your observation:
mpg of 20.51 and a total deviance
of 1004.62.cyl, dividing the cars into those with cyl >= 5 and cyl
< 5.mpg of 16.85 and a reduced deviance of
186.74. This is a terminal node.mpg of 26.73 and a reduced deviance of
202.90. This is also a terminal node.cyl is the most important predictor,
effectively splitting the data into two distinct groups based on
mpg, with cars having fewer cylinders achieving higher fuel
efficiency.# predictions on the training set
mtcars_train_predictions <- predict(mtcars_tree, mtcars_train)
# Calculate MSE
MSE_train <- mean((mtcars_train$mpg - mtcars_train_predictions)^2)
MSE_train
## [1] 14.43124
# Calculate R-squared
SS_total <- sum((mtcars_train$mpg - mean(mtcars_train$mpg))^2)
SS_residual <- sum((mtcars_train$mpg - mtcars_train_predictions)^2)
R_squared_train <- 1 - (SS_residual / SS_total)
R_squared_train
## [1] 0.6121479
Your observation: - MSE: 14.431, which reflects the
average squared differences between the predicted and actual
mpg values. While not perfect, it suggests the tree
provides a reasonable fit to the training data.
mpg is explained by the regression tree.
This indicates a moderate level of explanatory power, with room for
improvement.# Make predictions
pred_test <- predict(mtcars_tree, newdata = mtcars_test)
# Calculate MSE
mse_test <- mean((mtcars_test$mpg - pred_test)^2)
# Calculate R-squared for the test set
sst <- sum((mtcars_test$mpg - mean(mtcars_test$mpg))^2)
ssr <- sum((mtcars_test$mpg - pred_test)^2)
r2_test <- 1 - (ssr / sst)
# Results
mse_test
## [1] 2.619646
r2_test
## [1] 0.8567122
Your observation:
The MSE for the training set is 14.431, indicating a moderate average squared difference between the actual and predicted mpg values. The R-squared value of 0.612 shows that the model explains approximately 61.2% of the variance in mpg for the training data. While the model demonstrates a decent fit, the results suggest there is room for improvement in prediction accuracy, potentially by refining the model or exploring more complex methods.
The MSE for the testing set is 2.62, reflecting a much lower average squared difference between the actual and predicted mpg values compared to the training set. The R-squared value of 0.857 indicates that the model explains approximately 85.7% of the variance in mpg for the testing data. These results suggest that the model performs significantly better on the testing set, demonstrating strong predictive accuracy and generalizability.