Arboles de Decision

Lectura de Datos:

library(readxl)
datos_marketing <- read.csv("marketing_data.csv",header=TRUE)
datos_marketing <- datos_marketing[,c(-1,-3,-4,-5,-8,-28)]
str(datos_marketing)
## 'data.frame':    2240 obs. of  22 variables:
##  $ Year_Birth         : int  1970 1961 1958 1967 1989 1958 1954 1967 1954 1954 ...
##  $ Kidhome            : int  0 0 0 1 1 0 0 0 0 0 ...
##  $ Teenhome           : int  0 0 1 1 0 0 0 1 1 1 ...
##  $ Recency            : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ MntWines           : int  189 464 134 10 6 336 769 78 384 384 ...
##  $ MntFruits          : int  104 5 11 0 16 130 80 0 0 0 ...
##  $ MntMeatProducts    : int  379 64 59 1 24 411 252 11 102 102 ...
##  $ MntFishProducts    : int  111 7 15 0 11 240 15 0 21 21 ...
##  $ MntSweetProducts   : int  189 0 2 0 0 32 34 0 32 32 ...
##  $ MntGoldProds       : int  218 37 30 0 34 43 65 7 5 5 ...
##  $ NumDealsPurchases  : int  1 1 1 1 2 1 1 1 3 3 ...
##  $ NumWebPurchases    : int  4 7 3 1 3 4 10 2 6 6 ...
##  $ NumCatalogPurchases: int  4 3 2 0 1 7 10 1 2 2 ...
##  $ NumStorePurchases  : int  6 7 5 2 2 5 7 3 9 9 ...
##  $ NumWebVisitsMonth  : int  1 5 2 7 7 2 6 5 4 4 ...
##  $ AcceptedCmp3       : int  0 0 0 0 1 0 1 0 0 0 ...
##  $ AcceptedCmp4       : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ AcceptedCmp5       : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ AcceptedCmp1       : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ AcceptedCmp2       : int  0 1 0 0 0 0 0 0 0 0 ...
##  $ Response           : int  1 1 0 0 1 1 1 0 0 0 ...
##  $ Complain           : int  0 0 0 0 0 0 0 0 0 0 ...

Uso de la librería caret para particionar los datos en muestras bootstrap usando la función "createDataPartition" de la librería "caret", se tomo el 70% de la muestra para el entrenamiento:

library(caret)
set.seed(111)
particion = createDataPartition(y=datos_marketing$Response,p=0.7,list = F, times = 1)
head(particion,3)
##      Resample1
## [1,]         1
## [2,]         4
## [3,]         5

De manera que procedo a obtener los datos para el entrenamiento y para el test:

train = datos_marketing[particion,]
dim(train)
## [1] 1568   22
test= datos_marketing[-particion,]
dim(test)
## [1] 672  22

Balance de Datos

library(ROSE)
## Loaded ROSE 0.0-3
train_2 = ovun.sample(Response ~ ., data = train, method = "both", p=0.5,N=903,seed=1)$data
barplot(table(train_2$Response))

Creamos el modelo de árbol

Para ello haermos uso de la libreria rpart

library(rpart)
set.seed(222)
modelo = rpart(Response~.,data = train_2,method = "class",minsplit=0)
summary(modelo)
## Call:
## rpart(formula = Response ~ ., data = train_2, method = "class", 
##     minsplit = 0)
##   n= 903 
## 
##           CP nsplit rel error    xerror       xstd
## 1 0.27522936      0 1.0000000 1.0000000 0.03444065
## 2 0.06422018      1 0.7247706 0.7247706 0.03287246
## 3 0.05963303      4 0.5298165 0.6559633 0.03206234
## 4 0.04128440      5 0.4701835 0.5481651 0.03040553
## 5 0.01605505      6 0.4288991 0.4885321 0.02926065
## 6 0.01261468      7 0.4128440 0.4610092 0.02867056
## 7 0.01146789      9 0.3876147 0.4426606 0.02825384
## 8 0.01000000     14 0.3211009 0.4403670 0.02820039
## 
## Variable importance
##     MntMeatProducts        MntGoldProds            MntWines NumCatalogPurchases 
##                  14                  12                  12                   8 
##        AcceptedCmp3           MntFruits        AcceptedCmp5     NumWebPurchases 
##                   7                   7                   7                   6 
##   NumStorePurchases   NumWebVisitsMonth             Recency    MntSweetProducts 
##                   5                   5                   4                   4 
##     MntFishProducts   NumDealsPurchases        AcceptedCmp1          Year_Birth 
##                   3                   2                   2                   1 
##        AcceptedCmp4 
##                   1 
## 
## Node number 1: 903 observations,    complexity param=0.2752294
##   predicted class=0  expected loss=0.482835  P(node) =1
##     class counts:   467   436
##    probabilities: 0.517 0.483 
##   left son=2 (265 obs) right son=3 (638 obs)
##   Primary splits:
##       MntGoldProds        < 14.5  to the left,  improve=53.77391, (0 missing)
##       NumCatalogPurchases < 0.5   to the left,  improve=51.26854, (0 missing)
##       AcceptedCmp5        < 0.5   to the left,  improve=49.38106, (0 missing)
##       MntMeatProducts     < 11.5  to the left,  improve=41.56641, (0 missing)
##       AcceptedCmp3        < 0.5   to the left,  improve=41.40645, (0 missing)
##   Surrogate splits:
##       NumCatalogPurchases < 0.5   to the left,  agree=0.837, adj=0.445, (0 split)
##       MntMeatProducts     < 19.5  to the left,  agree=0.812, adj=0.358, (0 split)
##       NumWebPurchases     < 2.5   to the left,  agree=0.803, adj=0.328, (0 split)
##       MntWines            < 41.5  to the left,  agree=0.785, adj=0.268, (0 split)
##       MntFruits           < 3.5   to the left,  agree=0.761, adj=0.185, (0 split)
## 
## Node number 2: 265 observations,    complexity param=0.01146789
##   predicted class=0  expected loss=0.2150943  P(node) =0.2934662
##     class counts:   208    57
##    probabilities: 0.785 0.215 
##   left son=4 (165 obs) right son=5 (100 obs)
##   Primary splits:
##       MntMeatProducts     < 23.5  to the left,  improve=13.486520, (0 missing)
##       NumDealsPurchases   < 4.5   to the left,  improve=12.587360, (0 missing)
##       NumCatalogPurchases < 0.5   to the left,  improve=11.059190, (0 missing)
##       NumWebPurchases     < 3.5   to the left,  improve= 8.849679, (0 missing)
##       NumStorePurchases   < 4.5   to the left,  improve= 8.033649, (0 missing)
##   Surrogate splits:
##       NumStorePurchases   < 4.5   to the left,  agree=0.887, adj=0.70, (0 split)
##       MntWines            < 72    to the left,  agree=0.872, adj=0.66, (0 split)
##       NumWebPurchases     < 2.5   to the left,  agree=0.864, adj=0.64, (0 split)
##       NumCatalogPurchases < 1.5   to the left,  agree=0.853, adj=0.61, (0 split)
##       MntSweetProducts    < 6.5   to the left,  agree=0.815, adj=0.51, (0 split)
## 
## Node number 3: 638 observations,    complexity param=0.06422018
##   predicted class=1  expected loss=0.4059561  P(node) =0.7065338
##     class counts:   259   379
##    probabilities: 0.406 0.594 
##   left son=6 (507 obs) right son=7 (131 obs)
##   Primary splits:
##       AcceptedCmp5 < 0.5   to the left,  improve=31.01678, (0 missing)
##       Recency      < 38.5  to the right, improve=27.75609, (0 missing)
##       AcceptedCmp3 < 0.5   to the left,  improve=26.63365, (0 missing)
##       MntWines     < 816.5 to the left,  improve=18.51877, (0 missing)
##       AcceptedCmp1 < 0.5   to the left,  improve=15.60943, (0 missing)
##   Surrogate splits:
##       AcceptedCmp1      < 0.5   to the left,  agree=0.848, adj=0.260, (0 split)
##       MntWines          < 816.5 to the left,  agree=0.831, adj=0.176, (0 split)
##       NumDealsPurchases < 0.5   to the right, agree=0.831, adj=0.176, (0 split)
##       AcceptedCmp4      < 0.5   to the left,  agree=0.815, adj=0.099, (0 split)
##       MntFruits         < 142.5 to the left,  agree=0.810, adj=0.076, (0 split)
## 
## Node number 4: 165 observations
##   predicted class=0  expected loss=0.09090909  P(node) =0.1827243
##     class counts:   150    15
##    probabilities: 0.909 0.091 
## 
## Node number 5: 100 observations,    complexity param=0.01146789
##   predicted class=0  expected loss=0.42  P(node) =0.110742
##     class counts:    58    42
##    probabilities: 0.580 0.420 
##   left son=10 (19 obs) right son=11 (81 obs)
##   Primary splits:
##       NumStorePurchases < 9.5   to the right, improve=6.331436, (0 missing)
##       NumWebVisitsMonth < 5.5   to the left,  improve=6.133087, (0 missing)
##       Recency           < 20    to the right, improve=5.940010, (0 missing)
##       NumDealsPurchases < 2.5   to the left,  improve=5.335385, (0 missing)
##       MntMeatProducts   < 30.5  to the right, improve=4.539311, (0 missing)
##   Surrogate splits:
##       MntMeatProducts     < 529.5 to the right, agree=0.86, adj=0.263, (0 split)
##       MntSweetProducts    < 43    to the right, agree=0.85, adj=0.211, (0 split)
##       NumCatalogPurchases < 6.5   to the right, agree=0.85, adj=0.211, (0 split)
##       AcceptedCmp5        < 0.5   to the right, agree=0.85, adj=0.211, (0 split)
##       MntFruits           < 88    to the right, agree=0.84, adj=0.158, (0 split)
## 
## Node number 6: 507 observations,    complexity param=0.06422018
##   predicted class=1  expected loss=0.4852071  P(node) =0.5614618
##     class counts:   246   261
##    probabilities: 0.485 0.515 
##   left son=12 (422 obs) right son=13 (85 obs)
##   Primary splits:
##       AcceptedCmp3      < 0.5   to the left,  improve=33.14669, (0 missing)
##       Recency           < 37.5  to the right, improve=23.04342, (0 missing)
##       MntSweetProducts  < 5.5   to the right, improve=16.31478, (0 missing)
##       NumStorePurchases < 2.5   to the right, improve=14.97126, (0 missing)
##       NumWebVisitsMonth < 7.5   to the left,  improve=13.79282, (0 missing)
##   Surrogate splits:
##       MntMeatProducts < 2     to the right, agree=0.836, adj=0.024, (0 split)
##       Recency         < 0.5   to the right, agree=0.834, adj=0.012, (0 split)
##       NumWebPurchases < 0.5   to the right, agree=0.834, adj=0.012, (0 split)
## 
## Node number 7: 131 observations
##   predicted class=1  expected loss=0.09923664  P(node) =0.145072
##     class counts:    13   118
##    probabilities: 0.099 0.901 
## 
## Node number 10: 19 observations
##   predicted class=0  expected loss=0.05263158  P(node) =0.02104097
##     class counts:    18     1
##    probabilities: 0.947 0.053 
## 
## Node number 11: 81 observations,    complexity param=0.01146789
##   predicted class=1  expected loss=0.4938272  P(node) =0.089701
##     class counts:    40    41
##    probabilities: 0.494 0.506 
##   left son=22 (48 obs) right son=23 (33 obs)
##   Primary splits:
##       MntMeatProducts     < 96    to the left,  improve=5.444585, (0 missing)
##       NumStorePurchases   < 7.5   to the left,  improve=4.784736, (0 missing)
##       MntFruits           < 51.5  to the left,  improve=4.747450, (0 missing)
##       NumCatalogPurchases < 4.5   to the left,  improve=4.333901, (0 missing)
##       MntGoldProds        < 0.5   to the right, improve=4.329444, (0 missing)
##   Surrogate splits:
##       MntFruits           < 6.5   to the left,  agree=0.790, adj=0.485, (0 split)
##       NumWebPurchases     < 4.5   to the left,  agree=0.790, adj=0.485, (0 split)
##       MntWines            < 519   to the left,  agree=0.778, adj=0.455, (0 split)
##       NumStorePurchases   < 7.5   to the left,  agree=0.778, adj=0.455, (0 split)
##       NumCatalogPurchases < 5.5   to the left,  agree=0.765, adj=0.424, (0 split)
## 
## Node number 12: 422 observations,    complexity param=0.06422018
##   predicted class=0  expected loss=0.4336493  P(node) =0.4673311
##     class counts:   239   183
##    probabilities: 0.566 0.434 
##   left son=24 (387 obs) right son=25 (35 obs)
##   Primary splits:
##       NumWebVisitsMonth < 8.5   to the left,  improve=17.633270, (0 missing)
##       Recency           < 18.5  to the right, improve=15.958330, (0 missing)
##       MntWines          < 898   to the left,  improve=10.048270, (0 missing)
##       NumStorePurchases < 2.5   to the right, improve= 9.443249, (0 missing)
##       MntSweetProducts  < 5.5   to the right, improve= 9.116484, (0 missing)
##   Surrogate splits:
##       MntWines < 1275  to the left,  agree=0.922, adj=0.057, (0 split)
## 
## Node number 13: 85 observations
##   predicted class=1  expected loss=0.08235294  P(node) =0.09413068
##     class counts:     7    78
##    probabilities: 0.082 0.918 
## 
## Node number 22: 48 observations,    complexity param=0.01146789
##   predicted class=0  expected loss=0.3541667  P(node) =0.05315615
##     class counts:    31    17
##    probabilities: 0.646 0.354 
##   left son=44 (25 obs) right son=45 (23 obs)
##   Primary splits:
##       MntMeatProducts < 30.5  to the right, improve=10.299200, (0 missing)
##       MntFruits       < 3.5   to the right, improve= 8.601190, (0 missing)
##       MntFishProducts < 10.5  to the right, improve= 6.020833, (0 missing)
##       MntWines        < 56.5  to the right, improve= 4.840366, (0 missing)
##       MntGoldProds    < 4.5   to the right, improve= 4.261364, (0 missing)
##   Surrogate splits:
##       MntFruits         < 5.5   to the right, agree=0.812, adj=0.609, (0 split)
##       MntWines          < 56.5  to the right, agree=0.792, adj=0.565, (0 split)
##       NumStorePurchases < 3.5   to the right, agree=0.792, adj=0.565, (0 split)
##       Recency           < 33.5  to the right, agree=0.708, adj=0.391, (0 split)
##       MntFishProducts   < 6.5   to the right, agree=0.708, adj=0.391, (0 split)
## 
## Node number 23: 33 observations
##   predicted class=1  expected loss=0.2727273  P(node) =0.03654485
##     class counts:     9    24
##    probabilities: 0.273 0.727 
## 
## Node number 24: 387 observations,    complexity param=0.05963303
##   predicted class=0  expected loss=0.3901809  P(node) =0.4285714
##     class counts:   236   151
##    probabilities: 0.610 0.390 
##   left son=48 (299 obs) right son=49 (88 obs)
##   Primary splits:
##       Recency           < 18.5  to the right, improve=15.109960, (0 missing)
##       MntWines          < 898   to the left,  improve=11.178890, (0 missing)
##       MntMeatProducts   < 672   to the left,  improve= 9.433637, (0 missing)
##       NumDealsPurchases < 4.5   to the left,  improve= 8.375311, (0 missing)
##       MntSweetProducts  < 3.5   to the right, improve= 7.646249, (0 missing)
##   Surrogate splits:
##       MntFishProducts   < 238.5 to the left,  agree=0.775, adj=0.011, (0 split)
##       NumDealsPurchases < 9.5   to the left,  agree=0.775, adj=0.011, (0 split)
## 
## Node number 25: 35 observations
##   predicted class=1  expected loss=0.08571429  P(node) =0.03875969
##     class counts:     3    32
##    probabilities: 0.086 0.914 
## 
## Node number 44: 25 observations
##   predicted class=0  expected loss=0.04  P(node) =0.02768549
##     class counts:    24     1
##    probabilities: 0.960 0.040 
## 
## Node number 45: 23 observations,    complexity param=0.01146789
##   predicted class=1  expected loss=0.3043478  P(node) =0.02547065
##     class counts:     7    16
##    probabilities: 0.304 0.696 
##   left son=90 (5 obs) right son=91 (18 obs)
##   Primary splits:
##       Year_Birth        < 1975  to the right, improve=6.183575, (0 missing)
##       MntFruits         < 3.5   to the right, improve=4.686499, (0 missing)
##       NumDealsPurchases < 2.5   to the left,  improve=3.381988, (0 missing)
##       MntFishProducts   < 10.5  to the right, improve=3.339130, (0 missing)
##       MntSweetProducts  < 2.5   to the left,  improve=3.092977, (0 missing)
##   Surrogate splits:
##       MntFruits         < 3.5   to the right, agree=0.957, adj=0.8, (0 split)
##       MntFishProducts   < 10.5  to the right, agree=0.826, adj=0.2, (0 split)
##       NumDealsPurchases < 2.5   to the left,  agree=0.826, adj=0.2, (0 split)
## 
## Node number 48: 299 observations,    complexity param=0.0412844
##   predicted class=0  expected loss=0.3143813  P(node) =0.3311185
##     class counts:   205    94
##    probabilities: 0.686 0.314 
##   left son=96 (259 obs) right son=97 (40 obs)
##   Primary splits:
##       MntWines          < 898   to the left,  improve=15.571800, (0 missing)
##       MntMeatProducts   < 832.5 to the left,  improve=11.629320, (0 missing)
##       NumDealsPurchases < 5.5   to the left,  improve= 7.056365, (0 missing)
##       MntSweetProducts  < 3.5   to the right, improve= 4.674977, (0 missing)
##       MntFishProducts   < 6.5   to the right, improve= 4.212728, (0 missing)
##   Surrogate splits:
##       Recency           < 95.5  to the left,  agree=0.87, adj=0.025, (0 split)
##       NumDealsPurchases < 10.5  to the left,  agree=0.87, adj=0.025, (0 split)
## 
## Node number 49: 88 observations,    complexity param=0.01261468
##   predicted class=1  expected loss=0.3522727  P(node) =0.09745293
##     class counts:    31    57
##    probabilities: 0.352 0.648 
##   left son=98 (47 obs) right son=99 (41 obs)
##   Primary splits:
##       MntGoldProds        < 41.5  to the left,  improve=3.791680, (0 missing)
##       NumStorePurchases   < 7.5   to the right, improve=3.441382, (0 missing)
##       NumCatalogPurchases < 0.5   to the left,  improve=2.606150, (0 missing)
##       AcceptedCmp1        < 0.5   to the left,  improve=2.488205, (0 missing)
##       MntSweetProducts    < 27    to the right, improve=2.354298, (0 missing)
##   Surrogate splits:
##       MntSweetProducts  < 21.5  to the left,  agree=0.773, adj=0.512, (0 split)
##       MntMeatProducts   < 152   to the left,  agree=0.761, adj=0.488, (0 split)
##       MntFruits         < 22    to the left,  agree=0.750, adj=0.463, (0 split)
##       MntFishProducts   < 21.5  to the left,  agree=0.727, adj=0.415, (0 split)
##       NumDealsPurchases < 1.5   to the right, agree=0.727, adj=0.415, (0 split)
## 
## Node number 90: 5 observations
##   predicted class=0  expected loss=0  P(node) =0.005537099
##     class counts:     5     0
##    probabilities: 1.000 0.000 
## 
## Node number 91: 18 observations
##   predicted class=1  expected loss=0.1111111  P(node) =0.01993355
##     class counts:     2    16
##    probabilities: 0.111 0.889 
## 
## Node number 96: 259 observations,    complexity param=0.01605505
##   predicted class=0  expected loss=0.2509653  P(node) =0.2868217
##     class counts:   194    65
##    probabilities: 0.749 0.251 
##   left son=192 (244 obs) right son=193 (15 obs)
##   Primary splits:
##       MntMeatProducts   < 793   to the left,  improve=7.409490, (0 missing)
##       Recency           < 75.5  to the right, improve=5.388065, (0 missing)
##       NumWebVisitsMonth < 7.5   to the left,  improve=5.067975, (0 missing)
##       MntFishProducts   < 6.5   to the right, improve=4.691885, (0 missing)
##       Kidhome           < 0.5   to the left,  improve=3.692861, (0 missing)
##   Surrogate splits:
##       MntSweetProducts < 180   to the left,  agree=0.946, adj=0.067, (0 split)
## 
## Node number 97: 40 observations
##   predicted class=1  expected loss=0.275  P(node) =0.04429679
##     class counts:    11    29
##    probabilities: 0.275 0.725 
## 
## Node number 98: 47 observations,    complexity param=0.01261468
##   predicted class=1  expected loss=0.4893617  P(node) =0.05204873
##     class counts:    23    24
##    probabilities: 0.489 0.511 
##   left son=196 (11 obs) right son=197 (36 obs)
##   Primary splits:
##       MntSweetProducts  < 15    to the right, improve=7.489362, (0 missing)
##       MntFishProducts   < 47    to the right, improve=5.805151, (0 missing)
##       NumStorePurchases < 2.5   to the right, improve=5.331467, (0 missing)
##       MntMeatProducts   < 21.5  to the right, improve=4.788063, (0 missing)
##       NumWebVisitsMonth < 4.5   to the left,  improve=4.283956, (0 missing)
##   Surrogate splits:
##       MntFishProducts     < 47    to the right, agree=0.957, adj=0.818, (0 split)
##       MntMeatProducts     < 180   to the right, agree=0.915, adj=0.636, (0 split)
##       NumWebVisitsMonth   < 3.5   to the left,  agree=0.915, adj=0.636, (0 split)
##       MntFruits           < 16.5  to the right, agree=0.894, adj=0.545, (0 split)
##       NumCatalogPurchases < 4.5   to the right, agree=0.851, adj=0.364, (0 split)
## 
## Node number 99: 41 observations
##   predicted class=1  expected loss=0.195122  P(node) =0.04540421
##     class counts:     8    33
##    probabilities: 0.195 0.805 
## 
## Node number 192: 244 observations
##   predicted class=0  expected loss=0.2213115  P(node) =0.2702104
##     class counts:   190    54
##    probabilities: 0.779 0.221 
## 
## Node number 193: 15 observations
##   predicted class=1  expected loss=0.2666667  P(node) =0.0166113
##     class counts:     4    11
##    probabilities: 0.267 0.733 
## 
## Node number 196: 11 observations
##   predicted class=0  expected loss=0  P(node) =0.01218162
##     class counts:    11     0
##    probabilities: 1.000 0.000 
## 
## Node number 197: 36 observations
##   predicted class=1  expected loss=0.3333333  P(node) =0.03986711
##     class counts:    12    24
##    probabilities: 0.333 0.667
## Loading required package: grid
## Loading required package: libcoin
## Loading required package: mvtnorm

predichos = predict(modelo,test,type = "class")
summary(predichos)
##   0   1 
## 450 222

Obtenemos los indicadores de nuestro modelo:

library(caret)
predichos2 = predichos
for(i in 1:672)
  predichos2[i]=test$Response[i]

indicadores = confusionMatrix((unname(unlist(predichos))),predichos2)
indicadores
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   0   1
##          0 421  29
##          1 145  77
##                                           
##                Accuracy : 0.7411          
##                  95% CI : (0.7062, 0.7738)
##     No Information Rate : 0.8423          
##     P-Value [Acc > NIR] : 1               
##                                           
##                   Kappa : 0.3255          
##                                           
##  Mcnemar's Test P-Value : <2e-16          
##                                           
##             Sensitivity : 0.7438          
##             Specificity : 0.7264          
##          Pos Pred Value : 0.9356          
##          Neg Pred Value : 0.3468          
##              Prevalence : 0.8423          
##          Detection Rate : 0.6265          
##    Detection Prevalence : 0.6696          
##       Balanced Accuracy : 0.7351          
##                                           
##        'Positive' Class : 0               
## 

De los indicadores se puede concluir:

Accuracy = 74.11% -> Indica el nivel de exactitud del modelo Se puede decir que el modelo puede predecir correctamente un 74.11% y se puede equivocar un 25.89%,

Sensibilidad = 74.38% -> Indica el nivel de exactitud para predecir los pacientes que no aceptaron la oferta en la ultima campaña.

Espeficidad = 72.64% -> Indica el nivel de exactitud para predecir los pacientes que aceptaron la oferta en la ultima campaña.

Predecimos los valores para realizar la curva ROC

predichos_proba=predict(modelo, test, type = "prob")[,2]

Realizamos la grafica de la curva ROC para el Arboles

library(ROCR)

predichos_prob <- predict(modelo,test,type = "prob")[,2]
pred11  <- prediction (predichos_prob,test$Response)
pred22  <- performance(pred11,"tpr","fpr")
auc <- as.numeric(performance(pred11 ,"auc")@y.values)
plot(pred22,type='o', main = paste('Area Bajo la Curva =',round(auc,2)),colorize=T)  
abline(a=0, b= 1)

Con lo cual tenemos que el rendimiento del clasificador al 76% y se lo establece como un "Test bueno".

Redes Neuronales

library(nnet)
set.seed(333)
modelo_dia = nnet(Response~.,data = train_2, size=18)
## # weights:  415
## initial  value 294.409269 
## iter  10 value 199.709393
## iter  20 value 186.156337
## iter  30 value 180.574634
## iter  40 value 173.389681
## iter  50 value 167.771344
## iter  60 value 157.500115
## iter  70 value 155.227617
## iter  80 value 154.560200
## iter  90 value 153.508128
## iter 100 value 152.957832
## final  value 152.957832 
## stopped after 100 iterations
predichos_dia = predict(modelo_dia,test)
summary(predichos_dia)
##        V1          
##  Min.   :0.001844  
##  1st Qu.:0.184607  
##  Median :0.184613  
##  Mean   :0.421903  
##  3rd Qu.:0.653524  
##  Max.   :0.986695

Convertimos la data predichos_dia para poder hacer uso de ella en la ConfusionMatrix de la libreria caret y obtenemos los indicadores del modelo Redes Neuronales

library(caret)
predichos_dia2 = predichos_dia
for(i in 1:672)
  predichos_dia2[i]=test$Response[i]

predichos_diaa=as.factor(predichos_dia2)
factordos=as.factor(test$Response)
indicadores_diaa2 = confusionMatrix((unname(unlist(predichos_diaa))),factordos)
indicadores_diaa2
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   0   1
##          0 566   0
##          1   0 106
##                                      
##                Accuracy : 1          
##                  95% CI : (0.9945, 1)
##     No Information Rate : 0.8423     
##     P-Value [Acc > NIR] : < 2.2e-16  
##                                      
##                   Kappa : 1          
##                                      
##  Mcnemar's Test P-Value : NA         
##                                      
##             Sensitivity : 1.0000     
##             Specificity : 1.0000     
##          Pos Pred Value : 1.0000     
##          Neg Pred Value : 1.0000     
##              Prevalence : 0.8423     
##          Detection Rate : 0.8423     
##    Detection Prevalence : 0.8423     
##       Balanced Accuracy : 1.0000     
##                                      
##        'Positive' Class : 0          
## 

Predecimos los valores para la curva ROC

predichos_prob=predict(modelo_dia, test, type = "raw")

Realizamos la curva ROC para el modelo de Redes Neuronales.

library(ROCR)
predichos_proba <- predict(modelo_dia,test,type = "raw")
pred1  <- prediction (predichos_proba,test$Response)
pred2 <- performance(pred1,"tpr","fpr")
auc <- as.numeric(performance(pred1 ,"auc")@y.values)
plot(pred2,type='o', main = paste('Area Bajo la Curva =',round(auc,2)),colorize=T)  
abline(a=0, b= 1)

Realizamos la comparación entre los modelos

par(mfrow=c(1,2))
plot(pred22, colorize=T)
lines(x=c(0,1), y=c(0,1))
plot(pred2, colorize=T)

De los dos graficos mostrados y en base a el área bajo la curva, se puede concluir que el mejor modelo es el modelo de arboles, puesto que su área bajo la curva es mayor a la área bajo la curva del modelo de redes neuronales.