Arboles de Decision
Lectura de Datos:
library(readxl)
datos_marketing <- read.csv("marketing_data.csv",header=TRUE)
datos_marketing <- datos_marketing[,c(-1,-3,-4,-5,-8,-28)]
str(datos_marketing)
## 'data.frame': 2240 obs. of 22 variables:
## $ Year_Birth : int 1970 1961 1958 1967 1989 1958 1954 1967 1954 1954 ...
## $ Kidhome : int 0 0 0 1 1 0 0 0 0 0 ...
## $ Teenhome : int 0 0 1 1 0 0 0 1 1 1 ...
## $ Recency : int 0 0 0 0 0 0 0 0 0 0 ...
## $ MntWines : int 189 464 134 10 6 336 769 78 384 384 ...
## $ MntFruits : int 104 5 11 0 16 130 80 0 0 0 ...
## $ MntMeatProducts : int 379 64 59 1 24 411 252 11 102 102 ...
## $ MntFishProducts : int 111 7 15 0 11 240 15 0 21 21 ...
## $ MntSweetProducts : int 189 0 2 0 0 32 34 0 32 32 ...
## $ MntGoldProds : int 218 37 30 0 34 43 65 7 5 5 ...
## $ NumDealsPurchases : int 1 1 1 1 2 1 1 1 3 3 ...
## $ NumWebPurchases : int 4 7 3 1 3 4 10 2 6 6 ...
## $ NumCatalogPurchases: int 4 3 2 0 1 7 10 1 2 2 ...
## $ NumStorePurchases : int 6 7 5 2 2 5 7 3 9 9 ...
## $ NumWebVisitsMonth : int 1 5 2 7 7 2 6 5 4 4 ...
## $ AcceptedCmp3 : int 0 0 0 0 1 0 1 0 0 0 ...
## $ AcceptedCmp4 : int 0 0 0 0 0 0 0 0 0 0 ...
## $ AcceptedCmp5 : int 0 0 0 0 0 0 0 0 0 0 ...
## $ AcceptedCmp1 : int 0 0 0 0 0 0 0 0 0 0 ...
## $ AcceptedCmp2 : int 0 1 0 0 0 0 0 0 0 0 ...
## $ Response : int 1 1 0 0 1 1 1 0 0 0 ...
## $ Complain : int 0 0 0 0 0 0 0 0 0 0 ...
Uso de la librería caret para particionar los datos en muestras bootstrap usando la función "createDataPartition" de la librería "caret", se tomo el 70% de la muestra para el entrenamiento:
library(caret)
set.seed(111)
particion = createDataPartition(y=datos_marketing$Response,p=0.7,list = F, times = 1)
head(particion,3)
## Resample1
## [1,] 1
## [2,] 4
## [3,] 5
De manera que procedo a obtener los datos para el entrenamiento y para el test:
train = datos_marketing[particion,]
dim(train)
## [1] 1568 22
test= datos_marketing[-particion,]
dim(test)
## [1] 672 22
Balance de Datos
library(ROSE)
## Loaded ROSE 0.0-3
train_2 = ovun.sample(Response ~ ., data = train, method = "both", p=0.5,N=903,seed=1)$data
barplot(table(train_2$Response))
Creamos el modelo de árbol
Para ello haermos uso de la libreria rpart
library(rpart)
set.seed(222)
modelo = rpart(Response~.,data = train_2,method = "class",minsplit=0)
summary(modelo)
## Call:
## rpart(formula = Response ~ ., data = train_2, method = "class",
## minsplit = 0)
## n= 903
##
## CP nsplit rel error xerror xstd
## 1 0.27522936 0 1.0000000 1.0000000 0.03444065
## 2 0.06422018 1 0.7247706 0.7247706 0.03287246
## 3 0.05963303 4 0.5298165 0.6559633 0.03206234
## 4 0.04128440 5 0.4701835 0.5481651 0.03040553
## 5 0.01605505 6 0.4288991 0.4885321 0.02926065
## 6 0.01261468 7 0.4128440 0.4610092 0.02867056
## 7 0.01146789 9 0.3876147 0.4426606 0.02825384
## 8 0.01000000 14 0.3211009 0.4403670 0.02820039
##
## Variable importance
## MntMeatProducts MntGoldProds MntWines NumCatalogPurchases
## 14 12 12 8
## AcceptedCmp3 MntFruits AcceptedCmp5 NumWebPurchases
## 7 7 7 6
## NumStorePurchases NumWebVisitsMonth Recency MntSweetProducts
## 5 5 4 4
## MntFishProducts NumDealsPurchases AcceptedCmp1 Year_Birth
## 3 2 2 1
## AcceptedCmp4
## 1
##
## Node number 1: 903 observations, complexity param=0.2752294
## predicted class=0 expected loss=0.482835 P(node) =1
## class counts: 467 436
## probabilities: 0.517 0.483
## left son=2 (265 obs) right son=3 (638 obs)
## Primary splits:
## MntGoldProds < 14.5 to the left, improve=53.77391, (0 missing)
## NumCatalogPurchases < 0.5 to the left, improve=51.26854, (0 missing)
## AcceptedCmp5 < 0.5 to the left, improve=49.38106, (0 missing)
## MntMeatProducts < 11.5 to the left, improve=41.56641, (0 missing)
## AcceptedCmp3 < 0.5 to the left, improve=41.40645, (0 missing)
## Surrogate splits:
## NumCatalogPurchases < 0.5 to the left, agree=0.837, adj=0.445, (0 split)
## MntMeatProducts < 19.5 to the left, agree=0.812, adj=0.358, (0 split)
## NumWebPurchases < 2.5 to the left, agree=0.803, adj=0.328, (0 split)
## MntWines < 41.5 to the left, agree=0.785, adj=0.268, (0 split)
## MntFruits < 3.5 to the left, agree=0.761, adj=0.185, (0 split)
##
## Node number 2: 265 observations, complexity param=0.01146789
## predicted class=0 expected loss=0.2150943 P(node) =0.2934662
## class counts: 208 57
## probabilities: 0.785 0.215
## left son=4 (165 obs) right son=5 (100 obs)
## Primary splits:
## MntMeatProducts < 23.5 to the left, improve=13.486520, (0 missing)
## NumDealsPurchases < 4.5 to the left, improve=12.587360, (0 missing)
## NumCatalogPurchases < 0.5 to the left, improve=11.059190, (0 missing)
## NumWebPurchases < 3.5 to the left, improve= 8.849679, (0 missing)
## NumStorePurchases < 4.5 to the left, improve= 8.033649, (0 missing)
## Surrogate splits:
## NumStorePurchases < 4.5 to the left, agree=0.887, adj=0.70, (0 split)
## MntWines < 72 to the left, agree=0.872, adj=0.66, (0 split)
## NumWebPurchases < 2.5 to the left, agree=0.864, adj=0.64, (0 split)
## NumCatalogPurchases < 1.5 to the left, agree=0.853, adj=0.61, (0 split)
## MntSweetProducts < 6.5 to the left, agree=0.815, adj=0.51, (0 split)
##
## Node number 3: 638 observations, complexity param=0.06422018
## predicted class=1 expected loss=0.4059561 P(node) =0.7065338
## class counts: 259 379
## probabilities: 0.406 0.594
## left son=6 (507 obs) right son=7 (131 obs)
## Primary splits:
## AcceptedCmp5 < 0.5 to the left, improve=31.01678, (0 missing)
## Recency < 38.5 to the right, improve=27.75609, (0 missing)
## AcceptedCmp3 < 0.5 to the left, improve=26.63365, (0 missing)
## MntWines < 816.5 to the left, improve=18.51877, (0 missing)
## AcceptedCmp1 < 0.5 to the left, improve=15.60943, (0 missing)
## Surrogate splits:
## AcceptedCmp1 < 0.5 to the left, agree=0.848, adj=0.260, (0 split)
## MntWines < 816.5 to the left, agree=0.831, adj=0.176, (0 split)
## NumDealsPurchases < 0.5 to the right, agree=0.831, adj=0.176, (0 split)
## AcceptedCmp4 < 0.5 to the left, agree=0.815, adj=0.099, (0 split)
## MntFruits < 142.5 to the left, agree=0.810, adj=0.076, (0 split)
##
## Node number 4: 165 observations
## predicted class=0 expected loss=0.09090909 P(node) =0.1827243
## class counts: 150 15
## probabilities: 0.909 0.091
##
## Node number 5: 100 observations, complexity param=0.01146789
## predicted class=0 expected loss=0.42 P(node) =0.110742
## class counts: 58 42
## probabilities: 0.580 0.420
## left son=10 (19 obs) right son=11 (81 obs)
## Primary splits:
## NumStorePurchases < 9.5 to the right, improve=6.331436, (0 missing)
## NumWebVisitsMonth < 5.5 to the left, improve=6.133087, (0 missing)
## Recency < 20 to the right, improve=5.940010, (0 missing)
## NumDealsPurchases < 2.5 to the left, improve=5.335385, (0 missing)
## MntMeatProducts < 30.5 to the right, improve=4.539311, (0 missing)
## Surrogate splits:
## MntMeatProducts < 529.5 to the right, agree=0.86, adj=0.263, (0 split)
## MntSweetProducts < 43 to the right, agree=0.85, adj=0.211, (0 split)
## NumCatalogPurchases < 6.5 to the right, agree=0.85, adj=0.211, (0 split)
## AcceptedCmp5 < 0.5 to the right, agree=0.85, adj=0.211, (0 split)
## MntFruits < 88 to the right, agree=0.84, adj=0.158, (0 split)
##
## Node number 6: 507 observations, complexity param=0.06422018
## predicted class=1 expected loss=0.4852071 P(node) =0.5614618
## class counts: 246 261
## probabilities: 0.485 0.515
## left son=12 (422 obs) right son=13 (85 obs)
## Primary splits:
## AcceptedCmp3 < 0.5 to the left, improve=33.14669, (0 missing)
## Recency < 37.5 to the right, improve=23.04342, (0 missing)
## MntSweetProducts < 5.5 to the right, improve=16.31478, (0 missing)
## NumStorePurchases < 2.5 to the right, improve=14.97126, (0 missing)
## NumWebVisitsMonth < 7.5 to the left, improve=13.79282, (0 missing)
## Surrogate splits:
## MntMeatProducts < 2 to the right, agree=0.836, adj=0.024, (0 split)
## Recency < 0.5 to the right, agree=0.834, adj=0.012, (0 split)
## NumWebPurchases < 0.5 to the right, agree=0.834, adj=0.012, (0 split)
##
## Node number 7: 131 observations
## predicted class=1 expected loss=0.09923664 P(node) =0.145072
## class counts: 13 118
## probabilities: 0.099 0.901
##
## Node number 10: 19 observations
## predicted class=0 expected loss=0.05263158 P(node) =0.02104097
## class counts: 18 1
## probabilities: 0.947 0.053
##
## Node number 11: 81 observations, complexity param=0.01146789
## predicted class=1 expected loss=0.4938272 P(node) =0.089701
## class counts: 40 41
## probabilities: 0.494 0.506
## left son=22 (48 obs) right son=23 (33 obs)
## Primary splits:
## MntMeatProducts < 96 to the left, improve=5.444585, (0 missing)
## NumStorePurchases < 7.5 to the left, improve=4.784736, (0 missing)
## MntFruits < 51.5 to the left, improve=4.747450, (0 missing)
## NumCatalogPurchases < 4.5 to the left, improve=4.333901, (0 missing)
## MntGoldProds < 0.5 to the right, improve=4.329444, (0 missing)
## Surrogate splits:
## MntFruits < 6.5 to the left, agree=0.790, adj=0.485, (0 split)
## NumWebPurchases < 4.5 to the left, agree=0.790, adj=0.485, (0 split)
## MntWines < 519 to the left, agree=0.778, adj=0.455, (0 split)
## NumStorePurchases < 7.5 to the left, agree=0.778, adj=0.455, (0 split)
## NumCatalogPurchases < 5.5 to the left, agree=0.765, adj=0.424, (0 split)
##
## Node number 12: 422 observations, complexity param=0.06422018
## predicted class=0 expected loss=0.4336493 P(node) =0.4673311
## class counts: 239 183
## probabilities: 0.566 0.434
## left son=24 (387 obs) right son=25 (35 obs)
## Primary splits:
## NumWebVisitsMonth < 8.5 to the left, improve=17.633270, (0 missing)
## Recency < 18.5 to the right, improve=15.958330, (0 missing)
## MntWines < 898 to the left, improve=10.048270, (0 missing)
## NumStorePurchases < 2.5 to the right, improve= 9.443249, (0 missing)
## MntSweetProducts < 5.5 to the right, improve= 9.116484, (0 missing)
## Surrogate splits:
## MntWines < 1275 to the left, agree=0.922, adj=0.057, (0 split)
##
## Node number 13: 85 observations
## predicted class=1 expected loss=0.08235294 P(node) =0.09413068
## class counts: 7 78
## probabilities: 0.082 0.918
##
## Node number 22: 48 observations, complexity param=0.01146789
## predicted class=0 expected loss=0.3541667 P(node) =0.05315615
## class counts: 31 17
## probabilities: 0.646 0.354
## left son=44 (25 obs) right son=45 (23 obs)
## Primary splits:
## MntMeatProducts < 30.5 to the right, improve=10.299200, (0 missing)
## MntFruits < 3.5 to the right, improve= 8.601190, (0 missing)
## MntFishProducts < 10.5 to the right, improve= 6.020833, (0 missing)
## MntWines < 56.5 to the right, improve= 4.840366, (0 missing)
## MntGoldProds < 4.5 to the right, improve= 4.261364, (0 missing)
## Surrogate splits:
## MntFruits < 5.5 to the right, agree=0.812, adj=0.609, (0 split)
## MntWines < 56.5 to the right, agree=0.792, adj=0.565, (0 split)
## NumStorePurchases < 3.5 to the right, agree=0.792, adj=0.565, (0 split)
## Recency < 33.5 to the right, agree=0.708, adj=0.391, (0 split)
## MntFishProducts < 6.5 to the right, agree=0.708, adj=0.391, (0 split)
##
## Node number 23: 33 observations
## predicted class=1 expected loss=0.2727273 P(node) =0.03654485
## class counts: 9 24
## probabilities: 0.273 0.727
##
## Node number 24: 387 observations, complexity param=0.05963303
## predicted class=0 expected loss=0.3901809 P(node) =0.4285714
## class counts: 236 151
## probabilities: 0.610 0.390
## left son=48 (299 obs) right son=49 (88 obs)
## Primary splits:
## Recency < 18.5 to the right, improve=15.109960, (0 missing)
## MntWines < 898 to the left, improve=11.178890, (0 missing)
## MntMeatProducts < 672 to the left, improve= 9.433637, (0 missing)
## NumDealsPurchases < 4.5 to the left, improve= 8.375311, (0 missing)
## MntSweetProducts < 3.5 to the right, improve= 7.646249, (0 missing)
## Surrogate splits:
## MntFishProducts < 238.5 to the left, agree=0.775, adj=0.011, (0 split)
## NumDealsPurchases < 9.5 to the left, agree=0.775, adj=0.011, (0 split)
##
## Node number 25: 35 observations
## predicted class=1 expected loss=0.08571429 P(node) =0.03875969
## class counts: 3 32
## probabilities: 0.086 0.914
##
## Node number 44: 25 observations
## predicted class=0 expected loss=0.04 P(node) =0.02768549
## class counts: 24 1
## probabilities: 0.960 0.040
##
## Node number 45: 23 observations, complexity param=0.01146789
## predicted class=1 expected loss=0.3043478 P(node) =0.02547065
## class counts: 7 16
## probabilities: 0.304 0.696
## left son=90 (5 obs) right son=91 (18 obs)
## Primary splits:
## Year_Birth < 1975 to the right, improve=6.183575, (0 missing)
## MntFruits < 3.5 to the right, improve=4.686499, (0 missing)
## NumDealsPurchases < 2.5 to the left, improve=3.381988, (0 missing)
## MntFishProducts < 10.5 to the right, improve=3.339130, (0 missing)
## MntSweetProducts < 2.5 to the left, improve=3.092977, (0 missing)
## Surrogate splits:
## MntFruits < 3.5 to the right, agree=0.957, adj=0.8, (0 split)
## MntFishProducts < 10.5 to the right, agree=0.826, adj=0.2, (0 split)
## NumDealsPurchases < 2.5 to the left, agree=0.826, adj=0.2, (0 split)
##
## Node number 48: 299 observations, complexity param=0.0412844
## predicted class=0 expected loss=0.3143813 P(node) =0.3311185
## class counts: 205 94
## probabilities: 0.686 0.314
## left son=96 (259 obs) right son=97 (40 obs)
## Primary splits:
## MntWines < 898 to the left, improve=15.571800, (0 missing)
## MntMeatProducts < 832.5 to the left, improve=11.629320, (0 missing)
## NumDealsPurchases < 5.5 to the left, improve= 7.056365, (0 missing)
## MntSweetProducts < 3.5 to the right, improve= 4.674977, (0 missing)
## MntFishProducts < 6.5 to the right, improve= 4.212728, (0 missing)
## Surrogate splits:
## Recency < 95.5 to the left, agree=0.87, adj=0.025, (0 split)
## NumDealsPurchases < 10.5 to the left, agree=0.87, adj=0.025, (0 split)
##
## Node number 49: 88 observations, complexity param=0.01261468
## predicted class=1 expected loss=0.3522727 P(node) =0.09745293
## class counts: 31 57
## probabilities: 0.352 0.648
## left son=98 (47 obs) right son=99 (41 obs)
## Primary splits:
## MntGoldProds < 41.5 to the left, improve=3.791680, (0 missing)
## NumStorePurchases < 7.5 to the right, improve=3.441382, (0 missing)
## NumCatalogPurchases < 0.5 to the left, improve=2.606150, (0 missing)
## AcceptedCmp1 < 0.5 to the left, improve=2.488205, (0 missing)
## MntSweetProducts < 27 to the right, improve=2.354298, (0 missing)
## Surrogate splits:
## MntSweetProducts < 21.5 to the left, agree=0.773, adj=0.512, (0 split)
## MntMeatProducts < 152 to the left, agree=0.761, adj=0.488, (0 split)
## MntFruits < 22 to the left, agree=0.750, adj=0.463, (0 split)
## MntFishProducts < 21.5 to the left, agree=0.727, adj=0.415, (0 split)
## NumDealsPurchases < 1.5 to the right, agree=0.727, adj=0.415, (0 split)
##
## Node number 90: 5 observations
## predicted class=0 expected loss=0 P(node) =0.005537099
## class counts: 5 0
## probabilities: 1.000 0.000
##
## Node number 91: 18 observations
## predicted class=1 expected loss=0.1111111 P(node) =0.01993355
## class counts: 2 16
## probabilities: 0.111 0.889
##
## Node number 96: 259 observations, complexity param=0.01605505
## predicted class=0 expected loss=0.2509653 P(node) =0.2868217
## class counts: 194 65
## probabilities: 0.749 0.251
## left son=192 (244 obs) right son=193 (15 obs)
## Primary splits:
## MntMeatProducts < 793 to the left, improve=7.409490, (0 missing)
## Recency < 75.5 to the right, improve=5.388065, (0 missing)
## NumWebVisitsMonth < 7.5 to the left, improve=5.067975, (0 missing)
## MntFishProducts < 6.5 to the right, improve=4.691885, (0 missing)
## Kidhome < 0.5 to the left, improve=3.692861, (0 missing)
## Surrogate splits:
## MntSweetProducts < 180 to the left, agree=0.946, adj=0.067, (0 split)
##
## Node number 97: 40 observations
## predicted class=1 expected loss=0.275 P(node) =0.04429679
## class counts: 11 29
## probabilities: 0.275 0.725
##
## Node number 98: 47 observations, complexity param=0.01261468
## predicted class=1 expected loss=0.4893617 P(node) =0.05204873
## class counts: 23 24
## probabilities: 0.489 0.511
## left son=196 (11 obs) right son=197 (36 obs)
## Primary splits:
## MntSweetProducts < 15 to the right, improve=7.489362, (0 missing)
## MntFishProducts < 47 to the right, improve=5.805151, (0 missing)
## NumStorePurchases < 2.5 to the right, improve=5.331467, (0 missing)
## MntMeatProducts < 21.5 to the right, improve=4.788063, (0 missing)
## NumWebVisitsMonth < 4.5 to the left, improve=4.283956, (0 missing)
## Surrogate splits:
## MntFishProducts < 47 to the right, agree=0.957, adj=0.818, (0 split)
## MntMeatProducts < 180 to the right, agree=0.915, adj=0.636, (0 split)
## NumWebVisitsMonth < 3.5 to the left, agree=0.915, adj=0.636, (0 split)
## MntFruits < 16.5 to the right, agree=0.894, adj=0.545, (0 split)
## NumCatalogPurchases < 4.5 to the right, agree=0.851, adj=0.364, (0 split)
##
## Node number 99: 41 observations
## predicted class=1 expected loss=0.195122 P(node) =0.04540421
## class counts: 8 33
## probabilities: 0.195 0.805
##
## Node number 192: 244 observations
## predicted class=0 expected loss=0.2213115 P(node) =0.2702104
## class counts: 190 54
## probabilities: 0.779 0.221
##
## Node number 193: 15 observations
## predicted class=1 expected loss=0.2666667 P(node) =0.0166113
## class counts: 4 11
## probabilities: 0.267 0.733
##
## Node number 196: 11 observations
## predicted class=0 expected loss=0 P(node) =0.01218162
## class counts: 11 0
## probabilities: 1.000 0.000
##
## Node number 197: 36 observations
## predicted class=1 expected loss=0.3333333 P(node) =0.03986711
## class counts: 12 24
## probabilities: 0.333 0.667
## Loading required package: grid
## Loading required package: libcoin
## Loading required package: mvtnorm
predichos = predict(modelo,test,type = "class")
summary(predichos)
## 0 1
## 450 222
Obtenemos los indicadores de nuestro modelo:
library(caret)
predichos2 = predichos
for(i in 1:672)
predichos2[i]=test$Response[i]
indicadores = confusionMatrix((unname(unlist(predichos))),predichos2)
indicadores
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 421 29
## 1 145 77
##
## Accuracy : 0.7411
## 95% CI : (0.7062, 0.7738)
## No Information Rate : 0.8423
## P-Value [Acc > NIR] : 1
##
## Kappa : 0.3255
##
## Mcnemar's Test P-Value : <2e-16
##
## Sensitivity : 0.7438
## Specificity : 0.7264
## Pos Pred Value : 0.9356
## Neg Pred Value : 0.3468
## Prevalence : 0.8423
## Detection Rate : 0.6265
## Detection Prevalence : 0.6696
## Balanced Accuracy : 0.7351
##
## 'Positive' Class : 0
##
De los indicadores se puede concluir:
Accuracy = 74.11% -> Indica el nivel de exactitud del modelo Se puede decir que el modelo puede predecir correctamente un 74.11% y se puede equivocar un 25.89%,
Sensibilidad = 74.38% -> Indica el nivel de exactitud para predecir los pacientes que no aceptaron la oferta en la ultima campaña.
Espeficidad = 72.64% -> Indica el nivel de exactitud para predecir los pacientes que aceptaron la oferta en la ultima campaña.
Predecimos los valores para realizar la curva ROC
predichos_proba=predict(modelo, test, type = "prob")[,2]
Realizamos la grafica de la curva ROC para el Arboles
library(ROCR)
predichos_prob <- predict(modelo,test,type = "prob")[,2]
pred11 <- prediction (predichos_prob,test$Response)
pred22 <- performance(pred11,"tpr","fpr")
auc <- as.numeric(performance(pred11 ,"auc")@y.values)
plot(pred22,type='o', main = paste('Area Bajo la Curva =',round(auc,2)),colorize=T)
abline(a=0, b= 1)
Con lo cual tenemos que el rendimiento del clasificador al 76% y se lo establece como un "Test bueno".
Redes Neuronales
library(nnet)
set.seed(333)
modelo_dia = nnet(Response~.,data = train_2, size=18)
## # weights: 415
## initial value 294.409269
## iter 10 value 199.709393
## iter 20 value 186.156337
## iter 30 value 180.574634
## iter 40 value 173.389681
## iter 50 value 167.771344
## iter 60 value 157.500115
## iter 70 value 155.227617
## iter 80 value 154.560200
## iter 90 value 153.508128
## iter 100 value 152.957832
## final value 152.957832
## stopped after 100 iterations
predichos_dia = predict(modelo_dia,test)
summary(predichos_dia)
## V1
## Min. :0.001844
## 1st Qu.:0.184607
## Median :0.184613
## Mean :0.421903
## 3rd Qu.:0.653524
## Max. :0.986695
Convertimos la data predichos_dia para poder hacer uso de ella en la ConfusionMatrix de la libreria caret y obtenemos los indicadores del modelo Redes Neuronales
library(caret)
predichos_dia2 = predichos_dia
for(i in 1:672)
predichos_dia2[i]=test$Response[i]
predichos_diaa=as.factor(predichos_dia2)
factordos=as.factor(test$Response)
indicadores_diaa2 = confusionMatrix((unname(unlist(predichos_diaa))),factordos)
indicadores_diaa2
## Confusion Matrix and Statistics
##
## Reference
## Prediction 0 1
## 0 566 0
## 1 0 106
##
## Accuracy : 1
## 95% CI : (0.9945, 1)
## No Information Rate : 0.8423
## P-Value [Acc > NIR] : < 2.2e-16
##
## Kappa : 1
##
## Mcnemar's Test P-Value : NA
##
## Sensitivity : 1.0000
## Specificity : 1.0000
## Pos Pred Value : 1.0000
## Neg Pred Value : 1.0000
## Prevalence : 0.8423
## Detection Rate : 0.8423
## Detection Prevalence : 0.8423
## Balanced Accuracy : 1.0000
##
## 'Positive' Class : 0
##
Predecimos los valores para la curva ROC
predichos_prob=predict(modelo_dia, test, type = "raw")
Realizamos la curva ROC para el modelo de Redes Neuronales.
library(ROCR)
predichos_proba <- predict(modelo_dia,test,type = "raw")
pred1 <- prediction (predichos_proba,test$Response)
pred2 <- performance(pred1,"tpr","fpr")
auc <- as.numeric(performance(pred1 ,"auc")@y.values)
plot(pred2,type='o', main = paste('Area Bajo la Curva =',round(auc,2)),colorize=T)
abline(a=0, b= 1)
Realizamos la comparación entre los modelos
par(mfrow=c(1,2))
plot(pred22, colorize=T)
lines(x=c(0,1), y=c(0,1))
plot(pred2, colorize=T)
De los dos graficos mostrados y en base a el área bajo la curva, se puede concluir que el mejor modelo es el modelo de arboles, puesto que su área bajo la curva es mayor a la área bajo la curva del modelo de redes neuronales.