Teoría

El paquete CARET (Classification And REgression Training) es un paquete integral con una amplia variedad de algoritmos para el aprendizaje automático. En esta actividad lo aplicamos a una base de datos sobre el chip Apple M1, con el objetivo de predecir si un usuario comprará (m1_purchase) dicho chip o no.

Instalar paquetes y llamar librerías

#install.packages("ggplot2")
library(ggplot2)
#install.packages("lattice")
library(lattice)
#install.packages("caret")
library(caret)
#install.packages("DataExplorer")
library(DataExplorer)

Crear la base de datos

df <- read.csv("M1_data.csv", stringsAsFactors = TRUE)

Entender la base de datos

summary(df)
##  trust_apple interest_computers  age_computer   user_pcmac appleproducts_count
##  No : 19     Min.   :2.000      Min.   :0.000   Apple:86   Min.   :0.000      
##  Yes:114     1st Qu.:3.000      1st Qu.:1.000   Hp   : 1   1st Qu.:1.000      
##              Median :4.000      Median :3.000   Other: 1   Median :3.000      
##              Mean   :3.812      Mean   :2.827   PC   :45   Mean   :2.609      
##              3rd Qu.:5.000      3rd Qu.:5.000              3rd Qu.:4.000      
##              Max.   :5.000      Max.   :9.000              Max.   :8.000      
##                                                                               
##  familiarity_m1 f_batterylife      f_price          f_size      f_multitasking
##  No :75         Min.   :1.000   Min.   :1.000   Min.   :1.000   Min.   :2.00  
##  Yes:58         1st Qu.:4.000   1st Qu.:3.000   1st Qu.:2.000   1st Qu.:4.00  
##                 Median :5.000   Median :4.000   Median :3.000   Median :4.00  
##                 Mean   :4.526   Mean   :3.872   Mean   :3.158   Mean   :4.12  
##                 3rd Qu.:5.000   3rd Qu.:5.000   3rd Qu.:4.000   3rd Qu.:5.00  
##                 Max.   :5.000   Max.   :5.000   Max.   :5.000   Max.   :5.00  
##                                                                               
##     f_noise      f_performance      f_neural       f_synergy    
##  Min.   :1.000   Min.   :2.000   Min.   :1.000   Min.   :1.000  
##  1st Qu.:3.000   1st Qu.:4.000   1st Qu.:2.000   1st Qu.:3.000  
##  Median :4.000   Median :5.000   Median :3.000   Median :4.000  
##  Mean   :3.729   Mean   :4.398   Mean   :3.165   Mean   :3.466  
##  3rd Qu.:5.000   3rd Qu.:5.000   3rd Qu.:4.000   3rd Qu.:4.000  
##  Max.   :5.000   Max.   :5.000   Max.   :5.000   Max.   :5.000  
##                                                                 
##  f_performanceloss m1_consideration m1_purchase    gender     age_group    
##  Min.   :1.000     Min.   :1.000    No :45      Female:61   Min.   : 1.00  
##  1st Qu.:3.000     1st Qu.:3.000    Yes:88      Male  :72   1st Qu.: 2.00  
##  Median :4.000     Median :4.000                            Median : 2.00  
##  Mean   :3.376     Mean   :3.609                            Mean   : 2.97  
##  3rd Qu.:4.000     3rd Qu.:5.000                            3rd Qu.: 3.00  
##  Max.   :5.000     Max.   :5.000                            Max.   :10.00  
##                                                                            
##   income_group                   status               domain  
##  Min.   :1.00   Employed            :41   IT & Technology:33  
##  1st Qu.:1.00   Retired             : 1   Marketing      :21  
##  Median :2.00   Self-Employed       : 5   Business       :14  
##  Mean   :2.97   Student             :84   Engineering    : 7  
##  3rd Qu.:4.00   Student ant employed: 1   Finance        : 7  
##  Max.   :7.00   Unemployed          : 1   Science        : 7  
##                                           (Other)        :44
str(df)
## 'data.frame':    133 obs. of  22 variables:
##  $ trust_apple        : Factor w/ 2 levels "No","Yes": 1 2 2 2 2 2 2 1 2 2 ...
##  $ interest_computers : int  4 2 5 2 4 3 3 3 4 5 ...
##  $ age_computer       : int  8 4 6 6 4 1 2 0 2 0 ...
##  $ user_pcmac         : Factor w/ 4 levels "Apple","Hp","Other",..: 4 4 4 1 1 1 1 4 1 1 ...
##  $ appleproducts_count: int  0 1 0 4 7 2 7 0 6 7 ...
##  $ familiarity_m1     : Factor w/ 2 levels "No","Yes": 1 1 1 1 2 1 1 1 2 2 ...
##  $ f_batterylife      : int  5 5 3 4 5 5 4 5 4 5 ...
##  $ f_price            : int  4 5 4 3 3 5 3 5 4 3 ...
##  $ f_size             : int  3 5 2 3 3 4 4 4 3 5 ...
##  $ f_multitasking     : int  4 3 4 4 4 4 5 4 4 5 ...
##  $ f_noise            : int  4 4 1 4 4 5 5 3 4 5 ...
##  $ f_performance      : int  2 5 4 4 5 5 5 3 4 5 ...
##  $ f_neural           : int  2 2 2 4 3 5 3 2 3 3 ...
##  $ f_synergy          : int  1 2 2 4 4 4 3 2 3 5 ...
##  $ f_performanceloss  : int  1 4 2 3 4 2 2 3 4 5 ...
##  $ m1_consideration   : int  1 2 4 2 4 2 3 1 5 5 ...
##  $ m1_purchase        : Factor w/ 2 levels "No","Yes": 2 1 2 1 2 1 2 1 2 2 ...
##  $ gender             : Factor w/ 2 levels "Female","Male": 2 2 2 1 2 1 2 2 2 2 ...
##  $ age_group          : int  2 2 2 2 5 2 6 2 8 4 ...
##  $ income_group       : int  2 3 2 2 7 2 7 2 7 6 ...
##  $ status             : Factor w/ 6 levels "Employed","Retired",..: 4 1 4 4 1 4 1 4 1 1 ...
##  $ domain             : Factor w/ 22 levels "Administration & Public Services",..: 21 10 13 3 12 17 13 22 13 12 ...
plot_missing(df)
## Warning: `aes_string()` was deprecated in ggplot2 3.0.0.
## ℹ Please use tidy evaluation idioms with `aes()`.
## ℹ See also `vignette("ggplot2-in-packages")` for more information.
## ℹ The deprecated feature was likely used in the DataExplorer package.
##   Please report the issue at
##   <https://github.com/boxuancui/DataExplorer/issues>.
## This warning is displayed once per session.
## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
## generated.

plot_histogram(df)

plot_correlation(df, type = "continuous")

NOTA: La variable que queremos predecir debe tener formato de FACTOR

# Verificamos que m1_purchase sea factor
class(df$m1_purchase)
## [1] "factor"
# Si no es factor, lo convertimos:
df$m1_purchase <- as.factor(df$m1_purchase)

Partir la base de datos

# Normalmente 80-20
set.seed(123)
renglones_entrenamiento <- createDataPartition(df$m1_purchase, p = 0.8, list = FALSE)
entrenamiento <- df[renglones_entrenamiento, ]
prueba        <- df[-renglones_entrenamiento, ]

Distintos tipos de Métodos para Modelar

Los métodos más utilizados para modelar aprendizaje automático son:

  • SVM: Support Vector Machine o Máquina de Vectores de Soporte. Hay varios subtipos: Lineal (svmLinear), Radial (svmRadial), Polinómico (svmPoly), etc.
  • Árbol de Decisión: rpart
  • Redes Neuronales: nnet
  • Random Forest o Bosques Aleatorios: rf

Modelo 1. SVM Lineal

modelo1 <- train(m1_purchase ~ ., data = entrenamiento,
                 method    = "svmLinear",
                 preProcess = c("scale", "center"),
                 trControl  = trainControl(method = "cv", number = 10),
                 tuneGrid   = data.frame(C = 1)
)
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: user_pcmacOther, statusRetired,
## statusUnemployed, domainAgriculture, domainCommunication , domainRetired
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: user_pcmacOther, statusRetired,
## statusUnemployed, domainCommunication , domainRetired
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: user_pcmacHp, user_pcmacOther,
## statusRetired, statusStudent ant employed, statusUnemployed,
## domainCommunication , domainRealestate, domainRetired
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: user_pcmacOther, statusRetired,
## statusUnemployed, domainCommunication , domainRetired
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: user_pcmacOther, statusRetired,
## statusUnemployed, domainCommunication , domainRetired
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: user_pcmacOther, statusRetired,
## statusUnemployed, domainCommunication , domainRetired
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: user_pcmacOther, statusRetired,
## statusUnemployed, domainCommunication , domainLaw, domainRetired
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: user_pcmacOther, statusRetired,
## statusUnemployed, domainCommunication , domainRetail, domainRetired
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: user_pcmacOther, statusRetired,
## statusUnemployed, domainCommunication , domainConsulting , domainRetired
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: user_pcmacOther, statusRetired,
## statusUnemployed, domainCommunication , domainLogistics, domainRetired
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: user_pcmacOther, statusRetired,
## statusUnemployed, domainCommunication , domainRetired
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
resultado_entrenamiento1 <- predict(modelo1, entrenamiento)
resultado_prueba1        <- predict(modelo1, prueba)

# Matriz de Confusión del Resultado del Entrenamiento
mcre1 <- confusionMatrix(resultado_entrenamiento1, entrenamiento$m1_purchase)
mcre1
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction No Yes
##        No  30   4
##        Yes  6  67
##                                           
##                Accuracy : 0.9065          
##                  95% CI : (0.8348, 0.9543)
##     No Information Rate : 0.6636          
##     P-Value [Acc > NIR] : 4.281e-09       
##                                           
##                   Kappa : 0.7878          
##                                           
##  Mcnemar's Test P-Value : 0.7518          
##                                           
##             Sensitivity : 0.8333          
##             Specificity : 0.9437          
##          Pos Pred Value : 0.8824          
##          Neg Pred Value : 0.9178          
##              Prevalence : 0.3364          
##          Detection Rate : 0.2804          
##    Detection Prevalence : 0.3178          
##       Balanced Accuracy : 0.8885          
##                                           
##        'Positive' Class : No              
## 
# Matriz de Confusión del Resultado de la Prueba
mcrp1 <- confusionMatrix(resultado_prueba1, prueba$m1_purchase)
mcrp1
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction No Yes
##        No   3   6
##        Yes  6  11
##                                           
##                Accuracy : 0.5385          
##                  95% CI : (0.3337, 0.7341)
##     No Information Rate : 0.6538          
##     P-Value [Acc > NIR] : 0.9231          
##                                           
##                   Kappa : -0.0196         
##                                           
##  Mcnemar's Test P-Value : 1.0000          
##                                           
##             Sensitivity : 0.3333          
##             Specificity : 0.6471          
##          Pos Pred Value : 0.3333          
##          Neg Pred Value : 0.6471          
##              Prevalence : 0.3462          
##          Detection Rate : 0.1154          
##    Detection Prevalence : 0.3462          
##       Balanced Accuracy : 0.4902          
##                                           
##        'Positive' Class : No              
## 

Modelo 2. SVM Radial

modelo2 <- train(m1_purchase ~ ., data = entrenamiento,
                 method    = "svmRadial",
                 preProcess = c("scale", "center"),
                 trControl  = trainControl(method = "cv", number = 10),
                 tuneGrid   = data.frame(sigma = 1, C = 1)
)
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: user_pcmacOther, statusRetired,
## statusUnemployed, domainCommunication , domainRetired
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: user_pcmacHp, user_pcmacOther,
## statusRetired, statusUnemployed, domainAgriculture, domainCommunication ,
## domainLogistics, domainRetired
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: user_pcmacOther, statusRetired,
## statusUnemployed, domainCommunication , domainRetired
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: user_pcmacOther, statusRetired,
## statusUnemployed, domainCommunication , domainRetired
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: user_pcmacOther, statusRetired,
## statusUnemployed, domainCommunication , domainLaw, domainRetired
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: user_pcmacOther, statusRetired,
## statusUnemployed, domainCommunication , domainRetail, domainRetired
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: user_pcmacOther, statusRetired,
## statusUnemployed, domainCommunication , domainRetired
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: user_pcmacOther, statusRetired,
## statusStudent ant employed, statusUnemployed, domainCommunication ,
## domainRealestate, domainRetired
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: user_pcmacOther, statusRetired,
## statusUnemployed, domainCommunication , domainConsulting , domainRetired
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: user_pcmacOther, statusRetired,
## statusUnemployed, domainCommunication , domainEconomics, domainRetired
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: user_pcmacOther, statusRetired,
## statusUnemployed, domainCommunication , domainRetired
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
resultado_entrenamiento2 <- predict(modelo2, entrenamiento)
resultado_prueba2        <- predict(modelo2, prueba)

# Matriz de Confusión del Resultado del Entrenamiento
mcre2 <- confusionMatrix(resultado_entrenamiento2, entrenamiento$m1_purchase)
mcre2
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction No Yes
##        No  34   0
##        Yes  2  71
##                                           
##                Accuracy : 0.9813          
##                  95% CI : (0.9341, 0.9977)
##     No Information Rate : 0.6636          
##     P-Value [Acc > NIR] : <2e-16          
##                                           
##                   Kappa : 0.9576          
##                                           
##  Mcnemar's Test P-Value : 0.4795          
##                                           
##             Sensitivity : 0.9444          
##             Specificity : 1.0000          
##          Pos Pred Value : 1.0000          
##          Neg Pred Value : 0.9726          
##              Prevalence : 0.3364          
##          Detection Rate : 0.3178          
##    Detection Prevalence : 0.3178          
##       Balanced Accuracy : 0.9722          
##                                           
##        'Positive' Class : No              
## 
# Matriz de Confusión del Resultado de la Prueba
mcrp2 <- confusionMatrix(resultado_prueba2, prueba$m1_purchase)
mcrp2
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction No Yes
##        No   0   0
##        Yes  9  17
##                                           
##                Accuracy : 0.6538          
##                  95% CI : (0.4433, 0.8279)
##     No Information Rate : 0.6538          
##     P-Value [Acc > NIR] : 0.589398        
##                                           
##                   Kappa : 0               
##                                           
##  Mcnemar's Test P-Value : 0.007661        
##                                           
##             Sensitivity : 0.0000          
##             Specificity : 1.0000          
##          Pos Pred Value :    NaN          
##          Neg Pred Value : 0.6538          
##              Prevalence : 0.3462          
##          Detection Rate : 0.0000          
##    Detection Prevalence : 0.0000          
##       Balanced Accuracy : 0.5000          
##                                           
##        'Positive' Class : No              
## 

Modelo 3. SVM Polinómico

modelo3 <- train(m1_purchase ~ ., data = entrenamiento,
                 method    = "svmPoly",
                 preProcess = c("scale", "center"),
                 trControl  = trainControl(method = "cv", number = 10),
                 tuneGrid   = data.frame(degree = 1, scale = 1, C = 1)
)
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: user_pcmacOther, statusRetired,
## statusUnemployed, domainCommunication , domainRetired
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: user_pcmacOther, statusRetired,
## statusUnemployed, domainCommunication , domainRealestate, domainRetired
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: user_pcmacOther, statusRetired,
## statusSelf-Employed, statusStudent ant employed, statusUnemployed,
## domainCommunication , domainRetired
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: user_pcmacHp, user_pcmacOther,
## statusRetired, statusUnemployed, domainCommunication , domainRetired
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: user_pcmacOther, statusRetired,
## statusUnemployed, domainCommunication , domainConsulting , domainRetail,
## domainRetired
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: user_pcmacOther, statusRetired,
## statusUnemployed, domainCommunication , domainRetired
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: user_pcmacOther, statusRetired,
## statusUnemployed, domainCommunication , domainLogistics, domainRetired
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: user_pcmacOther, statusRetired,
## statusUnemployed, domainAgriculture, domainCommunication , domainLaw,
## domainRetired
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: user_pcmacOther, statusRetired,
## statusUnemployed, domainCommunication , domainRetired
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: user_pcmacOther, statusRetired,
## statusUnemployed, domainCommunication , domainRetired
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: user_pcmacOther, statusRetired,
## statusUnemployed, domainCommunication , domainRetired
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
resultado_entrenamiento3 <- predict(modelo3, entrenamiento)
resultado_prueba3        <- predict(modelo3, prueba)

# Matriz de Confusión del Resultado del Entrenamiento
mcre3 <- confusionMatrix(resultado_entrenamiento3, entrenamiento$m1_purchase)
mcre3
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction No Yes
##        No  30   4
##        Yes  6  67
##                                           
##                Accuracy : 0.9065          
##                  95% CI : (0.8348, 0.9543)
##     No Information Rate : 0.6636          
##     P-Value [Acc > NIR] : 4.281e-09       
##                                           
##                   Kappa : 0.7878          
##                                           
##  Mcnemar's Test P-Value : 0.7518          
##                                           
##             Sensitivity : 0.8333          
##             Specificity : 0.9437          
##          Pos Pred Value : 0.8824          
##          Neg Pred Value : 0.9178          
##              Prevalence : 0.3364          
##          Detection Rate : 0.2804          
##    Detection Prevalence : 0.3178          
##       Balanced Accuracy : 0.8885          
##                                           
##        'Positive' Class : No              
## 
# Matriz de Confusión del Resultado de la Prueba
mcrp3 <- confusionMatrix(resultado_prueba3, prueba$m1_purchase)
mcrp3
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction No Yes
##        No   3   6
##        Yes  6  11
##                                           
##                Accuracy : 0.5385          
##                  95% CI : (0.3337, 0.7341)
##     No Information Rate : 0.6538          
##     P-Value [Acc > NIR] : 0.9231          
##                                           
##                   Kappa : -0.0196         
##                                           
##  Mcnemar's Test P-Value : 1.0000          
##                                           
##             Sensitivity : 0.3333          
##             Specificity : 0.6471          
##          Pos Pred Value : 0.3333          
##          Neg Pred Value : 0.6471          
##              Prevalence : 0.3462          
##          Detection Rate : 0.1154          
##    Detection Prevalence : 0.3462          
##       Balanced Accuracy : 0.4902          
##                                           
##        'Positive' Class : No              
## 

Modelo 4. Árbol de Decisión

modelo4 <- train(m1_purchase ~ ., data = entrenamiento,
                 method    = "rpart",
                 preProcess = c("scale", "center"),
                 trControl  = trainControl(method = "cv", number = 10),
                 tuneLength = 10
)
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: user_pcmacOther, statusRetired,
## statusUnemployed, domainCommunication , domainRetired
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: user_pcmacOther, statusRetired,
## statusUnemployed, domainCommunication , domainRetired
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: user_pcmacOther, statusRetired,
## statusUnemployed, domainCommunication , domainRetired
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: user_pcmacOther, statusRetired,
## statusUnemployed, domainCommunication , domainLogistics, domainRetired
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: user_pcmacOther, statusRetired,
## statusUnemployed, domainCommunication , domainRetail, domainRetired
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: user_pcmacHp, user_pcmacOther,
## statusRetired, statusStudent ant employed, statusUnemployed,
## domainCommunication , domainRealestate, domainRetired
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: user_pcmacOther, statusRetired,
## statusUnemployed, domainAgriculture, domainCommunication , domainLaw,
## domainRetired
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: user_pcmacOther, statusRetired,
## statusUnemployed, domainCommunication , domainRetired
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: user_pcmacOther, statusRetired,
## statusUnemployed, domainCommunication , domainRetired
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: user_pcmacOther, statusRetired,
## statusUnemployed, domainCommunication , domainConsulting , domainRetired
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: user_pcmacOther, statusRetired,
## statusUnemployed, domainCommunication , domainRetired
resultado_entrenamiento4 <- predict(modelo4, entrenamiento)
resultado_prueba4        <- predict(modelo4, prueba)

# Matriz de Confusión del Resultado del Entrenamiento
mcre4 <- confusionMatrix(resultado_entrenamiento4, entrenamiento$m1_purchase)
mcre4
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction No Yes
##        No  17   2
##        Yes 19  69
##                                           
##                Accuracy : 0.8037          
##                  95% CI : (0.7158, 0.8742)
##     No Information Rate : 0.6636          
##     P-Value [Acc > NIR] : 0.0010139       
##                                           
##                   Kappa : 0.5025          
##                                           
##  Mcnemar's Test P-Value : 0.0004803       
##                                           
##             Sensitivity : 0.4722          
##             Specificity : 0.9718          
##          Pos Pred Value : 0.8947          
##          Neg Pred Value : 0.7841          
##              Prevalence : 0.3364          
##          Detection Rate : 0.1589          
##    Detection Prevalence : 0.1776          
##       Balanced Accuracy : 0.7220          
##                                           
##        'Positive' Class : No              
## 
# Matriz de Confusión del Resultado de la Prueba
mcrp4 <- confusionMatrix(resultado_prueba4, prueba$m1_purchase)
mcrp4
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction No Yes
##        No   4   6
##        Yes  5  11
##                                           
##                Accuracy : 0.5769          
##                  95% CI : (0.3692, 0.7665)
##     No Information Rate : 0.6538          
##     P-Value [Acc > NIR] : 0.8485          
##                                           
##                   Kappa : 0.0892          
##                                           
##  Mcnemar's Test P-Value : 1.0000          
##                                           
##             Sensitivity : 0.4444          
##             Specificity : 0.6471          
##          Pos Pred Value : 0.4000          
##          Neg Pred Value : 0.6875          
##              Prevalence : 0.3462          
##          Detection Rate : 0.1538          
##    Detection Prevalence : 0.3846          
##       Balanced Accuracy : 0.5458          
##                                           
##        'Positive' Class : No              
## 

Modelo 5. Redes Neuronales

modelo5 <- train(m1_purchase ~ ., data = entrenamiento,
                 method    = "nnet",
                 preProcess = c("scale", "center"),
                 trControl  = trainControl(method = "cv", number = 10)
)
## # weights:  50
## initial  value 82.794609 
## iter  10 value 32.523223
## iter  20 value 30.173940
## iter  30 value 28.539522
## iter  40 value 28.506990
## iter  50 value 27.074299
## iter  60 value 26.908957
## iter  70 value 25.867242
## iter  80 value 25.188107
## iter  90 value 25.180191
## iter 100 value 25.175239
## final  value 25.175239 
## stopped after 100 iterations
## # weights:  148
## initial  value 97.677847 
## iter  10 value 16.219975
## iter  20 value 10.653356
## iter  30 value 7.796083
## iter  40 value 7.291744
## iter  50 value 7.256140
## iter  60 value 6.798688
## iter  70 value 6.604047
## iter  80 value 6.596572
## iter  90 value 6.594610
## iter 100 value 6.592594
## final  value 6.592594 
## stopped after 100 iterations
## # weights:  246
## initial  value 72.744037 
## iter  10 value 19.372157
## iter  20 value 9.636929
## iter  30 value 7.136274
## iter  40 value 3.528353
## iter  50 value 2.915231
## iter  60 value 2.797221
## iter  70 value 2.776372
## iter  80 value 2.774198
## iter  90 value 2.773608
## iter 100 value 2.772933
## final  value 2.772933 
## stopped after 100 iterations
## # weights:  50
## initial  value 70.289855 
## iter  10 value 35.933317
## iter  20 value 24.558417
## iter  30 value 24.244017
## iter  40 value 24.242078
## final  value 24.242074 
## converged
## # weights:  148
## initial  value 77.281580 
## iter  10 value 30.010986
## iter  20 value 22.050017
## iter  30 value 18.424384
## iter  40 value 17.604812
## iter  50 value 16.857607
## iter  60 value 16.355402
## iter  70 value 16.332377
## iter  80 value 16.331379
## final  value 16.331356 
## converged
## # weights:  246
## initial  value 69.945990 
## iter  10 value 24.243650
## iter  20 value 17.522007
## iter  30 value 16.322947
## iter  40 value 15.878325
## iter  50 value 15.597082
## iter  60 value 15.488776
## iter  70 value 15.446262
## iter  80 value 15.410025
## iter  90 value 15.402042
## iter 100 value 15.393000
## final  value 15.393000 
## stopped after 100 iterations
## # weights:  50
## initial  value 62.360472 
## iter  10 value 28.658205
## iter  20 value 19.201108
## iter  30 value 19.083328
## iter  40 value 19.079268
## iter  50 value 19.078384
## iter  60 value 19.077813
## iter  70 value 19.077231
## iter  80 value 19.076897
## iter  90 value 19.076728
## iter 100 value 19.076605
## final  value 19.076605 
## stopped after 100 iterations
## # weights:  148
## initial  value 77.117099 
## iter  10 value 25.472374
## iter  20 value 9.149204
## iter  30 value 7.753878
## iter  40 value 7.655239
## iter  50 value 7.636452
## iter  60 value 7.621506
## iter  70 value 7.451727
## iter  80 value 6.311720
## iter  90 value 6.251173
## iter 100 value 6.243870
## final  value 6.243870 
## stopped after 100 iterations
## # weights:  246
## initial  value 68.029470 
## iter  10 value 12.172986
## iter  20 value 3.473130
## iter  30 value 3.100124
## iter  40 value 3.028963
## iter  50 value 2.993067
## iter  60 value 2.963699
## iter  70 value 2.949382
## iter  80 value 2.930823
## iter  90 value 2.923014
## iter 100 value 2.917919
## final  value 2.917919 
## stopped after 100 iterations
## # weights:  50
## initial  value 59.368741 
## iter  10 value 26.718303
## iter  20 value 20.584750
## iter  30 value 19.306762
## iter  40 value 19.190102
## iter  50 value 19.183632
## iter  60 value 19.181910
## iter  70 value 14.260203
## iter  80 value 14.224927
## iter  90 value 14.224845
## final  value 14.224722 
## converged
## # weights:  148
## initial  value 67.993695 
## iter  10 value 16.383947
## iter  20 value 7.276646
## iter  30 value 6.923741
## iter  40 value 4.943834
## iter  50 value 4.932038
## iter  60 value 4.927091
## iter  70 value 2.518276
## iter  80 value 1.932994
## iter  90 value 1.921476
## iter 100 value 1.917878
## final  value 1.917878 
## stopped after 100 iterations
## # weights:  246
## initial  value 98.387219 
## iter  10 value 11.476411
## iter  20 value 5.547852
## iter  30 value 3.232565
## iter  40 value 2.781108
## iter  50 value 2.774195
## iter  60 value 2.773354
## iter  70 value 2.772740
## iter  80 value 2.772324
## iter  90 value 1.909664
## iter 100 value 1.909560
## final  value 1.909560 
## stopped after 100 iterations
## # weights:  50
## initial  value 68.791180 
## iter  10 value 35.028759
## iter  20 value 27.907264
## iter  30 value 22.762805
## iter  40 value 22.112099
## iter  50 value 22.098946
## iter  60 value 22.098422
## final  value 22.098418 
## converged
## # weights:  148
## initial  value 83.712837 
## iter  10 value 26.716450
## iter  20 value 17.587663
## iter  30 value 15.233041
## iter  40 value 14.310126
## iter  50 value 14.241002
## iter  60 value 14.236339
## iter  70 value 14.236237
## iter  70 value 14.236237
## iter  70 value 14.236237
## final  value 14.236237 
## converged
## # weights:  246
## initial  value 69.979147 
## iter  10 value 24.376136
## iter  20 value 15.363840
## iter  30 value 14.323596
## iter  40 value 14.214524
## iter  50 value 14.089252
## iter  60 value 14.051703
## iter  70 value 14.047668
## iter  80 value 14.047613
## iter  80 value 14.047613
## iter  80 value 14.047613
## final  value 14.047613 
## converged
## # weights:  50
## initial  value 60.434429 
## iter  10 value 33.749684
## iter  20 value 30.752206
## iter  30 value 25.402654
## iter  40 value 24.769739
## iter  50 value 24.760702
## iter  60 value 20.670778
## iter  70 value 18.213444
## iter  80 value 18.179513
## iter  90 value 18.176381
## iter 100 value 18.174123
## final  value 18.174123 
## stopped after 100 iterations
## # weights:  148
## initial  value 73.425392 
## iter  10 value 20.617829
## iter  20 value 15.789569
## iter  30 value 14.489948
## iter  40 value 10.895604
## iter  50 value 10.618544
## iter  60 value 10.370854
## iter  70 value 10.289755
## iter  80 value 10.281710
## iter  90 value 10.110416
## iter 100 value 9.918664
## final  value 9.918664 
## stopped after 100 iterations
## # weights:  246
## initial  value 87.891678 
## iter  10 value 19.390635
## iter  20 value 8.607673
## iter  30 value 5.944232
## iter  40 value 4.918315
## iter  50 value 3.933743
## iter  60 value 2.246445
## iter  70 value 2.145542
## iter  80 value 2.132707
## iter  90 value 2.124702
## iter 100 value 2.112569
## final  value 2.112569 
## stopped after 100 iterations
## # weights:  50
## initial  value 71.976216 
## iter  10 value 33.342980
## iter  20 value 21.662463
## iter  30 value 14.639662
## iter  40 value 12.946586
## iter  50 value 12.922052
## iter  60 value 12.920325
## iter  70 value 12.919702
## iter  80 value 12.919511
## iter  90 value 12.919486
## final  value 12.919483 
## converged
## # weights:  148
## initial  value 78.216282 
## iter  10 value 17.900698
## iter  20 value 3.843636
## iter  30 value 2.784637
## iter  40 value 2.773070
## iter  50 value 2.772667
## iter  60 value 2.772608
## final  value 2.772589 
## converged
## # weights:  246
## initial  value 70.573297 
## iter  10 value 11.056969
## iter  20 value 6.188884
## iter  30 value 5.745323
## iter  40 value 4.681470
## iter  50 value 2.795438
## iter  60 value 2.780735
## iter  70 value 2.777014
## iter  80 value 2.776401
## iter  90 value 2.773795
## iter 100 value 2.773227
## final  value 2.773227 
## stopped after 100 iterations
## # weights:  50
## initial  value 83.300123 
## iter  10 value 30.687234
## iter  20 value 25.559378
## iter  30 value 24.516171
## iter  40 value 24.089590
## iter  50 value 23.950815
## iter  60 value 23.702450
## iter  70 value 23.700113
## final  value 23.700110 
## converged
## # weights:  148
## initial  value 94.372604 
## iter  10 value 34.330543
## iter  20 value 20.242339
## iter  30 value 16.575234
## iter  40 value 16.080500
## iter  50 value 15.835419
## iter  60 value 15.737704
## iter  70 value 15.726623
## iter  80 value 15.726062
## iter  90 value 15.725993
## final  value 15.725993 
## converged
## # weights:  246
## initial  value 68.477806 
## iter  10 value 23.228993
## iter  20 value 16.404908
## iter  30 value 15.539001
## iter  40 value 15.026944
## iter  50 value 14.868771
## iter  60 value 14.848713
## iter  70 value 14.845767
## iter  80 value 14.844733
## iter  90 value 14.840800
## iter 100 value 14.837916
## final  value 14.837916 
## stopped after 100 iterations
## # weights:  50
## initial  value 62.649672 
## iter  10 value 31.665932
## iter  20 value 21.798895
## iter  30 value 13.205426
## iter  40 value 12.985333
## iter  50 value 12.982972
## iter  60 value 12.982058
## iter  70 value 12.981439
## iter  80 value 12.981136
## iter  90 value 12.980976
## iter 100 value 12.980773
## final  value 12.980773 
## stopped after 100 iterations
## # weights:  148
## initial  value 61.516738 
## iter  10 value 12.573174
## iter  20 value 4.389903
## iter  30 value 4.319975
## iter  40 value 4.294086
## iter  50 value 4.278291
## iter  60 value 3.225050
## iter  70 value 2.972311
## iter  80 value 2.920447
## iter  90 value 2.910318
## iter 100 value 2.905371
## final  value 2.905371 
## stopped after 100 iterations
## # weights:  246
## initial  value 100.181376 
## iter  10 value 22.784907
## iter  20 value 13.965855
## iter  30 value 11.145884
## iter  40 value 9.989035
## iter  50 value 9.004827
## iter  60 value 8.324247
## iter  70 value 8.310037
## iter  80 value 8.294750
## iter  90 value 8.191976
## iter 100 value 5.229373
## final  value 5.229373 
## stopped after 100 iterations
## # weights:  50
## initial  value 79.954652 
## iter  10 value 39.234535
## iter  20 value 33.514689
## iter  30 value 33.140681
## iter  40 value 31.229409
## iter  50 value 25.219234
## iter  60 value 22.877642
## iter  70 value 20.591738
## iter  80 value 20.484456
## iter  90 value 20.479297
## iter 100 value 20.477962
## final  value 20.477962 
## stopped after 100 iterations
## # weights:  148
## initial  value 61.873646 
## iter  10 value 10.930614
## iter  20 value 8.159923
## iter  30 value 7.642354
## iter  40 value 7.553842
## iter  50 value 6.785688
## iter  60 value 6.164015
## iter  70 value 6.145143
## iter  80 value 6.140889
## iter  90 value 6.139390
## iter 100 value 6.139081
## final  value 6.139081 
## stopped after 100 iterations
## # weights:  246
## initial  value 64.815571 
## iter  10 value 10.981524
## iter  20 value 4.404612
## iter  30 value 2.800178
## iter  40 value 2.774528
## iter  50 value 2.773007
## iter  60 value 2.772744
## iter  70 value 2.772651
## iter  80 value 2.772596
## iter  90 value 2.772592
## iter  90 value 2.772592
## iter  90 value 2.772592
## final  value 2.772592 
## converged
## # weights:  50
## initial  value 65.118354 
## iter  10 value 32.386986
## iter  20 value 25.716508
## iter  30 value 24.256204
## iter  40 value 23.892339
## iter  50 value 23.878284
## final  value 23.878274 
## converged
## # weights:  148
## initial  value 87.342192 
## iter  10 value 26.955697
## iter  20 value 18.385422
## iter  30 value 17.587733
## iter  40 value 17.315309
## iter  50 value 17.199786
## iter  60 value 17.190483
## final  value 17.190432 
## converged
## # weights:  246
## initial  value 68.367557 
## iter  10 value 23.902082
## iter  20 value 16.361040
## iter  30 value 15.301320
## iter  40 value 15.083982
## iter  50 value 15.059599
## iter  60 value 15.015962
## iter  70 value 14.980847
## iter  80 value 14.976009
## iter  90 value 14.974998
## iter 100 value 14.974702
## final  value 14.974702 
## stopped after 100 iterations
## # weights:  50
## initial  value 74.913060 
## iter  10 value 29.342125
## iter  20 value 25.213536
## iter  30 value 23.271096
## iter  40 value 23.219913
## iter  50 value 23.212860
## iter  60 value 23.211414
## iter  70 value 23.210637
## iter  80 value 23.210225
## iter  90 value 23.209769
## iter 100 value 23.209394
## final  value 23.209394 
## stopped after 100 iterations
## # weights:  148
## initial  value 66.808724 
## iter  10 value 15.854533
## iter  20 value 8.820257
## iter  30 value 3.112124
## iter  40 value 2.999179
## iter  50 value 2.958810
## iter  60 value 2.934274
## iter  70 value 2.907263
## iter  80 value 2.898741
## iter  90 value 2.892244
## iter 100 value 2.887944
## final  value 2.887944 
## stopped after 100 iterations
## # weights:  246
## initial  value 72.989823 
## iter  10 value 5.924902
## iter  20 value 3.024461
## iter  30 value 2.989294
## iter  40 value 2.956873
## iter  50 value 2.925959
## iter  60 value 2.908988
## iter  70 value 2.895211
## iter  80 value 2.888331
## iter  90 value 2.881497
## iter 100 value 2.873979
## final  value 2.873979 
## stopped after 100 iterations
## # weights:  50
## initial  value 73.194708 
## iter  10 value 29.257803
## iter  20 value 15.121911
## iter  30 value 10.831618
## iter  40 value 10.266672
## iter  50 value 10.254020
## iter  60 value 10.253940
## iter  70 value 10.253927
## final  value 10.253925 
## converged
## # weights:  148
## initial  value 65.546884 
## iter  10 value 19.968681
## iter  20 value 9.875934
## iter  30 value 9.435251
## iter  40 value 9.244243
## iter  50 value 8.796685
## iter  60 value 7.372868
## iter  70 value 7.316372
## iter  80 value 4.754440
## iter  90 value 4.693311
## iter 100 value 4.170914
## final  value 4.170914 
## stopped after 100 iterations
## # weights:  246
## initial  value 75.182816 
## iter  10 value 12.219245
## iter  20 value 3.138752
## iter  30 value 2.799542
## iter  40 value 2.776662
## iter  50 value 2.773339
## iter  60 value 2.772814
## iter  70 value 2.772618
## iter  80 value 2.772596
## iter  90 value 2.772591
## final  value 2.772590 
## converged
## # weights:  50
## initial  value 80.664198 
## iter  10 value 31.769665
## iter  20 value 22.930277
## iter  30 value 20.995536
## iter  40 value 20.389782
## iter  50 value 20.382084
## final  value 20.382049 
## converged
## # weights:  148
## initial  value 72.140767 
## iter  10 value 22.597729
## iter  20 value 15.754747
## iter  30 value 15.506772
## iter  40 value 15.443794
## iter  50 value 15.380771
## iter  60 value 15.379478
## iter  70 value 15.379430
## iter  70 value 15.379429
## iter  70 value 15.379429
## final  value 15.379429 
## converged
## # weights:  246
## initial  value 71.957486 
## iter  10 value 22.531119
## iter  20 value 15.276009
## iter  30 value 14.859896
## iter  40 value 14.764417
## iter  50 value 14.720292
## iter  60 value 14.662859
## iter  70 value 14.657139
## iter  80 value 14.653565
## iter  90 value 14.653481
## final  value 14.653480 
## converged
## # weights:  50
## initial  value 69.279274 
## iter  10 value 31.048343
## iter  20 value 27.271239
## iter  30 value 24.725026
## iter  40 value 21.862000
## iter  50 value 21.454322
## iter  60 value 21.448891
## iter  70 value 21.445487
## iter  80 value 18.558477
## iter  90 value 16.061342
## iter 100 value 16.010145
## final  value 16.010145 
## stopped after 100 iterations
## # weights:  148
## initial  value 80.018636 
## iter  10 value 18.522263
## iter  20 value 9.026920
## iter  30 value 8.280354
## iter  40 value 7.883689
## iter  50 value 5.662657
## iter  60 value 5.426435
## iter  70 value 5.404631
## iter  80 value 5.384877
## iter  90 value 5.378448
## iter 100 value 5.183498
## final  value 5.183498 
## stopped after 100 iterations
## # weights:  246
## initial  value 85.974871 
## iter  10 value 16.707814
## iter  20 value 7.228730
## iter  30 value 6.636602
## iter  40 value 6.293116
## iter  50 value 6.031013
## iter  60 value 5.943857
## iter  70 value 5.644049
## iter  80 value 5.525917
## iter  90 value 5.399625
## iter 100 value 4.851789
## final  value 4.851789 
## stopped after 100 iterations
## # weights:  50
## initial  value 61.553814 
## iter  10 value 35.277615
## iter  20 value 28.063575
## iter  30 value 25.104741
## iter  40 value 21.311828
## iter  50 value 21.197852
## iter  60 value 21.162542
## iter  70 value 21.153516
## iter  80 value 21.150601
## iter  90 value 21.150536
## final  value 21.150523 
## converged
## # weights:  148
## initial  value 63.384780 
## iter  10 value 14.283585
## iter  20 value 8.256076
## iter  30 value 6.371471
## iter  40 value 4.589857
## iter  50 value 3.921047
## iter  60 value 3.357869
## iter  70 value 3.319939
## iter  80 value 3.303442
## iter  90 value 3.299578
## iter 100 value 3.296512
## final  value 3.296512 
## stopped after 100 iterations
## # weights:  246
## initial  value 64.043275 
## iter  10 value 6.237517
## iter  20 value 2.002794
## iter  30 value 1.910590
## iter  40 value 1.909592
## iter  50 value 1.909550
## iter  60 value 1.909544
## iter  70 value 1.909543
## final  value 1.909543 
## converged
## # weights:  50
## initial  value 62.316059 
## iter  10 value 38.944154
## iter  20 value 26.380743
## iter  30 value 19.315276
## iter  40 value 18.989410
## iter  50 value 18.975903
## final  value 18.975881 
## converged
## # weights:  148
## initial  value 66.640341 
## iter  10 value 32.172447
## iter  20 value 19.778969
## iter  30 value 16.687003
## iter  40 value 14.737146
## iter  50 value 14.302814
## iter  60 value 14.208910
## iter  70 value 14.156938
## iter  80 value 14.133909
## iter  90 value 14.133432
## iter 100 value 14.133425
## final  value 14.133425 
## stopped after 100 iterations
## # weights:  246
## initial  value 67.239292 
## iter  10 value 22.220849
## iter  20 value 16.128021
## iter  30 value 15.037465
## iter  40 value 14.273030
## iter  50 value 14.035413
## iter  60 value 13.792054
## iter  70 value 13.522224
## iter  80 value 13.430113
## iter  90 value 13.396012
## iter 100 value 13.392577
## final  value 13.392577 
## stopped after 100 iterations
## # weights:  50
## initial  value 62.466453 
## iter  10 value 26.924936
## iter  20 value 25.103263
## iter  30 value 23.251517
## iter  40 value 23.209114
## iter  50 value 21.231767
## iter  60 value 21.217169
## iter  70 value 21.214984
## iter  80 value 21.212140
## iter  90 value 21.209688
## iter 100 value 21.207537
## final  value 21.207537 
## stopped after 100 iterations
## # weights:  148
## initial  value 74.410303 
## iter  10 value 20.148540
## iter  20 value 10.450807
## iter  30 value 5.722416
## iter  40 value 4.708645
## iter  50 value 4.664552
## iter  60 value 4.653850
## iter  70 value 4.642256
## iter  80 value 4.199105
## iter  90 value 2.427519
## iter 100 value 2.414477
## final  value 2.414477 
## stopped after 100 iterations
## # weights:  246
## initial  value 81.626665 
## iter  10 value 8.208893
## iter  20 value 2.100247
## iter  30 value 2.046130
## iter  40 value 2.035333
## iter  50 value 2.026906
## iter  60 value 2.013949
## iter  70 value 2.002988
## iter  80 value 1.996243
## iter  90 value 1.991121
## iter 100 value 1.985333
## final  value 1.985333 
## stopped after 100 iterations
## # weights:  50
## initial  value 66.957991 
## iter  10 value 33.795560
## iter  20 value 30.764744
## iter  30 value 30.726799
## iter  40 value 29.432519
## iter  50 value 29.411648
## iter  60 value 28.055067
## iter  70 value 28.050877
## iter  80 value 26.606075
## iter  90 value 26.596372
## iter 100 value 26.594993
## final  value 26.594993 
## stopped after 100 iterations
## # weights:  148
## initial  value 66.380510 
## iter  10 value 23.583858
## iter  20 value 9.287468
## iter  30 value 6.828115
## iter  40 value 4.956410
## iter  50 value 4.805013
## iter  60 value 4.788865
## iter  70 value 4.783917
## iter  80 value 4.782095
## iter  90 value 4.781036
## iter 100 value 4.764955
## final  value 4.764955 
## stopped after 100 iterations
## # weights:  246
## initial  value 72.009673 
## iter  10 value 13.199890
## iter  20 value 3.188530
## iter  30 value 1.947583
## iter  40 value 1.912963
## iter  50 value 1.909636
## final  value 1.909543 
## converged
## # weights:  50
## initial  value 67.284119 
## iter  10 value 36.036373
## iter  20 value 26.175441
## iter  30 value 22.693643
## iter  40 value 22.145809
## final  value 22.138071 
## converged
## # weights:  148
## initial  value 68.078681 
## iter  10 value 27.024005
## iter  20 value 18.966043
## iter  30 value 16.901176
## iter  40 value 16.127766
## iter  50 value 15.740388
## iter  60 value 15.650303
## iter  70 value 15.641672
## iter  80 value 15.639938
## iter  90 value 15.638782
## final  value 15.638780 
## converged
## # weights:  246
## initial  value 69.285061 
## iter  10 value 23.914361
## iter  20 value 15.333893
## iter  30 value 14.895605
## iter  40 value 14.827519
## iter  50 value 14.803767
## iter  60 value 14.773113
## iter  70 value 14.749670
## iter  80 value 14.748566
## iter  90 value 14.746799
## iter 100 value 14.746031
## final  value 14.746031 
## stopped after 100 iterations
## # weights:  50
## initial  value 64.336021 
## iter  10 value 31.507096
## iter  20 value 29.993624
## iter  30 value 28.245755
## iter  40 value 28.239712
## iter  50 value 28.231201
## iter  60 value 28.223967
## iter  70 value 25.213225
## iter  80 value 25.210617
## iter  90 value 25.208764
## iter 100 value 25.206148
## final  value 25.206148 
## stopped after 100 iterations
## # weights:  148
## initial  value 66.814703 
## iter  10 value 22.546948
## iter  20 value 15.381218
## iter  30 value 14.501346
## iter  40 value 14.379715
## iter  50 value 14.082980
## iter  60 value 14.055293
## iter  70 value 13.962366
## iter  80 value 13.943572
## iter  90 value 13.622021
## iter 100 value 12.187155
## final  value 12.187155 
## stopped after 100 iterations
## # weights:  246
## initial  value 84.919110 
## iter  10 value 28.033163
## iter  20 value 10.210805
## iter  30 value 8.570518
## iter  40 value 7.763227
## iter  50 value 7.295027
## iter  60 value 7.031426
## iter  70 value 7.000249
## iter  80 value 6.955851
## iter  90 value 6.644369
## iter 100 value 6.600102
## final  value 6.600102 
## stopped after 100 iterations
## # weights:  50
## initial  value 86.324573 
## iter  10 value 26.232086
## iter  20 value 25.069007
## iter  30 value 25.051006
## iter  40 value 24.034646
## iter  50 value 23.407801
## iter  60 value 23.406463
## final  value 23.405936 
## converged
## # weights:  148
## initial  value 63.872883 
## iter  10 value 23.261756
## iter  20 value 16.996061
## iter  30 value 16.630222
## iter  40 value 16.249935
## iter  50 value 15.287087
## iter  60 value 15.233224
## iter  70 value 14.997925
## iter  80 value 14.988528
## iter  90 value 14.984635
## iter 100 value 14.980263
## final  value 14.980263 
## stopped after 100 iterations
## # weights:  246
## initial  value 74.742942 
## iter  10 value 20.371087
## iter  20 value 7.286435
## iter  30 value 6.102180
## iter  40 value 5.549045
## iter  50 value 4.714455
## iter  60 value 4.685760
## iter  70 value 4.683824
## iter  80 value 4.683360
## iter  90 value 4.683130
## iter 100 value 4.682918
## final  value 4.682918 
## stopped after 100 iterations
## # weights:  50
## initial  value 69.378173 
## iter  10 value 32.177127
## iter  20 value 24.233473
## iter  30 value 22.246803
## iter  40 value 22.071438
## iter  50 value 21.751972
## iter  60 value 21.618247
## iter  70 value 21.617007
## iter  70 value 21.617007
## iter  70 value 21.617007
## final  value 21.617007 
## converged
## # weights:  148
## initial  value 73.071981 
## iter  10 value 25.773608
## iter  20 value 17.298459
## iter  30 value 15.742513
## iter  40 value 15.155733
## iter  50 value 15.001713
## iter  60 value 14.961105
## iter  70 value 14.949065
## iter  80 value 14.948699
## final  value 14.948682 
## converged
## # weights:  246
## initial  value 80.762229 
## iter  10 value 24.417215
## iter  20 value 15.634746
## iter  30 value 14.536364
## iter  40 value 14.274673
## iter  50 value 14.238511
## iter  60 value 14.235964
## iter  70 value 14.235910
## iter  80 value 14.235907
## final  value 14.235907 
## converged
## # weights:  50
## initial  value 73.433304 
## iter  10 value 37.555125
## iter  20 value 20.700075
## iter  30 value 19.804623
## iter  40 value 19.795099
## iter  50 value 19.791943
## iter  60 value 19.788699
## iter  70 value 17.771500
## iter  80 value 17.715991
## iter  90 value 17.709412
## iter 100 value 15.475401
## final  value 15.475401 
## stopped after 100 iterations
## # weights:  148
## initial  value 73.677626 
## iter  10 value 35.760074
## iter  20 value 22.990582
## iter  30 value 19.533515
## iter  40 value 17.992623
## iter  50 value 17.340637
## iter  60 value 16.571211
## iter  70 value 15.818469
## iter  80 value 15.806080
## iter  90 value 13.259448
## iter 100 value 12.401802
## final  value 12.401802 
## stopped after 100 iterations
## # weights:  246
## initial  value 78.297731 
## iter  10 value 15.539093
## iter  20 value 11.334613
## iter  30 value 11.055371
## iter  40 value 10.022467
## iter  50 value 9.500383
## iter  60 value 9.291604
## iter  70 value 9.179569
## iter  80 value 8.796615
## iter  90 value 8.760201
## iter 100 value 8.739276
## final  value 8.739276 
## stopped after 100 iterations
## # weights:  50
## initial  value 64.529412 
## iter  10 value 25.651713
## iter  20 value 19.715909
## iter  30 value 16.664665
## iter  40 value 16.429905
## iter  50 value 16.398188
## iter  60 value 16.387949
## iter  70 value 16.382131
## iter  80 value 16.381226
## final  value 16.381224 
## converged
## # weights:  148
## initial  value 65.266712 
## iter  10 value 23.215554
## iter  20 value 11.933435
## iter  30 value 4.208241
## iter  40 value 3.827260
## iter  50 value 2.838205
## iter  60 value 2.793879
## iter  70 value 2.780570
## iter  80 value 2.775860
## iter  90 value 2.774085
## iter 100 value 2.773028
## final  value 2.773028 
## stopped after 100 iterations
## # weights:  246
## initial  value 66.556822 
## iter  10 value 11.482455
## iter  20 value 4.917838
## iter  30 value 4.702307
## iter  40 value 4.682600
## iter  50 value 4.682265
## iter  60 value 4.682131
## iter  60 value 4.682131
## iter  60 value 4.682131
## final  value 4.682131 
## converged
## # weights:  50
## initial  value 71.696143 
## iter  10 value 27.939852
## iter  20 value 20.937513
## iter  30 value 20.541111
## iter  40 value 20.530253
## iter  40 value 20.530253
## iter  40 value 20.530253
## final  value 20.530253 
## converged
## # weights:  148
## initial  value 67.447831 
## iter  10 value 22.730246
## iter  20 value 16.069012
## iter  30 value 14.975225
## iter  40 value 14.297662
## iter  50 value 14.254544
## iter  60 value 14.248011
## iter  70 value 14.247734
## iter  70 value 14.247734
## iter  70 value 14.247734
## final  value 14.247734 
## converged
## # weights:  246
## initial  value 65.335582 
## iter  10 value 21.447596
## iter  20 value 15.454211
## iter  30 value 14.477365
## iter  40 value 13.873263
## iter  50 value 13.708738
## iter  60 value 13.665483
## iter  70 value 13.646620
## iter  80 value 13.643768
## iter  90 value 13.643436
## final  value 13.643432 
## converged
## # weights:  50
## initial  value 73.072488 
## iter  10 value 29.805494
## iter  20 value 23.816631
## iter  30 value 21.213655
## iter  40 value 21.196828
## iter  50 value 21.193898
## iter  60 value 21.192865
## iter  70 value 21.056069
## iter  80 value 14.713420
## iter  90 value 14.705609
## iter 100 value 14.702258
## final  value 14.702258 
## stopped after 100 iterations
## # weights:  148
## initial  value 59.887673 
## iter  10 value 25.606206
## iter  20 value 11.845724
## iter  30 value 10.400596
## iter  40 value 10.375751
## iter  50 value 10.312334
## iter  60 value 10.237429
## iter  70 value 10.196505
## iter  80 value 10.012968
## iter  90 value 9.808608
## iter 100 value 9.657355
## final  value 9.657355 
## stopped after 100 iterations
## # weights:  246
## initial  value 74.319562 
## iter  10 value 10.168484
## iter  20 value 5.511297
## iter  30 value 5.402116
## iter  40 value 4.905955
## iter  50 value 4.340351
## iter  60 value 4.298331
## iter  70 value 4.291601
## iter  80 value 4.280951
## iter  90 value 4.260944
## iter 100 value 4.193815
## final  value 4.193815 
## stopped after 100 iterations
## # weights:  50
## initial  value 66.015095 
## iter  10 value 31.097786
## iter  20 value 29.604179
## iter  30 value 27.080306
## iter  40 value 27.036491
## iter  50 value 24.287027
## iter  60 value 21.336392
## iter  70 value 21.156247
## iter  80 value 21.154844
## iter  90 value 21.153388
## iter 100 value 21.151713
## final  value 21.151713 
## stopped after 100 iterations
## # weights:  148
## initial  value 63.610273 
## iter  10 value 24.104075
## iter  20 value 8.640125
## iter  30 value 2.981083
## iter  40 value 2.775571
## iter  50 value 2.772885
## iter  60 value 2.772731
## iter  70 value 2.772644
## iter  80 value 2.772633
## iter  90 value 2.772597
## iter 100 value 2.772590
## final  value 2.772590 
## stopped after 100 iterations
## # weights:  246
## initial  value 63.137518 
## iter  10 value 21.063098
## iter  20 value 10.576914
## iter  30 value 9.067025
## iter  40 value 6.438300
## iter  50 value 5.072279
## iter  60 value 4.555081
## iter  70 value 2.797958
## iter  80 value 2.779677
## iter  90 value 2.777176
## iter 100 value 2.775329
## final  value 2.775329 
## stopped after 100 iterations
## # weights:  50
## initial  value 61.787783 
## iter  10 value 28.573552
## iter  20 value 24.511573
## iter  30 value 24.033795
## iter  40 value 24.004832
## iter  50 value 24.004664
## final  value 24.004663 
## converged
## # weights:  148
## initial  value 73.072023 
## iter  10 value 34.321151
## iter  20 value 19.607188
## iter  30 value 16.057632
## iter  40 value 15.221674
## iter  50 value 15.122504
## iter  60 value 15.094168
## iter  70 value 15.091944
## iter  80 value 15.091887
## final  value 15.091884 
## converged
## # weights:  246
## initial  value 76.151706 
## iter  10 value 21.580490
## iter  20 value 15.677001
## iter  30 value 14.810781
## iter  40 value 14.661626
## iter  50 value 14.645167
## iter  60 value 14.643259
## iter  70 value 14.642952
## iter  80 value 14.642876
## iter  80 value 14.642876
## iter  80 value 14.642876
## final  value 14.642876 
## converged
## # weights:  50
## initial  value 91.328835 
## iter  10 value 35.412042
## iter  20 value 24.634171
## iter  30 value 15.730665
## iter  40 value 15.335339
## iter  50 value 15.329432
## iter  60 value 15.324485
## iter  70 value 15.321931
## iter  80 value 15.319288
## iter  90 value 15.316559
## iter 100 value 15.315422
## final  value 15.315422 
## stopped after 100 iterations
## # weights:  148
## initial  value 60.051624 
## iter  10 value 20.361561
## iter  20 value 11.136947
## iter  30 value 9.101344
## iter  40 value 8.572187
## iter  50 value 8.094703
## iter  60 value 8.075152
## iter  70 value 7.382035
## iter  80 value 7.352125
## iter  90 value 6.236481
## iter 100 value 3.028709
## final  value 3.028709 
## stopped after 100 iterations
## # weights:  246
## initial  value 73.261698 
## iter  10 value 13.314776
## iter  20 value 7.553811
## iter  30 value 7.178838
## iter  40 value 7.133394
## iter  50 value 7.093329
## iter  60 value 7.061353
## iter  70 value 7.039266
## iter  80 value 7.029383
## iter  90 value 6.462540
## iter 100 value 5.126840
## final  value 5.126840 
## stopped after 100 iterations
## # weights:  148
## initial  value 95.467947 
## iter  10 value 29.352737
## iter  20 value 18.658725
## iter  30 value 17.792349
## iter  40 value 17.742214
## iter  50 value 17.738211
## iter  60 value 17.737670
## final  value 17.737641 
## converged
resultado_entrenamiento5 <- predict(modelo5, entrenamiento)
resultado_prueba5        <- predict(modelo5, prueba)

# Matriz de Confusión del Resultado del Entrenamiento
mcre5 <- confusionMatrix(resultado_entrenamiento5, entrenamiento$m1_purchase)
mcre5
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction No Yes
##        No  32   2
##        Yes  4  69
##                                           
##                Accuracy : 0.9439          
##                  95% CI : (0.8819, 0.9791)
##     No Information Rate : 0.6636          
##     P-Value [Acc > NIR] : 3.021e-12       
##                                           
##                   Kappa : 0.8727          
##                                           
##  Mcnemar's Test P-Value : 0.6831          
##                                           
##             Sensitivity : 0.8889          
##             Specificity : 0.9718          
##          Pos Pred Value : 0.9412          
##          Neg Pred Value : 0.9452          
##              Prevalence : 0.3364          
##          Detection Rate : 0.2991          
##    Detection Prevalence : 0.3178          
##       Balanced Accuracy : 0.9304          
##                                           
##        'Positive' Class : No              
## 
# Matriz de Confusión del Resultado de la Prueba
mcrp5 <- confusionMatrix(resultado_prueba5, prueba$m1_purchase)
mcrp5
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction No Yes
##        No   4   6
##        Yes  5  11
##                                           
##                Accuracy : 0.5769          
##                  95% CI : (0.3692, 0.7665)
##     No Information Rate : 0.6538          
##     P-Value [Acc > NIR] : 0.8485          
##                                           
##                   Kappa : 0.0892          
##                                           
##  Mcnemar's Test P-Value : 1.0000          
##                                           
##             Sensitivity : 0.4444          
##             Specificity : 0.6471          
##          Pos Pred Value : 0.4000          
##          Neg Pred Value : 0.6875          
##              Prevalence : 0.3462          
##          Detection Rate : 0.1538          
##    Detection Prevalence : 0.3846          
##       Balanced Accuracy : 0.5458          
##                                           
##        'Positive' Class : No              
## 

Modelo 6. Bosques Aleatorios

modelo6 <- train(m1_purchase ~ ., data = entrenamiento,
                 method    = "rf",
                 preProcess = c("scale", "center"),
                 trControl  = trainControl(method = "cv", number = 10),
                 tuneGrid   = expand.grid(mtry = c(2, 4, 6))
)
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: user_pcmacOther, statusRetired,
## statusStudent ant employed, statusUnemployed, domainCommunication ,
## domainRetired
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: user_pcmacOther, statusRetired,
## statusStudent ant employed, statusUnemployed, domainCommunication ,
## domainRetired
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: user_pcmacOther, statusRetired,
## statusStudent ant employed, statusUnemployed, domainCommunication ,
## domainRetired
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: user_pcmacOther, statusRetired,
## statusUnemployed, domainAgriculture, domainCommunication , domainRetired
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: user_pcmacOther, statusRetired,
## statusUnemployed, domainAgriculture, domainCommunication , domainRetired
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: user_pcmacOther, statusRetired,
## statusUnemployed, domainAgriculture, domainCommunication , domainRetired
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: user_pcmacOther, statusRetired,
## statusUnemployed, domainCommunication , domainRealestate, domainRetired
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: user_pcmacOther, statusRetired,
## statusUnemployed, domainCommunication , domainRealestate, domainRetired
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: user_pcmacOther, statusRetired,
## statusUnemployed, domainCommunication , domainRealestate, domainRetired
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: user_pcmacOther, statusRetired,
## statusUnemployed, domainCommunication , domainRetired
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: user_pcmacOther, statusRetired,
## statusUnemployed, domainCommunication , domainRetired
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: user_pcmacOther, statusRetired,
## statusUnemployed, domainCommunication , domainRetired
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: user_pcmacOther, statusRetired,
## statusUnemployed, domainCommunication , domainRetired
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: user_pcmacOther, statusRetired,
## statusUnemployed, domainCommunication , domainRetired
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: user_pcmacOther, statusRetired,
## statusUnemployed, domainCommunication , domainRetired
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: user_pcmacHp, user_pcmacOther,
## statusRetired, statusUnemployed, domainCommunication , domainRetired
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: user_pcmacHp, user_pcmacOther,
## statusRetired, statusUnemployed, domainCommunication , domainRetired
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: user_pcmacHp, user_pcmacOther,
## statusRetired, statusUnemployed, domainCommunication , domainRetired
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: user_pcmacOther, statusRetired,
## statusUnemployed, domainCommunication , domainConsulting , domainRetail,
## domainRetired
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: user_pcmacOther, statusRetired,
## statusUnemployed, domainCommunication , domainConsulting , domainRetail,
## domainRetired
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: user_pcmacOther, statusRetired,
## statusUnemployed, domainCommunication , domainConsulting , domainRetail,
## domainRetired
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: user_pcmacOther, statusRetired,
## statusUnemployed, domainCommunication , domainRetired
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: user_pcmacOther, statusRetired,
## statusUnemployed, domainCommunication , domainRetired
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: user_pcmacOther, statusRetired,
## statusUnemployed, domainCommunication , domainRetired
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: user_pcmacOther, statusRetired,
## statusUnemployed, domainCommunication , domainLaw, domainLogistics,
## domainRetired
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: user_pcmacOther, statusRetired,
## statusUnemployed, domainCommunication , domainLaw, domainLogistics,
## domainRetired
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: user_pcmacOther, statusRetired,
## statusUnemployed, domainCommunication , domainLaw, domainLogistics,
## domainRetired
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: user_pcmacOther, statusRetired,
## statusUnemployed, domainCommunication , domainRetired
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: user_pcmacOther, statusRetired,
## statusUnemployed, domainCommunication , domainRetired
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: user_pcmacOther, statusRetired,
## statusUnemployed, domainCommunication , domainRetired
## Warning in preProcess.default(thresh = 0.95, k = 5, freqCut = 19, uniqueCut =
## 10, : These variables have zero variances: user_pcmacOther, statusRetired,
## statusUnemployed, domainCommunication , domainRetired
resultado_entrenamiento6 <- predict(modelo6, entrenamiento)
resultado_prueba6        <- predict(modelo6, prueba)

# Matriz de Confusión del Resultado del Entrenamiento
mcre6 <- confusionMatrix(resultado_entrenamiento6, entrenamiento$m1_purchase)
mcre6
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction No Yes
##        No  34   0
##        Yes  2  71
##                                           
##                Accuracy : 0.9813          
##                  95% CI : (0.9341, 0.9977)
##     No Information Rate : 0.6636          
##     P-Value [Acc > NIR] : <2e-16          
##                                           
##                   Kappa : 0.9576          
##                                           
##  Mcnemar's Test P-Value : 0.4795          
##                                           
##             Sensitivity : 0.9444          
##             Specificity : 1.0000          
##          Pos Pred Value : 1.0000          
##          Neg Pred Value : 0.9726          
##              Prevalence : 0.3364          
##          Detection Rate : 0.3178          
##    Detection Prevalence : 0.3178          
##       Balanced Accuracy : 0.9722          
##                                           
##        'Positive' Class : No              
## 
# Matriz de Confusión del Resultado de la Prueba
mcrp6 <- confusionMatrix(resultado_prueba6, prueba$m1_purchase)
mcrp6
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction No Yes
##        No   4   3
##        Yes  5  14
##                                           
##                Accuracy : 0.6923          
##                  95% CI : (0.4821, 0.8567)
##     No Information Rate : 0.6538          
##     P-Value [Acc > NIR] : 0.4267          
##                                           
##                   Kappa : 0.2828          
##                                           
##  Mcnemar's Test P-Value : 0.7237          
##                                           
##             Sensitivity : 0.4444          
##             Specificity : 0.8235          
##          Pos Pred Value : 0.5714          
##          Neg Pred Value : 0.7368          
##              Prevalence : 0.3462          
##          Detection Rate : 0.1538          
##    Detection Prevalence : 0.2692          
##       Balanced Accuracy : 0.6340          
##                                           
##        'Positive' Class : No              
## 

Tabla de Resultados

resultados <- data.frame(
  "svmLineal"  = c(mcre1$overall["Accuracy"], mcrp1$overall["Accuracy"]),
  "svmRadial"  = c(mcre2$overall["Accuracy"], mcrp2$overall["Accuracy"]),
  "svmPoly"    = c(mcre3$overall["Accuracy"], mcrp3$overall["Accuracy"]),
  "rpart"      = c(mcre4$overall["Accuracy"], mcrp4$overall["Accuracy"]),
  "nnet"       = c(mcre5$overall["Accuracy"], mcrp5$overall["Accuracy"]),
  "rf"         = c(mcre6$overall["Accuracy"], mcrp6$overall["Accuracy"])
)
rownames(resultados) <- c("Precisión de entrenamiento", "Precisión de prueba")
resultados
##                            svmLineal svmRadial   svmPoly     rpart      nnet
## Precisión de entrenamiento 0.9065421 0.9813084 0.9065421 0.8037383 0.9439252
## Precisión de prueba        0.5384615 0.6538462 0.5384615 0.5769231 0.5769231
##                                   rf
## Precisión de entrenamiento 0.9813084
## Precisión de prueba        0.6923077

Conclusiones

El modelo con mayor precisión en datos de prueba fue Random Forest (rf) con un 69.23%, seguido de svmRadial con 65.38%. Sin embargo, todos los modelos presentaron una caída considerable entre la precisión de entrenamiento y la de prueba, lo que indica sobreajuste (overfitting) generalizado. Esto puede deberse al tamaño reducido de la base de datos (133 registros) y a la alta variedad de variables categóricas como domain y status.

svmLineal y svmPoly tuvieron el peor desempeño en prueba (53.85%), apenas por encima del azar para un problema binario (Sí/No). rpart y nnet obtuvieron 57.69%, un rendimiento modesto pero más estable.

Se recomienda usar Random Forest para predecir si un usuario comprará el chip M1, ya que fue el modelo que mejor generalizó a datos nuevos. Para mejorar los resultados en el futuro, sería conveniente recolectar más datos o aplicar técnicas de balanceo de clases.

---
title: "SVM Apple M1"
author: "A00839467 Marco Escobar"
date: "1/3/2026"
output:
  html_document:
    toc: TRUE
    toc_float: TRUE
    code_download: TRUE
    theme: yeti
---

<center>
![](https://www.apple.com/newsroom/images/product/mac/standard/Apple_M1-Pro-M1-Max_Chips_10182021_big.jpg.large.jpg)
</center>

# <span style="color: blue"> Teoría </span>

El paquete **CARET (Classification And REgression Training)** es un paquete integral con una amplia variedad de algoritmos para el aprendizaje automático. En esta actividad lo aplicamos a una base de datos sobre el chip **Apple M1**, con el objetivo de predecir si un usuario **comprará** (`m1_purchase`) dicho chip o no.

# <span style="color: blue"> Instalar paquetes y llamar librerías </span>

```{r message=FALSE, warning=FALSE}
#install.packages("ggplot2")
library(ggplot2)
#install.packages("lattice")
library(lattice)
#install.packages("caret")
library(caret)
#install.packages("DataExplorer")
library(DataExplorer)
```

# <span style="color: blue"> Crear la base de datos </span>

```{r}
df <- read.csv("M1_data.csv", stringsAsFactors = TRUE)
```

# <span style="color: blue"> Entender la base de datos </span>

```{r}
summary(df)
str(df)
```

```{r}
plot_missing(df)
```

```{r}
plot_histogram(df)
```

```{r}
plot_correlation(df, type = "continuous")
```

**NOTA: La variable que queremos predecir debe tener formato de FACTOR**

```{r}
# Verificamos que m1_purchase sea factor
class(df$m1_purchase)

# Si no es factor, lo convertimos:
df$m1_purchase <- as.factor(df$m1_purchase)
```

# <span style="color: blue"> Partir la base de datos </span>

```{r}
# Normalmente 80-20
set.seed(123)
renglones_entrenamiento <- createDataPartition(df$m1_purchase, p = 0.8, list = FALSE)
entrenamiento <- df[renglones_entrenamiento, ]
prueba        <- df[-renglones_entrenamiento, ]
```

# <span style="color: blue"> Distintos tipos de Métodos para Modelar </span>

Los métodos más utilizados para modelar aprendizaje automático son:

* **SVM**: *Support Vector Machine* o Máquina de Vectores de Soporte. Hay varios subtipos: Lineal (svmLinear), Radial (svmRadial), Polinómico (svmPoly), etc.
* **Árbol de Decisión**: rpart
* **Redes Neuronales**: nnet
* **Random Forest** o Bosques Aleatorios: rf

# <span style="color: blue"> Modelo 1. SVM Lineal </span>

```{r}
modelo1 <- train(m1_purchase ~ ., data = entrenamiento,
                 method    = "svmLinear",
                 preProcess = c("scale", "center"),
                 trControl  = trainControl(method = "cv", number = 10),
                 tuneGrid   = data.frame(C = 1)
)
resultado_entrenamiento1 <- predict(modelo1, entrenamiento)
resultado_prueba1        <- predict(modelo1, prueba)

# Matriz de Confusión del Resultado del Entrenamiento
mcre1 <- confusionMatrix(resultado_entrenamiento1, entrenamiento$m1_purchase)
mcre1

# Matriz de Confusión del Resultado de la Prueba
mcrp1 <- confusionMatrix(resultado_prueba1, prueba$m1_purchase)
mcrp1
```

# <span style="color: blue"> Modelo 2. SVM Radial </span>

```{r}
modelo2 <- train(m1_purchase ~ ., data = entrenamiento,
                 method    = "svmRadial",
                 preProcess = c("scale", "center"),
                 trControl  = trainControl(method = "cv", number = 10),
                 tuneGrid   = data.frame(sigma = 1, C = 1)
)
resultado_entrenamiento2 <- predict(modelo2, entrenamiento)
resultado_prueba2        <- predict(modelo2, prueba)

# Matriz de Confusión del Resultado del Entrenamiento
mcre2 <- confusionMatrix(resultado_entrenamiento2, entrenamiento$m1_purchase)
mcre2

# Matriz de Confusión del Resultado de la Prueba
mcrp2 <- confusionMatrix(resultado_prueba2, prueba$m1_purchase)
mcrp2
```

# <span style="color: blue"> Modelo 3. SVM Polinómico </span>

```{r}
modelo3 <- train(m1_purchase ~ ., data = entrenamiento,
                 method    = "svmPoly",
                 preProcess = c("scale", "center"),
                 trControl  = trainControl(method = "cv", number = 10),
                 tuneGrid   = data.frame(degree = 1, scale = 1, C = 1)
)
resultado_entrenamiento3 <- predict(modelo3, entrenamiento)
resultado_prueba3        <- predict(modelo3, prueba)

# Matriz de Confusión del Resultado del Entrenamiento
mcre3 <- confusionMatrix(resultado_entrenamiento3, entrenamiento$m1_purchase)
mcre3

# Matriz de Confusión del Resultado de la Prueba
mcrp3 <- confusionMatrix(resultado_prueba3, prueba$m1_purchase)
mcrp3
```

# <span style="color: blue"> Modelo 4. Árbol de Decisión </span>

```{r}
modelo4 <- train(m1_purchase ~ ., data = entrenamiento,
                 method    = "rpart",
                 preProcess = c("scale", "center"),
                 trControl  = trainControl(method = "cv", number = 10),
                 tuneLength = 10
)
resultado_entrenamiento4 <- predict(modelo4, entrenamiento)
resultado_prueba4        <- predict(modelo4, prueba)

# Matriz de Confusión del Resultado del Entrenamiento
mcre4 <- confusionMatrix(resultado_entrenamiento4, entrenamiento$m1_purchase)
mcre4

# Matriz de Confusión del Resultado de la Prueba
mcrp4 <- confusionMatrix(resultado_prueba4, prueba$m1_purchase)
mcrp4
```

# <span style="color: blue"> Modelo 5. Redes Neuronales </span>

```{r message=FALSE, warning=FALSE}
modelo5 <- train(m1_purchase ~ ., data = entrenamiento,
                 method    = "nnet",
                 preProcess = c("scale", "center"),
                 trControl  = trainControl(method = "cv", number = 10)
)
resultado_entrenamiento5 <- predict(modelo5, entrenamiento)
resultado_prueba5        <- predict(modelo5, prueba)

# Matriz de Confusión del Resultado del Entrenamiento
mcre5 <- confusionMatrix(resultado_entrenamiento5, entrenamiento$m1_purchase)
mcre5

# Matriz de Confusión del Resultado de la Prueba
mcrp5 <- confusionMatrix(resultado_prueba5, prueba$m1_purchase)
mcrp5
```

# <span style="color: blue"> Modelo 6. Bosques Aleatorios </span>

```{r}
modelo6 <- train(m1_purchase ~ ., data = entrenamiento,
                 method    = "rf",
                 preProcess = c("scale", "center"),
                 trControl  = trainControl(method = "cv", number = 10),
                 tuneGrid   = expand.grid(mtry = c(2, 4, 6))
)
resultado_entrenamiento6 <- predict(modelo6, entrenamiento)
resultado_prueba6        <- predict(modelo6, prueba)

# Matriz de Confusión del Resultado del Entrenamiento
mcre6 <- confusionMatrix(resultado_entrenamiento6, entrenamiento$m1_purchase)
mcre6

# Matriz de Confusión del Resultado de la Prueba
mcrp6 <- confusionMatrix(resultado_prueba6, prueba$m1_purchase)
mcrp6
```

# <span style="color: blue"> Tabla de Resultados </span>

```{r}
resultados <- data.frame(
  "svmLineal"  = c(mcre1$overall["Accuracy"], mcrp1$overall["Accuracy"]),
  "svmRadial"  = c(mcre2$overall["Accuracy"], mcrp2$overall["Accuracy"]),
  "svmPoly"    = c(mcre3$overall["Accuracy"], mcrp3$overall["Accuracy"]),
  "rpart"      = c(mcre4$overall["Accuracy"], mcrp4$overall["Accuracy"]),
  "nnet"       = c(mcre5$overall["Accuracy"], mcrp5$overall["Accuracy"]),
  "rf"         = c(mcre6$overall["Accuracy"], mcrp6$overall["Accuracy"])
)
rownames(resultados) <- c("Precisión de entrenamiento", "Precisión de prueba")
resultados
```

# <span style="color: blue"> Conclusiones </span>

El modelo con mayor precisión en datos de prueba fue Random Forest (rf) con un 
69.23%, seguido de svmRadial con 65.38%. Sin embargo, todos los modelos presentaron 
una caída considerable entre la precisión de entrenamiento y la de prueba, lo que 
indica sobreajuste (overfitting) generalizado. Esto puede deberse al tamaño reducido 
de la base de datos (133 registros) y a la alta variedad de variables categóricas 
como domain y status.  

svmLineal y svmPoly tuvieron el peor desempeño en prueba (53.85%), apenas por 
encima del azar para un problema binario (Sí/No). rpart y nnet obtuvieron 57.69%, 
un rendimiento modesto pero más estable.  

Se recomienda usar Random Forest para predecir si un usuario comprará el chip M1, 
ya que fue el modelo que mejor generalizó a datos nuevos. Para mejorar los 
resultados en el futuro, sería conveniente recolectar más datos o aplicar técnicas 
de balanceo de clases.