Dataset: https://www.kaggle.com/janiobachmann/bank-marketing-dataset

Carga de librerias

## 
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## The following object is masked from 'package:stats':
## 
##     filter
## The following object is masked from 'package:graphics':
## 
##     layout
## Loading required package: lattice
## randomForest 4.6-14
## Type rfNews() to see new features/changes/bug fixes.
## 
## Attaching package: 'randomForest'
## The following object is masked from 'package:ggplot2':
## 
##     margin
## 
## Attaching package: 'dplyr'
## The following object is masked from 'package:randomForest':
## 
##     combine
## The following objects are masked from 'package:data.table':
## 
##     between, first, last
## The following objects are masked from 'package:stats':
## 
##     filter, lag
## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

Carga de datos

## 'data.frame':    11162 obs. of  17 variables:
##  $ age      : int  59 56 41 55 54 42 56 60 37 28 ...
##  $ job      : Factor w/ 12 levels "admin.","blue-collar",..: 1 1 10 8 1 5 5 6 10 8 ...
##  $ marital  : Factor w/ 3 levels "divorced","married",..: 2 2 2 2 2 3 2 1 2 3 ...
##  $ education: Factor w/ 4 levels "primary","secondary",..: 2 2 2 2 3 3 3 2 2 2 ...
##  $ default  : Factor w/ 2 levels "no","yes": 1 1 1 1 1 1 1 1 1 1 ...
##  $ balance  : int  2343 45 1270 2476 184 0 830 545 1 5090 ...
##  $ housing  : Factor w/ 2 levels "no","yes": 2 1 2 2 1 2 2 2 2 2 ...
##  $ loan     : Factor w/ 2 levels "no","yes": 1 1 1 1 1 2 2 1 1 1 ...
##  $ contact  : Factor w/ 3 levels "cellular","telephone",..: 3 3 3 3 3 3 3 3 3 3 ...
##  $ day      : int  5 5 5 5 5 5 6 6 6 6 ...
##  $ month    : Factor w/ 12 levels "apr","aug","dec",..: 9 9 9 9 9 9 9 9 9 9 ...
##  $ duration : int  1042 1467 1389 579 673 562 1201 1030 608 1297 ...
##  $ campaign : int  1 1 1 1 2 2 1 1 1 3 ...
##  $ pdays    : int  -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 ...
##  $ previous : int  0 0 0 0 0 0 0 0 0 0 ...
##  $ poutcome : Factor w/ 4 levels "failure","other",..: 4 4 4 4 4 4 4 4 4 4 ...
##  $ deposit  : Factor w/ 2 levels "no","yes": 2 2 2 2 2 2 2 2 2 2 ...

Separando datos

En datos de entrenamiento y test en base a un 80%

Exloración

##      age        job marital education default balance housing loan
## 1225  61    retired married  tertiary      no    1257      no   no
## 5952  51 technician married secondary      no    1327      no   no
## 4609  35   services  single   primary      no     167      no  yes
## 9673  34  housemaid  single   primary      no     443      no   no
## 9046  53 management married  tertiary      no       0     yes   no
## 3545  30 technician  single secondary      no    3144      no   no
##       contact day month duration campaign pdays previous poutcome deposit
## 1225 cellular  10   feb      503        1    -1        0  unknown       1
## 5952 cellular   7   jul       21        2    -1        0  unknown       0
## 4609 cellular  11   jul      614        2    -1        0  unknown       1
## 9673 cellular  30   jan       10        1     2        1    other       0
## 9046 cellular  14   jul       85        3    -1        0  unknown       0
## 3545 cellular  19   may      212        2    -1        0  unknown       1
##  [1] "age"       "job"       "marital"   "education" "default"  
##  [6] "balance"   "housing"   "loan"      "contact"   "day"      
## [11] "month"     "duration"  "campaign"  "pdays"     "previous" 
## [16] "poutcome"  "deposit"

Glosario

  • Client Information
    • age - age of client
    • job - type of job held by client
    • marital - marital status of client
    • education - highest level of education completed by client
    • default - has the client ever defaulted on previous debts?
    • balance - client’s average yearly balance, in euros
    • housing - does client possess a housing loan?
    • loan - does client possess a personal loan?
  • Information related to the last contact of the client during the current campaign
    • contact - communication type
    • month - month of year
    • day - day (of the month) that the client was contacted
    • duration - contact duration in seconds.
  • Miscellaneous Attributes
    • campaign - number of contacts performed during this campaign and for this client
    • pdays - number of days that passed by after the client was last contacted from a previous campaign
    • previous - number of contacts performed before this campaign and for this client
    • poutcome - outcome of the previous marketing campaign for this client
    • deposit - has the client subscribed a term deposit? (dependent var.)
## [1] 0

Acciones

  • Estructura de datos
  • gráficos de:
    • Edades
    • Trabajos
    • Estado civil
    • Nivel educacional
    • Posee un crédito (default)
    • Balance
    • Prestamo hipotecario (housing)
    • Prestamo personal (loan)
    • medio de contacto
    • cuando se contactaron en el mes
    • mes de contacto
    • duración de la llamada
    • numero de contactos realizados al cliente

CART

## ── Attaching packages ────────
## ✔ tibble  2.1.3     ✔ purrr   0.3.2
## ✔ tidyr   1.0.0     ✔ stringr 1.4.0
## ✔ readr   1.3.1     ✔ forcats 0.4.0
## ── Conflicts ─────────────────
## ✖ dplyr::between()       masks data.table::between()
## ✖ dplyr::combine()       masks randomForest::combine()
## ✖ dplyr::filter()        masks plotly::filter(), stats::filter()
## ✖ dplyr::first()         masks data.table::first()
## ✖ dplyr::lag()           masks stats::lag()
## ✖ dplyr::last()          masks data.table::last()
## ✖ purrr::lift()          masks caret::lift()
## ✖ randomForest::margin() masks ggplot2::margin()
## ✖ purrr::transpose()     masks data.table::transpose()

## predicción

## Confusion Matrix and Statistics
## 
##           Reference
## Prediction   0   1
##          0 849 355
##          1  94 935
##                                           
##                Accuracy : 0.7989          
##                  95% CI : (0.7817, 0.8154)
##     No Information Rate : 0.5777          
##     P-Value [Acc > NIR] : < 2.2e-16       
##                                           
##                   Kappa : 0.6027          
##                                           
##  Mcnemar's Test P-Value : < 2.2e-16       
##                                           
##             Sensitivity : 0.9003          
##             Specificity : 0.7248          
##          Pos Pred Value : 0.7051          
##          Neg Pred Value : 0.9086          
##              Prevalence : 0.4223          
##          Detection Rate : 0.3802          
##    Detection Prevalence : 0.5392          
##       Balanced Accuracy : 0.8126          
##                                           
##        'Positive' Class : 0               
##