고객구매예측

DOEUN

2020-04-07

##   InvoiceNo   StockCode Description    Quantity InvoiceDate   UnitPrice 
##           0           0           0           0           0           0 
##  CustomerID     Country 
##      135080           0

R : 최근 기간

F : 구매 빈도

M : 금엑 >> y- 의 구매 YES/NO 의 변수를 만든 후, -> Modelling 으로 도입한다.

## # A tibble: 6 x 5
##   CustomerID  freq money recency BuyNextMonth
##        <int> <int> <dbl>   <dbl> <fct>       
## 1      12350    17  334.      57 No          
## 2      12352    38 1562.       9 Yes         
## 3      12359    80 1839.      52 No          
## 4      12361    10  190.      34 No          
## 5      12362    27  479.      42 No          
## 6      12365    22  641.      38 No
## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        1.0703             nan     0.1000    0.0945
##      2        0.9229             nan     0.1000    0.0750
##      3        0.8035             nan     0.1000    0.0594
##      4        0.7043             nan     0.1000    0.0502
##      5        0.6206             nan     0.1000    0.0433
##      6        0.5489             nan     0.1000    0.0347
##      7        0.4871             nan     0.1000    0.0312
##      8        0.4334             nan     0.1000    0.0272
##      9        0.3864             nan     0.1000    0.0232
##     10        0.3451             nan     0.1000    0.0206
##     20        0.1183             nan     0.1000    0.0064
##     40        0.0155             nan     0.1000    0.0008
##     50        0.0057             nan     0.1000    0.0003
## 
## note: only 2 unique complexity parameters in default grid. Truncating the grid to 2 .
## 
## note: only 2 unique complexity parameters in default grid. Truncating the grid to 2 .
## 
## # weights:  16
## initial  value 910.213601 
## iter  10 value 816.415266
## iter  20 value 557.731964
## iter  30 value 62.921474
## iter  40 value 3.066346
## iter  50 value 0.048891
## iter  60 value 0.001777
## iter  70 value 0.001008
## final  value 0.000093 
## converged
## note: only 2 unique complexity parameters in default grid. Truncating the grid to 2 .
##    user  system elapsed 
##   37.76    2.15  645.47
##    ROC Sens Spec   Resample    Model
## 1    1    1    1 Resample01 adaboost
## 2    1    1    1 Resample02 adaboost
## 3    1    1    1 Resample03 adaboost
## 4    1    1    1 Resample14 adaboost
## 5    1    1    1 Resample12 adaboost
## 6    1    1    1 Resample10 adaboost
## 7    1    1    1 Resample08 adaboost
## 8    1    1    1 Resample06 adaboost
## 9    1    1    1 Resample04 adaboost
## 10   1    1    1 Resample15 adaboost

Modelling Visualization

## # A tibble: 15 x 4
##    Model     avg_roc avg_sens avg_Spec
##    <chr>       <dbl>    <dbl>    <dbl>
##  1 adaboost     1        1       1    
##  2 bagFDA       1        0.89    1    
##  3 C5.0         1        1       1    
##  4 cforest      1        1       1    
##  5 gbm          1        1       1    
##  6 glm          1        1       1    
##  7 glmboost     1        1       0.966
##  8 knn          0.96     0.78    0.958
##  9 lda          1        1       0.985
## 10 nnet         1        1       1.00 
## 11 ranger       1        1       1    
## 12 rf           1        1       1    
## 13 svmRadial    1        0.99    0.998
## 14 treebag      1        1       1    
## 15 xgbTree      1        1       1

GBM 도입

## Iter   TrainDeviance   ValidDeviance   StepSize   Improve
##      1        1.0709             nan     0.1000    0.0963
##      2        0.9234             nan     0.1000    0.0734
##      3        0.8039             nan     0.1000    0.0594
##      4        0.7047             nan     0.1000    0.0483
##      5        0.6209             nan     0.1000    0.0402
##      6        0.5492             nan     0.1000    0.0353
##      7        0.4874             nan     0.1000    0.0316
##      8        0.4336             nan     0.1000    0.0273
##      9        0.3866             nan     0.1000    0.0236
##     10        0.3453             nan     0.1000    0.0202
##     20        0.1184             nan     0.1000    0.0065
##     40        0.0155             nan     0.1000    0.0008
##     50        0.0057             nan     0.1000    0.0003
## Confusion Matrix and Statistics
## 
##           Reference
## Prediction  No Yes
##        No   93   0
##        Yes   0 195
##                                      
##                Accuracy : 1          
##                  95% CI : (0.9873, 1)
##     No Information Rate : 0.6771     
##     P-Value [Acc > NIR] : < 2.2e-16  
##                                      
##                   Kappa : 1          
##                                      
##  Mcnemar's Test P-Value : NA         
##                                      
##             Sensitivity : 1.0000     
##             Specificity : 1.0000     
##          Pos Pred Value : 1.0000     
##          Neg Pred Value : 1.0000     
##              Prevalence : 0.6771     
##          Detection Rate : 0.6771     
##    Detection Prevalence : 0.6771     
##       Balanced Accuracy : 1.0000     
##                                      
##        'Positive' Class : Yes        
##