Retail Performance Dataset

Comprehensive Sales Data for Retail Analytics and Forecasting

This dataset provides detailed sales transaction information, including order numbers, quantities ordered, unit prices, and sales amounts, alongside related fields like order dates, statuses, and customer details. It is segmented by various categories such as quarter, month, and year.

This dataset is ideal for sales forecasting, customer segmentation, clustering analysis, and trend identification, offering a comprehensive view of sales performance over time.

## 
## Attaching package: 'vip'

## The following object is masked from 'package:utils':
## 
##     vi

## Loading required package: lattice

## 
## ----------------------------------------------------------------------
## 
## Your next step is to start H2O:
##     > h2o.init()
## 
## For H2O package documentation, ask for help:
##     > ??h2o
## 
## After starting H2O, you can use the Web UI at http://localhost:54321
## For more information visit https://docs.h2o.ai
## 
## ----------------------------------------------------------------------

## 
## Attaching package: 'h2o'

## The following objects are masked from 'package:stats':
## 
##     cor, sd, var

## The following objects are masked from 'package:base':
## 
##     &&, %*%, %in%, ||, apply, as.factor, as.numeric, colnames,
##     colnames<-, ifelse, is.character, is.factor, is.numeric, log,
##     log10, log1p, log2, round, signif, trunc

## 
## Attaching package: 'dplyr'

## The following object is masked from 'package:lime':
## 
##     explain

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

##  Connection successful!
## 
## R is connected to the H2O cluster: 
##     H2O cluster uptime:         5 hours 15 minutes 
##     H2O cluster timezone:       America/New_York 
##     H2O data parsing timezone:  UTC 
##     H2O cluster version:        3.44.0.3 
##     H2O cluster version age:    8 months and 17 days 
##     H2O cluster name:           H2O_started_from_R_deviancedev01_sga668 
##     H2O cluster total nodes:    1 
##     H2O cluster total memory:   15.02 GB 
##     H2O cluster total cores:    24 
##     H2O cluster allowed cores:  24 
##     H2O cluster healthy:        TRUE 
##     H2O Connection ip:          localhost 
##     H2O Connection port:        54321 
##     H2O Connection proxy:       NA 
##     H2O Internal Security:      FALSE 
##     R Version:                  R version 4.4.1 (2024-06-14)

## Warning in h2o.clusterInfo(): 
## Your H2O cluster version is (8 months and 17 days) old. There may be a newer version available.
## Please download and install the latest version from: https://h2o-release.s3.amazonaws.com/h2o/latest_stable.html

##   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%

##   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%

## 'data.frame':    2823 obs. of  15 variables:
##  $ CUSTOMERNAME   : Factor w/ 92 levels "Alpha Cognac",..: 47 68 48 87 24 81 27 42 58 9 ...
##  $ ORDERNUMBER    : int  10107 10121 10134 10145 10159 10168 10180 10188 10201 10211 ...
##  $ QUANTITYORDERED: int  30 34 41 45 49 36 29 48 22 41 ...
##  $ PRICEEACH      : int  96 81 95 83 100 97 86 100 99 100 ...
##  $ ORDERLINENUMBER: int  2 5 2 6 14 1 9 1 2 14 ...
##  $ SALES          : int  2871 2766 3884 3747 5205 3480 2498 5512 2169 4708 ...
##  $ STATUS         : Factor w/ 6 levels "Cancelled","Disputed",..: 6 6 6 6 6 6 6 6 6 6 ...
##  $ QTR_ID         : int  1 2 3 3 4 4 4 4 4 1 ...
##  $ MONTH_ID       : int  2 5 7 8 10 10 11 11 12 1 ...
##  $ YEAR_ID        : int  2003 2003 2003 2003 2003 2003 2003 2003 2003 2004 ...
##  $ PRODUCTLINE    : Factor w/ 7 levels "Classic Cars",..: 2 2 2 2 2 2 2 2 2 2 ...
##  $ MSRP           : int  95 95 95 95 95 95 95 95 95 95 ...
##  $ CITY           : Factor w/ 73 levels "Aaarhus","Allentown",..: 49 57 53 54 60 13 29 5 60 53 ...
##  $ COUNTRY        : Factor w/ 19 levels "Australia","Austria",..: 19 7 7 19 19 19 7 12 19 7 ...
##  $ DEALSIZE       : Factor w/ 3 levels "Large","Medium",..: 3 3 2 2 2 2 3 2 3 2 ...

Standard binomial regression machine learning for projecting customer dealsize sales:

##   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%

## Model Details:
## ==============
## 
## H2OMultinomialModel: glm
## Model ID:  GLM_model_R_1725646653168_9007 
## GLM Model: summary
##        family        link                               regularization
## 1 multinomial multinomial Elastic Net (alpha = 0.5, lambda = 0.07536 )
##   number_of_predictors_total number_of_active_predictors number_of_iterations
## 1                        621                           6                   10
##         training_frame
## 1 train_obs_sid_8aae_1
## 
## Coefficients: glm multinomial coefficients
##                                  names coefs_class_0 coefs_class_1
## 1                            Intercept     -4.697777     -1.931287
## 2          CUSTOMERNAME.AV Stores, Co.      0.000000      0.000000
## 3            CUSTOMERNAME.Alpha Cognac      0.000000      0.000000
## 4      CUSTOMERNAME.Amica Models & Co.      0.000000      0.000000
## 5 CUSTOMERNAME.Anna's Decorations, Ltd      0.000000      0.000000
##   coefs_class_2 std_coefs_class_0 std_coefs_class_1 std_coefs_class_2
## 1      6.105348         -3.361595         -0.536347         -0.693485
## 2      0.000000          0.000000          0.000000          0.000000
## 3      0.000000          0.000000          0.000000          0.000000
## 4      0.000000          0.000000          0.000000          0.000000
## 5      0.000000          0.000000          0.000000          0.000000
## 
## ---
##               names coefs_class_0 coefs_class_1 coefs_class_2 std_coefs_class_0
## 202 ORDERLINENUMBER      0.000000      0.000000      0.000000          0.000000
## 203           SALES      0.000390      0.000000     -0.000626          0.676534
## 204          QTR_ID      0.000000      0.000000      0.000000          0.000000
## 205        MONTH_ID      0.000000      0.000000      0.000000          0.000000
## 206         YEAR_ID      0.000000      0.000000      0.000000          0.000000
## 207            MSRP      0.000000      0.000000     -0.005117          0.000000
##     std_coefs_class_1 std_coefs_class_2
## 202          0.000000          0.000000
## 203          0.000000         -1.085172
## 204          0.000000          0.000000
## 205          0.000000          0.000000
## 206          0.000000          0.000000
## 207          0.000000         -0.185485
## 
## H2OMultinomialMetrics: glm
## ** Reported on training data. **
## 
## Training Set Metrics: 
## =====================
## 
## Extract training frame with `h2o.getFrame("train_obs_sid_8aae_1")`
## MSE: (Extract with `h2o.mse`) 0.1025144
## RMSE: (Extract with `h2o.rmse`) 0.3201787
## Logloss: (Extract with `h2o.logloss`) 0.3524951
## Mean Per-Class Error: 0.346966
## AUC: (Extract with `h2o.auc`) NaN
## AUCPR: (Extract with `h2o.aucpr`) NaN
## Null Deviance: (Extract with `h2o.nulldeviance`) 4386.959
## Residual Deviance: (Extract with `h2o.residual_deviance`) 1849.189
## R^2: (Extract with `h2o.r2`) 0.6876205
## AIC: (Extract with `h2o.aic`) NaN
## Confusion Matrix: Extract with `h2o.confusionMatrix(<model>,train = TRUE)`)
## =========================================================================
## Confusion Matrix: Row labels: Actual class; Column labels: Predicted class
##        Large Medium Small  Error          Rate
## Large      4    104     0 0.9630 =   104 / 108
## Medium     0   1216    49 0.0387 =  49 / 1,265
## Small      0     49  1201 0.0392 =  49 / 1,250
## Totals     4   1369  1250 0.0770 = 202 / 2,623
## 
## Hit Ratio Table: Extract with `h2o.hit_ratio_table(<model>,train = TRUE)`
## =======================================================================
## Top-3 Hit Ratios: 
##   k hit_ratio
## 1 1  0.922989
## 2 2  1.000000
## 3 3  1.000000

##   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%

## Warning in plot.window(...): "medcol" is not a graphical parameter

## Warning in plot.window(...): "medlty" is not a graphical parameter

## Warning in plot.window(...): "staplelty" is not a graphical parameter

## Warning in plot.window(...): "boxlty" is not a graphical parameter

## Warning in plot.xy(xy, type, ...): "medcol" is not a graphical parameter

## Warning in plot.xy(xy, type, ...): "medlty" is not a graphical parameter

## Warning in plot.xy(xy, type, ...): "staplelty" is not a graphical parameter

## Warning in plot.xy(xy, type, ...): "boxlty" is not a graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "medcol" is not a
## graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "medlty" is not a
## graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "staplelty" is not
## a graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "boxlty" is not a
## graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "medcol" is not a
## graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "medlty" is not a
## graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "staplelty" is not
## a graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "boxlty" is not a
## graphical parameter

## Warning in box(...): "medcol" is not a graphical parameter

## Warning in box(...): "medlty" is not a graphical parameter

## Warning in box(...): "staplelty" is not a graphical parameter

## Warning in box(...): "boxlty" is not a graphical parameter

## Warning in title(...): "medcol" is not a graphical parameter

## Warning in title(...): "medlty" is not a graphical parameter

## Warning in title(...): "staplelty" is not a graphical parameter

## Warning in title(...): "boxlty" is not a graphical parameter

## Warning in plot.xy(xy.coords(x, y), type = type, ...): "medcol" is not a
## graphical parameter

## Warning in plot.xy(xy.coords(x, y), type = type, ...): "medlty" is not a
## graphical parameter

## Warning in plot.xy(xy.coords(x, y), type = type, ...): "staplelty" is not a
## graphical parameter

## Warning in plot.xy(xy.coords(x, y), type = type, ...): "boxlty" is not a
## graphical parameter

## Warning in plot.xy(xy.coords(x, y), type = type, ...): "medcol" is not a
## graphical parameter

## Warning in plot.xy(xy.coords(x, y), type = type, ...): "medlty" is not a
## graphical parameter

## Warning in plot.xy(xy.coords(x, y), type = type, ...): "staplelty" is not a
## graphical parameter

## Warning in plot.xy(xy.coords(x, y), type = type, ...): "boxlty" is not a
## graphical parameter

## [[1]]
## PartialDependence: Partial dependency plot for QUANTITYORDERED and classes
##  Large, Medium, Small
##    QUANTITYORDERED mean_response stddev_response std_error_mean_response
## 1         6.000000      0.027041        0.051108                0.000998
## 2        10.789474      0.029364        0.051948                0.001014
## 3        15.578947      0.031679        0.052579                0.001027
## 4        20.368421      0.033969        0.053011                0.001035
## 5        25.157895      0.036216        0.053255                0.001040
## 6        29.947368      0.038408        0.053328                0.001041
## 7        34.736842      0.040537        0.053250                0.001040
## 8        39.526316      0.042593        0.053041                0.001036
## 9        44.315789      0.044571        0.052722                0.001029
## 10       49.105263      0.046467        0.052313                0.001021
## 11       53.894737      0.048276        0.051835                0.001012
## 12       58.684211      0.049992        0.051309                0.001002
## 13       63.473684      0.051611        0.050754                0.000991
## 14       68.263158      0.053125        0.050190                0.000980
## 15       73.052632      0.054529        0.049634                0.000969
## 16       77.842105      0.055817        0.049101                0.000959
## 17       82.631579      0.056986        0.048605                0.000949
## 18       87.421053      0.058032        0.048155                0.000940
## 19       92.210526      0.058955        0.047757                0.000932
## 20       97.000000      0.059759        0.047415                0.000926
## 
## [[2]]
## PartialDependence: Partial dependency plot for QUANTITYORDERED and classes
##  Large, Medium, Small
##    QUANTITYORDERED mean_response stddev_response std_error_mean_response
## 1         6.000000      0.263493        0.234985                0.004588
## 2        10.789474      0.298821        0.249011                0.004862
## 3        15.578947      0.335779        0.261226                0.005101
## 4        20.368421      0.373937        0.271279                0.005297
## 5        25.157895      0.412845        0.278865                0.005445
## 6        29.947368      0.452058        0.283735                0.005540
## 7        34.736842      0.491157        0.285701                0.005578
## 8        39.526316      0.529766        0.284650                0.005558
## 9        44.315789      0.567552        0.280556                0.005478
## 10       49.105263      0.604229        0.273488                0.005340
## 11       53.894737      0.639544        0.263614                0.005147
## 12       58.684211      0.673269        0.251198                0.004905
## 13       63.473684      0.705191        0.236587                0.004619
## 14       68.263158      0.735118        0.220192                0.004299
## 15       73.052632      0.762880        0.202468                0.003953
## 16       77.842105      0.788337        0.183899                0.003591
## 17       82.631579      0.811391        0.164987                0.003221
## 18       87.421053      0.831992        0.146243                0.002855
## 19       92.210526      0.850147        0.128174                0.002503
## 20       97.000000      0.865913        0.111270                0.002173
## 
## [[3]]
## PartialDependence: Partial dependency plot for QUANTITYORDERED and classes
##  Large, Medium, Small
##    QUANTITYORDERED mean_response stddev_response std_error_mean_response
## 1         6.000000      0.709466        0.272608                0.005323
## 2        10.789474      0.671815        0.286502                0.005594
## 3        15.578947      0.632542        0.298325                0.005825
## 4        20.368421      0.592095        0.307738                0.006009
## 5        25.157895      0.550939        0.314454                0.006140
## 6        29.947368      0.509534        0.318245                0.006214
## 7        34.736842      0.468306        0.318945                0.006228
## 8        39.526316      0.427642        0.316464                0.006179
## 9        44.315789      0.387877        0.310793                0.006068
## 10       49.105263      0.349303        0.302017                0.005897
## 11       53.894737      0.312180        0.290316                0.005669
## 12       58.684211      0.276739        0.275964                0.005388
## 13       63.473684      0.243198        0.259316                0.005063
## 14       68.263158      0.211757        0.240787                0.004701
## 15       73.052632      0.182591        0.220837                0.004312
## 16       77.842105      0.155846        0.199951                0.003904
## 17       82.631579      0.131624        0.178625                0.003488
## 18       87.421053      0.109976        0.157355                0.003072
## 19       92.210526      0.090898        0.136623                0.002668
## 20       97.000000      0.074328        0.116880                0.002282

##   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%
##   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%

Confusion Matrix and feature explanations of regression model:

## Confusion Matrix: Row labels: Actual class; Column labels: Predicted class
##        Large Medium Small  Error       Rate
## Large      3     46     0 0.9388 =  46 / 49
## Medium     0    118     1 0.0084 =  1 / 119
## Small      0      8    24 0.2500 =   8 / 32
## Totals     3    172    25 0.2750 = 55 / 200

##   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%
##   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%

Random forest machine learning for projecting customer dealsize sales. 5 folds for k-fold cross-validations:

##   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%

## Model Details:
## ==============
## 
## H2OMultinomialModel: drf
## Model ID:  DRF_model_R_1725646653168_9008 
## Model Summary: 
##   number_of_trees number_of_internal_trees model_size_in_bytes min_depth
## 1              50                      150              121224         1
##   max_depth mean_depth min_leaves max_leaves mean_leaves
## 1        18    9.68000          2        211    50.16667
## 
## 
## H2OMultinomialMetrics: drf
## ** Reported on training data. **
## ** Metrics reported on Out-Of-Bag training samples **
## 
## Training Set Metrics: 
## =====================
## 
## Extract training frame with `h2o.getFrame("train_obs_sid_8aae_1")`
## MSE: (Extract with `h2o.mse`) 0.009820549
## RMSE: (Extract with `h2o.rmse`) 0.09909868
## Logloss: (Extract with `h2o.logloss`) 0.05494521
## Mean Per-Class Error: 0.01004977
## AUC: (Extract with `h2o.auc`) NaN
## AUCPR: (Extract with `h2o.aucpr`) NaN
## R^2: (Extract with `h2o.r2`) 0.9700751
## Confusion Matrix: Extract with `h2o.confusionMatrix(<model>,train = TRUE)`)
## =========================================================================
## Confusion Matrix: Row labels: Actual class; Column labels: Predicted class
##        Large Medium Small  Error        Rate
## Large    105      3     0 0.0278 =   3 / 108
## Medium     0   1262     3 0.0024 = 3 / 1,265
## Small      0      0  1250 0.0000 = 0 / 1,250
## Totals   105   1265  1253 0.0023 = 6 / 2,623
## 
## Hit Ratio Table: Extract with `h2o.hit_ratio_table(<model>,train = TRUE)`
## =======================================================================
## Top-3 Hit Ratios: 
##   k hit_ratio
## 1 1  0.997713
## 2 2  1.000000
## 3 3  1.000000
## 
## 
## 
## 
## 
## H2OMultinomialMetrics: drf
## ** Reported on cross-validation data. **
## ** 5-fold cross-validation on training data (Metrics computed for combined holdout predictions) **
## 
## Cross-Validation Set Metrics: 
## =====================
## 
## Extract cross-validation frame with `h2o.getFrame("train_obs_sid_8aae_1")`
## MSE: (Extract with `h2o.mse`) 0.01020355
## RMSE: (Extract with `h2o.rmse`) 0.1010126
## Logloss: (Extract with `h2o.logloss`) 0.06158316
## Mean Per-Class Error: 0.01031644
## AUC: (Extract with `h2o.auc`) NaN
## AUCPR: (Extract with `h2o.aucpr`) NaN
## R^2: (Extract with `h2o.r2`) 0.968908
## Hit Ratio Table: Extract with `h2o.hit_ratio_table(<model>,xval = TRUE)`
## =======================================================================
## Top-3 Hit Ratios: 
##   k hit_ratio
## 1 1  0.997331
## 2 2  1.000000
## 3 3  1.000000
## 
## 
## 
## 
## Cross-Validation Metrics Summary: 
##                             mean       sd cv_1_valid cv_2_valid cv_3_valid
## accuracy                0.997338 0.001677   0.998058   0.998120   0.998141
## auc                           NA 0.000000         NA         NA         NA
## err                     0.002662 0.001677   0.001942   0.001880   0.001859
## err_count               1.400000 0.894427   1.000000   1.000000   1.000000
## logloss                 0.061556 0.006180   0.066501   0.057156   0.060878
## max_per_class_error     0.025477 0.031714   0.041667   0.003636   0.003891
## mean_per_class_accuracy 0.991241 0.011086   0.986111   0.998788   0.998703
## mean_per_class_error    0.008759 0.011086   0.013889   0.001212   0.001297
## mse                     0.010192 0.002018   0.011436   0.008800   0.010067
## pr_auc                        NA 0.000000         NA         NA         NA
## r2                      0.969086 0.004604   0.966398   0.971659   0.969925
## rmse                    0.100559 0.010001   0.106941   0.093809   0.100335
##                         cv_4_valid cv_5_valid
## accuracy                  0.994340   0.998031
## auc                             NA         NA
## err                       0.005660   0.001969
## err_count                 3.000000   1.000000
## logloss                   0.068987   0.054257
## max_per_class_error       0.074074   0.004115
## mean_per_class_accuracy   0.973975   0.998628
## mean_per_class_error      0.026025   0.001372
## mse                       0.012854   0.007803
## pr_auc                          NA         NA
## r2                        0.962808   0.974638
## rmse                      0.113375   0.088333

##   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%

## Warning in plot.window(...): "medcol" is not a graphical parameter

## Warning in plot.window(...): "medlty" is not a graphical parameter

## Warning in plot.window(...): "staplelty" is not a graphical parameter

## Warning in plot.window(...): "boxlty" is not a graphical parameter

## Warning in plot.xy(xy, type, ...): "medcol" is not a graphical parameter

## Warning in plot.xy(xy, type, ...): "medlty" is not a graphical parameter

## Warning in plot.xy(xy, type, ...): "staplelty" is not a graphical parameter

## Warning in plot.xy(xy, type, ...): "boxlty" is not a graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "medcol" is not a
## graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "medlty" is not a
## graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "staplelty" is not
## a graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "boxlty" is not a
## graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "medcol" is not a
## graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "medlty" is not a
## graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "staplelty" is not
## a graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "boxlty" is not a
## graphical parameter

## Warning in box(...): "medcol" is not a graphical parameter

## Warning in box(...): "medlty" is not a graphical parameter

## Warning in box(...): "staplelty" is not a graphical parameter

## Warning in box(...): "boxlty" is not a graphical parameter

## Warning in title(...): "medcol" is not a graphical parameter

## Warning in title(...): "medlty" is not a graphical parameter

## Warning in title(...): "staplelty" is not a graphical parameter

## Warning in title(...): "boxlty" is not a graphical parameter

## Warning in plot.xy(xy.coords(x, y), type = type, ...): "medcol" is not a
## graphical parameter

## Warning in plot.xy(xy.coords(x, y), type = type, ...): "medlty" is not a
## graphical parameter

## Warning in plot.xy(xy.coords(x, y), type = type, ...): "staplelty" is not a
## graphical parameter

## Warning in plot.xy(xy.coords(x, y), type = type, ...): "boxlty" is not a
## graphical parameter

## Warning in plot.xy(xy.coords(x, y), type = type, ...): "medcol" is not a
## graphical parameter

## Warning in plot.xy(xy.coords(x, y), type = type, ...): "medlty" is not a
## graphical parameter

## Warning in plot.xy(xy.coords(x, y), type = type, ...): "staplelty" is not a
## graphical parameter

## Warning in plot.xy(xy.coords(x, y), type = type, ...): "boxlty" is not a
## graphical parameter

## [[1]]
## PartialDependence: Partial dependency plot for QUANTITYORDERED and classes
##  Large, Medium, Small
##    QUANTITYORDERED mean_response stddev_response std_error_mean_response
## 1         6.000000      0.027333        0.125172                0.002444
## 2        10.789474      0.027333        0.125172                0.002444
## 3        15.578947      0.027560        0.126184                0.002464
## 4        20.368421      0.027560        0.126184                0.002464
## 5        25.157895      0.028205        0.129376                0.002526
## 6        29.947368      0.029395        0.135107                0.002638
## 7        34.736842      0.034734        0.159177                0.003108
## 8        39.526316      0.037675        0.164382                0.003210
## 9        44.315789      0.043326        0.169844                0.003316
## 10       49.105263      0.047286        0.173566                0.003389
## 11       53.894737      0.052453        0.175913                0.003435
## 12       58.684211      0.055517        0.176615                0.003448
## 13       63.473684      0.056365        0.176703                0.003450
## 14       68.263158      0.064453        0.175871                0.003434
## 15       73.052632      0.136733        0.160498                0.003134
## 16       77.842105      0.137115        0.160632                0.003136
## 17       82.631579      0.146225        0.159158                0.003108
## 18       87.421053      0.146225        0.159158                0.003108
## 19       92.210526      0.146225        0.159158                0.003108
## 20       97.000000      0.146225        0.159158                0.003108
## 
## [[2]]
## PartialDependence: Partial dependency plot for QUANTITYORDERED and classes
##  Large, Medium, Small
##    QUANTITYORDERED mean_response stddev_response std_error_mean_response
## 1         6.000000      0.430804        0.421248                0.008225
## 2        10.789474      0.430804        0.421248                0.008225
## 3        15.578947      0.433192        0.423853                0.008276
## 4        20.368421      0.433192        0.423853                0.008276
## 5        25.157895      0.448231        0.436391                0.008521
## 6        29.947368      0.469186        0.443653                0.008663
## 7        34.736842      0.501733        0.454398                0.008872
## 8        39.526316      0.511731        0.449823                0.008783
## 9        44.315789      0.518265        0.445028                0.008689
## 10       49.105263      0.520400        0.439006                0.008572
## 11       53.894737      0.518227        0.432444                0.008444
## 12       58.684211      0.514798        0.431421                0.008424
## 13       63.473684      0.513600        0.431554                0.008426
## 14       68.263158      0.508399        0.427168                0.008341
## 15       73.052632      0.472102        0.399356                0.007798
## 16       77.842105      0.471681        0.399032                0.007791
## 17       82.631579      0.468653        0.397172                0.007755
## 18       87.421053      0.468653        0.397172                0.007755
## 19       92.210526      0.468653        0.397172                0.007755
## 20       97.000000      0.468653        0.397172                0.007755
## 
## [[3]]
## PartialDependence: Partial dependency plot for QUANTITYORDERED and classes
##  Large, Medium, Small
##    QUANTITYORDERED mean_response stddev_response std_error_mean_response
## 1         6.000000      0.541863        0.433149                0.008457
## 2        10.789474      0.541863        0.433149                0.008457
## 3        15.578947      0.539248        0.435972                0.008513
## 4        20.368421      0.539248        0.435972                0.008513
## 5        25.157895      0.523564        0.448755                0.008762
## 6        29.947368      0.501419        0.454995                0.008884
## 7        34.736842      0.463533        0.461440                0.009010
## 8        39.526316      0.450594        0.456770                0.008919
## 9        44.315789      0.438409        0.453384                0.008853
## 10       49.105263      0.432314        0.448256                0.008752
## 11       53.894737      0.429320        0.443531                0.008660
## 12       58.684211      0.429685        0.443709                0.008664
## 13       63.473684      0.430035        0.443965                0.008669
## 14       68.263158      0.427148        0.440852                0.008608
## 15       73.052632      0.391165        0.403053                0.007870
## 16       77.842105      0.391204        0.403003                0.007869
## 17       82.631579      0.385122        0.396597                0.007744
## 18       87.421053      0.385122        0.396597                0.007744
## 19       92.210526      0.385122        0.396597                0.007744
## 20       97.000000      0.385122        0.396597                0.007744

##   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%
##   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%

Confusion Matrix and feature explanations of random forest cross validation model:

## Confusion Matrix: Row labels: Actual class; Column labels: Predicted class
##        Large Medium Small  Error      Rate
## Large     48      1     0 0.0204 =  1 / 49
## Medium     0    118     1 0.0084 = 1 / 119
## Small      0      0    32 0.0000 =  0 / 32
## Totals    48    119    33 0.0100 = 2 / 200

##   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%
##   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%

Gradient boosting machine learning for projecting customer dealsize sales:

##   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%

##                                                                                                   
## 1 function (x, y, training_frame, model_id = NULL, validation_frame = NULL,                       
## 2     nfolds = 0, keep_cross_validation_models = TRUE, keep_cross_validation_predictions = FALSE, 
## 3     keep_cross_validation_fold_assignment = FALSE, score_each_iteration = FALSE,                
## 4     score_tree_interval = 0, fold_assignment = c("AUTO", "Random",                              
## 5         "Modulo", "Stratified"), fold_column = NULL, ignore_const_cols = TRUE,                  
## 6     offset_column = NULL, weights_column = NULL, balance_classes = FALSE,

##   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%

## Warning in plot.window(...): "medcol" is not a graphical parameter

## Warning in plot.window(...): "medlty" is not a graphical parameter

## Warning in plot.window(...): "staplelty" is not a graphical parameter

## Warning in plot.window(...): "boxlty" is not a graphical parameter

## Warning in plot.xy(xy, type, ...): "medcol" is not a graphical parameter

## Warning in plot.xy(xy, type, ...): "medlty" is not a graphical parameter

## Warning in plot.xy(xy, type, ...): "staplelty" is not a graphical parameter

## Warning in plot.xy(xy, type, ...): "boxlty" is not a graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "medcol" is not a
## graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "medlty" is not a
## graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "staplelty" is not
## a graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "boxlty" is not a
## graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "medcol" is not a
## graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "medlty" is not a
## graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "staplelty" is not
## a graphical parameter

## Warning in axis(side = side, at = at, labels = labels, ...): "boxlty" is not a
## graphical parameter

## Warning in box(...): "medcol" is not a graphical parameter

## Warning in box(...): "medlty" is not a graphical parameter

## Warning in box(...): "staplelty" is not a graphical parameter

## Warning in box(...): "boxlty" is not a graphical parameter

## Warning in title(...): "medcol" is not a graphical parameter

## Warning in title(...): "medlty" is not a graphical parameter

## Warning in title(...): "staplelty" is not a graphical parameter

## Warning in title(...): "boxlty" is not a graphical parameter

## Warning in plot.xy(xy.coords(x, y), type = type, ...): "medcol" is not a
## graphical parameter

## Warning in plot.xy(xy.coords(x, y), type = type, ...): "medlty" is not a
## graphical parameter

## Warning in plot.xy(xy.coords(x, y), type = type, ...): "staplelty" is not a
## graphical parameter

## Warning in plot.xy(xy.coords(x, y), type = type, ...): "boxlty" is not a
## graphical parameter

## Warning in plot.xy(xy.coords(x, y), type = type, ...): "medcol" is not a
## graphical parameter

## Warning in plot.xy(xy.coords(x, y), type = type, ...): "medlty" is not a
## graphical parameter

## Warning in plot.xy(xy.coords(x, y), type = type, ...): "staplelty" is not a
## graphical parameter

## Warning in plot.xy(xy.coords(x, y), type = type, ...): "boxlty" is not a
## graphical parameter

## [[1]]
## PartialDependence: Partial dependency plot for QUANTITYORDERED and classes
##  Large, Medium, Small
##    QUANTITYORDERED mean_response stddev_response std_error_mean_response
## 1         6.000000      0.041643        0.198424                0.003874
## 2        10.789474      0.041643        0.198424                0.003874
## 3        15.578947      0.041643        0.198424                0.003874
## 4        20.368421      0.041643        0.198424                0.003874
## 5        25.157895      0.041643        0.198424                0.003874
## 6        29.947368      0.041643        0.198424                0.003874
## 7        34.736842      0.041643        0.198424                0.003874
## 8        39.526316      0.041642        0.198424                0.003874
## 9        44.315789      0.041642        0.198424                0.003874
## 10       49.105263      0.041642        0.198424                0.003874
## 11       53.894737      0.042064        0.198337                0.003873
## 12       58.684211      0.042064        0.198337                0.003873
## 13       63.473684      0.042064        0.198337                0.003873
## 14       68.263158      0.042064        0.198337                0.003873
## 15       73.052632      0.042064        0.198337                0.003873
## 16       77.842105      0.042064        0.198337                0.003873
## 17       82.631579      0.042064        0.198337                0.003873
## 18       87.421053      0.042064        0.198337                0.003873
## 19       92.210526      0.042064        0.198337                0.003873
## 20       97.000000      0.042064        0.198337                0.003873
## 
## [[2]]
## PartialDependence: Partial dependency plot for QUANTITYORDERED and classes
##  Large, Medium, Small
##    QUANTITYORDERED mean_response stddev_response std_error_mean_response
## 1         6.000000      0.481906        0.498737                0.009738
## 2        10.789474      0.481906        0.498737                0.009738
## 3        15.578947      0.481906        0.498737                0.009738
## 4        20.368421      0.481906        0.498737                0.009738
## 5        25.157895      0.481906        0.498737                0.009738
## 6        29.947368      0.481906        0.498737                0.009738
## 7        34.736842      0.481906        0.498737                0.009738
## 8        39.526316      0.481951        0.498742                0.009738
## 9        44.315789      0.481950        0.498742                0.009738
## 10       49.105263      0.481950        0.498742                0.009738
## 11       53.894737      0.483059        0.497910                0.009722
## 12       58.684211      0.483059        0.497910                0.009722
## 13       63.473684      0.483059        0.497910                0.009722
## 14       68.263158      0.483059        0.497910                0.009722
## 15       73.052632      0.483059        0.497910                0.009722
## 16       77.842105      0.483059        0.497910                0.009722
## 17       82.631579      0.483059        0.497910                0.009722
## 18       87.421053      0.483059        0.497910                0.009722
## 19       92.210526      0.483059        0.497910                0.009722
## 20       97.000000      0.483059        0.497910                0.009722
## 
## [[3]]
## PartialDependence: Partial dependency plot for QUANTITYORDERED and classes
##  Large, Medium, Small
##    QUANTITYORDERED mean_response stddev_response std_error_mean_response
## 1         6.000000      0.476451        0.498518                0.009734
## 2        10.789474      0.476451        0.498518                0.009734
## 3        15.578947      0.476451        0.498518                0.009734
## 4        20.368421      0.476451        0.498518                0.009734
## 5        25.157895      0.476451        0.498518                0.009734
## 6        29.947368      0.476451        0.498518                0.009734
## 7        34.736842      0.476451        0.498518                0.009734
## 8        39.526316      0.476407        0.498519                0.009734
## 9        44.315789      0.476407        0.498519                0.009734
## 10       49.105263      0.476408        0.498520                0.009734
## 11       53.894737      0.474877        0.497155                0.009707
## 12       58.684211      0.474877        0.497155                0.009707
## 13       63.473684      0.474877        0.497155                0.009707
## 14       68.263158      0.474877        0.497155                0.009707
## 15       73.052632      0.474877        0.497155                0.009707
## 16       77.842105      0.474877        0.497155                0.009707
## 17       82.631579      0.474877        0.497155                0.009707
## 18       87.421053      0.474877        0.497155                0.009707
## 19       92.210526      0.474877        0.497155                0.009707
## 20       97.000000      0.474877        0.497155                0.009707

##   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%
##   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%

Confusion Matrix and feature explanations of gradient boosting model:

## Confusion Matrix: Row labels: Actual class; Column labels: Predicted class
##        Large Medium Small  Error      Rate
## Large     49      0     0 0.0000 =  0 / 49
## Medium     0    118     1 0.0084 = 1 / 119
## Small      0      0    32 0.0000 =  0 / 32
## Totals    49    118    33 0.0050 = 1 / 200

##   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%
##   |                                                                              |                                                                      |   0%  |                                                                              |======================================================================| 100%

All models have similar output projections when compared side by side. Observations in large customer dealsizes include sales at ~$10,683, MSRP at ~$163.50, and classic car purchases. Observations in medium dealsize purchases include prices between $3-7k, unit prices of ~$81.80, and quantities of 30-50 units sales. Observations in small dealsize purchases include sales at ~$3k and unit prices at ~$80. These results suggests behavior trends in customer purchases which can be leveraged to increase margains, while controlling for product loss.

Data available at https://www.kaggle.com/datasets/anshulranjan2004/retail-performance-dataset.

RStudio 2024.04.2+764 “Chocolate Cosmos” Release (e4392fc9ddc21961fd1d0efd47484b43f07a4177, 2024-06-05) for Ubuntu Jammy Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) rstudio/2024.04.2+764 Chrome/120.0.6099.291 Electron/28.3.1 Safari/537.36, Quarto 1.4.555

Retail Performance Dataset

Chakkapong Burudpakdee

2024-09-06