Comprehensive Sales Data for Retail Analytics and Forecasting
This dataset provides detailed sales transaction information, including order numbers, quantities ordered, unit prices, and sales amounts, alongside related fields like order dates, statuses, and customer details. It is segmented by various categories such as quarter, month, and year.
This dataset is ideal for sales forecasting, customer segmentation, clustering analysis, and trend identification, offering a comprehensive view of sales performance over time.
##
## Attaching package: 'vip'
## The following object is masked from 'package:utils':
##
## vi
## Loading required package: lattice
##
## ----------------------------------------------------------------------
##
## Your next step is to start H2O:
## > h2o.init()
##
## For H2O package documentation, ask for help:
## > ??h2o
##
## After starting H2O, you can use the Web UI at http://localhost:54321
## For more information visit https://docs.h2o.ai
##
## ----------------------------------------------------------------------
##
## Attaching package: 'h2o'
## The following objects are masked from 'package:stats':
##
## cor, sd, var
## The following objects are masked from 'package:base':
##
## &&, %*%, %in%, ||, apply, as.factor, as.numeric, colnames,
## colnames<-, ifelse, is.character, is.factor, is.numeric, log,
## log10, log1p, log2, round, signif, trunc
##
## Attaching package: 'dplyr'
## The following object is masked from 'package:lime':
##
## explain
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
## Connection successful!
##
## R is connected to the H2O cluster:
## H2O cluster uptime: 5 hours 15 minutes
## H2O cluster timezone: America/New_York
## H2O data parsing timezone: UTC
## H2O cluster version: 3.44.0.3
## H2O cluster version age: 8 months and 17 days
## H2O cluster name: H2O_started_from_R_deviancedev01_sga668
## H2O cluster total nodes: 1
## H2O cluster total memory: 15.02 GB
## H2O cluster total cores: 24
## H2O cluster allowed cores: 24
## H2O cluster healthy: TRUE
## H2O Connection ip: localhost
## H2O Connection port: 54321
## H2O Connection proxy: NA
## H2O Internal Security: FALSE
## R Version: R version 4.4.1 (2024-06-14)
## Warning in h2o.clusterInfo():
## Your H2O cluster version is (8 months and 17 days) old. There may be a newer version available.
## Please download and install the latest version from: https://h2o-release.s3.amazonaws.com/h2o/latest_stable.html
## | | | 0% | |======================================================================| 100%
## | | | 0% | |======================================================================| 100%
## 'data.frame': 2823 obs. of 15 variables:
## $ CUSTOMERNAME : Factor w/ 92 levels "Alpha Cognac",..: 47 68 48 87 24 81 27 42 58 9 ...
## $ ORDERNUMBER : int 10107 10121 10134 10145 10159 10168 10180 10188 10201 10211 ...
## $ QUANTITYORDERED: int 30 34 41 45 49 36 29 48 22 41 ...
## $ PRICEEACH : int 96 81 95 83 100 97 86 100 99 100 ...
## $ ORDERLINENUMBER: int 2 5 2 6 14 1 9 1 2 14 ...
## $ SALES : int 2871 2766 3884 3747 5205 3480 2498 5512 2169 4708 ...
## $ STATUS : Factor w/ 6 levels "Cancelled","Disputed",..: 6 6 6 6 6 6 6 6 6 6 ...
## $ QTR_ID : int 1 2 3 3 4 4 4 4 4 1 ...
## $ MONTH_ID : int 2 5 7 8 10 10 11 11 12 1 ...
## $ YEAR_ID : int 2003 2003 2003 2003 2003 2003 2003 2003 2003 2004 ...
## $ PRODUCTLINE : Factor w/ 7 levels "Classic Cars",..: 2 2 2 2 2 2 2 2 2 2 ...
## $ MSRP : int 95 95 95 95 95 95 95 95 95 95 ...
## $ CITY : Factor w/ 73 levels "Aaarhus","Allentown",..: 49 57 53 54 60 13 29 5 60 53 ...
## $ COUNTRY : Factor w/ 19 levels "Australia","Austria",..: 19 7 7 19 19 19 7 12 19 7 ...
## $ DEALSIZE : Factor w/ 3 levels "Large","Medium",..: 3 3 2 2 2 2 3 2 3 2 ...
Standard binomial regression machine learning for projecting customer dealsize sales:
## | | | 0% | |======================================================================| 100%
## Model Details:
## ==============
##
## H2OMultinomialModel: glm
## Model ID: GLM_model_R_1725646653168_9007
## GLM Model: summary
## family link regularization
## 1 multinomial multinomial Elastic Net (alpha = 0.5, lambda = 0.07536 )
## number_of_predictors_total number_of_active_predictors number_of_iterations
## 1 621 6 10
## training_frame
## 1 train_obs_sid_8aae_1
##
## Coefficients: glm multinomial coefficients
## names coefs_class_0 coefs_class_1
## 1 Intercept -4.697777 -1.931287
## 2 CUSTOMERNAME.AV Stores, Co. 0.000000 0.000000
## 3 CUSTOMERNAME.Alpha Cognac 0.000000 0.000000
## 4 CUSTOMERNAME.Amica Models & Co. 0.000000 0.000000
## 5 CUSTOMERNAME.Anna's Decorations, Ltd 0.000000 0.000000
## coefs_class_2 std_coefs_class_0 std_coefs_class_1 std_coefs_class_2
## 1 6.105348 -3.361595 -0.536347 -0.693485
## 2 0.000000 0.000000 0.000000 0.000000
## 3 0.000000 0.000000 0.000000 0.000000
## 4 0.000000 0.000000 0.000000 0.000000
## 5 0.000000 0.000000 0.000000 0.000000
##
## ---
## names coefs_class_0 coefs_class_1 coefs_class_2 std_coefs_class_0
## 202 ORDERLINENUMBER 0.000000 0.000000 0.000000 0.000000
## 203 SALES 0.000390 0.000000 -0.000626 0.676534
## 204 QTR_ID 0.000000 0.000000 0.000000 0.000000
## 205 MONTH_ID 0.000000 0.000000 0.000000 0.000000
## 206 YEAR_ID 0.000000 0.000000 0.000000 0.000000
## 207 MSRP 0.000000 0.000000 -0.005117 0.000000
## std_coefs_class_1 std_coefs_class_2
## 202 0.000000 0.000000
## 203 0.000000 -1.085172
## 204 0.000000 0.000000
## 205 0.000000 0.000000
## 206 0.000000 0.000000
## 207 0.000000 -0.185485
##
## H2OMultinomialMetrics: glm
## ** Reported on training data. **
##
## Training Set Metrics:
## =====================
##
## Extract training frame with `h2o.getFrame("train_obs_sid_8aae_1")`
## MSE: (Extract with `h2o.mse`) 0.1025144
## RMSE: (Extract with `h2o.rmse`) 0.3201787
## Logloss: (Extract with `h2o.logloss`) 0.3524951
## Mean Per-Class Error: 0.346966
## AUC: (Extract with `h2o.auc`) NaN
## AUCPR: (Extract with `h2o.aucpr`) NaN
## Null Deviance: (Extract with `h2o.nulldeviance`) 4386.959
## Residual Deviance: (Extract with `h2o.residual_deviance`) 1849.189
## R^2: (Extract with `h2o.r2`) 0.6876205
## AIC: (Extract with `h2o.aic`) NaN
## Confusion Matrix: Extract with `h2o.confusionMatrix(<model>,train = TRUE)`)
## =========================================================================
## Confusion Matrix: Row labels: Actual class; Column labels: Predicted class
## Large Medium Small Error Rate
## Large 4 104 0 0.9630 = 104 / 108
## Medium 0 1216 49 0.0387 = 49 / 1,265
## Small 0 49 1201 0.0392 = 49 / 1,250
## Totals 4 1369 1250 0.0770 = 202 / 2,623
##
## Hit Ratio Table: Extract with `h2o.hit_ratio_table(<model>,train = TRUE)`
## =======================================================================
## Top-3 Hit Ratios:
## k hit_ratio
## 1 1 0.922989
## 2 2 1.000000
## 3 3 1.000000
## | | | 0% | |======================================================================| 100%
## Warning in plot.window(...): "medcol" is not a graphical parameter
## Warning in plot.window(...): "medlty" is not a graphical parameter
## Warning in plot.window(...): "staplelty" is not a graphical parameter
## Warning in plot.window(...): "boxlty" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "medcol" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "medlty" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "staplelty" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "boxlty" is not a graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "medcol" is not a
## graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "medlty" is not a
## graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "staplelty" is not
## a graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "boxlty" is not a
## graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "medcol" is not a
## graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "medlty" is not a
## graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "staplelty" is not
## a graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "boxlty" is not a
## graphical parameter
## Warning in box(...): "medcol" is not a graphical parameter
## Warning in box(...): "medlty" is not a graphical parameter
## Warning in box(...): "staplelty" is not a graphical parameter
## Warning in box(...): "boxlty" is not a graphical parameter
## Warning in title(...): "medcol" is not a graphical parameter
## Warning in title(...): "medlty" is not a graphical parameter
## Warning in title(...): "staplelty" is not a graphical parameter
## Warning in title(...): "boxlty" is not a graphical parameter
## Warning in plot.xy(xy.coords(x, y), type = type, ...): "medcol" is not a
## graphical parameter
## Warning in plot.xy(xy.coords(x, y), type = type, ...): "medlty" is not a
## graphical parameter
## Warning in plot.xy(xy.coords(x, y), type = type, ...): "staplelty" is not a
## graphical parameter
## Warning in plot.xy(xy.coords(x, y), type = type, ...): "boxlty" is not a
## graphical parameter
## Warning in plot.xy(xy.coords(x, y), type = type, ...): "medcol" is not a
## graphical parameter
## Warning in plot.xy(xy.coords(x, y), type = type, ...): "medlty" is not a
## graphical parameter
## Warning in plot.xy(xy.coords(x, y), type = type, ...): "staplelty" is not a
## graphical parameter
## Warning in plot.xy(xy.coords(x, y), type = type, ...): "boxlty" is not a
## graphical parameter
## [[1]]
## PartialDependence: Partial dependency plot for QUANTITYORDERED and classes
## Large, Medium, Small
## QUANTITYORDERED mean_response stddev_response std_error_mean_response
## 1 6.000000 0.027041 0.051108 0.000998
## 2 10.789474 0.029364 0.051948 0.001014
## 3 15.578947 0.031679 0.052579 0.001027
## 4 20.368421 0.033969 0.053011 0.001035
## 5 25.157895 0.036216 0.053255 0.001040
## 6 29.947368 0.038408 0.053328 0.001041
## 7 34.736842 0.040537 0.053250 0.001040
## 8 39.526316 0.042593 0.053041 0.001036
## 9 44.315789 0.044571 0.052722 0.001029
## 10 49.105263 0.046467 0.052313 0.001021
## 11 53.894737 0.048276 0.051835 0.001012
## 12 58.684211 0.049992 0.051309 0.001002
## 13 63.473684 0.051611 0.050754 0.000991
## 14 68.263158 0.053125 0.050190 0.000980
## 15 73.052632 0.054529 0.049634 0.000969
## 16 77.842105 0.055817 0.049101 0.000959
## 17 82.631579 0.056986 0.048605 0.000949
## 18 87.421053 0.058032 0.048155 0.000940
## 19 92.210526 0.058955 0.047757 0.000932
## 20 97.000000 0.059759 0.047415 0.000926
##
## [[2]]
## PartialDependence: Partial dependency plot for QUANTITYORDERED and classes
## Large, Medium, Small
## QUANTITYORDERED mean_response stddev_response std_error_mean_response
## 1 6.000000 0.263493 0.234985 0.004588
## 2 10.789474 0.298821 0.249011 0.004862
## 3 15.578947 0.335779 0.261226 0.005101
## 4 20.368421 0.373937 0.271279 0.005297
## 5 25.157895 0.412845 0.278865 0.005445
## 6 29.947368 0.452058 0.283735 0.005540
## 7 34.736842 0.491157 0.285701 0.005578
## 8 39.526316 0.529766 0.284650 0.005558
## 9 44.315789 0.567552 0.280556 0.005478
## 10 49.105263 0.604229 0.273488 0.005340
## 11 53.894737 0.639544 0.263614 0.005147
## 12 58.684211 0.673269 0.251198 0.004905
## 13 63.473684 0.705191 0.236587 0.004619
## 14 68.263158 0.735118 0.220192 0.004299
## 15 73.052632 0.762880 0.202468 0.003953
## 16 77.842105 0.788337 0.183899 0.003591
## 17 82.631579 0.811391 0.164987 0.003221
## 18 87.421053 0.831992 0.146243 0.002855
## 19 92.210526 0.850147 0.128174 0.002503
## 20 97.000000 0.865913 0.111270 0.002173
##
## [[3]]
## PartialDependence: Partial dependency plot for QUANTITYORDERED and classes
## Large, Medium, Small
## QUANTITYORDERED mean_response stddev_response std_error_mean_response
## 1 6.000000 0.709466 0.272608 0.005323
## 2 10.789474 0.671815 0.286502 0.005594
## 3 15.578947 0.632542 0.298325 0.005825
## 4 20.368421 0.592095 0.307738 0.006009
## 5 25.157895 0.550939 0.314454 0.006140
## 6 29.947368 0.509534 0.318245 0.006214
## 7 34.736842 0.468306 0.318945 0.006228
## 8 39.526316 0.427642 0.316464 0.006179
## 9 44.315789 0.387877 0.310793 0.006068
## 10 49.105263 0.349303 0.302017 0.005897
## 11 53.894737 0.312180 0.290316 0.005669
## 12 58.684211 0.276739 0.275964 0.005388
## 13 63.473684 0.243198 0.259316 0.005063
## 14 68.263158 0.211757 0.240787 0.004701
## 15 73.052632 0.182591 0.220837 0.004312
## 16 77.842105 0.155846 0.199951 0.003904
## 17 82.631579 0.131624 0.178625 0.003488
## 18 87.421053 0.109976 0.157355 0.003072
## 19 92.210526 0.090898 0.136623 0.002668
## 20 97.000000 0.074328 0.116880 0.002282
## | | | 0% | |======================================================================| 100%
## | | | 0% | |======================================================================| 100%
Confusion Matrix and feature explanations of regression model:
## Confusion Matrix: Row labels: Actual class; Column labels: Predicted class
## Large Medium Small Error Rate
## Large 3 46 0 0.9388 = 46 / 49
## Medium 0 118 1 0.0084 = 1 / 119
## Small 0 8 24 0.2500 = 8 / 32
## Totals 3 172 25 0.2750 = 55 / 200
## | | | 0% | |======================================================================| 100%
## | | | 0% | |======================================================================| 100%
Random forest machine learning for projecting customer dealsize sales. 5 folds for k-fold cross-validations:
## | | | 0% | |======================================================================| 100%
## Model Details:
## ==============
##
## H2OMultinomialModel: drf
## Model ID: DRF_model_R_1725646653168_9008
## Model Summary:
## number_of_trees number_of_internal_trees model_size_in_bytes min_depth
## 1 50 150 121224 1
## max_depth mean_depth min_leaves max_leaves mean_leaves
## 1 18 9.68000 2 211 50.16667
##
##
## H2OMultinomialMetrics: drf
## ** Reported on training data. **
## ** Metrics reported on Out-Of-Bag training samples **
##
## Training Set Metrics:
## =====================
##
## Extract training frame with `h2o.getFrame("train_obs_sid_8aae_1")`
## MSE: (Extract with `h2o.mse`) 0.009820549
## RMSE: (Extract with `h2o.rmse`) 0.09909868
## Logloss: (Extract with `h2o.logloss`) 0.05494521
## Mean Per-Class Error: 0.01004977
## AUC: (Extract with `h2o.auc`) NaN
## AUCPR: (Extract with `h2o.aucpr`) NaN
## R^2: (Extract with `h2o.r2`) 0.9700751
## Confusion Matrix: Extract with `h2o.confusionMatrix(<model>,train = TRUE)`)
## =========================================================================
## Confusion Matrix: Row labels: Actual class; Column labels: Predicted class
## Large Medium Small Error Rate
## Large 105 3 0 0.0278 = 3 / 108
## Medium 0 1262 3 0.0024 = 3 / 1,265
## Small 0 0 1250 0.0000 = 0 / 1,250
## Totals 105 1265 1253 0.0023 = 6 / 2,623
##
## Hit Ratio Table: Extract with `h2o.hit_ratio_table(<model>,train = TRUE)`
## =======================================================================
## Top-3 Hit Ratios:
## k hit_ratio
## 1 1 0.997713
## 2 2 1.000000
## 3 3 1.000000
##
##
##
##
##
## H2OMultinomialMetrics: drf
## ** Reported on cross-validation data. **
## ** 5-fold cross-validation on training data (Metrics computed for combined holdout predictions) **
##
## Cross-Validation Set Metrics:
## =====================
##
## Extract cross-validation frame with `h2o.getFrame("train_obs_sid_8aae_1")`
## MSE: (Extract with `h2o.mse`) 0.01020355
## RMSE: (Extract with `h2o.rmse`) 0.1010126
## Logloss: (Extract with `h2o.logloss`) 0.06158316
## Mean Per-Class Error: 0.01031644
## AUC: (Extract with `h2o.auc`) NaN
## AUCPR: (Extract with `h2o.aucpr`) NaN
## R^2: (Extract with `h2o.r2`) 0.968908
## Hit Ratio Table: Extract with `h2o.hit_ratio_table(<model>,xval = TRUE)`
## =======================================================================
## Top-3 Hit Ratios:
## k hit_ratio
## 1 1 0.997331
## 2 2 1.000000
## 3 3 1.000000
##
##
##
##
## Cross-Validation Metrics Summary:
## mean sd cv_1_valid cv_2_valid cv_3_valid
## accuracy 0.997338 0.001677 0.998058 0.998120 0.998141
## auc NA 0.000000 NA NA NA
## err 0.002662 0.001677 0.001942 0.001880 0.001859
## err_count 1.400000 0.894427 1.000000 1.000000 1.000000
## logloss 0.061556 0.006180 0.066501 0.057156 0.060878
## max_per_class_error 0.025477 0.031714 0.041667 0.003636 0.003891
## mean_per_class_accuracy 0.991241 0.011086 0.986111 0.998788 0.998703
## mean_per_class_error 0.008759 0.011086 0.013889 0.001212 0.001297
## mse 0.010192 0.002018 0.011436 0.008800 0.010067
## pr_auc NA 0.000000 NA NA NA
## r2 0.969086 0.004604 0.966398 0.971659 0.969925
## rmse 0.100559 0.010001 0.106941 0.093809 0.100335
## cv_4_valid cv_5_valid
## accuracy 0.994340 0.998031
## auc NA NA
## err 0.005660 0.001969
## err_count 3.000000 1.000000
## logloss 0.068987 0.054257
## max_per_class_error 0.074074 0.004115
## mean_per_class_accuracy 0.973975 0.998628
## mean_per_class_error 0.026025 0.001372
## mse 0.012854 0.007803
## pr_auc NA NA
## r2 0.962808 0.974638
## rmse 0.113375 0.088333
## | | | 0% | |======================================================================| 100%
## Warning in plot.window(...): "medcol" is not a graphical parameter
## Warning in plot.window(...): "medlty" is not a graphical parameter
## Warning in plot.window(...): "staplelty" is not a graphical parameter
## Warning in plot.window(...): "boxlty" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "medcol" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "medlty" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "staplelty" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "boxlty" is not a graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "medcol" is not a
## graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "medlty" is not a
## graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "staplelty" is not
## a graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "boxlty" is not a
## graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "medcol" is not a
## graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "medlty" is not a
## graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "staplelty" is not
## a graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "boxlty" is not a
## graphical parameter
## Warning in box(...): "medcol" is not a graphical parameter
## Warning in box(...): "medlty" is not a graphical parameter
## Warning in box(...): "staplelty" is not a graphical parameter
## Warning in box(...): "boxlty" is not a graphical parameter
## Warning in title(...): "medcol" is not a graphical parameter
## Warning in title(...): "medlty" is not a graphical parameter
## Warning in title(...): "staplelty" is not a graphical parameter
## Warning in title(...): "boxlty" is not a graphical parameter
## Warning in plot.xy(xy.coords(x, y), type = type, ...): "medcol" is not a
## graphical parameter
## Warning in plot.xy(xy.coords(x, y), type = type, ...): "medlty" is not a
## graphical parameter
## Warning in plot.xy(xy.coords(x, y), type = type, ...): "staplelty" is not a
## graphical parameter
## Warning in plot.xy(xy.coords(x, y), type = type, ...): "boxlty" is not a
## graphical parameter
## Warning in plot.xy(xy.coords(x, y), type = type, ...): "medcol" is not a
## graphical parameter
## Warning in plot.xy(xy.coords(x, y), type = type, ...): "medlty" is not a
## graphical parameter
## Warning in plot.xy(xy.coords(x, y), type = type, ...): "staplelty" is not a
## graphical parameter
## Warning in plot.xy(xy.coords(x, y), type = type, ...): "boxlty" is not a
## graphical parameter
## [[1]]
## PartialDependence: Partial dependency plot for QUANTITYORDERED and classes
## Large, Medium, Small
## QUANTITYORDERED mean_response stddev_response std_error_mean_response
## 1 6.000000 0.027333 0.125172 0.002444
## 2 10.789474 0.027333 0.125172 0.002444
## 3 15.578947 0.027560 0.126184 0.002464
## 4 20.368421 0.027560 0.126184 0.002464
## 5 25.157895 0.028205 0.129376 0.002526
## 6 29.947368 0.029395 0.135107 0.002638
## 7 34.736842 0.034734 0.159177 0.003108
## 8 39.526316 0.037675 0.164382 0.003210
## 9 44.315789 0.043326 0.169844 0.003316
## 10 49.105263 0.047286 0.173566 0.003389
## 11 53.894737 0.052453 0.175913 0.003435
## 12 58.684211 0.055517 0.176615 0.003448
## 13 63.473684 0.056365 0.176703 0.003450
## 14 68.263158 0.064453 0.175871 0.003434
## 15 73.052632 0.136733 0.160498 0.003134
## 16 77.842105 0.137115 0.160632 0.003136
## 17 82.631579 0.146225 0.159158 0.003108
## 18 87.421053 0.146225 0.159158 0.003108
## 19 92.210526 0.146225 0.159158 0.003108
## 20 97.000000 0.146225 0.159158 0.003108
##
## [[2]]
## PartialDependence: Partial dependency plot for QUANTITYORDERED and classes
## Large, Medium, Small
## QUANTITYORDERED mean_response stddev_response std_error_mean_response
## 1 6.000000 0.430804 0.421248 0.008225
## 2 10.789474 0.430804 0.421248 0.008225
## 3 15.578947 0.433192 0.423853 0.008276
## 4 20.368421 0.433192 0.423853 0.008276
## 5 25.157895 0.448231 0.436391 0.008521
## 6 29.947368 0.469186 0.443653 0.008663
## 7 34.736842 0.501733 0.454398 0.008872
## 8 39.526316 0.511731 0.449823 0.008783
## 9 44.315789 0.518265 0.445028 0.008689
## 10 49.105263 0.520400 0.439006 0.008572
## 11 53.894737 0.518227 0.432444 0.008444
## 12 58.684211 0.514798 0.431421 0.008424
## 13 63.473684 0.513600 0.431554 0.008426
## 14 68.263158 0.508399 0.427168 0.008341
## 15 73.052632 0.472102 0.399356 0.007798
## 16 77.842105 0.471681 0.399032 0.007791
## 17 82.631579 0.468653 0.397172 0.007755
## 18 87.421053 0.468653 0.397172 0.007755
## 19 92.210526 0.468653 0.397172 0.007755
## 20 97.000000 0.468653 0.397172 0.007755
##
## [[3]]
## PartialDependence: Partial dependency plot for QUANTITYORDERED and classes
## Large, Medium, Small
## QUANTITYORDERED mean_response stddev_response std_error_mean_response
## 1 6.000000 0.541863 0.433149 0.008457
## 2 10.789474 0.541863 0.433149 0.008457
## 3 15.578947 0.539248 0.435972 0.008513
## 4 20.368421 0.539248 0.435972 0.008513
## 5 25.157895 0.523564 0.448755 0.008762
## 6 29.947368 0.501419 0.454995 0.008884
## 7 34.736842 0.463533 0.461440 0.009010
## 8 39.526316 0.450594 0.456770 0.008919
## 9 44.315789 0.438409 0.453384 0.008853
## 10 49.105263 0.432314 0.448256 0.008752
## 11 53.894737 0.429320 0.443531 0.008660
## 12 58.684211 0.429685 0.443709 0.008664
## 13 63.473684 0.430035 0.443965 0.008669
## 14 68.263158 0.427148 0.440852 0.008608
## 15 73.052632 0.391165 0.403053 0.007870
## 16 77.842105 0.391204 0.403003 0.007869
## 17 82.631579 0.385122 0.396597 0.007744
## 18 87.421053 0.385122 0.396597 0.007744
## 19 92.210526 0.385122 0.396597 0.007744
## 20 97.000000 0.385122 0.396597 0.007744
## | | | 0% | |======================================================================| 100%
## | | | 0% | |======================================================================| 100%
Confusion Matrix and feature explanations of random forest cross validation model:
## Confusion Matrix: Row labels: Actual class; Column labels: Predicted class
## Large Medium Small Error Rate
## Large 48 1 0 0.0204 = 1 / 49
## Medium 0 118 1 0.0084 = 1 / 119
## Small 0 0 32 0.0000 = 0 / 32
## Totals 48 119 33 0.0100 = 2 / 200
## | | | 0% | |======================================================================| 100%
## | | | 0% | |======================================================================| 100%
Gradient boosting machine learning for projecting customer dealsize sales:
## | | | 0% | |======================================================================| 100%
##
## 1 function (x, y, training_frame, model_id = NULL, validation_frame = NULL,
## 2 nfolds = 0, keep_cross_validation_models = TRUE, keep_cross_validation_predictions = FALSE,
## 3 keep_cross_validation_fold_assignment = FALSE, score_each_iteration = FALSE,
## 4 score_tree_interval = 0, fold_assignment = c("AUTO", "Random",
## 5 "Modulo", "Stratified"), fold_column = NULL, ignore_const_cols = TRUE,
## 6 offset_column = NULL, weights_column = NULL, balance_classes = FALSE,
## | | | 0% | |======================================================================| 100%
## Warning in plot.window(...): "medcol" is not a graphical parameter
## Warning in plot.window(...): "medlty" is not a graphical parameter
## Warning in plot.window(...): "staplelty" is not a graphical parameter
## Warning in plot.window(...): "boxlty" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "medcol" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "medlty" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "staplelty" is not a graphical parameter
## Warning in plot.xy(xy, type, ...): "boxlty" is not a graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "medcol" is not a
## graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "medlty" is not a
## graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "staplelty" is not
## a graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "boxlty" is not a
## graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "medcol" is not a
## graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "medlty" is not a
## graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "staplelty" is not
## a graphical parameter
## Warning in axis(side = side, at = at, labels = labels, ...): "boxlty" is not a
## graphical parameter
## Warning in box(...): "medcol" is not a graphical parameter
## Warning in box(...): "medlty" is not a graphical parameter
## Warning in box(...): "staplelty" is not a graphical parameter
## Warning in box(...): "boxlty" is not a graphical parameter
## Warning in title(...): "medcol" is not a graphical parameter
## Warning in title(...): "medlty" is not a graphical parameter
## Warning in title(...): "staplelty" is not a graphical parameter
## Warning in title(...): "boxlty" is not a graphical parameter
## Warning in plot.xy(xy.coords(x, y), type = type, ...): "medcol" is not a
## graphical parameter
## Warning in plot.xy(xy.coords(x, y), type = type, ...): "medlty" is not a
## graphical parameter
## Warning in plot.xy(xy.coords(x, y), type = type, ...): "staplelty" is not a
## graphical parameter
## Warning in plot.xy(xy.coords(x, y), type = type, ...): "boxlty" is not a
## graphical parameter
## Warning in plot.xy(xy.coords(x, y), type = type, ...): "medcol" is not a
## graphical parameter
## Warning in plot.xy(xy.coords(x, y), type = type, ...): "medlty" is not a
## graphical parameter
## Warning in plot.xy(xy.coords(x, y), type = type, ...): "staplelty" is not a
## graphical parameter
## Warning in plot.xy(xy.coords(x, y), type = type, ...): "boxlty" is not a
## graphical parameter
## [[1]]
## PartialDependence: Partial dependency plot for QUANTITYORDERED and classes
## Large, Medium, Small
## QUANTITYORDERED mean_response stddev_response std_error_mean_response
## 1 6.000000 0.041643 0.198424 0.003874
## 2 10.789474 0.041643 0.198424 0.003874
## 3 15.578947 0.041643 0.198424 0.003874
## 4 20.368421 0.041643 0.198424 0.003874
## 5 25.157895 0.041643 0.198424 0.003874
## 6 29.947368 0.041643 0.198424 0.003874
## 7 34.736842 0.041643 0.198424 0.003874
## 8 39.526316 0.041642 0.198424 0.003874
## 9 44.315789 0.041642 0.198424 0.003874
## 10 49.105263 0.041642 0.198424 0.003874
## 11 53.894737 0.042064 0.198337 0.003873
## 12 58.684211 0.042064 0.198337 0.003873
## 13 63.473684 0.042064 0.198337 0.003873
## 14 68.263158 0.042064 0.198337 0.003873
## 15 73.052632 0.042064 0.198337 0.003873
## 16 77.842105 0.042064 0.198337 0.003873
## 17 82.631579 0.042064 0.198337 0.003873
## 18 87.421053 0.042064 0.198337 0.003873
## 19 92.210526 0.042064 0.198337 0.003873
## 20 97.000000 0.042064 0.198337 0.003873
##
## [[2]]
## PartialDependence: Partial dependency plot for QUANTITYORDERED and classes
## Large, Medium, Small
## QUANTITYORDERED mean_response stddev_response std_error_mean_response
## 1 6.000000 0.481906 0.498737 0.009738
## 2 10.789474 0.481906 0.498737 0.009738
## 3 15.578947 0.481906 0.498737 0.009738
## 4 20.368421 0.481906 0.498737 0.009738
## 5 25.157895 0.481906 0.498737 0.009738
## 6 29.947368 0.481906 0.498737 0.009738
## 7 34.736842 0.481906 0.498737 0.009738
## 8 39.526316 0.481951 0.498742 0.009738
## 9 44.315789 0.481950 0.498742 0.009738
## 10 49.105263 0.481950 0.498742 0.009738
## 11 53.894737 0.483059 0.497910 0.009722
## 12 58.684211 0.483059 0.497910 0.009722
## 13 63.473684 0.483059 0.497910 0.009722
## 14 68.263158 0.483059 0.497910 0.009722
## 15 73.052632 0.483059 0.497910 0.009722
## 16 77.842105 0.483059 0.497910 0.009722
## 17 82.631579 0.483059 0.497910 0.009722
## 18 87.421053 0.483059 0.497910 0.009722
## 19 92.210526 0.483059 0.497910 0.009722
## 20 97.000000 0.483059 0.497910 0.009722
##
## [[3]]
## PartialDependence: Partial dependency plot for QUANTITYORDERED and classes
## Large, Medium, Small
## QUANTITYORDERED mean_response stddev_response std_error_mean_response
## 1 6.000000 0.476451 0.498518 0.009734
## 2 10.789474 0.476451 0.498518 0.009734
## 3 15.578947 0.476451 0.498518 0.009734
## 4 20.368421 0.476451 0.498518 0.009734
## 5 25.157895 0.476451 0.498518 0.009734
## 6 29.947368 0.476451 0.498518 0.009734
## 7 34.736842 0.476451 0.498518 0.009734
## 8 39.526316 0.476407 0.498519 0.009734
## 9 44.315789 0.476407 0.498519 0.009734
## 10 49.105263 0.476408 0.498520 0.009734
## 11 53.894737 0.474877 0.497155 0.009707
## 12 58.684211 0.474877 0.497155 0.009707
## 13 63.473684 0.474877 0.497155 0.009707
## 14 68.263158 0.474877 0.497155 0.009707
## 15 73.052632 0.474877 0.497155 0.009707
## 16 77.842105 0.474877 0.497155 0.009707
## 17 82.631579 0.474877 0.497155 0.009707
## 18 87.421053 0.474877 0.497155 0.009707
## 19 92.210526 0.474877 0.497155 0.009707
## 20 97.000000 0.474877 0.497155 0.009707
## | | | 0% | |======================================================================| 100%
## | | | 0% | |======================================================================| 100%
Confusion Matrix and feature explanations of gradient boosting model:
## Confusion Matrix: Row labels: Actual class; Column labels: Predicted class
## Large Medium Small Error Rate
## Large 49 0 0 0.0000 = 0 / 49
## Medium 0 118 1 0.0084 = 1 / 119
## Small 0 0 32 0.0000 = 0 / 32
## Totals 49 118 33 0.0050 = 1 / 200
## | | | 0% | |======================================================================| 100%
## | | | 0% | |======================================================================| 100%
All models have similar output projections when compared side by side. Observations in large customer dealsizes include sales at ~$10,683, MSRP at ~$163.50, and classic car purchases. Observations in medium dealsize purchases include prices between $3-7k, unit prices of ~$81.80, and quantities of 30-50 units sales. Observations in small dealsize purchases include sales at ~$3k and unit prices at ~$80. These results suggests behavior trends in customer purchases which can be leveraged to increase margains, while controlling for product loss.
Data available at https://www.kaggle.com/datasets/anshulranjan2004/retail-performance-dataset.
R version 4.4.1 (2024-06-14) – “Race for Your Life” Copyright (C) 2024 The R Foundation for Statistical Computing Platform: x86_64-pc-linux-gnu
RStudio 2024.04.2+764 “Chocolate Cosmos” Release (e4392fc9ddc21961fd1d0efd47484b43f07a4177, 2024-06-05) for Ubuntu Jammy Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) rstudio/2024.04.2+764 Chrome/120.0.6099.291 Electron/28.3.1 Safari/537.36, Quarto 1.4.555