Introduction

The following is a further detailed analysis of frequent itemsets as part of the ongoing process of building a tool for the sales team to intelligently suggest items to their customers. While previous analyses looked at an overall view of historical data, zones, and year-over-year datasets, this study delves into item categories as a way of further developing a robust and informed suggestion tool.

The data used here uses invoice data from the past five years (1/1/2018 - 12/31/2022), and was prepared and cleaned in the same manner as the original Affinity Analysis. It looks at each customer’s purchasing history and which unique categories were purchased over those five years. The same apriori algorithm is applied to analyze purchasing frequency and associations between the various categories.

Data

catbasket = read.transactions("W:/apriori/category analysis/category_basket.csv", sep = ',', rm.duplicates = TRUE)
summary(catbasket)
## transactions as itemMatrix in sparse format with
##  3562 rows (elements/itemsets/transactions) and
##  30 columns (items) and a density of 0.2214393 
## 
## most frequent items:
##    DIST      10     RFQ      03      02 (Other) 
##    2552    2350    2063    2047    1784   12867 
## 
## element (itemset/transaction) length distribution:
## sizes
##   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19  20 
## 603 390 309 268 241 193 183 200 174 191 153 155 125  81  67  53  49  27  30  28 
##  21  22  23  24  25  26  29 
##   7  14  11   6   2   1   1 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   2.000   5.000   6.643  10.000  29.000 
## 
## includes extended item information - examples:
##   labels
## 1     01
## 2     02
## 3     03

Overview

– 3,562 unique customers
– Average customer purchased ~6 unique item categories
– Top Five Categories:
  1. Non-CPS/VMI Items
  2. Standard Parts for Chrome
  3. RFQ Items
  4. Standard Screws
  5. Standard Nuts

– 603 customers purchased only 1 item category, 390 bought 2 categories, 309 customers bought 3 categories, and so on…
– Only 1 customer purchased all 29 categories

Item Frequency

itemFrequencyPlot(catbasket, topN = 30, main = "most frequent categories")

Note that Categories 09, 21, 25, 45, and 32 are the least frequently purchased. However, Category 09 seems to be mostly “dummy” items, Air Freight charges, or other “Do Not Use” items. Similarly, Categories 21, 45, and 32 are various PPE, Kits, and otherwise uncommon items in the Marine Fasteners product line.

Interestingly, Category 25 parts consist primarily of Square Deck Screws, which are among the most popular parts for Zone ODF. I believe that patterns such as these will prove to be of the utmost value to the final suggestion tool.

Frequent Itemsets

catrules = eclat(data = catbasket, parameter = list(support = 0.25, minlen = 2, maxlen = 5))
## Eclat
## 
## parameter specification:
##  tidLists support minlen maxlen            target  ext
##     FALSE    0.25      2      5 frequent itemsets TRUE
## 
## algorithmic control:
##  sparse sort verbose
##       7   -2    TRUE
## 
## Absolute minimum support count: 890 
## 
## create itemset ... 
## set transactions ...[30 item(s), 3562 transaction(s)] done [0.00s].
## sorting and recoding items ... [11 item(s)] done [0.00s].
## creating bit matrix ... [11 row(s), 3562 column(s)] done [0.00s].
## writing  ... [155 set(s)] done [0.00s].
## Creating S4 object  ... done [0.00s].
inspect(sort(catrules, by = 'support'))
##       items                      support   count
## [1]   {10, DIST}                 0.5480067 1952 
## [2]   {03, DIST}                 0.5070185 1806 
## [3]   {03, 10}                   0.4963504 1768 
## [4]   {10, RFQ}                  0.4590118 1635 
## [5]   {03, 10, DIST}             0.4556429 1623 
## [6]   {DIST, RFQ}                0.4505896 1605 
## [7]   {02, 10}                   0.4489051 1599 
## [8]   {02, DIST}                 0.4432903 1579 
## [9]   {02, 03}                   0.4129702 1471 
## [10]  {02, 10, DIST}             0.4121280 1468 
## [11]  {04, 10}                   0.4118473 1467 
## [12]  {03, RFQ}                  0.4104436 1462 
## [13]  {01, 10}                   0.4090399 1457 
## [14]  {04, DIST}                 0.4076362 1452 
## [15]  {10, DIST, RFQ}            0.4048287 1442 
## [16]  {01, DIST}                 0.3986524 1420 
## [17]  {02, 03, 10}               0.3924761 1398 
## [18]  {02, 03, DIST}             0.3893880 1387 
## [19]  {03, 04}                   0.3868613 1378 
## [20]  {01, 10, DIST}             0.3846154 1370 
## [21]  {03, 10, RFQ}              0.3820887 1361 
## [22]  {02, 04}                   0.3815272 1359 
## [23]  {02, RFQ}                  0.3804043 1355 
## [24]  {04, 10, DIST}             0.3801235 1354 
## [25]  {03, DIST, RFQ}            0.3756317 1338 
## [26]  {02, 03, 10, DIST}         0.3731050 1329 
## [27]  {03, 04, 10}               0.3683324 1312 
## [28]  {03, 04, DIST}             0.3638405 1296 
## [29]  {01, 03}                   0.3607524 1285 
## [30]  {02, 04, 10}               0.3601909 1283 
## [31]  {01, 02}                   0.3599102 1282 
## [32]  {02, 10, RFQ}              0.3587872 1278 
## [33]  {02, 04, DIST}             0.3579450 1275 
## [34]  {03, 10, DIST, RFQ}        0.3565413 1270 
## [35]  {01, 03, 10}               0.3531724 1258 
## [36]  {01, 02, 10}               0.3528916 1257 
## [37]  {04, RFQ}                  0.3503650 1248 
## [38]  {02, DIST, RFQ}            0.3500842 1247 
## [39]  {01, 03, DIST}             0.3495227 1245 
## [40]  {03, 04, 10, DIST}         0.3492420 1244 
## [41]  {02, 03, 04}               0.3467153 1235 
## [42]  {01, 02, DIST}             0.3430657 1222 
## [43]  {01, 03, 10, DIST}         0.3427850 1221 
## [44]  {02, 04, 10, DIST}         0.3411005 1215 
## [45]  {01, 02, 10, DIST}         0.3385738 1206 
## [46]  {01, RFQ}                  0.3363279 1198 
## [47]  {02, 10, DIST, RFQ}        0.3360472 1197 
## [48]  {04, 10, RFQ}              0.3354857 1195 
## [49]  {DIST, DIST-A}             0.3343627 1191 
## [50]  {02, 03, 04, 10}           0.3332398 1187 
## [51]  {02, 03, RFQ}              0.3332398 1187 
## [52]  {01, 04}                   0.3312746 1180 
## [53]  {02, 03, 04, DIST}         0.3312746 1180 
## [54]  {01, 10, RFQ}              0.3279057 1168 
## [55]  {02, 03, 10, RFQ}          0.3259405 1161 
## [56]  {01, 02, 03}               0.3245368 1156 
## [57]  {01, 04, 10}               0.3239753 1154 
## [58]  {04, DIST, RFQ}            0.3239753 1154 
## [59]  {01, 02, 03, 10}           0.3217294 1146 
## [60]  {02, 03, 04, 10, DIST}     0.3206064 1142 
## [61]  {01, DIST, RFQ}            0.3177990 1132 
## [62]  {02, 03, DIST, RFQ}        0.3177990 1132 
## [63]  {01, 04, DIST}             0.3175182 1131 
## [64]  {10, DIST-A}               0.3172375 1130 
## [65]  {01, 02, 03, DIST}         0.3166760 1128 
## [66]  {01, 02, 03, 10, DIST}     0.3138686 1118 
## [67]  {04, 10, DIST, RFQ}        0.3138686 1118 
## [68]  {01, 10, DIST, RFQ}        0.3127456 1114 
## [69]  {03, 04, RFQ}              0.3127456 1114 
## [70]  {02, 03, 10, DIST, RFQ}    0.3119034 1111 
## [71]  {10, DIST, DIST-A}         0.3116227 1110 
## [72]  {01, 04, 10, DIST}         0.3116227 1110 
## [73]  {02, 04, RFQ}              0.3107805 1107 
## [74]  {01, 02, 04}               0.3079730 1097 
## [75]  {03, 04, 10, RFQ}          0.3065693 1092 
## [76]  {01, 02, 04, 10}           0.3034812 1081 
## [77]  {01, 03, 04}               0.3034812 1081 
## [78]  {02, 04, 10, RFQ}          0.3017967 1075 
## [79]  {01, 02, RFQ}              0.3003930 1070 
## [80]  {01, 03, 04, 10}           0.2998316 1068 
## [81]  {01, 02, 04, DIST}         0.2981471 1062 
## [82]  {03, 04, DIST, RFQ}        0.2981471 1062 
## [83]  {03, DIST-A}               0.2975856 1060 
## [84]  {01, 03, RFQ}              0.2973049 1059 
## [85]  {01, 03, 04, DIST}         0.2970241 1058 
## [86]  {01, 02, 10, RFQ}          0.2967434 1057 
## [87]  {02, 04, DIST, RFQ}        0.2950590 1051 
## [88]  {03, DIST, DIST-A}         0.2944975 1049 
## [89]  {01, 02, 04, 10, DIST}     0.2944975 1049 
## [90]  {30, DIST}                 0.2939360 1047 
## [91]  {01, 03, 10, RFQ}          0.2939360 1047 
## [92]  {01, 03, 04, 10, DIST}     0.2933745 1045 
## [93]  {03, 04, 10, DIST, RFQ}    0.2930938 1044 
## [94]  {01, 02, 03, 04}           0.2897249 1032 
## [95]  {01, 02, DIST, RFQ}        0.2897249 1032 
## [96]  {01, 03, DIST, RFQ}        0.2888827 1029 
## [97]  {02, 04, 10, DIST, RFQ}    0.2883212 1027 
## [98]  {02, 03, 04, RFQ}          0.2877597 1025 
## [99]  {01, 02, 10, DIST, RFQ}    0.2874789 1024 
## [100] {10, 91}                   0.2869175 1022 
## [101] {03, 10, DIST-A}           0.2869175 1022 
## [102] {01, 02, 03, 04, 10}       0.2869175 1022 
## [103] {01, 03, 10, DIST, RFQ}    0.2857945 1018 
## [104] {03, 10, DIST, DIST-A}     0.2843908 1013 
## [105] {01, 02, 03, 04, DIST}     0.2841101 1012 
## [106] {02, 03, 04, 10, RFQ}      0.2835486 1010 
## [107] {01, 04, RFQ}              0.2804604  999 
## [108] {01, 02, 03, RFQ}          0.2787760  993 
## [109] {02, DIST-A}               0.2784952  992 
## [110] {01, 02, 03, 10, RFQ}      0.2773723  988 
## [111] {02, 03, 04, DIST, RFQ}    0.2773723  988 
## [112] {01, 04, 10, RFQ}          0.2770915  987 
## [113] {02, DIST, DIST-A}         0.2745649  978 
## [114] {91, DIST}                 0.2728804  972 
## [115] {01, 02, 03, DIST, RFQ}    0.2723189  970 
## [116] {02, 10, DIST-A}           0.2717574  968 
## [117] {01, 04, DIST, RFQ}        0.2706345  964 
## [118] {91, RFQ}                  0.2703537  963 
## [119] {01, DIST-A}               0.2703537  963 
## [120] {04, DIST-A}               0.2695115  960 
## [121] {10, 30}                   0.2692308  959 
## [122] {02, 10, DIST, DIST-A}     0.2686693  957 
## [123] {03, 91}                   0.2683885  956 
## [124] {01, 04, 10, DIST, RFQ}    0.2681078  955 
## [125] {01, DIST, DIST-A}         0.2667041  950 
## [126] {04, DIST, DIST-A}         0.2661426  948 
## [127] {01, 10, DIST-A}           0.2653004  945 
## [128] {10, 30, DIST}             0.2647389  943 
## [129] {01, 02, 04, RFQ}          0.2644582  942 
## [130] {04, 10, DIST-A}           0.2633352  938 
## [131] {01, 10, DIST, DIST-A}     0.2624930  935 
## [132] {01, 02, 04, 10, RFQ}      0.2622122  934 
## [133] {02, 91}                   0.2616508  932 
## [134] {01, 03, 04, RFQ}          0.2613700  931 
## [135] {04, 10, DIST, DIST-A}     0.2602471  927 
## [136] {01, 03, 04, 10, RFQ}      0.2594048  924 
## [137] {10, 91, DIST}             0.2588433  922 
## [138] {02, 03, DIST-A}           0.2585626  921 
## [139] {01, 02, 04, DIST, RFQ}    0.2574396  917 
## [140] {02, 03, DIST, DIST-A}     0.2571589  916 
## [141] {03, 10, 91}               0.2568782  915 
## [142] {03, 30}                   0.2565974  914 
## [143] {DIST-A, RFQ}              0.2565974  914 
## [144] {01, 03, 04, DIST, RFQ}    0.2557552  911 
## [145] {10, 91, RFQ}              0.2551937  909 
## [146] {02, 03, 10, DIST-A}       0.2551937  909 
## [147] {03, 30, DIST}             0.2543515  906 
## [148] {03, 04, DIST-A}           0.2540707  905 
## [149] {02, 03, 10, DIST, DIST-A} 0.2537900  904 
## [150] {DIST, DIST-A, RFQ}        0.2529478  901 
## [151] {03, 04, DIST, DIST-A}     0.2521056  898 
## [152] {01, 02, 03, 04, RFQ}      0.2518248  897 
## [153] {03, 91, DIST}             0.2512633  895 
## [154] {02, 10, 91}               0.2509826  894 
## [155] {01, 03, DIST-A}           0.2501404  891

Findings

In terms of category sets, here we find over 150 sets between 2 and 5 that occur ~25-50% of the time. A vast majority of these sets are among the top ten categories, and even more in the top 5. There is a very strong correlation between categories 02, 03, 10, DIST, and RFQ. Moving forward with the Suggested Add-On Tool, we will synthesize this data to help the sales team make informed suggestions with an interactive dashboard-like interface.