As part of the ongoing project of producing a tool for the sales team to suggest add-ons to customers, the following is an extension of the initial Affinity Analysis from November 2022. While the initial analysis was a bird’s-eye overview of the entire history Marine Fasteners, this analysis looks specifically at the past five years, year-by-year, in an attempt to discover patterns and trends in purchasing correlations.
Together with the adjoining Zone Analysis, these data will aid in the development of an informed sales team that can upsell their clients, growing our profits and providing expert service.
The data for this analysis was mined and cleaned in the same way as the previous study. I have shown the code for the first year (2018), but hidden it for all the others for cleanliness.
trans2018 = read.transactions("W:/apriori/year_over_year/csv/2018_basket.csv", sep = ',', rm.duplicates = TRUE)
summary(trans2018)
## transactions as itemMatrix in sparse format with
## 930 rows (elements/itemsets/transactions) and
## 1853 columns (items) and a density of 0.01042251
##
## most frequent items:
## HHCSSS ESN79N PPMSSS WFLSS1 PFMSSS (Other)
## 444 370 350 328 270 16199
##
## element (itemset/transaction) length distribution:
## sizes
## 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
## 111 75 70 42 32 39 43 37 29 19 28 22 23 17 25 16 12 16 12 19
## 22 23 24 25 26 27 28 29 30 31 32 33 34 36 37 38 39 40 41 42
## 8 10 9 10 12 7 16 7 6 6 9 5 7 6 2 4 3 5 3 2
## 43 44 45 46 47 48 49 50 52 53 54 55 56 57 58 59 60 61 62 63
## 4 1 1 1 3 5 5 3 3 2 5 5 3 1 1 2 2 1 2 3
## 65 66 67 68 69 70 71 72 73 74 75 76 78 79 80 81 85 87 89 92
## 1 1 2 1 3 1 3 2 2 1 1 1 1 1 2 2 2 1 1 1
## 93 94 95 105 107 109 114 117 118 125 130 140 148 149 155 158 190 259 274 292
## 1 2 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.00 4.00 10.00 19.31 23.00 292.00
##
## includes extended item information - examples:
## labels
## 1 11205
## 2 1401
## 3 170
itemFrequencyPlot(trans2018, topN = 25, main = "2018", xlab = "most frequent items", ylab = "")
rules2018 = eclat(data = trans2018, parameter = list(support = 0.2, minlen = 2, maxlen = 2))
## Eclat
##
## parameter specification:
## tidLists support minlen maxlen target ext
## FALSE 0.2 2 2 frequent itemsets TRUE
##
## algorithmic control:
## sparse sort verbose
## 7 -2 TRUE
##
## Absolute minimum support count: 186
##
## create itemset ...
## set transactions ...[1853 item(s), 930 transaction(s)] done [0.00s].
## sorting and recoding items ... [12 item(s)] done [0.00s].
## creating bit matrix ... [12 row(s), 930 column(s)] done [0.00s].
## writing ... [11 set(s)] done [0.00s].
## Creating S4 object ... done [0.00s].
inspect(sort(rules2018, by = 'support'))
## items support count
## [1] {ESN79N, HHCSSS} 0.3032258 282
## [2] {HHCSSS, WFLSS1} 0.2817204 262
## [3] {HHCSSS, PPMSSS} 0.2591398 241
## [4] {ESN79N, PPMSSS} 0.2580645 240
## [5] {ESN79N, WFLSS1} 0.2440860 227
## [6] {PPMSSS, WFLSS1} 0.2139785 199
## [7] {PFMSSS, PPMSSS} 0.2118280 197
## [8] {HHCSSS, NFISS1} 0.2075269 193
## [9] {HHCSSS, WFLSS3} 0.2021505 188
## [10] {HHCSSS, PFMSSS} 0.2010753 187
## [11] {ESN79N, PFMSSS} 0.2010753 187
– Relatively low purchasing correlations found (20-30%), mostly among the top 5 parts
## transactions as itemMatrix in sparse format with
## 923 rows (elements/itemsets/transactions) and
## 1962 columns (items) and a density of 0.009838613
##
## most frequent items:
## HHCSSS ESN79N PPMSSS WFLSS1 PFMSSS (Other)
## 455 404 315 305 269 16069
##
## element (itemset/transaction) length distribution:
## sizes
## 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
## 109 83 66 56 35 36 40 28 28 29 13 20 26 17 12 24 18 14 17 13
## 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41
## 10 10 7 14 12 5 7 5 9 5 5 5 4 3 6 5 8 3 4 4
## 42 43 45 47 48 49 50 51 52 53 55 56 57 58 59 60 61 62 63 64
## 3 3 1 4 4 3 3 3 4 5 3 3 2 4 3 1 4 3 1 2
## 65 66 67 68 69 72 75 77 78 79 80 82 85 86 87 89 92 100 103 104
## 1 2 3 2 3 1 3 1 1 1 3 1 1 1 1 1 1 1 1 1
## 105 107 111 112 121 122 131 135 151 160 167 177 181 260 281 297
## 1 1 3 1 1 2 1 1 1 1 1 1 1 1 1 1
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.0 4.0 10.0 19.3 22.0 297.0
##
## includes extended item information - examples:
## labels
## 1 001294
## 2 1401
## 3 2-1159
## Eclat
##
## parameter specification:
## tidLists support minlen maxlen target ext
## FALSE 0.2 2 2 frequent itemsets TRUE
##
## algorithmic control:
## sparse sort verbose
## 7 -2 TRUE
##
## Absolute minimum support count: 184
##
## create itemset ...
## set transactions ...[1962 item(s), 923 transaction(s)] done [0.00s].
## sorting and recoding items ... [10 item(s)] done [0.00s].
## creating bit matrix ... [10 row(s), 923 column(s)] done [0.00s].
## writing ... [9 set(s)] done [0.00s].
## Creating S4 object ... done [0.00s].
## items support count
## [1] {ESN79N, HHCSSS} 0.3315276 306
## [2] {HHCSSS, WFLSS1} 0.2784399 257
## [3] {ESN79N, PPMSSS} 0.2567714 237
## [4] {ESN79N, WFLSS1} 0.2556880 236
## [5] {HHCSSS, PPMSSS} 0.2502709 231
## [6] {ESN79N, PFMSSS} 0.2199350 203
## [7] {PFMSSS, PPMSSS} 0.2177681 201
## [8] {HHCSSS, PFMSSS} 0.2123510 196
## [9] {HHCSSS, WFLSS3} 0.2069339 191
– Relatively low purchasing correlations found (20-33%), mostly among the top 5 parts
## transactions as itemMatrix in sparse format with
## 918 rows (elements/itemsets/transactions) and
## 1989 columns (items) and a density of 0.01006023
##
## most frequent items:
## HHCSSS ESN79N PPMSSS WFLSS1 PFMSSS (Other)
## 462 405 321 308 273 16600
##
## element (itemset/transaction) length distribution:
## sizes
## 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
## 97 74 49 55 37 33 42 34 22 23 26 21 34 17 16 14 15 17 15 12
## 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41
## 8 15 12 15 15 9 10 10 3 5 9 13 3 6 3 3 5 1 2 5
## 42 43 44 45 46 47 48 49 50 51 53 54 55 56 57 58 59 60 61 62
## 5 2 4 2 5 4 2 4 6 1 1 1 2 2 4 5 4 3 2 1
## 63 64 65 66 67 69 71 72 73 76 77 79 81 82 83 86 89 90 91 93
## 1 3 1 1 4 1 2 1 2 1 2 1 2 2 2 1 1 1 1 1
## 94 99 101 104 111 115 116 117 120 123 133 138 140 146 154 157 158 255 269 319
## 2 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.00 5.00 11.00 20.01 24.75 319.00
##
## includes extended item information - examples:
## labels
## 1 001294
## 2 1/2BUS
## 3 100000
## Eclat
##
## parameter specification:
## tidLists support minlen maxlen target ext
## FALSE 0.2 2 2 frequent itemsets TRUE
##
## algorithmic control:
## sparse sort verbose
## 7 -2 TRUE
##
## Absolute minimum support count: 183
##
## create itemset ...
## set transactions ...[1989 item(s), 918 transaction(s)] done [0.00s].
## sorting and recoding items ... [13 item(s)] done [0.00s].
## creating bit matrix ... [13 row(s), 918 column(s)] done [0.00s].
## writing ... [9 set(s)] done [0.00s].
## Creating S4 object ... done [0.00s].
## items support count
## [1] {ESN79N, HHCSSS} 0.3278867 301
## [2] {HHCSSS, WFLSS1} 0.2799564 257
## [3] {ESN79N, WFLSS1} 0.2570806 236
## [4] {HHCSSS, PPMSSS} 0.2494553 229
## [5] {ESN79N, PPMSSS} 0.2483660 228
## [6] {ESN79N, PFMSSS} 0.2211329 203
## [7] {HHCSSS, PFMSSS} 0.2135076 196
## [8] {PFMSSS, PPMSSS} 0.2135076 196
## [9] {HHCSSS, NFISS1} 0.2015251 185
– Relatively low purchasing correlations found (20-33%), mostly among the top 5 parts
## transactions as itemMatrix in sparse format with
## 960 rows (elements/itemsets/transactions) and
## 1913 columns (items) and a density of 0.0101749
##
## most frequent items:
## HHCSSS ESN79N PPMSSS WFLSS1 PFMSSS (Other)
## 475 413 322 314 264 16898
##
## element (itemset/transaction) length distribution:
## sizes
## 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
## 133 77 54 50 44 32 41 30 24 23 29 21 30 13 25 17 13 17 10 15
## 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41
## 12 12 13 12 8 5 6 12 10 10 6 6 8 2 6 6 3 6 2 4
## 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61
## 4 5 5 3 3 2 5 2 1 1 1 6 3 2 1 1 3 1 1 1
## 62 63 64 65 66 67 68 69 70 71 72 73 75 76 80 84 85 86 87 88
## 1 2 1 2 1 1 1 2 3 4 2 1 2 1 4 2 2 1 1 1
## 89 90 91 93 97 100 104 110 112 113 115 121 129 134 140 142 174 176 177 238
## 2 1 2 1 1 2 1 2 1 1 1 1 1 1 2 1 1 1 1 1
## 259 335
## 1 1
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.00 4.00 10.00 19.46 23.00 335.00
##
## includes extended item information - examples:
## labels
## 1 001294
## 2 1101
## 3 1301
## Eclat
##
## parameter specification:
## tidLists support minlen maxlen target ext
## FALSE 0.2 2 2 frequent itemsets TRUE
##
## algorithmic control:
## sparse sort verbose
## 7 -2 TRUE
##
## Absolute minimum support count: 192
##
## create itemset ...
## set transactions ...[1913 item(s), 960 transaction(s)] done [0.00s].
## sorting and recoding items ... [8 item(s)] done [0.00s].
## creating bit matrix ... [8 row(s), 960 column(s)] done [0.00s].
## writing ... [8 set(s)] done [0.00s].
## Creating S4 object ... done [0.00s].
## items support count
## [1] {ESN79N, HHCSSS} 0.3260417 313
## [2] {HHCSSS, WFLSS1} 0.2666667 256
## [3] {ESN79N, WFLSS1} 0.2500000 240
## [4] {ESN79N, PPMSSS} 0.2468750 237
## [5] {HHCSSS, PPMSSS} 0.2375000 228
## [6] {ESN79N, PFMSSS} 0.2072917 199
## [7] {PFMSSS, PPMSSS} 0.2062500 198
## [8] {HHCSSS, NFISS1} 0.2041667 196
– Relatively low purchasing correlations found (20-33%), mostly among the top 5 parts
## transactions as itemMatrix in sparse format with
## 1040 rows (elements/itemsets/transactions) and
## 1964 columns (items) and a density of 0.009300094
##
## most frequent items:
## HHCSSS ESN79N PPMSSS WFLSS1 PFMSSS (Other)
## 513 438 342 314 262 17127
##
## element (itemset/transaction) length distribution:
## sizes
## 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21
## 113 111 85 51 55 41 37 32 36 32 20 17 24 22 23 15 14 15 23 10
## 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41
## 16 11 13 7 13 9 8 8 10 10 3 5 6 9 8 3 5 3 4 5
## 42 43 44 45 46 47 49 50 51 53 54 55 56 57 58 59 60 61 62 63
## 5 3 3 3 4 6 4 1 2 1 1 1 1 1 1 3 1 2 3 1
## 64 65 66 67 68 69 70 71 72 73 74 75 77 78 79 80 82 84 85 91
## 4 2 2 3 1 1 1 1 2 3 2 1 2 2 4 2 1 1 1 2
## 94 95 96 98 102 103 104 110 113 115 121 128 131 141 155 160 177 189 203 277
## 2 1 1 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1
## 298
## 1
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.00 4.00 9.00 18.27 22.00 298.00
##
## includes extended item information - examples:
## labels
## 1 001294
## 2 100010
## 3 1301
## Eclat
##
## parameter specification:
## tidLists support minlen maxlen target ext
## FALSE 0.2 2 2 frequent itemsets TRUE
##
## algorithmic control:
## sparse sort verbose
## 7 -2 TRUE
##
## Absolute minimum support count: 208
##
## create itemset ...
## set transactions ...[1964 item(s), 1040 transaction(s)] done [0.00s].
## sorting and recoding items ... [6 item(s)] done [0.00s].
## creating bit matrix ... [6 row(s), 1040 column(s)] done [0.00s].
## writing ... [5 set(s)] done [0.00s].
## Creating S4 object ... done [0.00s].
## items support count
## [1] {ESN79N, HHCSSS} 0.3278846 341
## [2] {HHCSSS, WFLSS1} 0.2490385 259
## [3] {HHCSSS, PPMSSS} 0.2317308 241
## [4] {ESN79N, PPMSSS} 0.2259615 235
## [5] {ESN79N, WFLSS1} 0.2250000 234
– Relatively low purchasing correlations found (22-33%), mostly among the top 5 parts
This analysis proves that over the past five years, our customers have, on the whole, purchased largely in the same patterns year-over-year. For each year since 2018, the top 5 part families have dominated all other parts, and in the same order. While the accompanying Zone Analysis dives deeper into industry-specific deviations, we can see here that overall the trend remains largely the same. Of note is that the apriori algorithm did not find any associations above a 33% occurence, which, while a notable correlation, is not a particularly strong association, given that certain zones see some associations approaching 80%.
One trend to notice is the slow drop and then sharp rise in the number of unique customers with purchase history over the last five years. While this speaks to our continued success as a company, it is notable that the average number of unique part types purchased per year remains largely the same (~20).
Moving forward, the data produced by these analyses will be of paramount importance in developing a suggested add-on tool for the sales team. This particular analysis highlights what is clearly the “bread and butter” of Marine Fasteners, and we should focus on making these parts as profitable as possible, as they are essentially a guaranteed purchase for our clients.