Market Basket Analysis is an unsupervised learning technique based on Association Rules, which identifies the relationships between products purchased together. In this project Apriori and ECLAT algorithm will be applied on a large dataset to analyse these patterns. While Apriori operates on a horizontal database layout and scans it level by level, ECLAT identifies frequent itemsets as clusters. The evaluation of these connections is done by using metrics such as Support (frequency), Confidence (probability) and Lift (the strength of the association).
The main objective of this study is to analyze a UK-based online retail store to uncover structure in customer behavior. I aim to find the strongest rules reflecting shopping habits and identify the top-selling product bundles. I compare the top rules in the Christmas season to the rest of the year and explore 3+ item rules. The results may be useful for retailers looking to optimize the customer experience and increase the average transaction value by, for example, bundling.
The data set used in this study was sourced form the UCI Machine Learning Repository (https://archive.ics.uci.edu/dataset/352/online+retail). It represents a transactional dataset from a UK-based company, which operates as an online retailer. The store specializes in every occasion gifts and the significant portion of its clients base consists of wholesalers rather than individual customers. The dataset covers all transactions between 01/12/2010 and 09/12/2011. The data was directly imported from the UCI repository via url in its original .xlsx format. Raw data contained 541,909 entries and 8 columns. The variables and their descriptions are presented below.
url <- "https://archive.ics.uci.edu/ml/machine-learning-databases/00352/Online%20Retail.xlsx"
destfile <- "Online_Retail.xlsx"
#downloading the file
if (!file.exists(destfile)) {
url <- "https://archive.ics.uci.edu/ml/machine-learning-databases/00352/Online%20Retail.xlsx"
download.file(url, destfile, mode = "wb")
}
df <- read_excel(destfile)| Variable Name | Description |
|---|---|
| InvoiceNo | A unique 6-digit identifier for each transaction. Codes starting with ‘C’ indicate cancellations. |
| StockCode | A unique 5-digit integral number assigned to each distinct product. |
| Description | The nominal name of the product or service. |
| Quantity | The quantities of each product (item) per transaction. |
| InvoiceDate | The day and time when each transaction was generated. |
| UnitPrice | Product price per unit in sterling (£). |
| CustomerID | A unique 5-digit identifier assigned to each regular customer. |
| Country | The name of the country where each customer resides. |
Before conducting the market basket analysis, data cleaning and preparation was required to ensure the quality of applying association rules. First, records with missing Description or CustomerID were deleted. Then, transactions with invoice number starting with “C” (cancelled) were removed, as those orders would not reflect actual purchase association. Also, the dataset was restricted to records with Quantity>0 and UnitPrice>0, and the extra spaces from Description were trimmed. The analysis was narrowed down to United Kingdom, which represents the country of the largest share of customers, providing a homogenous sample and avoiding computational issues.
# Deleting NAs
df_clean <- df[complete.cases(df$Description, df$CustomerID), ]
#deleting cancelled transactions (InvoiceNo starting with C)
df_clean <- df_clean[!grepl("C", df_clean$InvoiceNo), ]
# leaving transactions with quantity>0 and unit price>0
df_clean <- df_clean[df_clean$Quantity > 0 & df_clean$UnitPrice > 0, ]
#Trimming spaces from description
df_clean$Description <- trimws(df_clean$Description)
#UK only
df_uk <- df_clean[df_clean$Country == "United Kingdom", ]Before attempting to perform actual association rules algorithms, the top 10 products with the most occurrences in the transactions were plotted. The “White Hanging Heart T-Light Holder” clearly dominates the sales, appearing in the highest number of transactions. “Jumbo Bag Red Retrospot” and “Regency Cakestand 3 Tier” are the second and third most popular.
ggplot(data.frame(sort(table(df_uk$Description), decreasing = TRUE)[1:10]),
aes(x = reorder(Var1, Freq), y = Freq)) +
geom_bar(stat = "identity", fill = "brown") +
coord_flip() +
labs(title = "Top 10 products", x = "Product", y = "Number of occurrences") +
theme_minimal()Furthermore, analyzing the number of products per invoice provides insight into basket size distribution. The maximum number of products purchased in a single transaction is 542, with the mean of 21.29. The median consumer is purchasing 15 product at once, which is reasonable as the significant share of consumers are wholesalers. This is highly beneficial for market basket analysis, as those transactions provide a good framework for identifying cooccurrance patterns between multiple items.
# Number of products in a basket
items_per_invoice <- aggregate(Description ~ InvoiceNo, data = df_uk, length)
summary(items_per_invoice$Description)## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 6.00 15.00 21.29 27.00 542.00
The Apriori algorithm is a basic technique in association rules, specifically designed to operate on large databases of transactions. It employs a bottom-up approach, utilizing the Apriori principle, which states, that any subset of a frequent itemset must also be frequent. It proceeds by identifying frequent individual items in the database and extending them to lager sets of items as long as they meet a set threshold. In this analysis, the generated rules will be evaluated based on three key metrics:
To prepare the cleaned data for Apriori algorithm, the dataset was transformed into a sparse transaction matrix. First, the products descriptions were aggregated based on their unique invoice numbers. After that, grouped lists were converted into a transactions object, which is a format from the arules package. For clarity, data was also represented as a binary matrix, where each row represents an individual transaction and each column a product, with values 1 (purchased) or 0 (not purchased). The matrix consists of 16646 transactions and 3844 unique items. The density of 0.54% indicates a highly sparse matrix, where only 0.54% of all possible combinations occur. The most frequent item, “White Hanging Heart T-Light Holder” occurs 1884 times.
# Grouping by invoices and creating a sparse matrix
transactions_list <- split(df_uk$Description, df_uk$InvoiceNo)
groceries <- as(transactions_list, "transactions")
summary(groceries)## transactions as itemMatrix in sparse format with
## 16646 rows (elements/itemsets/transactions) and
## 3833 columns (items) and a density of 0.00539578
##
## most frequent items:
## WHITE HANGING HEART T-LIGHT HOLDER JUMBO BAG RED RETROSPOT
## 1884 1447
## REGENCY CAKESTAND 3 TIER ASSORTED COLOUR BIRD ORNAMENT
## 1410 1300
## PARTY BUNTING (Other)
## 1290 336942
##
## element (itemset/transaction) length distribution:
## sizes
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
## 1273 679 594 570 599 550 545 542 546 467 476 438 430 467 485 489
## 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
## 412 402 425 380 346 294 305 266 218 231 212 221 241 189 163 153
## 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48
## 142 151 116 105 112 102 116 104 103 86 81 81 74 80 64 66
## 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64
## 71 70 44 50 58 61 59 37 49 33 29 46 30 18 34 33
## 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80
## 29 33 24 30 19 28 27 17 19 24 23 17 16 19 9 10
## 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96
## 13 16 14 19 13 14 9 13 9 7 9 12 10 6 3 8
## 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112
## 7 10 3 9 6 3 6 7 2 2 6 3 2 4 4 3
## 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128
## 3 6 6 7 2 4 4 4 5 7 2 4 1 2 5 1
## 129 130 131 132 134 135 136 137 138 139 140 141 142 144 145 146
## 1 2 2 2 2 2 2 1 2 1 1 4 1 1 2 2
## 148 149 150 151 153 154 156 157 163 165 169 175 176 178 179 180
## 1 2 1 1 1 1 1 1 1 1 2 1 2 1 1 1
## 181 184 187 192 193 195 202 204 208 210 227 249 262 270 280 333
## 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## 347 352 363 375 386 419 434 439 525 529 541
## 1 1 1 1 1 1 1 1 1 1 1
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 6.00 15.00 20.68 27.00 541.00
##
## includes extended item information - examples:
## labels
## 1 10 COLOUR SPACEBOY PEN
## 2 12 COLOURED PARTY BALLOONS
## 3 12 DAISY PEGS IN WOOD BOX
##
## includes extended transaction information - examples:
## transactionID
## 1 536365
## 2 536366
## 3 536367
#binary dataframe
df_binary <- as(groceries, "matrix")
df_binary <- as.data.frame(df_binary)
df_binary[] <- lapply(df_binary, as.integer)
head(df_binary)The Apriori algorithm successfully identified 258 rules that meet the established thresholds (support = 0.01, confidence = 0.5). A threshold of 1% was implemented to ensure, that a generated rule occurred in at least 166 transactions. Given the vast number of products, a 1% support is sensitive enough to capture specialized product clusters. The confidence level of 0.5 was established to focus on strong rules, meaning a 50% chance, that if product A is purchased, product B is also in the basket. An interactive table using the DT (DataTables) library was implemented.
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.5 0.1 1 none FALSE TRUE 5 0.01 2
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 166
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[3833 item(s), 16646 transaction(s)] done [0.05s].
## sorting and recoding items ... [615 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.01s].
## writing ... [258 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
## set of 258 rules
The top 10 rules by each measure are resented below.
The rules characterized by the highest lift are herb containers with lift statistic equal to 86.829. Buying a thyme container increases the chance of also buying a rosemary container over 86 times, compared to the situation if the products would be independent. Other rules in the top ten consist of tea set items, a toy collection and a decoration set.
| rules | support | confidence | coverage | lift | count |
|---|---|---|---|---|---|
| {HERB MARKER THYME} => {HERB MARKER ROSEMARY} | 0.010 | 0.944 | 0.011 | 86.829 | 169 |
| {HERB MARKER ROSEMARY} => {HERB MARKER THYME} | 0.010 | 0.934 | 0.011 | 86.829 | 169 |
| {REGENCY TEA PLATE GREEN} => {REGENCY TEA PLATE ROSES} | 0.012 | 0.846 | 0.014 | 52.930 | 192 |
| {REGENCY TEA PLATE ROSES} => {REGENCY TEA PLATE GREEN} | 0.012 | 0.722 | 0.016 | 52.930 | 192 |
| {POPPY’S PLAYHOUSE LIVINGROOM} => {POPPY’S PLAYHOUSE BEDROOM} | 0.010 | 0.809 | 0.013 | 51.770 | 169 |
| {POPPY’S PLAYHOUSE BEDROOM} => {POPPY’S PLAYHOUSE LIVINGROOM} | 0.010 | 0.650 | 0.016 | 51.770 | 169 |
| {SET OF 3 WOODEN STOCKING DECORATION} => {SET OF 3 WOODEN TREE DECORATIONS} | 0.010 | 0.691 | 0.015 | 50.212 | 172 |
| {SET OF 3 WOODEN TREE DECORATIONS} => {SET OF 3 WOODEN STOCKING DECORATION} | 0.010 | 0.751 | 0.014 | 50.212 | 172 |
| {POPPY’S PLAYHOUSE LIVINGROOM} => {POPPY’S PLAYHOUSE KITCHEN} | 0.011 | 0.852 | 0.013 | 49.226 | 178 |
| {POPPY’S PLAYHOUSE KITCHEN} => {POPPY’S PLAYHOUSE LIVINGROOM} | 0.011 | 0.618 | 0.017 | 49.226 | 178 |
Top 10 rules by support represent the most common purchase patterns across the entire dataset. Support indicates which bundles are purchased in the highest volume. All the rules in the top 10 are approximately 3% support. They reveal, that commonly jumbo bags in different colors are in the same basket, it is also true for teacup and saucers and lunch bags.
| rules | support | confidence | coverage | lift | count |
|---|---|---|---|---|---|
| {JUMBO BAG PINK POLKADOT} => {JUMBO BAG RED RETROSPOT} | 0.030 | 0.623 | 0.049 | 7.169 | 506 |
| {GREEN REGENCY TEACUP AND SAUCER} => {ROSES REGENCY TEACUP AND SAUCER} | 0.029 | 0.778 | 0.037 | 19.096 | 476 |
| {ROSES REGENCY TEACUP AND SAUCER} => {GREEN REGENCY TEACUP AND SAUCER} | 0.029 | 0.702 | 0.041 | 19.096 | 476 |
| {LUNCH BAG PINK POLKADOT} => {LUNCH BAG RED RETROSPOT} | 0.028 | 0.555 | 0.051 | 8.255 | 471 |
| {WOODEN FRAME ANTIQUE WHITE} => {WOODEN PICTURE FRAME WHITE FINISH} | 0.028 | 0.584 | 0.047 | 11.387 | 458 |
| {WOODEN PICTURE FRAME WHITE FINISH} => {WOODEN FRAME ANTIQUE WHITE} | 0.028 | 0.536 | 0.051 | 11.387 | 458 |
| {GARDENERS KNEELING PAD CUP OF TEA} => {GARDENERS KNEELING PAD KEEP CALM} | 0.028 | 0.730 | 0.038 | 16.387 | 458 |
| {GARDENERS KNEELING PAD KEEP CALM} => {GARDENERS KNEELING PAD CUP OF TEA} | 0.028 | 0.617 | 0.045 | 16.387 | 458 |
| {ALARM CLOCK BAKELIKE GREEN} => {ALARM CLOCK BAKELIKE RED} | 0.027 | 0.658 | 0.041 | 14.449 | 454 |
| {ALARM CLOCK BAKELIKE RED} => {ALARM CLOCK BAKELIKE GREEN} | 0.027 | 0.599 | 0.046 | 14.449 | 454 |
Confidence represents probability that the consequent will be purchased given the antecedent is already in the basket. Again, the herbs containers are in the first and second spot, with 94% chance that rosemary herb maker will be bought if the thyme maker is in the basket. Interestingly, the confidence list is dominated by rules with multiple antecedents. A customer with Wooden Heart and Wooden Tree is also likely to add the Wooden Star. The Regency series also appears frequently in 2+1 itemsets.
| rules | support | confidence | coverage | lift | count |
|---|---|---|---|---|---|
| {HERB MARKER THYME} => {HERB MARKER ROSEMARY} | 0.010 | 0.944 | 0.011 | 86.829 | 169 |
| {HERB MARKER ROSEMARY} => {HERB MARKER THYME} | 0.010 | 0.934 | 0.011 | 86.829 | 169 |
| {WOODEN HEART CHRISTMAS SCANDINAVIAN,WOODEN TREE CHRISTMAS SCANDINAVIAN} => {WOODEN STAR CHRISTMAS SCANDINAVIAN} | 0.010 | 0.929 | 0.011 | 38.562 | 170 |
| {PINK REGENCY TEACUP AND SAUCER,REGENCY CAKESTAND 3 TIER,ROSES REGENCY TEACUP AND SAUCER} => {GREEN REGENCY TEACUP AND SAUCER} | 0.012 | 0.898 | 0.014 | 24.419 | 202 |
| {PINK REGENCY TEACUP AND SAUCER,ROSES REGENCY TEACUP AND SAUCER} => {GREEN REGENCY TEACUP AND SAUCER} | 0.020 | 0.890 | 0.023 | 24.217 | 341 |
| {GREEN REGENCY TEACUP AND SAUCER,PINK REGENCY TEACUP AND SAUCER,REGENCY CAKESTAND 3 TIER} => {ROSES REGENCY TEACUP AND SAUCER} | 0.012 | 0.878 | 0.014 | 21.563 | 202 |
| {PINK REGENCY TEACUP AND SAUCER,REGENCY CAKESTAND 3 TIER} => {GREEN REGENCY TEACUP AND SAUCER} | 0.014 | 0.868 | 0.016 | 23.607 | 230 |
| {POPPY’S PLAYHOUSE LIVINGROOM} => {POPPY’S PLAYHOUSE KITCHEN} | 0.011 | 0.852 | 0.013 | 49.226 | 178 |
| {PINK REGENCY TEACUP AND SAUCER,REGENCY CAKESTAND 3 TIER} => {ROSES REGENCY TEACUP AND SAUCER} | 0.014 | 0.849 | 0.016 | 20.846 | 225 |
| {REGENCY TEA PLATE GREEN} => {REGENCY TEA PLATE ROSES} | 0.012 | 0.846 | 0.014 | 52.930 | 192 |
The scatterplot below represents an overview of the 258 generated association rules, mapping the relation between support (x axis), lift (y axis) and confidence (shading). Most rules are clustered on the bottom left corner with support around 0.10-0.15 and lift above 10.
The network graph visualization visualizes relationships between actual products. The graph below displays the top 20 rules by lift mapping how specific item are connected. The blue nodes represent products and the circles - association rules. The darker the color, the more confidence a rule has. Arrows indicate the direction from the antecedent to consequent. We can observe some highly coupled pairs as well as more complex structures, such as the Poppy Playhouse series.
top20rules <- head(sort(rules_uk, by="lift"), 20)
plot(top20rules, method = "graph", engine = "htmlwidget")The grouped plot groups the antecedents (LHS) that share the same consequent (RHS), it identifies items that consistently trigger the purchase of certain complementary item. The plot displays the most common antecedents. It reveals that different color lunch bags are triggering the most consequent purchases, other lunch bags.
lhs_list <- labels(lhs(rules_uk))
counts <- sort(table(lhs_list), decreasing = TRUE)
top_10_hub_labels <- names(head(counts, 10))
rules_to_plot <- rules_uk[lhs_list %in% top_10_hub_labels]
plot(rules_to_plot, method = "grouped")The paracoord plot shows direct flow from antecedent (Position 1) to the consequent (RHS) for the top 10 rules by lift.
By increasing the minlen parameter to 3 we can identify transactions with at least three different products. A set of 142 rules was generated.
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.5 0.1 1 none FALSE TRUE 5 0.01 3
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 166
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[3833 item(s), 16646 transaction(s)] done [0.05s].
## sorting and recoding items ... [615 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.01s].
## writing ... [142 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
## set of 142 rules
The top 10 rules by each measure for 3+ items are resented below. An interesting result is the dominance of the Regency set in all measures and the occurrence of a wooden Christmas Scandinavian collection it top spots by lift and condidence.
| rules | support | confidence | coverage | lift | count |
|---|---|---|---|---|---|
| {WOODEN HEART CHRISTMAS SCANDINAVIAN,WOODEN TREE CHRISTMAS SCANDINAVIAN} => {WOODEN STAR CHRISTMAS SCANDINAVIAN} | 0.010 | 0.929 | 0.011 | 38.562 | 170 |
| {WOODEN HEART CHRISTMAS SCANDINAVIAN,WOODEN STAR CHRISTMAS SCANDINAVIAN} => {WOODEN TREE CHRISTMAS SCANDINAVIAN} | 0.010 | 0.567 | 0.018 | 38.035 | 170 |
| {WOODEN STAR CHRISTMAS SCANDINAVIAN,WOODEN TREE CHRISTMAS SCANDINAVIAN} => {WOODEN HEART CHRISTMAS SCANDINAVIAN} | 0.010 | 0.817 | 0.012 | 31.276 | 170 |
| {JUMBO BAG APPLES,JUMBO BAG VINTAGE LEAF} => {JUMBO BAG PEARS} | 0.010 | 0.668 | 0.016 | 26.223 | 173 |
| {GREEN REGENCY TEACUP AND SAUCER,REGENCY CAKESTAND 3 TIER,ROSES REGENCY TEACUP AND SAUCER} => {PINK REGENCY TEACUP AND SAUCER} | 0.012 | 0.762 | 0.016 | 25.738 | 202 |
| {PINK REGENCY TEACUP AND SAUCER,REGENCY CAKESTAND 3 TIER,ROSES REGENCY TEACUP AND SAUCER} => {GREEN REGENCY TEACUP AND SAUCER} | 0.012 | 0.898 | 0.014 | 24.419 | 202 |
| {PINK REGENCY TEACUP AND SAUCER,ROSES REGENCY TEACUP AND SAUCER} => {GREEN REGENCY TEACUP AND SAUCER} | 0.020 | 0.890 | 0.023 | 24.217 | 341 |
| {GREEN REGENCY TEACUP AND SAUCER,REGENCY CAKESTAND 3 TIER} => {PINK REGENCY TEACUP AND SAUCER} | 0.014 | 0.717 | 0.019 | 24.193 | 230 |
| {GREEN REGENCY TEACUP AND SAUCER,ROSES REGENCY TEACUP AND SAUCER} => {PINK REGENCY TEACUP AND SAUCER} | 0.020 | 0.716 | 0.029 | 24.189 | 341 |
| {PINK REGENCY TEACUP AND SAUCER,REGENCY CAKESTAND 3 TIER} => {GREEN REGENCY TEACUP AND SAUCER} | 0.014 | 0.868 | 0.016 | 23.607 | 230 |
| rules | support | confidence | coverage | lift | count |
|---|---|---|---|---|---|
| {GREEN REGENCY TEACUP AND SAUCER,PINK REGENCY TEACUP AND SAUCER} => {ROSES REGENCY TEACUP AND SAUCER} | 0.020 | 0.844 | 0.024 | 20.723 | 341 |
| {PINK REGENCY TEACUP AND SAUCER,ROSES REGENCY TEACUP AND SAUCER} => {GREEN REGENCY TEACUP AND SAUCER} | 0.020 | 0.890 | 0.023 | 24.217 | 341 |
| {GREEN REGENCY TEACUP AND SAUCER,ROSES REGENCY TEACUP AND SAUCER} => {PINK REGENCY TEACUP AND SAUCER} | 0.020 | 0.716 | 0.029 | 24.189 | 341 |
| {LUNCH BAG BLACK SKULL.,LUNCH BAG PINK POLKADOT} => {LUNCH BAG RED RETROSPOT} | 0.018 | 0.663 | 0.027 | 9.852 | 293 |
| {LUNCH BAG PINK POLKADOT,LUNCH BAG RED RETROSPOT} => {LUNCH BAG BLACK SKULL.} | 0.018 | 0.622 | 0.028 | 10.397 | 293 |
| {LUNCH BAG BLACK SKULL.,LUNCH BAG RED RETROSPOT} => {LUNCH BAG PINK POLKADOT} | 0.018 | 0.605 | 0.029 | 11.883 | 293 |
| {GREEN REGENCY TEACUP AND SAUCER,ROSES REGENCY TEACUP AND SAUCER} => {REGENCY CAKESTAND 3 TIER} | 0.016 | 0.557 | 0.029 | 6.572 | 265 |
| {GREEN REGENCY TEACUP AND SAUCER,REGENCY CAKESTAND 3 TIER} => {ROSES REGENCY TEACUP AND SAUCER} | 0.016 | 0.826 | 0.019 | 20.268 | 265 |
| {REGENCY CAKESTAND 3 TIER,ROSES REGENCY TEACUP AND SAUCER} => {GREEN REGENCY TEACUP AND SAUCER} | 0.016 | 0.753 | 0.021 | 20.477 | 265 |
| {LUNCH BAG CARS BLUE,LUNCH BAG PINK POLKADOT} => {LUNCH BAG RED RETROSPOT} | 0.015 | 0.641 | 0.024 | 9.533 | 254 |
| rules | support | confidence | coverage | lift | count |
|---|---|---|---|---|---|
| {WOODEN HEART CHRISTMAS SCANDINAVIAN,WOODEN TREE CHRISTMAS SCANDINAVIAN} => {WOODEN STAR CHRISTMAS SCANDINAVIAN} | 0.010 | 0.929 | 0.011 | 38.562 | 170 |
| {PINK REGENCY TEACUP AND SAUCER,REGENCY CAKESTAND 3 TIER,ROSES REGENCY TEACUP AND SAUCER} => {GREEN REGENCY TEACUP AND SAUCER} | 0.012 | 0.898 | 0.014 | 24.419 | 202 |
| {PINK REGENCY TEACUP AND SAUCER,ROSES REGENCY TEACUP AND SAUCER} => {GREEN REGENCY TEACUP AND SAUCER} | 0.020 | 0.890 | 0.023 | 24.217 | 341 |
| {GREEN REGENCY TEACUP AND SAUCER,PINK REGENCY TEACUP AND SAUCER,REGENCY CAKESTAND 3 TIER} => {ROSES REGENCY TEACUP AND SAUCER} | 0.012 | 0.878 | 0.014 | 21.563 | 202 |
| {PINK REGENCY TEACUP AND SAUCER,REGENCY CAKESTAND 3 TIER} => {GREEN REGENCY TEACUP AND SAUCER} | 0.014 | 0.868 | 0.016 | 23.607 | 230 |
| {PINK REGENCY TEACUP AND SAUCER,REGENCY CAKESTAND 3 TIER} => {ROSES REGENCY TEACUP AND SAUCER} | 0.014 | 0.849 | 0.016 | 20.846 | 225 |
| {GREEN REGENCY TEACUP AND SAUCER,PINK REGENCY TEACUP AND SAUCER} => {ROSES REGENCY TEACUP AND SAUCER} | 0.020 | 0.844 | 0.024 | 20.723 | 341 |
| {GREEN REGENCY TEACUP AND SAUCER,REGENCY CAKESTAND 3 TIER} => {ROSES REGENCY TEACUP AND SAUCER} | 0.016 | 0.826 | 0.019 | 20.268 | 265 |
| {WOODEN STAR CHRISTMAS SCANDINAVIAN,WOODEN TREE CHRISTMAS SCANDINAVIAN} => {WOODEN HEART CHRISTMAS SCANDINAVIAN} | 0.010 | 0.817 | 0.012 | 31.276 | 170 |
| {JUMBO BAG PINK POLKADOT,JUMBO BAG STRAWBERRY} => {JUMBO BAG RED RETROSPOT} | 0.014 | 0.792 | 0.017 | 9.114 | 225 |
Matrix plot provides mapping of the relationships between LHS and their corresponding RHS. In the upper left corner a staircase structure can be observed, meaning the most powerful rules are selective and lead to a specific outcome. There are some rule cluster revealing multiple antecedents triggering a certain consequent, which confirms the existence of product ecosystems where items from the same thematic series are cross-referenced.
## Itemsets in Antecedent (LHS)
## [1] "{WOODEN HEART CHRISTMAS SCANDINAVIAN,WOODEN TREE CHRISTMAS SCANDINAVIAN}"
## [2] "{WOODEN HEART CHRISTMAS SCANDINAVIAN,WOODEN STAR CHRISTMAS SCANDINAVIAN}"
## [3] "{WOODEN STAR CHRISTMAS SCANDINAVIAN,WOODEN TREE CHRISTMAS SCANDINAVIAN}"
## [4] "{JUMBO BAG APPLES,JUMBO BAG VINTAGE LEAF}"
## [5] "{GREEN REGENCY TEACUP AND SAUCER,REGENCY CAKESTAND 3 TIER,ROSES REGENCY TEACUP AND SAUCER}"
## [6] "{PINK REGENCY TEACUP AND SAUCER,REGENCY CAKESTAND 3 TIER,ROSES REGENCY TEACUP AND SAUCER}"
## [7] "{GREEN REGENCY TEACUP AND SAUCER,REGENCY CAKESTAND 3 TIER}"
## [8] "{PINK REGENCY TEACUP AND SAUCER,REGENCY CAKESTAND 3 TIER}"
## [9] "{GREEN REGENCY TEACUP AND SAUCER,PINK REGENCY TEACUP AND SAUCER,REGENCY CAKESTAND 3 TIER}"
## [10] "{REGENCY CAKESTAND 3 TIER,ROSES REGENCY TEACUP AND SAUCER}"
## [11] "{JUMBO BAG PEARS,JUMBO BAG VINTAGE LEAF}"
## [12] "{ALARM CLOCK BAKELIKE GREEN,ALARM CLOCK BAKELIKE IVORY}"
## [13] "{ALARM CLOCK BAKELIKE GREEN,ALARM CLOCK BAKELIKE PINK}"
## [14] "{ALARM CLOCK BAKELIKE IVORY,ALARM CLOCK BAKELIKE RED}"
## [15] "{ALARM CLOCK BAKELIKE PINK,ALARM CLOCK BAKELIKE RED}"
## [16] "{PINK REGENCY TEACUP AND SAUCER,ROSES REGENCY TEACUP AND SAUCER}"
## [17] "{GREEN REGENCY TEACUP AND SAUCER,ROSES REGENCY TEACUP AND SAUCER}"
## [18] "{LUNCH BAG BLACK SKULL.,LUNCH BAG CARS BLUE,LUNCH BAG RED RETROSPOT}"
## [19] "{JUMBO BAG APPLES,JUMBO BAG PEARS}"
## [20] "{WHITE HANGING HEART T-LIGHT HOLDER,WOODEN FRAME ANTIQUE WHITE}"
## [21] "{GREEN REGENCY TEACUP AND SAUCER,PINK REGENCY TEACUP AND SAUCER}"
## [22] "{LUNCH BAG DOLLY GIRL DESIGN,LUNCH BAG RED RETROSPOT}"
## [23] "{LUNCH BAG BLACK SKULL.,LUNCH BAG DOLLY GIRL DESIGN}"
## [24] "{LUNCH BAG APPLE DESIGN,LUNCH BAG CARS BLUE}"
## [25] "{WHITE HANGING HEART T-LIGHT HOLDER,WOODEN PICTURE FRAME WHITE FINISH}"
## [26] "{LUNCH BAG CARS BLUE,LUNCH BAG PINK POLKADOT,LUNCH BAG RED RETROSPOT}"
## [27] "{LUNCH BAG BLACK SKULL.,LUNCH BAG PINK POLKADOT,LUNCH BAG RED RETROSPOT}"
## [28] "{JUMBO BAG RED RETROSPOT,JUMBO BAG STRAWBERRY}"
## [29] "{JUMBO BAG PINK VINTAGE PAISLEY,JUMBO BAG RED RETROSPOT}"
## [30] "{LUNCH BAG PINK POLKADOT,LUNCH BAG WOODLAND}"
## [31] "{LUNCH BAG PINK POLKADOT,LUNCH BAG SPACEBOY DESIGN}"
## [32] "{LUNCH BAG BLACK SKULL.,LUNCH BAG APPLE DESIGN}"
## [33] "{LUNCH BAG BLACK SKULL.,LUNCH BAG CARS BLUE,LUNCH BAG PINK POLKADOT}"
## [34] "{LUNCH BAG CARS BLUE,LUNCH BAG RED RETROSPOT}"
## [35] "{LUNCH BAG RED RETROSPOT,LUNCH BAG WOODLAND}"
## [36] "{LUNCH BAG BLACK SKULL.,LUNCH BAG WOODLAND}"
## [37] "{LUNCH BAG SUKI DESIGN,LUNCH BAG WOODLAND}"
## [38] "{LUNCH BAG RED RETROSPOT,LUNCH BAG SPACEBOY DESIGN}"
## [39] "{LUNCH BAG BLACK SKULL.,LUNCH BAG RED RETROSPOT}"
## [40] "{LUNCH BAG CARS BLUE,LUNCH BAG SPACEBOY DESIGN}"
## [41] "{JUMBO BAG RED RETROSPOT,JUMBO STORAGE BAG SUKI}"
## [42] "{LUNCH BAG PINK POLKADOT,LUNCH BAG SUKI DESIGN}"
## [43] "{LUNCH BAG CARS BLUE,LUNCH BAG WOODLAND}"
## [44] "{LUNCH BAG BLACK SKULL.,LUNCH BAG CARS BLUE}"
## [45] "{LUNCH BAG APPLE DESIGN,LUNCH BAG RED RETROSPOT}"
## [46] "{JUMBO BAG BAROQUE BLACK WHITE,JUMBO BAG RED RETROSPOT}"
## [47] "{LUNCH BAG PINK POLKADOT,LUNCH BAG RED RETROSPOT}"
## [48] "{LUNCH BAG BLACK SKULL.,LUNCH BAG PINK POLKADOT}"
## [49] "{LUNCH BAG RED RETROSPOT,LUNCH BAG SUKI DESIGN}"
## [50] "{LUNCH BAG BLACK SKULL.,LUNCH BAG SPACEBOY DESIGN}"
## [51] "{LUNCH BAG CARS BLUE,LUNCH BAG PINK POLKADOT}"
## [52] "{LUNCH BAG BLACK SKULL.,LUNCH BAG SUKI DESIGN}"
## [53] "{JUMBO BAG RED RETROSPOT,LUNCH BAG PINK POLKADOT}"
## [54] "{LUNCH BAG SPACEBOY DESIGN,LUNCH BAG WOODLAND}"
## [55] "{LUNCH BAG APPLE DESIGN,LUNCH BAG PINK POLKADOT}"
## [56] "{LUNCH BAG SPACEBOY DESIGN,LUNCH BAG SUKI DESIGN}"
## [57] "{LUNCH BAG CARS BLUE,LUNCH BAG SUKI DESIGN}"
## [58] "{JUMBO BAG RED RETROSPOT,LUNCH BAG BLACK SKULL.}"
## [59] "{LUNCH BAG APPLE DESIGN,LUNCH BAG SPACEBOY DESIGN}"
## [60] "{LUNCH BAG APPLE DESIGN,LUNCH BAG SUKI DESIGN}"
## [61] "{JUMBO BAG PINK POLKADOT,JUMBO BAG STRAWBERRY}"
## [62] "{JUMBO BAG STRAWBERRY,JUMBO STORAGE BAG SUKI}"
## [63] "{JUMBO BAG PINK POLKADOT,JUMBO STORAGE BAG SUKI}"
## [64] "{LUNCH BAG DOLLY GIRL DESIGN,LUNCH BAG SPACEBOY DESIGN}"
## [65] "{JUMBO BAG PINK POLKADOT,LUNCH BAG RED RETROSPOT}"
## [66] "{JUMBO BAG PINK POLKADOT,JUMBO SHOPPER VINTAGE RED PAISLEY}"
## [67] "{JUMBO SHOPPER VINTAGE RED PAISLEY,JUMBO STORAGE BAG SUKI}"
## [68] "{JUMBO BAG BAROQUE BLACK WHITE,JUMBO BAG PINK POLKADOT}"
## [69] "{JUMBO BAG PINK POLKADOT,JUMBO BAG PINK VINTAGE PAISLEY}"
## [70] "{GREEN REGENCY TEACUP AND SAUCER,PINK REGENCY TEACUP AND SAUCER,ROSES REGENCY TEACUP AND SAUCER}"
## Itemsets in Consequent (RHS)
## [1] "{REGENCY CAKESTAND 3 TIER}"
## [2] "{JUMBO BAG RED RETROSPOT}"
## [3] "{LUNCH BAG RED RETROSPOT}"
## [4] "{LUNCH BAG BLACK SKULL.}"
## [5] "{LUNCH BAG CARS BLUE}"
## [6] "{LUNCH BAG SPACEBOY DESIGN}"
## [7] "{LUNCH BAG SUKI DESIGN}"
## [8] "{JUMBO BAG PINK POLKADOT}"
## [9] "{LUNCH BAG PINK POLKADOT}"
## [10] "{WOODEN FRAME ANTIQUE WHITE}"
## [11] "{LUNCH BAG WOODLAND}"
## [12] "{WOODEN PICTURE FRAME WHITE FINISH}"
## [13] "{JUMBO BAG VINTAGE LEAF}"
## [14] "{ALARM CLOCK BAKELIKE GREEN}"
## [15] "{ALARM CLOCK BAKELIKE RED}"
## [16] "{JUMBO BAG APPLES}"
## [17] "{ROSES REGENCY TEACUP AND SAUCER}"
## [18] "{GREEN REGENCY TEACUP AND SAUCER}"
## [19] "{PINK REGENCY TEACUP AND SAUCER}"
## [20] "{JUMBO BAG PEARS}"
## [21] "{WOODEN HEART CHRISTMAS SCANDINAVIAN}"
## [22] "{WOODEN TREE CHRISTMAS SCANDINAVIAN}"
## [23] "{WOODEN STAR CHRISTMAS SCANDINAVIAN}"
Interactive network plot reveals a few main cluster of related items. The most number of rules are clusters within the lunch bag network, which connects to Jumbo Bag agglomeration. Other groups consist of alarm clocks, picture frames, wooden decorations and teacup sets. (first 100 rules)
This section explores how consumer behavior changes in months of November and December compared to the rest of the year. By splitting the data set, we can observe shifts in product popularity and transaction characteristics. First, the InvoiceDate variable type was transformed into the r date format, then the Month variable was created. The data was split into two subsets and transformed into a transactional matrix.
The number of transaction in the holiday season was 4383, while the rest of the year consisted of 12263 sales. The number of unique items was similar in both cases (around 3500) and the matrix density was higher by 0.0008 for November and December. The most popular items during this season are, as expected, related to Christmas (PAPER CHAIN KIT 50’S CHRISTMAS or PAPER CHAIN KIT VINTAGE CHRISTMAS).
#christmas vs rest of the year
# InvoiceDate as date
df_uk$InvoiceDate <- as.Date(df_uk$InvoiceDate)
df_uk$Month <- as.numeric(format(df_uk$InvoiceDate, "%m"))
# Nov-Dec subset
df_xmas <- subset(df_uk, Month %in% c(11, 12))
trans_xmas_list <- split(df_xmas$Description, df_xmas$InvoiceNo)
groceries_xmas <- as(trans_xmas_list, "transactions")
# Rest of the year subset
df_rest <- subset(df_uk, !(Month %in% c(11, 12)))
trans_rest_list <- split(df_rest$Description, df_rest$InvoiceNo)
groceries_rest <- as(trans_rest_list, "transactions")
summary(groceries_xmas)## transactions as itemMatrix in sparse format with
## 4383 rows (elements/itemsets/transactions) and
## 3344 columns (items) and a density of 0.006442696
##
## most frequent items:
## PAPER CHAIN KIT 50'S CHRISTMAS RABBIT NIGHT LIGHT
## 556 457
## WHITE HANGING HEART T-LIGHT HOLDER PAPER CHAIN KIT VINTAGE CHRISTMAS
## 454 397
## CHOCOLATE HOT WATER BOTTLE (Other)
## 349 92216
##
## element (itemset/transaction) length distribution:
## sizes
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
## 338 197 155 133 163 151 155 139 154 125 122 125 102 130 140 116 116 103 101 94
## 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
## 82 82 65 54 47 49 52 52 42 51 43 43 42 37 23 29 28 29 25 29
## 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
## 25 18 15 25 30 24 13 21 24 21 12 11 17 20 17 13 10 7 14 18
## 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80
## 10 9 10 9 7 11 5 11 4 10 8 8 6 4 5 4 5 4 4 4
## 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
## 4 3 3 8 4 3 2 3 2 5 3 4 2 3 2 3 1 1 1 4
## 101 102 103 104 106 108 109 111 112 113 114 115 116 117 118 120 121 122 123 126
## 3 1 2 3 1 2 1 1 2 1 3 3 4 1 2 2 1 1 1 2
## 127 128 131 132 134 135 138 139 141 144 145 146 148 149 156 175 176 180 184 193
## 1 1 1 2 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1 1
## 204 208 375 439 525 529 541
## 1 1 1 1 1 1 1
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 6.00 15.00 21.54 27.00 541.00
##
## includes extended item information - examples:
## labels
## 1 10 COLOUR SPACEBOY PEN
## 2 12 COLOURED PARTY BALLOONS
## 3 12 DAISY PEGS IN WOOD BOX
##
## includes extended transaction information - examples:
## transactionID
## 1 536365
## 2 536366
## 3 536367
## transactions as itemMatrix in sparse format with
## 12263 rows (elements/itemsets/transactions) and
## 3656 columns (items) and a density of 0.005572704
##
## most frequent items:
## WHITE HANGING HEART T-LIGHT HOLDER PARTY BUNTING
## 1430 1187
## JUMBO BAG RED RETROSPOT REGENCY CAKESTAND 3 TIER
## 1140 1116
## ASSORTED COLOUR BIRD ORNAMENT (Other)
## 980 243991
##
## element (itemset/transaction) length distribution:
## sizes
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20
## 935 482 439 437 436 399 390 403 392 342 354 313 328 337 345 373 296 299 324 286
## 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40
## 264 212 240 212 171 182 160 169 199 138 120 110 100 114 93 76 84 73 91 75
## 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60
## 78 68 66 56 44 56 51 45 47 49 32 39 41 41 42 24 39 26 15 28
## 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80
## 20 9 24 24 22 22 19 19 15 18 19 9 13 20 18 13 11 15 5 6
## 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100
## 9 13 11 11 9 11 7 10 7 2 6 8 8 3 1 5 6 9 2 5
## 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120
## 3 2 4 4 2 1 6 1 1 4 3 1 2 3 3 3 1 2 4 2
## 121 122 123 124 125 127 129 130 131 134 136 137 138 140 141 142 145 146 149 150
## 4 6 1 4 1 4 1 2 1 1 2 1 1 1 3 1 1 1 1 1
## 151 153 154 157 163 165 169 176 178 179 181 187 192 195 202 210 227 249 262 270
## 1 1 1 1 1 1 2 1 1 1 1 1 1 1 1 1 1 1 1 1
## 280 333 347 352 363 386 419 434
## 1 1 1 1 1 1 1 1
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 6.00 15.00 20.37 27.00 434.00
##
## includes extended item information - examples:
## labels
## 1 10 COLOUR SPACEBOY PEN
## 2 12 COLOURED PARTY BALLOONS
## 3 12 DAISY PEGS IN WOOD BOX
##
## includes extended transaction information - examples:
## transactionID
## 1 539993
## 2 540001
## 3 540002
The Apriori algorithm was implemented with the same parameters (support = 0.01, confidence = 0.5, minlen = 2) as before to compare rules generated during peak holiday season and the rest of the year. For the Christmas season the method yielded 294 rules and for the rest of the year - 469.
#xmas rules
rules_xmas <- apriori(groceries_xmas,
parameter = list(support = 0.01, confidence = 0.5, minlen = 2))## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.5 0.1 1 none FALSE TRUE 5 0.01 2
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 43
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[3344 item(s), 4383 transaction(s)] done [0.01s].
## sorting and recoding items ... [646 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [294 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
## set of 294 rules
#rest of the year rules
rules_rest <- apriori(groceries_rest,
parameter = list(support = 0.01, confidence = 0.5, minlen = 2))## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.5 0.1 1 none FALSE TRUE 5 0.01 2
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 122
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[3656 item(s), 12263 transaction(s)] done [0.04s].
## sorting and recoding items ... [607 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 done [0.01s].
## writing ... [469 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
## set of 469 rules
Presented below are the interactive tables for both subsets. When sorted by lift, data table with holiday rules displays a number of Christmas related items, while the rest of the year table shows similar top product relationships as in the previous part of the analysis done on the whole dataset.
The scatterplots below visualize the comparison between holiday and rest of the year rules strength and distribution.
#xmas plot
p_xmas <- plot(rules_xmas,
method = "scatterplot",
measure = c("support", "lift"),
shading = "confidence",
main = "November and December")
#rest plot
p_rest <- plot(rules_rest,
method = "scatterplot",
measure = c("support", "lift"),
shading = "confidence",
main = "Rest of the year")
grid.arrange(p_xmas, p_rest, ncol=2)A keyword Christmas frequency analysis was performed on association rules for both periods. Counting the number of occurrences of the word Christmas revealed a contrast in the thematic focus of the transaction rules. Christmas period found 43 rules involving Christmas themed products (out of 294 total) and the January through October period found only 4 (out of 469). That’s 14.6% compared to 0.8%.
# count 'Christmas' occurrences
count_christmas <- function(rules_object) {
if (length(rules_object) == 0) return(0)
rules_text <- labels(rules_object)
count <- sum(grepl("CHRISTMAS", rules_text))
return(count)
}
#count christmas
xmas_rules_count <- count_christmas(rules_xmas)
rest_rules_count <- count_christmas(rules_rest)
xmas_rules_count## [1] 43
## [1] 4
The final step if the analysis utilizes the ECLAT (Equivalence Class Transformation) algorithm to identify the most frequent item bundles within the UK market. Unlike the Apriori method, which generates ‘if-then’ rules, ECLAT focuses on support to find items that are purchased together. By analyzing the vertical itemsets, the algorithm reveals that the most common baskets involve thematic series. With the minimum support level of 0.01, 355 itemsets were found. The most common basket contains two variants of jumbo bags (support = 0.0304), and the second most common - lunch bags (0.0291). Wooden frames and the Regency sets are also high in the ranking.
The plot below displays an interactive map of the top 15 itemsets, it is moderately similar to the Apriori results.
## Eclat
##
## parameter specification:
## tidLists support minlen maxlen target ext
## FALSE 0.01 2 5 frequent itemsets TRUE
##
## algorithmic control:
## sparse sort verbose
## 7 -2 TRUE
##
## Absolute minimum support count: 166
##
## create itemset ...
## set transactions ...[3833 item(s), 16646 transaction(s)] done [0.05s].
## sorting and recoding items ... [615 item(s)] done [0.00s].
## creating sparse bit matrix ... [615 row(s), 16646 column(s)] done [0.00s].
## writing ... [355 set(s)] done [0.28s].
## Creating S4 object ... done [0.00s].
## set of 355 itemsets
## items support count
## [1] {JUMBO BAG PINK POLKADOT,
## JUMBO BAG RED RETROSPOT} 0.03039769 506
## [2] {LUNCH BAG BLACK SKULL.,
## LUNCH BAG RED RETROSPOT} 0.02907605 484
## [3] {GREEN REGENCY TEACUP AND SAUCER,
## ROSES REGENCY TEACUP AND SAUCER} 0.02859546 476
## [4] {LUNCH BAG PINK POLKADOT,
## LUNCH BAG RED RETROSPOT} 0.02829509 471
## [5] {WOODEN FRAME ANTIQUE WHITE,
## WOODEN PICTURE FRAME WHITE FINISH} 0.02751412 458
## [6] {GARDENERS KNEELING PAD CUP OF TEA,
## GARDENERS KNEELING PAD KEEP CALM} 0.02751412 458
## [7] {ALARM CLOCK BAKELIKE GREEN,
## ALARM CLOCK BAKELIKE RED} 0.02727382 454
## [8] {LUNCH BAG BLACK SKULL.,
## LUNCH BAG PINK POLKADOT} 0.02655293 442
## [9] {PAPER CHAIN KIT 50'S CHRISTMAS,
## PAPER CHAIN KIT VINTAGE CHRISTMAS} 0.02625255 437
## [10] {RED HANGING HEART T-LIGHT HOLDER,
## WHITE HANGING HEART T-LIGHT HOLDER} 0.02571188 428
## [11] {LUNCH BAG CARS BLUE,
## LUNCH BAG RED RETROSPOT} 0.02481077 413
## [12] {LUNCH BAG RED RETROSPOT,
## LUNCH BAG SUKI DESIGN} 0.02463054 410
## [13] {LUNCH BAG RED RETROSPOT,
## LUNCH BAG SPACEBOY DESIGN} 0.02463054 410
## [14] {JUMBO BAG RED RETROSPOT,
## JUMBO STORAGE BAG SUKI} 0.02451039 408
## [15] {GREEN REGENCY TEACUP AND SAUCER,
## PINK REGENCY TEACUP AND SAUCER} 0.02427009 404
Similarly to the previous part of the analysis, 61 itemsets with the minimum length of 3 were generated. The items characterized by the highest support measure (around 0.02) were the Regency teacup and saucer collection and three different variants of lunch bags. The lunch bags bundle can also be observed.
## Eclat
##
## parameter specification:
## tidLists support minlen maxlen target ext
## FALSE 0.01 3 5 frequent itemsets TRUE
##
## algorithmic control:
## sparse sort verbose
## 7 -2 TRUE
##
## Absolute minimum support count: 166
##
## create itemset ...
## set transactions ...[3833 item(s), 16646 transaction(s)] done [0.05s].
## sorting and recoding items ... [615 item(s)] done [0.00s].
## creating sparse bit matrix ... [615 row(s), 16646 column(s)] done [0.00s].
## writing ... [61 set(s)] done [0.28s].
## Creating S4 object ... done [0.00s].
## set of 61 itemsets
## items support count
## [1] {GREEN REGENCY TEACUP AND SAUCER,
## PINK REGENCY TEACUP AND SAUCER,
## ROSES REGENCY TEACUP AND SAUCER} 0.02048540 341
## [2] {LUNCH BAG BLACK SKULL.,
## LUNCH BAG PINK POLKADOT,
## LUNCH BAG RED RETROSPOT} 0.01760183 293
## [3] {GREEN REGENCY TEACUP AND SAUCER,
## REGENCY CAKESTAND 3 TIER,
## ROSES REGENCY TEACUP AND SAUCER} 0.01591974 265
## [4] {LUNCH BAG CARS BLUE,
## LUNCH BAG PINK POLKADOT,
## LUNCH BAG RED RETROSPOT} 0.01525892 254
## [5] {LUNCH BAG BLACK SKULL.,
## LUNCH BAG CARS BLUE,
## LUNCH BAG PINK POLKADOT} 0.01501862 250
## [6] {LUNCH BAG BLACK SKULL.,
## LUNCH BAG CARS BLUE,
## LUNCH BAG RED RETROSPOT} 0.01483840 247
## [7] {LUNCH BAG BLACK SKULL.,
## LUNCH BAG RED RETROSPOT,
## LUNCH BAG SUKI DESIGN} 0.01447795 241
## [8] {LUNCH BAG BLACK SKULL.,
## LUNCH BAG RED RETROSPOT,
## LUNCH BAG SPACEBOY DESIGN} 0.01423765 237
## [9] {LUNCH BAG PINK POLKADOT,
## LUNCH BAG RED RETROSPOT,
## LUNCH BAG SUKI DESIGN} 0.01393728 232
## [10] {GREEN REGENCY TEACUP AND SAUCER,
## PINK REGENCY TEACUP AND SAUCER,
## REGENCY CAKESTAND 3 TIER} 0.01381713 230
## [11] {LUNCH BAG PINK POLKADOT,
## LUNCH BAG RED RETROSPOT,
## LUNCH BAG SPACEBOY DESIGN} 0.01375706 229
## [12] {PINK REGENCY TEACUP AND SAUCER,
## REGENCY CAKESTAND 3 TIER,
## ROSES REGENCY TEACUP AND SAUCER} 0.01351676 225
## [13] {JUMBO BAG PINK POLKADOT,
## JUMBO BAG RED RETROSPOT,
## JUMBO BAG STRAWBERRY} 0.01351676 225
## [14] {LUNCH BAG RED RETROSPOT,
## LUNCH BAG SPACEBOY DESIGN,
## LUNCH BAG WOODLAND} 0.01339661 223
## [15] {LUNCH BAG CARS BLUE,
## LUNCH BAG RED RETROSPOT,
## LUNCH BAG SUKI DESIGN} 0.01297609 216
The market basket analysis conducted on the UK Online Retail store provides a comprehensive map of consumer behavior, starting with Apriori ‘if-then’ rules, moving to the seasonal and multi-item analysis and comparing the results to the ECLAT algorithm.
A key strategic finding is the power of the collections, a strong customer drive to complete thematic set. It is possibly to the wholesale nature of this store’s client profiles, but it is the most evident in the Regency and the Scandinavian wooden series. Certain high-support products, such as Lunch Bags or Jumbo Bags serve as massive sales anchors, triggering the most consequent purchases. The seasonal analysis showed that there is a significant shift in the market during the holiday season. Christmas themed rules represented around 15% of all the associations during this season.
Based on these findings, the main recommendation for the retailer is to consider transitioning form selling individual items to offering pre-packaged thematic sets in bundles, especially for the Jumbo bags, lunch bags and the Regency tea sets. Also, the high confidence for some purchases indicates the need for the site’s algorithm to automatically suggest related items to maximize cross-selling.