1. Introduction

Market Basket Analysis is an unsupervised learning technique based on Association Rules, which identifies the relationships between products purchased together. In this project Apriori and ECLAT algorithm will be applied on a large dataset to analyse these patterns. While Apriori operates on a horizontal database layout and scans it level by level, ECLAT identifies frequent itemsets as clusters. The evaluation of these connections is done by using metrics such as Support (frequency), Confidence (probability) and Lift (the strength of the association).

The main objective of this study is to analyze a UK-based online retail store to uncover structure in customer behavior. I aim to find the strongest rules reflecting shopping habits and identify the top-selling product bundles. I compare the top rules in the Christmas season to the rest of the year and explore 3+ item rules. The results may be useful for retailers looking to optimize the customer experience and increase the average transaction value by, for example, bundling.

2. Database

The data set used in this study was sourced form the UCI Machine Learning Repository (https://archive.ics.uci.edu/dataset/352/online+retail). It represents a transactional dataset from a UK-based company, which operates as an online retailer. The store specializes in every occasion gifts and the significant portion of its clients base consists of wholesalers rather than individual customers. The dataset covers all transactions between 01/12/2010 and 09/12/2011. The data was directly imported from the UCI repository via url in its original .xlsx format. Raw data contained 541,909 entries and 8 columns. The variables and their descriptions are presented below.

url <- "https://archive.ics.uci.edu/ml/machine-learning-databases/00352/Online%20Retail.xlsx"
destfile <- "Online_Retail.xlsx"
#downloading the file
if (!file.exists(destfile)) {
  url <- "https://archive.ics.uci.edu/ml/machine-learning-databases/00352/Online%20Retail.xlsx"
  download.file(url, destfile, mode = "wb")
}

df <- read_excel(destfile)
Variable Definitions
Variable Name Description
InvoiceNo A unique 6-digit identifier for each transaction. Codes starting with ‘C’ indicate cancellations.
StockCode A unique 5-digit integral number assigned to each distinct product.
Description The nominal name of the product or service.
Quantity The quantities of each product (item) per transaction.
InvoiceDate The day and time when each transaction was generated.
UnitPrice Product price per unit in sterling (£).
CustomerID A unique 5-digit identifier assigned to each regular customer.
Country The name of the country where each customer resides.

Preprocessing

Before conducting the market basket analysis, data cleaning and preparation was required to ensure the quality of applying association rules. First, records with missing Description or CustomerID were deleted. Then, transactions with invoice number starting with “C” (cancelled) were removed, as those orders would not reflect actual purchase association. Also, the dataset was restricted to records with Quantity>0 and UnitPrice>0, and the extra spaces from Description were trimmed. The analysis was narrowed down to United Kingdom, which represents the country of the largest share of customers, providing a homogenous sample and avoiding computational issues.

# Deleting NAs
df_clean <- df[complete.cases(df$Description, df$CustomerID), ]
#deleting cancelled transactions (InvoiceNo starting with C)
df_clean <- df_clean[!grepl("C", df_clean$InvoiceNo), ]
# leaving transactions with quantity>0 and unit price>0
df_clean <- df_clean[df_clean$Quantity > 0 & df_clean$UnitPrice > 0, ]
#Trimming spaces from description
df_clean$Description <- trimws(df_clean$Description)

#UK only
df_uk <- df_clean[df_clean$Country == "United Kingdom", ]

Initial data analysis

Before attempting to perform actual association rules algorithms, the top 10 products with the most occurrences in the transactions were plotted. The “White Hanging Heart T-Light Holder” clearly dominates the sales, appearing in the highest number of transactions. “Jumbo Bag Red Retrospot” and “Regency Cakestand 3 Tier” are the second and third most popular.

ggplot(data.frame(sort(table(df_uk$Description), decreasing = TRUE)[1:10]), 
       aes(x = reorder(Var1, Freq), y = Freq)) +
  geom_bar(stat = "identity", fill = "brown") +
  coord_flip() +
  labs(title = "Top 10 products", x = "Product", y = "Number of occurrences") +
  theme_minimal()

Furthermore, analyzing the number of products per invoice provides insight into basket size distribution. The maximum number of products purchased in a single transaction is 542, with the mean of 21.29. The median consumer is purchasing 15 product at once, which is reasonable as the significant share of consumers are wholesalers. This is highly beneficial for market basket analysis, as those transactions provide a good framework for identifying cooccurrance patterns between multiple items.

# Number of products in a basket
items_per_invoice <- aggregate(Description ~ InvoiceNo, data = df_uk, length)
summary(items_per_invoice$Description)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00    6.00   15.00   21.29   27.00  542.00

3. Apriori algorithm

The Apriori algorithm is a basic technique in association rules, specifically designed to operate on large databases of transactions. It employs a bottom-up approach, utilizing the Apriori principle, which states, that any subset of a frequent itemset must also be frequent. It proceeds by identifying frequent individual items in the database and extending them to lager sets of items as long as they meet a set threshold. In this analysis, the generated rules will be evaluated based on three key metrics:

  • Support - which is the overall frequency of an itemset in a dataset
  • Confidence - represents the conditional probability P(consequent|antecedent)
  • Lift - indicates the strength of the association by comparing the observed frequency of cooccurrence with the expected frequency if items would be independent.

Data preparation

To prepare the cleaned data for Apriori algorithm, the dataset was transformed into a sparse transaction matrix. First, the products descriptions were aggregated based on their unique invoice numbers. After that, grouped lists were converted into a transactions object, which is a format from the arules package. For clarity, data was also represented as a binary matrix, where each row represents an individual transaction and each column a product, with values 1 (purchased) or 0 (not purchased). The matrix consists of 16646 transactions and 3844 unique items. The density of 0.54% indicates a highly sparse matrix, where only 0.54% of all possible combinations occur. The most frequent item, “White Hanging Heart T-Light Holder” occurs 1884 times.

# Grouping by invoices and creating a sparse matrix
transactions_list <- split(df_uk$Description, df_uk$InvoiceNo)
groceries <- as(transactions_list, "transactions")
summary(groceries)
## transactions as itemMatrix in sparse format with
##  16646 rows (elements/itemsets/transactions) and
##  3833 columns (items) and a density of 0.00539578 
## 
## most frequent items:
## WHITE HANGING HEART T-LIGHT HOLDER            JUMBO BAG RED RETROSPOT 
##                               1884                               1447 
##           REGENCY CAKESTAND 3 TIER      ASSORTED COLOUR BIRD ORNAMENT 
##                               1410                               1300 
##                      PARTY BUNTING                            (Other) 
##                               1290                             336942 
## 
## element (itemset/transaction) length distribution:
## sizes
##    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16 
## 1273  679  594  570  599  550  545  542  546  467  476  438  430  467  485  489 
##   17   18   19   20   21   22   23   24   25   26   27   28   29   30   31   32 
##  412  402  425  380  346  294  305  266  218  231  212  221  241  189  163  153 
##   33   34   35   36   37   38   39   40   41   42   43   44   45   46   47   48 
##  142  151  116  105  112  102  116  104  103   86   81   81   74   80   64   66 
##   49   50   51   52   53   54   55   56   57   58   59   60   61   62   63   64 
##   71   70   44   50   58   61   59   37   49   33   29   46   30   18   34   33 
##   65   66   67   68   69   70   71   72   73   74   75   76   77   78   79   80 
##   29   33   24   30   19   28   27   17   19   24   23   17   16   19    9   10 
##   81   82   83   84   85   86   87   88   89   90   91   92   93   94   95   96 
##   13   16   14   19   13   14    9   13    9    7    9   12   10    6    3    8 
##   97   98   99  100  101  102  103  104  105  106  107  108  109  110  111  112 
##    7   10    3    9    6    3    6    7    2    2    6    3    2    4    4    3 
##  113  114  115  116  117  118  119  120  121  122  123  124  125  126  127  128 
##    3    6    6    7    2    4    4    4    5    7    2    4    1    2    5    1 
##  129  130  131  132  134  135  136  137  138  139  140  141  142  144  145  146 
##    1    2    2    2    2    2    2    1    2    1    1    4    1    1    2    2 
##  148  149  150  151  153  154  156  157  163  165  169  175  176  178  179  180 
##    1    2    1    1    1    1    1    1    1    1    2    1    2    1    1    1 
##  181  184  187  192  193  195  202  204  208  210  227  249  262  270  280  333 
##    1    1    1    1    1    1    1    1    1    1    1    1    1    1    1    1 
##  347  352  363  375  386  419  434  439  525  529  541 
##    1    1    1    1    1    1    1    1    1    1    1 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00    6.00   15.00   20.68   27.00  541.00 
## 
## includes extended item information - examples:
##                       labels
## 1     10 COLOUR SPACEBOY PEN
## 2 12 COLOURED PARTY BALLOONS
## 3  12 DAISY PEGS IN WOOD BOX
## 
## includes extended transaction information - examples:
##   transactionID
## 1        536365
## 2        536366
## 3        536367
#binary dataframe
df_binary <- as(groceries, "matrix")
df_binary <- as.data.frame(df_binary)
df_binary[] <- lapply(df_binary, as.integer)
head(df_binary)

Apriori

The Apriori algorithm successfully identified 258 rules that meet the established thresholds (support = 0.01, confidence = 0.5). A threshold of 1% was implemented to ensure, that a generated rule occurred in at least 166 transactions. Given the vast number of products, a 1% support is sensitive enough to capture specialized product clusters. The confidence level of 0.5 was established to focus on strong rules, meaning a 50% chance, that if product A is purchased, product B is also in the basket. An interactive table using the DT (DataTables) library was implemented.

rules_uk <- apriori(groceries, parameter = list(support = 0.01, confidence = 0.5, minlen = 2))
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.5    0.1    1 none FALSE            TRUE       5    0.01      2
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 166 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[3833 item(s), 16646 transaction(s)] done [0.05s].
## sorting and recoding items ... [615 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.01s].
## writing ... [258 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
print(rules_uk)
## set of 258 rules
inspectDT(rules_uk)
Top 10 rules

The top 10 rules by each measure are resented below.

By lift

The rules characterized by the highest lift are herb containers with lift statistic equal to 86.829. Buying a thyme container increases the chance of also buying a rosemary container over 86 times, compared to the situation if the products would be independent. Other rules in the top ten consist of tea set items, a toy collection and a decoration set.

Top 10 Association Rules by Lift (Rounded)
rules support confidence coverage lift count
{HERB MARKER THYME} => {HERB MARKER ROSEMARY} 0.010 0.944 0.011 86.829 169
{HERB MARKER ROSEMARY} => {HERB MARKER THYME} 0.010 0.934 0.011 86.829 169
{REGENCY TEA PLATE GREEN} => {REGENCY TEA PLATE ROSES} 0.012 0.846 0.014 52.930 192
{REGENCY TEA PLATE ROSES} => {REGENCY TEA PLATE GREEN} 0.012 0.722 0.016 52.930 192
{POPPY’S PLAYHOUSE LIVINGROOM} => {POPPY’S PLAYHOUSE BEDROOM} 0.010 0.809 0.013 51.770 169
{POPPY’S PLAYHOUSE BEDROOM} => {POPPY’S PLAYHOUSE LIVINGROOM} 0.010 0.650 0.016 51.770 169
{SET OF 3 WOODEN STOCKING DECORATION} => {SET OF 3 WOODEN TREE DECORATIONS} 0.010 0.691 0.015 50.212 172
{SET OF 3 WOODEN TREE DECORATIONS} => {SET OF 3 WOODEN STOCKING DECORATION} 0.010 0.751 0.014 50.212 172
{POPPY’S PLAYHOUSE LIVINGROOM} => {POPPY’S PLAYHOUSE KITCHEN} 0.011 0.852 0.013 49.226 178
{POPPY’S PLAYHOUSE KITCHEN} => {POPPY’S PLAYHOUSE LIVINGROOM} 0.011 0.618 0.017 49.226 178
By support

Top 10 rules by support represent the most common purchase patterns across the entire dataset. Support indicates which bundles are purchased in the highest volume. All the rules in the top 10 are approximately 3% support. They reveal, that commonly jumbo bags in different colors are in the same basket, it is also true for teacup and saucers and lunch bags.

Top 10 Most Frequent Association Rules (by Support)
rules support confidence coverage lift count
{JUMBO BAG PINK POLKADOT} => {JUMBO BAG RED RETROSPOT} 0.030 0.623 0.049 7.169 506
{GREEN REGENCY TEACUP AND SAUCER} => {ROSES REGENCY TEACUP AND SAUCER} 0.029 0.778 0.037 19.096 476
{ROSES REGENCY TEACUP AND SAUCER} => {GREEN REGENCY TEACUP AND SAUCER} 0.029 0.702 0.041 19.096 476
{LUNCH BAG PINK POLKADOT} => {LUNCH BAG RED RETROSPOT} 0.028 0.555 0.051 8.255 471
{WOODEN FRAME ANTIQUE WHITE} => {WOODEN PICTURE FRAME WHITE FINISH} 0.028 0.584 0.047 11.387 458
{WOODEN PICTURE FRAME WHITE FINISH} => {WOODEN FRAME ANTIQUE WHITE} 0.028 0.536 0.051 11.387 458
{GARDENERS KNEELING PAD CUP OF TEA} => {GARDENERS KNEELING PAD KEEP CALM} 0.028 0.730 0.038 16.387 458
{GARDENERS KNEELING PAD KEEP CALM} => {GARDENERS KNEELING PAD CUP OF TEA} 0.028 0.617 0.045 16.387 458
{ALARM CLOCK BAKELIKE GREEN} => {ALARM CLOCK BAKELIKE RED} 0.027 0.658 0.041 14.449 454
{ALARM CLOCK BAKELIKE RED} => {ALARM CLOCK BAKELIKE GREEN} 0.027 0.599 0.046 14.449 454
By confidence

Confidence represents probability that the consequent will be purchased given the antecedent is already in the basket. Again, the herbs containers are in the first and second spot, with 94% chance that rosemary herb maker will be bought if the thyme maker is in the basket. Interestingly, the confidence list is dominated by rules with multiple antecedents. A customer with Wooden Heart and Wooden Tree is also likely to add the Wooden Star. The Regency series also appears frequently in 2+1 itemsets.

Top 10 Most Reliable Association Rules (by Confidence)
rules support confidence coverage lift count
{HERB MARKER THYME} => {HERB MARKER ROSEMARY} 0.010 0.944 0.011 86.829 169
{HERB MARKER ROSEMARY} => {HERB MARKER THYME} 0.010 0.934 0.011 86.829 169
{WOODEN HEART CHRISTMAS SCANDINAVIAN,WOODEN TREE CHRISTMAS SCANDINAVIAN} => {WOODEN STAR CHRISTMAS SCANDINAVIAN} 0.010 0.929 0.011 38.562 170
{PINK REGENCY TEACUP AND SAUCER,REGENCY CAKESTAND 3 TIER,ROSES REGENCY TEACUP AND SAUCER} => {GREEN REGENCY TEACUP AND SAUCER} 0.012 0.898 0.014 24.419 202
{PINK REGENCY TEACUP AND SAUCER,ROSES REGENCY TEACUP AND SAUCER} => {GREEN REGENCY TEACUP AND SAUCER} 0.020 0.890 0.023 24.217 341
{GREEN REGENCY TEACUP AND SAUCER,PINK REGENCY TEACUP AND SAUCER,REGENCY CAKESTAND 3 TIER} => {ROSES REGENCY TEACUP AND SAUCER} 0.012 0.878 0.014 21.563 202
{PINK REGENCY TEACUP AND SAUCER,REGENCY CAKESTAND 3 TIER} => {GREEN REGENCY TEACUP AND SAUCER} 0.014 0.868 0.016 23.607 230
{POPPY’S PLAYHOUSE LIVINGROOM} => {POPPY’S PLAYHOUSE KITCHEN} 0.011 0.852 0.013 49.226 178
{PINK REGENCY TEACUP AND SAUCER,REGENCY CAKESTAND 3 TIER} => {ROSES REGENCY TEACUP AND SAUCER} 0.014 0.849 0.016 20.846 225
{REGENCY TEA PLATE GREEN} => {REGENCY TEA PLATE ROSES} 0.012 0.846 0.014 52.930 192

The scatterplot below represents an overview of the 258 generated association rules, mapping the relation between support (x axis), lift (y axis) and confidence (shading). Most rules are clustered on the bottom left corner with support around 0.10-0.15 and lift above 10.

plot(rules_uk, method = "scatterplot", measure = c("support", "lift"), shading = "confidence")

The network graph visualization visualizes relationships between actual products. The graph below displays the top 20 rules by lift mapping how specific item are connected. The blue nodes represent products and the circles - association rules. The darker the color, the more confidence a rule has. Arrows indicate the direction from the antecedent to consequent. We can observe some highly coupled pairs as well as more complex structures, such as the Poppy Playhouse series.

top20rules <- head(sort(rules_uk, by="lift"), 20)
plot(top20rules, method = "graph", engine = "htmlwidget")

The grouped plot groups the antecedents (LHS) that share the same consequent (RHS), it identifies items that consistently trigger the purchase of certain complementary item. The plot displays the most common antecedents. It reveals that different color lunch bags are triggering the most consequent purchases, other lunch bags.

lhs_list <- labels(lhs(rules_uk))
counts <- sort(table(lhs_list), decreasing = TRUE)
top_10_hub_labels <- names(head(counts, 10))
rules_to_plot <- rules_uk[lhs_list %in% top_10_hub_labels]
plot(rules_to_plot, method = "grouped")

The paracoord plot shows direct flow from antecedent (Position 1) to the consequent (RHS) for the top 10 rules by lift.

subrules2 <- head(sort(rules_uk, by="lift"), 10)
plot(subrules2, method="paracoord")

Min. 3 element rules

By increasing the minlen parameter to 3 we can identify transactions with at least three different products. A set of 142 rules was generated.

rules_multi <- apriori(groceries, parameter = list(support = 0.01, confidence = 0.5, minlen = 3))
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.5    0.1    1 none FALSE            TRUE       5    0.01      3
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 166 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[3833 item(s), 16646 transaction(s)] done [0.05s].
## sorting and recoding items ... [615 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.01s].
## writing ... [142 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
rules_multi
## set of 142 rules
Top 10 rules

The top 10 rules by each measure for 3+ items are resented below. An interesting result is the dominance of the Regency set in all measures and the occurrence of a wooden Christmas Scandinavian collection it top spots by lift and condidence.

By lift
Top 10 Most Frequent Association Rules (by Support)
rules support confidence coverage lift count
{WOODEN HEART CHRISTMAS SCANDINAVIAN,WOODEN TREE CHRISTMAS SCANDINAVIAN} => {WOODEN STAR CHRISTMAS SCANDINAVIAN} 0.010 0.929 0.011 38.562 170
{WOODEN HEART CHRISTMAS SCANDINAVIAN,WOODEN STAR CHRISTMAS SCANDINAVIAN} => {WOODEN TREE CHRISTMAS SCANDINAVIAN} 0.010 0.567 0.018 38.035 170
{WOODEN STAR CHRISTMAS SCANDINAVIAN,WOODEN TREE CHRISTMAS SCANDINAVIAN} => {WOODEN HEART CHRISTMAS SCANDINAVIAN} 0.010 0.817 0.012 31.276 170
{JUMBO BAG APPLES,JUMBO BAG VINTAGE LEAF} => {JUMBO BAG PEARS} 0.010 0.668 0.016 26.223 173
{GREEN REGENCY TEACUP AND SAUCER,REGENCY CAKESTAND 3 TIER,ROSES REGENCY TEACUP AND SAUCER} => {PINK REGENCY TEACUP AND SAUCER} 0.012 0.762 0.016 25.738 202
{PINK REGENCY TEACUP AND SAUCER,REGENCY CAKESTAND 3 TIER,ROSES REGENCY TEACUP AND SAUCER} => {GREEN REGENCY TEACUP AND SAUCER} 0.012 0.898 0.014 24.419 202
{PINK REGENCY TEACUP AND SAUCER,ROSES REGENCY TEACUP AND SAUCER} => {GREEN REGENCY TEACUP AND SAUCER} 0.020 0.890 0.023 24.217 341
{GREEN REGENCY TEACUP AND SAUCER,REGENCY CAKESTAND 3 TIER} => {PINK REGENCY TEACUP AND SAUCER} 0.014 0.717 0.019 24.193 230
{GREEN REGENCY TEACUP AND SAUCER,ROSES REGENCY TEACUP AND SAUCER} => {PINK REGENCY TEACUP AND SAUCER} 0.020 0.716 0.029 24.189 341
{PINK REGENCY TEACUP AND SAUCER,REGENCY CAKESTAND 3 TIER} => {GREEN REGENCY TEACUP AND SAUCER} 0.014 0.868 0.016 23.607 230
By support
Top 10 Most Frequent Association Rules (by Support)
rules support confidence coverage lift count
{GREEN REGENCY TEACUP AND SAUCER,PINK REGENCY TEACUP AND SAUCER} => {ROSES REGENCY TEACUP AND SAUCER} 0.020 0.844 0.024 20.723 341
{PINK REGENCY TEACUP AND SAUCER,ROSES REGENCY TEACUP AND SAUCER} => {GREEN REGENCY TEACUP AND SAUCER} 0.020 0.890 0.023 24.217 341
{GREEN REGENCY TEACUP AND SAUCER,ROSES REGENCY TEACUP AND SAUCER} => {PINK REGENCY TEACUP AND SAUCER} 0.020 0.716 0.029 24.189 341
{LUNCH BAG BLACK SKULL.,LUNCH BAG PINK POLKADOT} => {LUNCH BAG RED RETROSPOT} 0.018 0.663 0.027 9.852 293
{LUNCH BAG PINK POLKADOT,LUNCH BAG RED RETROSPOT} => {LUNCH BAG BLACK SKULL.} 0.018 0.622 0.028 10.397 293
{LUNCH BAG BLACK SKULL.,LUNCH BAG RED RETROSPOT} => {LUNCH BAG PINK POLKADOT} 0.018 0.605 0.029 11.883 293
{GREEN REGENCY TEACUP AND SAUCER,ROSES REGENCY TEACUP AND SAUCER} => {REGENCY CAKESTAND 3 TIER} 0.016 0.557 0.029 6.572 265
{GREEN REGENCY TEACUP AND SAUCER,REGENCY CAKESTAND 3 TIER} => {ROSES REGENCY TEACUP AND SAUCER} 0.016 0.826 0.019 20.268 265
{REGENCY CAKESTAND 3 TIER,ROSES REGENCY TEACUP AND SAUCER} => {GREEN REGENCY TEACUP AND SAUCER} 0.016 0.753 0.021 20.477 265
{LUNCH BAG CARS BLUE,LUNCH BAG PINK POLKADOT} => {LUNCH BAG RED RETROSPOT} 0.015 0.641 0.024 9.533 254
By confidence
Top 10 Most Reliable Association Rules (by Confidence)
rules support confidence coverage lift count
{WOODEN HEART CHRISTMAS SCANDINAVIAN,WOODEN TREE CHRISTMAS SCANDINAVIAN} => {WOODEN STAR CHRISTMAS SCANDINAVIAN} 0.010 0.929 0.011 38.562 170
{PINK REGENCY TEACUP AND SAUCER,REGENCY CAKESTAND 3 TIER,ROSES REGENCY TEACUP AND SAUCER} => {GREEN REGENCY TEACUP AND SAUCER} 0.012 0.898 0.014 24.419 202
{PINK REGENCY TEACUP AND SAUCER,ROSES REGENCY TEACUP AND SAUCER} => {GREEN REGENCY TEACUP AND SAUCER} 0.020 0.890 0.023 24.217 341
{GREEN REGENCY TEACUP AND SAUCER,PINK REGENCY TEACUP AND SAUCER,REGENCY CAKESTAND 3 TIER} => {ROSES REGENCY TEACUP AND SAUCER} 0.012 0.878 0.014 21.563 202
{PINK REGENCY TEACUP AND SAUCER,REGENCY CAKESTAND 3 TIER} => {GREEN REGENCY TEACUP AND SAUCER} 0.014 0.868 0.016 23.607 230
{PINK REGENCY TEACUP AND SAUCER,REGENCY CAKESTAND 3 TIER} => {ROSES REGENCY TEACUP AND SAUCER} 0.014 0.849 0.016 20.846 225
{GREEN REGENCY TEACUP AND SAUCER,PINK REGENCY TEACUP AND SAUCER} => {ROSES REGENCY TEACUP AND SAUCER} 0.020 0.844 0.024 20.723 341
{GREEN REGENCY TEACUP AND SAUCER,REGENCY CAKESTAND 3 TIER} => {ROSES REGENCY TEACUP AND SAUCER} 0.016 0.826 0.019 20.268 265
{WOODEN STAR CHRISTMAS SCANDINAVIAN,WOODEN TREE CHRISTMAS SCANDINAVIAN} => {WOODEN HEART CHRISTMAS SCANDINAVIAN} 0.010 0.817 0.012 31.276 170
{JUMBO BAG PINK POLKADOT,JUMBO BAG STRAWBERRY} => {JUMBO BAG RED RETROSPOT} 0.014 0.792 0.017 9.114 225

Matrix plot provides mapping of the relationships between LHS and their corresponding RHS. In the upper left corner a staircase structure can be observed, meaning the most powerful rules are selective and lead to a specific outcome. There are some rule cluster revealing multiple antecedents triggering a certain consequent, which confirms the existence of product ecosystems where items from the same thematic series are cross-referenced.

plot(rules_multi, method = "matrix", measure = "lift")
## Itemsets in Antecedent (LHS)
##  [1] "{WOODEN HEART CHRISTMAS SCANDINAVIAN,WOODEN TREE CHRISTMAS SCANDINAVIAN}"                        
##  [2] "{WOODEN HEART CHRISTMAS SCANDINAVIAN,WOODEN STAR CHRISTMAS SCANDINAVIAN}"                        
##  [3] "{WOODEN STAR CHRISTMAS SCANDINAVIAN,WOODEN TREE CHRISTMAS SCANDINAVIAN}"                         
##  [4] "{JUMBO BAG APPLES,JUMBO BAG VINTAGE LEAF}"                                                       
##  [5] "{GREEN REGENCY TEACUP AND SAUCER,REGENCY CAKESTAND 3 TIER,ROSES REGENCY TEACUP AND SAUCER}"      
##  [6] "{PINK REGENCY TEACUP AND SAUCER,REGENCY CAKESTAND 3 TIER,ROSES REGENCY TEACUP AND SAUCER}"       
##  [7] "{GREEN REGENCY TEACUP AND SAUCER,REGENCY CAKESTAND 3 TIER}"                                      
##  [8] "{PINK REGENCY TEACUP AND SAUCER,REGENCY CAKESTAND 3 TIER}"                                       
##  [9] "{GREEN REGENCY TEACUP AND SAUCER,PINK REGENCY TEACUP AND SAUCER,REGENCY CAKESTAND 3 TIER}"       
## [10] "{REGENCY CAKESTAND 3 TIER,ROSES REGENCY TEACUP AND SAUCER}"                                      
## [11] "{JUMBO BAG PEARS,JUMBO BAG VINTAGE LEAF}"                                                        
## [12] "{ALARM CLOCK BAKELIKE GREEN,ALARM CLOCK BAKELIKE IVORY}"                                         
## [13] "{ALARM CLOCK BAKELIKE GREEN,ALARM CLOCK BAKELIKE PINK}"                                          
## [14] "{ALARM CLOCK BAKELIKE IVORY,ALARM CLOCK BAKELIKE RED}"                                           
## [15] "{ALARM CLOCK BAKELIKE PINK,ALARM CLOCK BAKELIKE RED}"                                            
## [16] "{PINK REGENCY TEACUP AND SAUCER,ROSES REGENCY TEACUP AND SAUCER}"                                
## [17] "{GREEN REGENCY TEACUP AND SAUCER,ROSES REGENCY TEACUP AND SAUCER}"                               
## [18] "{LUNCH BAG  BLACK SKULL.,LUNCH BAG CARS BLUE,LUNCH BAG RED RETROSPOT}"                           
## [19] "{JUMBO BAG APPLES,JUMBO BAG PEARS}"                                                              
## [20] "{WHITE HANGING HEART T-LIGHT HOLDER,WOODEN FRAME ANTIQUE WHITE}"                                 
## [21] "{GREEN REGENCY TEACUP AND SAUCER,PINK REGENCY TEACUP AND SAUCER}"                                
## [22] "{LUNCH BAG DOLLY GIRL DESIGN,LUNCH BAG RED RETROSPOT}"                                           
## [23] "{LUNCH BAG  BLACK SKULL.,LUNCH BAG DOLLY GIRL DESIGN}"                                           
## [24] "{LUNCH BAG APPLE DESIGN,LUNCH BAG CARS BLUE}"                                                    
## [25] "{WHITE HANGING HEART T-LIGHT HOLDER,WOODEN PICTURE FRAME WHITE FINISH}"                          
## [26] "{LUNCH BAG CARS BLUE,LUNCH BAG PINK POLKADOT,LUNCH BAG RED RETROSPOT}"                           
## [27] "{LUNCH BAG  BLACK SKULL.,LUNCH BAG PINK POLKADOT,LUNCH BAG RED RETROSPOT}"                       
## [28] "{JUMBO BAG RED RETROSPOT,JUMBO BAG STRAWBERRY}"                                                  
## [29] "{JUMBO BAG PINK VINTAGE PAISLEY,JUMBO BAG RED RETROSPOT}"                                        
## [30] "{LUNCH BAG PINK POLKADOT,LUNCH BAG WOODLAND}"                                                    
## [31] "{LUNCH BAG PINK POLKADOT,LUNCH BAG SPACEBOY DESIGN}"                                             
## [32] "{LUNCH BAG  BLACK SKULL.,LUNCH BAG APPLE DESIGN}"                                                
## [33] "{LUNCH BAG  BLACK SKULL.,LUNCH BAG CARS BLUE,LUNCH BAG PINK POLKADOT}"                           
## [34] "{LUNCH BAG CARS BLUE,LUNCH BAG RED RETROSPOT}"                                                   
## [35] "{LUNCH BAG RED RETROSPOT,LUNCH BAG WOODLAND}"                                                    
## [36] "{LUNCH BAG  BLACK SKULL.,LUNCH BAG WOODLAND}"                                                    
## [37] "{LUNCH BAG SUKI DESIGN,LUNCH BAG WOODLAND}"                                                      
## [38] "{LUNCH BAG RED RETROSPOT,LUNCH BAG SPACEBOY DESIGN}"                                             
## [39] "{LUNCH BAG  BLACK SKULL.,LUNCH BAG RED RETROSPOT}"                                               
## [40] "{LUNCH BAG CARS BLUE,LUNCH BAG SPACEBOY DESIGN}"                                                 
## [41] "{JUMBO BAG RED RETROSPOT,JUMBO STORAGE BAG SUKI}"                                                
## [42] "{LUNCH BAG PINK POLKADOT,LUNCH BAG SUKI DESIGN}"                                                 
## [43] "{LUNCH BAG CARS BLUE,LUNCH BAG WOODLAND}"                                                        
## [44] "{LUNCH BAG  BLACK SKULL.,LUNCH BAG CARS BLUE}"                                                   
## [45] "{LUNCH BAG APPLE DESIGN,LUNCH BAG RED RETROSPOT}"                                                
## [46] "{JUMBO  BAG BAROQUE BLACK WHITE,JUMBO BAG RED RETROSPOT}"                                        
## [47] "{LUNCH BAG PINK POLKADOT,LUNCH BAG RED RETROSPOT}"                                               
## [48] "{LUNCH BAG  BLACK SKULL.,LUNCH BAG PINK POLKADOT}"                                               
## [49] "{LUNCH BAG RED RETROSPOT,LUNCH BAG SUKI DESIGN}"                                                 
## [50] "{LUNCH BAG  BLACK SKULL.,LUNCH BAG SPACEBOY DESIGN}"                                             
## [51] "{LUNCH BAG CARS BLUE,LUNCH BAG PINK POLKADOT}"                                                   
## [52] "{LUNCH BAG  BLACK SKULL.,LUNCH BAG SUKI DESIGN}"                                                 
## [53] "{JUMBO BAG RED RETROSPOT,LUNCH BAG PINK POLKADOT}"                                               
## [54] "{LUNCH BAG SPACEBOY DESIGN,LUNCH BAG WOODLAND}"                                                  
## [55] "{LUNCH BAG APPLE DESIGN,LUNCH BAG PINK POLKADOT}"                                                
## [56] "{LUNCH BAG SPACEBOY DESIGN,LUNCH BAG SUKI DESIGN}"                                               
## [57] "{LUNCH BAG CARS BLUE,LUNCH BAG SUKI DESIGN}"                                                     
## [58] "{JUMBO BAG RED RETROSPOT,LUNCH BAG  BLACK SKULL.}"                                               
## [59] "{LUNCH BAG APPLE DESIGN,LUNCH BAG SPACEBOY DESIGN}"                                              
## [60] "{LUNCH BAG APPLE DESIGN,LUNCH BAG SUKI DESIGN}"                                                  
## [61] "{JUMBO BAG PINK POLKADOT,JUMBO BAG STRAWBERRY}"                                                  
## [62] "{JUMBO BAG STRAWBERRY,JUMBO STORAGE BAG SUKI}"                                                   
## [63] "{JUMBO BAG PINK POLKADOT,JUMBO STORAGE BAG SUKI}"                                                
## [64] "{LUNCH BAG DOLLY GIRL DESIGN,LUNCH BAG SPACEBOY DESIGN}"                                         
## [65] "{JUMBO BAG PINK POLKADOT,LUNCH BAG RED RETROSPOT}"                                               
## [66] "{JUMBO BAG PINK POLKADOT,JUMBO SHOPPER VINTAGE RED PAISLEY}"                                     
## [67] "{JUMBO SHOPPER VINTAGE RED PAISLEY,JUMBO STORAGE BAG SUKI}"                                      
## [68] "{JUMBO  BAG BAROQUE BLACK WHITE,JUMBO BAG PINK POLKADOT}"                                        
## [69] "{JUMBO BAG PINK POLKADOT,JUMBO BAG PINK VINTAGE PAISLEY}"                                        
## [70] "{GREEN REGENCY TEACUP AND SAUCER,PINK REGENCY TEACUP AND SAUCER,ROSES REGENCY TEACUP AND SAUCER}"
## Itemsets in Consequent (RHS)
##  [1] "{REGENCY CAKESTAND 3 TIER}"           
##  [2] "{JUMBO BAG RED RETROSPOT}"            
##  [3] "{LUNCH BAG RED RETROSPOT}"            
##  [4] "{LUNCH BAG  BLACK SKULL.}"            
##  [5] "{LUNCH BAG CARS BLUE}"                
##  [6] "{LUNCH BAG SPACEBOY DESIGN}"          
##  [7] "{LUNCH BAG SUKI DESIGN}"              
##  [8] "{JUMBO BAG PINK POLKADOT}"            
##  [9] "{LUNCH BAG PINK POLKADOT}"            
## [10] "{WOODEN FRAME ANTIQUE WHITE}"         
## [11] "{LUNCH BAG WOODLAND}"                 
## [12] "{WOODEN PICTURE FRAME WHITE FINISH}"  
## [13] "{JUMBO BAG VINTAGE LEAF}"             
## [14] "{ALARM CLOCK BAKELIKE GREEN}"         
## [15] "{ALARM CLOCK BAKELIKE RED}"           
## [16] "{JUMBO BAG APPLES}"                   
## [17] "{ROSES REGENCY TEACUP AND SAUCER}"    
## [18] "{GREEN REGENCY TEACUP AND SAUCER}"    
## [19] "{PINK REGENCY TEACUP AND SAUCER}"     
## [20] "{JUMBO BAG PEARS}"                    
## [21] "{WOODEN HEART CHRISTMAS SCANDINAVIAN}"
## [22] "{WOODEN TREE CHRISTMAS SCANDINAVIAN}" 
## [23] "{WOODEN STAR CHRISTMAS SCANDINAVIAN}"

Interactive network plot reveals a few main cluster of related items. The most number of rules are clusters within the lunch bag network, which connects to Jumbo Bag agglomeration. Other groups consist of alarm clocks, picture frames, wooden decorations and teacup sets. (first 100 rules)

#interactive plot
plot(rules_multi, method="graph", engine="htmlwidget")

Christmas vs the rest of year

This section explores how consumer behavior changes in months of November and December compared to the rest of the year. By splitting the data set, we can observe shifts in product popularity and transaction characteristics. First, the InvoiceDate variable type was transformed into the r date format, then the Month variable was created. The data was split into two subsets and transformed into a transactional matrix.

The number of transaction in the holiday season was 4383, while the rest of the year consisted of 12263 sales. The number of unique items was similar in both cases (around 3500) and the matrix density was higher by 0.0008 for November and December. The most popular items during this season are, as expected, related to Christmas (PAPER CHAIN KIT 50’S CHRISTMAS or PAPER CHAIN KIT VINTAGE CHRISTMAS).

#christmas vs rest of the year
# InvoiceDate as date
df_uk$InvoiceDate <- as.Date(df_uk$InvoiceDate)
df_uk$Month <- as.numeric(format(df_uk$InvoiceDate, "%m"))

# Nov-Dec subset
df_xmas <- subset(df_uk, Month %in% c(11, 12))
trans_xmas_list <- split(df_xmas$Description, df_xmas$InvoiceNo)
groceries_xmas <- as(trans_xmas_list, "transactions")
# Rest of the year subset
df_rest <- subset(df_uk, !(Month %in% c(11, 12)))
trans_rest_list <- split(df_rest$Description, df_rest$InvoiceNo)
groceries_rest <- as(trans_rest_list, "transactions")

summary(groceries_xmas)
## transactions as itemMatrix in sparse format with
##  4383 rows (elements/itemsets/transactions) and
##  3344 columns (items) and a density of 0.006442696 
## 
## most frequent items:
##     PAPER CHAIN KIT 50'S CHRISTMAS                 RABBIT NIGHT LIGHT 
##                                556                                457 
## WHITE HANGING HEART T-LIGHT HOLDER  PAPER CHAIN KIT VINTAGE CHRISTMAS 
##                                454                                397 
##         CHOCOLATE HOT WATER BOTTLE                            (Other) 
##                                349                              92216 
## 
## element (itemset/transaction) length distribution:
## sizes
##   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19  20 
## 338 197 155 133 163 151 155 139 154 125 122 125 102 130 140 116 116 103 101  94 
##  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40 
##  82  82  65  54  47  49  52  52  42  51  43  43  42  37  23  29  28  29  25  29 
##  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60 
##  25  18  15  25  30  24  13  21  24  21  12  11  17  20  17  13  10   7  14  18 
##  61  62  63  64  65  66  67  68  69  70  71  72  73  74  75  76  77  78  79  80 
##  10   9  10   9   7  11   5  11   4  10   8   8   6   4   5   4   5   4   4   4 
##  81  82  83  84  85  86  87  88  89  90  91  92  93  94  95  96  97  98  99 100 
##   4   3   3   8   4   3   2   3   2   5   3   4   2   3   2   3   1   1   1   4 
## 101 102 103 104 106 108 109 111 112 113 114 115 116 117 118 120 121 122 123 126 
##   3   1   2   3   1   2   1   1   2   1   3   3   4   1   2   2   1   1   1   2 
## 127 128 131 132 134 135 138 139 141 144 145 146 148 149 156 175 176 180 184 193 
##   1   1   1   2   1   2   1   1   1   1   1   1   1   1   1   1   1   1   1   1 
## 204 208 375 439 525 529 541 
##   1   1   1   1   1   1   1 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00    6.00   15.00   21.54   27.00  541.00 
## 
## includes extended item information - examples:
##                       labels
## 1     10 COLOUR SPACEBOY PEN
## 2 12 COLOURED PARTY BALLOONS
## 3  12 DAISY PEGS IN WOOD BOX
## 
## includes extended transaction information - examples:
##   transactionID
## 1        536365
## 2        536366
## 3        536367
summary(groceries_rest)
## transactions as itemMatrix in sparse format with
##  12263 rows (elements/itemsets/transactions) and
##  3656 columns (items) and a density of 0.005572704 
## 
## most frequent items:
## WHITE HANGING HEART T-LIGHT HOLDER                      PARTY BUNTING 
##                               1430                               1187 
##            JUMBO BAG RED RETROSPOT           REGENCY CAKESTAND 3 TIER 
##                               1140                               1116 
##      ASSORTED COLOUR BIRD ORNAMENT                            (Other) 
##                                980                             243991 
## 
## element (itemset/transaction) length distribution:
## sizes
##   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19  20 
## 935 482 439 437 436 399 390 403 392 342 354 313 328 337 345 373 296 299 324 286 
##  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40 
## 264 212 240 212 171 182 160 169 199 138 120 110 100 114  93  76  84  73  91  75 
##  41  42  43  44  45  46  47  48  49  50  51  52  53  54  55  56  57  58  59  60 
##  78  68  66  56  44  56  51  45  47  49  32  39  41  41  42  24  39  26  15  28 
##  61  62  63  64  65  66  67  68  69  70  71  72  73  74  75  76  77  78  79  80 
##  20   9  24  24  22  22  19  19  15  18  19   9  13  20  18  13  11  15   5   6 
##  81  82  83  84  85  86  87  88  89  90  91  92  93  94  95  96  97  98  99 100 
##   9  13  11  11   9  11   7  10   7   2   6   8   8   3   1   5   6   9   2   5 
## 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 
##   3   2   4   4   2   1   6   1   1   4   3   1   2   3   3   3   1   2   4   2 
## 121 122 123 124 125 127 129 130 131 134 136 137 138 140 141 142 145 146 149 150 
##   4   6   1   4   1   4   1   2   1   1   2   1   1   1   3   1   1   1   1   1 
## 151 153 154 157 163 165 169 176 178 179 181 187 192 195 202 210 227 249 262 270 
##   1   1   1   1   1   1   2   1   1   1   1   1   1   1   1   1   1   1   1   1 
## 280 333 347 352 363 386 419 434 
##   1   1   1   1   1   1   1   1 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    1.00    6.00   15.00   20.37   27.00  434.00 
## 
## includes extended item information - examples:
##                       labels
## 1     10 COLOUR SPACEBOY PEN
## 2 12 COLOURED PARTY BALLOONS
## 3  12 DAISY PEGS IN WOOD BOX
## 
## includes extended transaction information - examples:
##   transactionID
## 1        539993
## 2        540001
## 3        540002

The Apriori algorithm was implemented with the same parameters (support = 0.01, confidence = 0.5, minlen = 2) as before to compare rules generated during peak holiday season and the rest of the year. For the Christmas season the method yielded 294 rules and for the rest of the year - 469.

#xmas rules
rules_xmas <- apriori(groceries_xmas, 
                      parameter = list(support = 0.01, confidence = 0.5, minlen = 2))
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.5    0.1    1 none FALSE            TRUE       5    0.01      2
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 43 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[3344 item(s), 4383 transaction(s)] done [0.01s].
## sorting and recoding items ... [646 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [294 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
rules_xmas
## set of 294 rules
#rest of the year rules
rules_rest <- apriori(groceries_rest, 
                      parameter = list(support = 0.01, confidence = 0.5, minlen = 2))
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.5    0.1    1 none FALSE            TRUE       5    0.01      2
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 122 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[3656 item(s), 12263 transaction(s)] done [0.04s].
## sorting and recoding items ... [607 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 done [0.01s].
## writing ... [469 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
rules_rest
## set of 469 rules

Presented below are the interactive tables for both subsets. When sorted by lift, data table with holiday rules displays a number of Christmas related items, while the rest of the year table shows similar top product relationships as in the previous part of the analysis done on the whole dataset.

#data table
inspectDT(rules_xmas)
inspectDT(rules_rest)

The scatterplots below visualize the comparison between holiday and rest of the year rules strength and distribution.

#xmas plot
p_xmas <- plot(rules_xmas, 
               method = "scatterplot", 
               measure = c("support", "lift"), 
               shading = "confidence",
               main = "November and December")

#rest plot
p_rest <- plot(rules_rest, 
               method = "scatterplot", 
               measure = c("support", "lift"), 
               shading = "confidence",
               main = "Rest of the year")
grid.arrange(p_xmas, p_rest, ncol=2)

A keyword Christmas frequency analysis was performed on association rules for both periods. Counting the number of occurrences of the word Christmas revealed a contrast in the thematic focus of the transaction rules. Christmas period found 43 rules involving Christmas themed products (out of 294 total) and the January through October period found only 4 (out of 469). That’s 14.6% compared to 0.8%.

# count 'Christmas' occurrences
count_christmas <- function(rules_object) {
  if (length(rules_object) == 0) return(0)
  rules_text <- labels(rules_object)
    count <- sum(grepl("CHRISTMAS", rules_text))
  return(count)
}
#count christmas
xmas_rules_count <- count_christmas(rules_xmas)
rest_rules_count <- count_christmas(rules_rest)
xmas_rules_count
## [1] 43
rest_rules_count
## [1] 4

4. ECLAT algorithm

The final step if the analysis utilizes the ECLAT (Equivalence Class Transformation) algorithm to identify the most frequent item bundles within the UK market. Unlike the Apriori method, which generates ‘if-then’ rules, ECLAT focuses on support to find items that are purchased together. By analyzing the vertical itemsets, the algorithm reveals that the most common baskets involve thematic series. With the minimum support level of 0.01, 355 itemsets were found. The most common basket contains two variants of jumbo bags (support = 0.0304), and the second most common - lunch bags (0.0291). Wooden frames and the Regency sets are also high in the ranking.

The plot below displays an interactive map of the top 15 itemsets, it is moderately similar to the Apriori results.

frequent_items_eclat <- eclat(groceries, 
                              parameter = list(supp = 0.01, maxlen = 5, minlen=2))
## Eclat
## 
## parameter specification:
##  tidLists support minlen maxlen            target  ext
##     FALSE    0.01      2      5 frequent itemsets TRUE
## 
## algorithmic control:
##  sparse sort verbose
##       7   -2    TRUE
## 
## Absolute minimum support count: 166 
## 
## create itemset ... 
## set transactions ...[3833 item(s), 16646 transaction(s)] done [0.05s].
## sorting and recoding items ... [615 item(s)] done [0.00s].
## creating sparse bit matrix ... [615 row(s), 16646 column(s)] done [0.00s].
## writing  ... [355 set(s)] done [0.28s].
## Creating S4 object  ... done [0.00s].
frequent_items_eclat
## set of 355 itemsets
inspect(sort(frequent_items_eclat, by = "support")[1:15])
##      items                                   support count
## [1]  {JUMBO BAG PINK POLKADOT,                            
##       JUMBO BAG RED RETROSPOT}            0.03039769   506
## [2]  {LUNCH BAG  BLACK SKULL.,                            
##       LUNCH BAG RED RETROSPOT}            0.02907605   484
## [3]  {GREEN REGENCY TEACUP AND SAUCER,                    
##       ROSES REGENCY TEACUP AND SAUCER}    0.02859546   476
## [4]  {LUNCH BAG PINK POLKADOT,                            
##       LUNCH BAG RED RETROSPOT}            0.02829509   471
## [5]  {WOODEN FRAME ANTIQUE WHITE,                         
##       WOODEN PICTURE FRAME WHITE FINISH}  0.02751412   458
## [6]  {GARDENERS KNEELING PAD CUP OF TEA,                  
##       GARDENERS KNEELING PAD KEEP CALM}   0.02751412   458
## [7]  {ALARM CLOCK BAKELIKE GREEN,                         
##       ALARM CLOCK BAKELIKE RED}           0.02727382   454
## [8]  {LUNCH BAG  BLACK SKULL.,                            
##       LUNCH BAG PINK POLKADOT}            0.02655293   442
## [9]  {PAPER CHAIN KIT 50'S CHRISTMAS,                     
##       PAPER CHAIN KIT VINTAGE CHRISTMAS}  0.02625255   437
## [10] {RED HANGING HEART T-LIGHT HOLDER,                   
##       WHITE HANGING HEART T-LIGHT HOLDER} 0.02571188   428
## [11] {LUNCH BAG CARS BLUE,                                
##       LUNCH BAG RED RETROSPOT}            0.02481077   413
## [12] {LUNCH BAG RED RETROSPOT,                            
##       LUNCH BAG SUKI DESIGN}              0.02463054   410
## [13] {LUNCH BAG RED RETROSPOT,                            
##       LUNCH BAG SPACEBOY DESIGN}          0.02463054   410
## [14] {JUMBO BAG RED RETROSPOT,                            
##       JUMBO STORAGE BAG SUKI}             0.02451039   408
## [15] {GREEN REGENCY TEACUP AND SAUCER,                    
##       PINK REGENCY TEACUP AND SAUCER}     0.02427009   404
plot(sort(frequent_items_eclat, by = "support")[1:20], method = "graph", engine = "htmlwidget")

Similarly to the previous part of the analysis, 61 itemsets with the minimum length of 3 were generated. The items characterized by the highest support measure (around 0.02) were the Regency teacup and saucer collection and three different variants of lunch bags. The lunch bags bundle can also be observed.

frequent_items_eclat3 <- eclat(groceries, 
                              parameter = list(supp = 0.01, maxlen = 5, minlen=3))
## Eclat
## 
## parameter specification:
##  tidLists support minlen maxlen            target  ext
##     FALSE    0.01      3      5 frequent itemsets TRUE
## 
## algorithmic control:
##  sparse sort verbose
##       7   -2    TRUE
## 
## Absolute minimum support count: 166 
## 
## create itemset ... 
## set transactions ...[3833 item(s), 16646 transaction(s)] done [0.05s].
## sorting and recoding items ... [615 item(s)] done [0.00s].
## creating sparse bit matrix ... [615 row(s), 16646 column(s)] done [0.00s].
## writing  ... [61 set(s)] done [0.28s].
## Creating S4 object  ... done [0.00s].
frequent_items_eclat3
## set of 61 itemsets
inspect(sort(frequent_items_eclat3, by = "support")[1:15])
##      items                                 support count
## [1]  {GREEN REGENCY TEACUP AND SAUCER,                  
##       PINK REGENCY TEACUP AND SAUCER,                   
##       ROSES REGENCY TEACUP AND SAUCER}  0.02048540   341
## [2]  {LUNCH BAG  BLACK SKULL.,                          
##       LUNCH BAG PINK POLKADOT,                          
##       LUNCH BAG RED RETROSPOT}          0.01760183   293
## [3]  {GREEN REGENCY TEACUP AND SAUCER,                  
##       REGENCY CAKESTAND 3 TIER,                         
##       ROSES REGENCY TEACUP AND SAUCER}  0.01591974   265
## [4]  {LUNCH BAG CARS BLUE,                              
##       LUNCH BAG PINK POLKADOT,                          
##       LUNCH BAG RED RETROSPOT}          0.01525892   254
## [5]  {LUNCH BAG  BLACK SKULL.,                          
##       LUNCH BAG CARS BLUE,                              
##       LUNCH BAG PINK POLKADOT}          0.01501862   250
## [6]  {LUNCH BAG  BLACK SKULL.,                          
##       LUNCH BAG CARS BLUE,                              
##       LUNCH BAG RED RETROSPOT}          0.01483840   247
## [7]  {LUNCH BAG  BLACK SKULL.,                          
##       LUNCH BAG RED RETROSPOT,                          
##       LUNCH BAG SUKI DESIGN}            0.01447795   241
## [8]  {LUNCH BAG  BLACK SKULL.,                          
##       LUNCH BAG RED RETROSPOT,                          
##       LUNCH BAG SPACEBOY DESIGN}        0.01423765   237
## [9]  {LUNCH BAG PINK POLKADOT,                          
##       LUNCH BAG RED RETROSPOT,                          
##       LUNCH BAG SUKI DESIGN}            0.01393728   232
## [10] {GREEN REGENCY TEACUP AND SAUCER,                  
##       PINK REGENCY TEACUP AND SAUCER,                   
##       REGENCY CAKESTAND 3 TIER}         0.01381713   230
## [11] {LUNCH BAG PINK POLKADOT,                          
##       LUNCH BAG RED RETROSPOT,                          
##       LUNCH BAG SPACEBOY DESIGN}        0.01375706   229
## [12] {PINK REGENCY TEACUP AND SAUCER,                   
##       REGENCY CAKESTAND 3 TIER,                         
##       ROSES REGENCY TEACUP AND SAUCER}  0.01351676   225
## [13] {JUMBO BAG PINK POLKADOT,                          
##       JUMBO BAG RED RETROSPOT,                          
##       JUMBO BAG STRAWBERRY}             0.01351676   225
## [14] {LUNCH BAG RED RETROSPOT,                          
##       LUNCH BAG SPACEBOY DESIGN,                        
##       LUNCH BAG WOODLAND}               0.01339661   223
## [15] {LUNCH BAG CARS BLUE,                              
##       LUNCH BAG RED RETROSPOT,                          
##       LUNCH BAG SUKI DESIGN}            0.01297609   216
plot(sort(frequent_items_eclat3, by = "support")[1:20], method = "graph", engine = "htmlwidget")

5. Conclusions

The market basket analysis conducted on the UK Online Retail store provides a comprehensive map of consumer behavior, starting with Apriori ‘if-then’ rules, moving to the seasonal and multi-item analysis and comparing the results to the ECLAT algorithm.

A key strategic finding is the power of the collections, a strong customer drive to complete thematic set. It is possibly to the wholesale nature of this store’s client profiles, but it is the most evident in the Regency and the Scandinavian wooden series. Certain high-support products, such as Lunch Bags or Jumbo Bags serve as massive sales anchors, triggering the most consequent purchases. The seasonal analysis showed that there is a significant shift in the market during the holiday season. Christmas themed rules represented around 15% of all the associations during this season.

Based on these findings, the main recommendation for the retailer is to consider transitioning form selling individual items to offering pre-packaged thematic sets in bundles, especially for the Jumbo bags, lunch bags and the Regency tea sets. Also, the high confidence for some purchases indicates the need for the site’s algorithm to automatically suggest related items to maximize cross-selling.