Introduction

The retail industry has undergone a significant transformation with the advent of online shopping, resulting in the generation of vast amounts of transactional data. This wealth of information holds invaluable insights into customer behavior, preferences, and trends. In this report, we embark on an exploration and analysis of an online retail dataset, aiming to uncover patterns in customer purchasing behavior and identify popular products across different countries.

Data Overview

The dataset comprises the following key attributes:

Row and Column: 541,909 entries, 8 total columns which are.

InvoiceNo: A unique identifier for each transaction.
StockCode: Code assigned to each product.
Description: Brief text describing the product.
Quantity: The quantity of each product involved in the transaction.
UnitPrice: The price of a single unit of the product.
InvoiceDate: Date and time when the transaction occurred.
CustomerID: Unique identifier for each customer.
Country: The country where the customer is located.

Objective:

The primary objective of this analysis is to uncover patterns and associations within the online retail transactions across multiple countries, including the United Kingdom, Germany, Poland, and France. By leveraging exploratory data analysis and association rule mining techniques, we aim to identify popular products and understand the relationships between them in the context of customer transactions. This insight will provide valuable information for the online retail store to enhance marketing strategies, improve customer experience, and optimize product offerings for each specific country for example the store might suggested the popular pair of products for the customer from abroad that these product are popular in specific country and offer good price for buying both products together which can lead to raise the sales.

Association Rule

Loading Libraries and Reading the Dataset. The analysis commences by loading essential R libraries for Association Rule

# Install and Load the Libraries:
install.packages("arules")
library(arules)

Load Dataset:. The dataset is loaded into the R environment using the read.csv function.

Online_retail <- read.csv("Online_Retail.csv")

The read.csv function is employed to import the "Online_Retail.csv" dataset into the R environment, and it is stored in the dataframe named Online_retail

Data cleaning and Checking Unique Products:. I identify the unique products by extracting the Description column from the Online_retail dataframe and applying the unique function. This provides me with a vector of distinct product descriptions.

unique_products <- unique(Online_retail$Description)

After exploring the Description column, There are some odd descriptions like Manual, POSTAGE, DOTCOM POSTAGE, CRUK Commission, and Discount. these description mean:
- POSTAGE/DOTCOM POSTAGE: The amount spent by the user on postage.
- CRUK Commission: An initiative to pay some part of the sales to the Cancer Research UK (CRUK).
- Manual: Since there is no proper definition we can think of this as manual service provided for the purchase of an item.
- Discount: This explains the discount provided for a product.
Except for the Discount, all the other categories do not directly affect the sales. Hence, we can remove those from the data. To ensure the subsequent analyses focus on meaningful transactions, I exclude entries with descriptions such as “POSTAGE,” “DOTCOM POSTAGE,” “CRUK Commission,” and “Manual.” Entries with empty descriptions or missing values are also filtered out.

Online_retail <- Online_retail[!(Online_retail$Description %in% c("POSTAGE", "DOTCOM POSTAGE", "CRUK Commission", "Manual") | grepl("^\\s*$", Online_retail$Description) | is.na(Online_retail$Description)), ]

After cleaning the data, I want to know how many country in this dataset as I expected that it will be many country and I want just 4 sample countries. by using unique to check the what country data we have in this dataset and print out

unique_countries <- unique(Online_retail$Country)
##  [1] "United Kingdom"       "France"               "Australia"           
##  [4] "Netherlands"          "Germany"              "Norway"              
##  [7] "EIRE"                 "Switzerland"          "Spain"               
## [10] "Poland"               "Portugal"             "Italy"               
## [13] "Belgium"              "Lithuania"            "Japan"               
## [16] "Iceland"              "Channel Islands"      "Denmark"             
## [19] "Cyprus"               "Sweden"               "Austria"             
## [22] "Israel"               "Finland"              "Bahrain"             
## [25] "Greece"               "Hong Kong"            "Singapore"           
## [28] "Lebanon"              "United Arab Emirates" "Saudi Arabia"        
## [31] "Czech Republic"       "Canada"               "Unspecified"         
## [34] "Brazil"               "USA"                  "European Community"  
## [37] "Malta"                "RSA"

and using length to check how many country is it

print(length(unique_countries))
## [1] 38

So I decided to choose United Kingdom, Germany, Poland, France because these countries are big countries and close to each other and the big reason of choosing Poland is because “JESTEM W POLCE” XD
Selection of Target Country:
I start by selecting the target country, which is the United Kingdom.

target_country_UK <- 'United Kingdom'

I create a new dataframe, df_country_UK, by subsetting the original dataset Online_retail to include only transactions from the UK

df_country_UK <- subset(Online_retail, Country == target_country_UK)

This step ensures that subsequent analyses focus solely on transactions within the UK. Then, I convert the UK transactions data df_country_UK into the transactions format, grouping items by Invoice Number as it is primary key for transaction.

transactions_UK <- split(df_country_UK$Description, df_country_UK$InvoiceNo)

After I got the transaction for association rule mining, the as function is employed to create a transactions object trans_UK. The as function is pivotal in this context as it converts the grouped transactions into a format recognized by the arules package for association rule mining.

trans_UK <- as(transactions_UK, "transactions")

In this step, the Apriori algorithm is applied to mine frequent itemsets from the transactions data for the United Kingdom. The parameter support = 0.05 is set to specify the minimum support threshold for an itemset to be considered frequent.

frequent_itemsets_UK <- apriori(trans_UK, parameter = list(support = 0.05, target = "frequent itemsets"))

The support of an itemset is defined as the proportion of transactions in the dataset that contain that itemset. By setting a support threshold of 0.05, we aim to identify itemsets that appear in at least 5% of the transactions.

Generate Association Rules:

rules_UK <- apriori(trans_UK, parameter = list(support = 0.03, confidence = 0.7, target = "rules"))

After identifying frequent itemsets, the next step is to generate association rules from these itemsets. In this case, the apriori function is used with the parameter settings support = 0.03 and confidence = 0.7. which is the best result after many attempts.

Support: The support parameter sets the minimum support threshold for an association rule to be considered. A support of 0.03 means that the rules must have a support of at least 3%, indicating that the antecedent and consequent of the rule together must occur in at least 3% of the transactions. Choosing a relatively lower support threshold helps include a broader range of associations.

Confidence: The confidence parameter sets the minimum confidence threshold for an association rule. A confidence of 0.7 means that the rule must have at least 70% confidence, indicating that given the antecedent, there’s a 70% probability that the consequent will also occur. Choosing a higher confidence threshold filters out weaker or less reliable associations, ensuring that the discovered rules are more robust. After generating association rules in the previous step, the inspect function is used to display and examine the discovered rules for the United Kingdom dataset.

inspect(rules_UK)
## Warning in asMethod(object): removing duplicated items in transactions
##     lhs                                   rhs                                  support confidence   coverage     lift count
## [1] {ROSES REGENCY TEACUP AND SAUCER } => {GREEN REGENCY TEACUP AND SAUCER}  0.0327982  0.7108674 0.04613828 15.93149   713
## [2] {GREEN REGENCY TEACUP AND SAUCER}  => {ROSES REGENCY TEACUP AND SAUCER } 0.0327982  0.7350515 0.04462027 15.93149   713

The displayed information includes details about each rule, such as:.
- Antecedent and Consequent: The items involved in the association rule.
- Support: The proportion of transactions in the dataset that contain both the antecedent and the consequent.
- Confidence: The probability that the rule holds true.
- Lift: The ratio of the observed support to what would be expected if the items were independent. A lift greater than 1 indicates that the items are positively correlated.

Germany, Poland, France

These analyses follow a similar structure to the UK analysis but have different parameter values for support and confidence based on the characteristics of the specific country’s dataset.
Germany

rules_DE <- apriori(trans_DE, parameter = list(support = 0.04, confidence = 0.8, target = "rules"))
## Warning in asMethod(object): removing duplicated items in transactions
##     lhs                              rhs                        support confidence   coverage    lift count
## [1] {RED RETROSPOT CHARLOTTE BAG} => {WOODLAND CHARLOTTE BAG} 0.0467128    0.84375 0.05536332 8.26589    27

Poland

rules_PL <- apriori(trans_PL, parameter = list(support = 0.12, confidence = 1, target = "rules"))

# Filter rules based on lift
high_lift_rules_PL <- subset(rules_PL, lift > 5)

# Display the rules
inspect(high_lift_rules_PL)
## Warning in apriori(trans_PL, parameter = list(support = 0.05, target =
## "frequent itemsets")): Mining stopped (maxlen reached). Only patterns up to a
## length of 10 returned!
##      lhs                                       rhs                                   support confidence coverage lift count
## [1]  {FRENCH PAISLEY CUSHION COVER}         => {FRENCH PAISLEY CUSHION COVER }         0.125          1    0.125    6     3
## [2]  {HEART OF WICKER SMALL}                => {FRENCH PAISLEY CUSHION COVER }         0.125          1    0.125    6     3
## [3]  {CERAMIC BOWL WITH STRAWBERRY DESIGN}  => {LARGE HEART MEASURING SPOONS}          0.125          1    0.125    8     3
## [4]  {LARGE HEART MEASURING SPOONS}         => {CERAMIC BOWL WITH STRAWBERRY DESIGN}   0.125          1    0.125    8     3
## [5]  {CERAMIC BOWL WITH STRAWBERRY DESIGN}  => {CERAMIC STRAWBERRY CAKE MONEY BANK}    0.125          1    0.125    8     3
## [6]  {CERAMIC STRAWBERRY CAKE MONEY BANK}   => {CERAMIC BOWL WITH STRAWBERRY DESIGN}   0.125          1    0.125    8     3
## [7]  {LARGE HEART MEASURING SPOONS}         => {CERAMIC STRAWBERRY CAKE MONEY BANK}    0.125          1    0.125    8     3
## [8]  {CERAMIC STRAWBERRY CAKE MONEY BANK}   => {LARGE HEART MEASURING SPOONS}          0.125          1    0.125    8     3
## [9]  {CERAMIC CAKE STAND + HANGING CAKES}   => {DOORMAT HOME SWEET HOME BLUE }         0.125          1    0.125    6     3
## [10] {CERAMIC CAKE DESIGN SPOTTED MUG,                                                                                     
##       LARGE CAKE STAND  HANGING STRAWBERY}  => {CERAMIC STRAWBERRY DESIGN MUG}         0.125          1    0.125    8     3
## [11] {CERAMIC CAKE BOWL + HANGING CAKES,                                                                                   
##       LARGE CAKE STAND  HANGING STRAWBERY}  => {CERAMIC STRAWBERRY DESIGN MUG}         0.125          1    0.125    8     3
## [12] {CERAMIC CAKE BOWL + HANGING CAKES,                                                                                   
##       RECIPE BOX PANTRY YELLOW DESIGN}      => {BLACK KITCHEN SCALES}                  0.125          1    0.125    8     3
## [13] {CERAMIC BOWL WITH STRAWBERRY DESIGN,                                                                                 
##       LARGE HEART MEASURING SPOONS}         => {CERAMIC STRAWBERRY CAKE MONEY BANK}    0.125          1    0.125    8     3
## [14] {CERAMIC BOWL WITH STRAWBERRY DESIGN,                                                                                 
##       CERAMIC STRAWBERRY CAKE MONEY BANK}   => {LARGE HEART MEASURING SPOONS}          0.125          1    0.125    8     3
## [15] {CERAMIC STRAWBERRY CAKE MONEY BANK,                                                                                  
##       LARGE HEART MEASURING SPOONS}         => {CERAMIC BOWL WITH STRAWBERRY DESIGN}   0.125          1    0.125    8     3
## [16] {CERAMIC BOWL WITH STRAWBERRY DESIGN,                                                                                 
##       CERAMIC CAKE BOWL + HANGING CAKES}    => {LARGE HEART MEASURING SPOONS}          0.125          1    0.125    8     3
## [17] {CERAMIC CAKE BOWL + HANGING CAKES,                                                                                   
##       LARGE HEART MEASURING SPOONS}         => {CERAMIC BOWL WITH STRAWBERRY DESIGN}   0.125          1    0.125    8     3
## [18] {CERAMIC BOWL WITH STRAWBERRY DESIGN,                                                                                 
##       CERAMIC CAKE BOWL + HANGING CAKES}    => {CERAMIC STRAWBERRY CAKE MONEY BANK}    0.125          1    0.125    8     3
## [19] {CERAMIC CAKE BOWL + HANGING CAKES,                                                                                   
##       CERAMIC STRAWBERRY CAKE MONEY BANK}   => {CERAMIC BOWL WITH STRAWBERRY DESIGN}   0.125          1    0.125    8     3
## [20] {CERAMIC CAKE BOWL + HANGING CAKES,                                                                                   
##       LARGE HEART MEASURING SPOONS}         => {CERAMIC STRAWBERRY CAKE MONEY BANK}    0.125          1    0.125    8     3
## [21] {CERAMIC CAKE BOWL + HANGING CAKES,                                                                                   
##       CERAMIC STRAWBERRY CAKE MONEY BANK}   => {LARGE HEART MEASURING SPOONS}          0.125          1    0.125    8     3
## [22] {CERAMIC CAKE STAND + HANGING CAKES,                                                                                  
##       LARGE CAKE STAND  HANGING STRAWBERY}  => {DOORMAT HOME SWEET HOME BLUE }         0.125          1    0.125    6     3
## [23] {DOORMAT HOME SWEET HOME BLUE ,                                                                                       
##       LARGE CAKE STAND  HANGING STRAWBERY}  => {CERAMIC CAKE STAND + HANGING CAKES}    0.125          1    0.125    8     3
## [24] {CERAMIC CAKE STAND + HANGING CAKES,                                                                                  
##       PANTRY WASHING UP BRUSH}              => {DOORMAT HOME SWEET HOME BLUE }         0.125          1    0.125    6     3
## [25] {DOORMAT HOME SWEET HOME BLUE ,                                                                                       
##       PANTRY WASHING UP BRUSH}              => {CERAMIC CAKE STAND + HANGING CAKES}    0.125          1    0.125    8     3
## [26] {LARGE CAKE STAND  HANGING STRAWBERY,                                                                                 
##       PANTRY WASHING UP BRUSH}              => {CERAMIC CAKE STAND + HANGING CAKES}    0.125          1    0.125    8     3
## [27] {LARGE CAKE STAND  HANGING STRAWBERY,                                                                                 
##       PANTRY WASHING UP BRUSH}              => {DOORMAT HOME SWEET HOME BLUE }         0.125          1    0.125    6     3
## [28] {CERAMIC CAKE BOWL + HANGING CAKES,                                                                                   
##       CERAMIC CAKE DESIGN SPOTTED MUG,                                                                                     
##       LARGE CAKE STAND  HANGING STRAWBERY}  => {CERAMIC STRAWBERRY DESIGN MUG}         0.125          1    0.125    8     3
## [29] {CERAMIC BOWL WITH STRAWBERRY DESIGN,                                                                                 
##       CERAMIC CAKE BOWL + HANGING CAKES,                                                                                   
##       LARGE HEART MEASURING SPOONS}         => {CERAMIC STRAWBERRY CAKE MONEY BANK}    0.125          1    0.125    8     3
## [30] {CERAMIC BOWL WITH STRAWBERRY DESIGN,                                                                                 
##       CERAMIC CAKE BOWL + HANGING CAKES,                                                                                   
##       CERAMIC STRAWBERRY CAKE MONEY BANK}   => {LARGE HEART MEASURING SPOONS}          0.125          1    0.125    8     3
## [31] {CERAMIC CAKE BOWL + HANGING CAKES,                                                                                   
##       CERAMIC STRAWBERRY CAKE MONEY BANK,                                                                                  
##       LARGE HEART MEASURING SPOONS}         => {CERAMIC BOWL WITH STRAWBERRY DESIGN}   0.125          1    0.125    8     3
## [32] {CERAMIC CAKE STAND + HANGING CAKES,                                                                                  
##       LARGE CAKE STAND  HANGING STRAWBERY,                                                                                 
##       PANTRY WASHING UP BRUSH}              => {DOORMAT HOME SWEET HOME BLUE }         0.125          1    0.125    6     3
## [33] {DOORMAT HOME SWEET HOME BLUE ,                                                                                       
##       LARGE CAKE STAND  HANGING STRAWBERY,                                                                                 
##       PANTRY WASHING UP BRUSH}              => {CERAMIC CAKE STAND + HANGING CAKES}    0.125          1    0.125    8     3

due to there are too many of rule, so I handle by include only those with a lift greater than 5.

France

rules_FR <- apriori(trans_FR, parameter = list(support = 0.08, confidence = 0.9, target = "rules"))
## Warning in asMethod(object): removing duplicated items in transactions
##     lhs                                       rhs                                support confidence   coverage     lift count
## [1] {SET/6 RED SPOTTY PAPER PLATES}        => {SET/6 RED SPOTTY PAPER CUPS}   0.10859729      0.960 0.11312217 7.857778    48
## [2] {SET/20 RED RETROSPOT PAPER NAPKINS ,                                                                                    
##      SET/6 RED SPOTTY PAPER PLATES}        => {SET/6 RED SPOTTY PAPER CUPS}   0.08823529      0.975 0.09049774 7.980556    39
## [3] {SET/20 RED RETROSPOT PAPER NAPKINS ,                                                                                    
##      SET/6 RED SPOTTY PAPER CUPS}          => {SET/6 RED SPOTTY PAPER PLATES} 0.08823529      0.975 0.09049774 8.619000    39

Conclusion

Unites Kingdom

The rule suggests that customers who purchase “ROSES REGENCY TEACUP AND SAUCER” are highly likely (71.09%) to also purchase “GREEN REGENCY TEACUP AND SAUCER.” The support of 3.28% indicates that this association occurs in approximately 3.28% of transactions. The lift value of 15.93 indicates a strong positive correlation between these items. Lift greater than 1 suggests that the items are positively correlated, and in this case, it’s almost 16 times more likely that both items are purchased together compared to what would be expected if they were independent.
Similarly, customers who purchase “GREEN REGENCY TEACUP AND SAUCER” are highly likely (73.51%) to also purchase “ROSES REGENCY TEACUP AND SAUCER.”

Germany

The rule indicates that customers in Germany who purchase the “RED RETROSPOT CHARLOTTE BAG” are highly likely (84.38%) to also purchase the “WOODLAND CHARLOTTE BAG.” The support of 4.67% suggests that this association occurs in approximately 4.67% of transactions. The lift value of **8.2*7** indicates a strong positive correlation between these items. A lift greater than 1 suggests that the items are positively correlated, and in this case, it’s over 8 times more likely that both items are purchased together compared to what would be expected if they were independent.

Poland:

as poland generated many rules so I will bring up just the first 3 rules
Rule 1: The association rule suggests that customers in Poland who purchase the “FRENCH PAISLEY CUSHION COVER” are highly likely (100%) to also purchase the same item. The support of 12.5% indicates that this association occurs in approximately 12.5% of transactions. The lift value of 6 suggests a strong positive correlation.

Rule 2: Another significant association is found for customers who purchase the “HEART OF WICKER SMALL,” which is highly likely (100%) to be associated with the “FRENCH PAISLEY CUSHION COVER.” The support is 12.5%, and the lift is 6.

Rule 3: Customers in Poland who purchase the “CERAMIC BOWL WITH STRAWBERRY DESIGN” are highly likely (100%) to also purchase the “LARGE HEART MEASURING SPOONS.” The support is 12.5%, and the lift is 8.

France:

Rule 1:
The association rule for France suggests that customers who purchase the “SET/6 RED SPOTTY PAPER PLATES” are highly likely (96%) to also purchase the “SET/6 RED SPOTTY PAPER CUPS.” The support of 10.86% indicates that this association occurs in approximately 10.86% of transactions. The lift value of 7.86 suggests a strong positive correlation between these items. A lift greater than 1 indicates that the items are positively correlated, and in this case, it’s almost 8 times more likely that both items are purchased together compared to what would be expected if they were independent.

Rule 2:
Customers in France who purchase the “SET/20 RED RETROSPOT PAPER NAPKINS” and “SET/6 RED SPOTTY PAPER PLATES” are highly likely (97.5%) to also purchase the “SET/6 RED SPOTTY PAPER CUPS.” The support is 8.82%, and the lift is 7.98.

Rule 3:
Similarly, customers in France who purchase the “SET/20 RED RETROSPOT PAPER NAPKINS” and “SET/6 RED SPOTTY PAPER CUPS” are highly likely (97.5%) to also purchase the “SET/6 RED SPOTTY PAPER PLATES.” The support is 8.82%, and the lift is 8.62.