ABSTRACT

This study uses complex data mining and analytics tools to examine the complex dynamics of product transactions over a 52-week period. Finding significant relationships between products is the main goal, providing a detailed picture of commonly co-purchased goods in a particular week. Utilizing PAM clustering for pattern recognition and the Apriori algorithm for association rule mining, the analysis offers a thorough understanding of customer behavior and product interactions.

Strong association rules are demonstrated by the outcomes, which also highlight important connections and transaction patterns. The study visually illustrates the importance and strength of these laws using visualization techniques like item frequency charts and matrix plots. Furthermore, PAM clustering helps with strategic decision-making by enhancing the analysis and exposing unique sales patterns.

Significant findings provide companies with useful information that can be used to improve consumer engagement, inventory control, and marketing tactics. Businesses can improve their operations and overall market responsiveness by identifying weekly patterns and aligning with established product associations.

To sum up, this study adds to the expanding body of knowledge in retail analytics by offering a data-driven method for figuring out what customers want and how they behave. This study is a useful tool for companies navigating the difficulties of the contemporary retail environment in order to make well-informed decisions and achieve long-term market success.

Photo

INFORMATION ABOUT THE DATA

The key elements of 52 weeks’ worth of sales transactions involving 819 different products is captured in this dataset. The records demonstrate how product purchases have changed over time, exhibiting patterns and trends in consumer behavior over a 12-month period. This dataset offers a rich environment for revealing obscure insights into customer preferences and the dynamic nature of product sales in a business setting because it focuses on both raw and normalized sales data.¹

Creators

James Tan

License

This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.

DOI

10.24432/C5XS4Q

Details about citations

0 citations 14402 views

TABLE OF CONTENT

ABSTRACT
INFORMATION ABOUT THE DATA
TABLE OF CONTENT
INTRODUCTION
RESEARCH QUESTION
DATA PREPARATION
STATISTICAL ANALYSIS
CLUSTERING
ASSOCIATION RULE
FINDINGS
CONCLUSION
REFERENCES

INTRODUCTION

Understanding customer behavior and identifying trends in purchase decisions are essential for organizations looking to stay ahead in the dynamic and competitive market landscape of today. Large datasets are readily available, offering a rare chance to explore the complexities of customer preferences and uncover insightful information that might inform strategic choices.

This dataset, an extensive compilation of transactional records, is a veritable gold mine of data that captures the subtleties of consumer interactions with a wide range of items. These records, which are a snapshot in time, provide insight into how the market is changing. Analysis of these transactions is essential as companies manage the difficulties of satisfying customer requests, adjusting to trends, and improving product offers.

The goal is apparent when I delve more into this dataset: to learn more about the products that customers purchase and how they are connected in the intricate network of choices they have. The analysis’s conclusions may serve as a basis for product bundling, inventory control, and marketing strategy, all of which could lead to a more knowledgeable and responsive company environment.

RESEARCH QUESTION

“Which products are frequently purchased together in a given week?”

The purpose of this study is to identify product co-purchasing tendencies within the dataset. We may learn a great deal about consumer preferences by identifying the goods that are typically purchased in tandem over a given week. This information can then be used to guide strategic decisions about product placement, marketing, and inventory control.

Significance

Improving the Client Experience

By providing complementary items together, it is possible to improve the entire consumer experience by identifying products that are regularly co-purchased.

Optimizing Marketing Techniques:

Finding connections between products offers a chance to improve marketing tactics by recommending products to specific clients.

Managing Inventory:

Understanding co-purchasing trends can help with effective inventory control by guaranteeing that linked products are suitably stocked.

Methodology

I will continue to use association rule mining, with the Apriori method in particularly. Utilizing data from transactions, this algorithm works well for identifying frequent item groupings and creating association rules.

Expected Outcome:

A set of association rules highlighting products that are commonly bought together is the anticipated result. An antecedent (items already in the basket) and a consequent (item suggested to be added to the basket) will make up each rule.

Implications:

1 . Business Strategy:

The outcomes can help with cross-selling, product grouping, and advertising campaign strategy considerations.

Customer Engagement:

Businesses can improve client involvement through targeted promotions and discounts by analyzing co-purchasing behaviors.

Summary

By demonstrating hidden relationships between products, this study aims to illuminate the subtleties of consumer behavior. The results could lead to significant gains in operational effectiveness, business profitability, and customer pleasure.

DATA PREPARATION

Changing the language to English

Sys.setlocale("LC_ALL","English")

## Warning in Sys.setlocale("LC_ALL", "English"): using locale code page other
## than 65001 ("UTF-8") may cause problems

## [1] "LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252"

Sys.setenv(LANGUAGE='en')

Installing the Packages

# Set the CRAN mirror
options(repos = c(CRAN = "https://cloud.r-project.org"))

install.packages("readr")
install.packages("stats")
install.packages("factoextra")
install.packages("flexclust")
install.packages("fpc")
install.packages("clustertend")
install.packages("cluster")
install.packages("ClusterR")
install.packages("dplyr")
install.packages("ggplot2")
install.packages("hopkins")
install.packages("NbClust")
install.packages("tidyverse")
install.packages("dendextend")
install.packages("Rtsne")
install.packages("gridExtra")
install.packages("caret")
install.packages("pheatmap")
install.packages("FactoMineR")
install.packages("vioplot")
install.packages("stats")
install.packages("arules")
install.packages("arulesViz")
install.packages("plot3D")
install.packages("dbscan")

Activating the packages with Library function

library(readr)
library(stats)
library(factoextra)

## Loading required package: ggplot2

## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa

library(flexclust)

## Loading required package: grid

## Loading required package: lattice

## Loading required package: modeltools

## Loading required package: stats4

library(grid)
library(lattice)
library(modeltools)
library(stats4)
library(hopkins)
library(fpc)
library(clustertend)

## Package `clustertend` is deprecated.  Use package `hopkins` instead.

## 
## Attaching package: 'clustertend'

## The following object is masked from 'package:hopkins':
## 
##     hopkins

library(cluster)
library(ClusterR)
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(ggplot2)
library(lubridate)

## 
## Attaching package: 'lubridate'

## The following objects are masked from 'package:base':
## 
##     date, intersect, setdiff, union

library(NbClust)
library(tidyverse)

## -- Attaching core tidyverse packages ------------------------ tidyverse 2.0.0 --
## v forcats 1.0.0     v tibble  3.2.1
## v purrr   1.0.2     v tidyr   1.3.0
## v stringr 1.5.0

## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()
## i Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(dendextend)

## 
## ---------------------
## Welcome to dendextend version 1.17.1
## Type citation('dendextend') for how to cite the package.
## 
## Type browseVignettes(package = 'dendextend') for the package vignette.
## The github page is: https://github.com/talgalili/dendextend/
## 
## Suggestions and bug-reports can be submitted at: https://github.com/talgalili/dendextend/issues
## You may ask questions at stackoverflow, use the r and dendextend tags: 
##   https://stackoverflow.com/questions/tagged/dendextend
## 
##  To suppress this message use:  suppressPackageStartupMessages(library(dendextend))
## ---------------------
## 
## 
## Attaching package: 'dendextend'
## 
## The following object is masked from 'package:stats':
## 
##     cutree

library(Rtsne)
library(gridExtra)

## 
## Attaching package: 'gridExtra'
## 
## The following object is masked from 'package:dplyr':
## 
##     combine

library(caret)

## 
## Attaching package: 'caret'
## 
## The following object is masked from 'package:purrr':
## 
##     lift

library(pheatmap)
library(FactoMineR)
library(vioplot)

## Loading required package: sm
## Package 'sm', version 2.2-5.7: type help(sm) for summary information
## Loading required package: zoo
## 
## Attaching package: 'zoo'
## 
## The following objects are masked from 'package:base':
## 
##     as.Date, as.Date.numeric

library(stats)
library(arules)

## Loading required package: Matrix
## 
## Attaching package: 'Matrix'
## 
## The following objects are masked from 'package:tidyr':
## 
##     expand, pack, unpack
## 
## 
## Attaching package: 'arules'
## 
## The following object is masked from 'package:dplyr':
## 
##     recode
## 
## The following object is masked from 'package:flexclust':
## 
##     info
## 
## The following object is masked from 'package:modeltools':
## 
##     info
## 
## The following objects are masked from 'package:base':
## 
##     abbreviate, write

library(arulesViz)
library("plot3D")
library(dbscan)

## 
## Attaching package: 'dbscan'
## 
## The following object is masked from 'package:fpc':
## 
##     dbscan
## 
## The following object is masked from 'package:stats':
## 
##     as.dendrogram

Importing the Dataset

dataset <- read.csv("C:/Users/User/Desktop/UL Research 3 -  Association Rule/Sales_Transactions_Dataset_Weekly.csv")

A brief overview of the dataset

head(dataset)

##   Product_Code W0 W1 W2 W3 W4 W5 W6 W7 W8 W9 W10 W11 W12 W13 W14 W15 W16 W17
## 1           P1 11 12 10  8 13 12 14 21  6 14  11  14  16   9   9   9  14   9
## 2           P2  7  6  3  2  7  1  6  3  3  3   2   2   6   2   0   6   2   7
## 3           P3  7 11  8  9 10  8  7 13 12  6  14   9   4   7  12   8   7  11
## 4           P4 12  8 13  5  9  6  9 13 13 11   8   4   5   4  15   7  11   9
## 5           P5  8  5 13 11  6  7  9 14  9  9  11  18   8   4  13   8  10  15
## 6           P6  3  3  2  7  6  3  8  6  6  3   1   1   5   4   3   5   3   5
##   W18 W19 W20 W21 W22 W23 W24 W25 W26 W27 W28 W29 W30 W31 W32 W33 W34 W35 W36
## 1   3  12   5  11   7  12   5   9   7  10   5  11   7  10  12   6   5  14  10
## 2   7   9   4   7   2   4   5   3   5   8   5   5   3   1   3   2   3  10   5
## 3  10   7   7  13  11   8  10   8  14   5   3  13  11   9   7   8   7   9   6
## 4  15   4   6   7  11   7   9   6  10  10   2   6   7   2   5  12   5  19   8
## 5   6  13  11   6  10   9   8  12   8   9  13   3   5   3   5   5   9   7   4
## 6  10   8   4   9   7   5   4   2   1   3   2   4   0   3   2  11   2   1   4
##   W37 W38 W39 W40 W41 W42 W43 W44 W45 W46 W47 W48 W49 W50 W51 MIN MAX
## 1   9  12  17   7  11   4   7   8  10  12   3   7   6   5  10   3  21
## 2   2   7   3   2   5   2   4   5   1   1   4   5   1   6   0   0  10
## 3  12  12   9   3   5   6  14   5   5   7   8  14   8   8   7   3  14
## 4   6   8   8  12   6   9  10   3   4   6   8  14   8   7   8   2  19
## 5   8   8   5   5   8   7  11   7  12   6   6   5  11   8   9   3  18
## 6   4   3   2   5   4   4   2   4   3   6   5   3   3  10   6   0  11
##   Normalized.0 Normalized.1 Normalized.2 Normalized.3 Normalized.4 Normalized.5
## 1         0.44         0.50         0.39         0.28         0.56         0.50
## 2         0.70         0.60         0.30         0.20         0.70         0.10
## 3         0.36         0.73         0.45         0.55         0.64         0.45
## 4         0.59         0.35         0.65         0.18         0.41         0.24
## 5         0.33         0.13         0.67         0.53         0.20         0.27
## 6         0.27         0.27         0.18         0.64         0.55         0.27
##   Normalized.6 Normalized.7 Normalized.8 Normalized.9 Normalized.10
## 1         0.61         1.00         0.17         0.61          0.44
## 2         0.60         0.30         0.30         0.30          0.20
## 3         0.36         0.91         0.82         0.27          1.00
## 4         0.41         0.65         0.65         0.53          0.35
## 5         0.40         0.73         0.40         0.40          0.53
## 6         0.73         0.55         0.55         0.27          0.09
##   Normalized.11 Normalized.12 Normalized.13 Normalized.14 Normalized.15
## 1          0.61          0.72          0.33          0.33          0.33
## 2          0.20          0.60          0.20          0.00          0.60
## 3          0.55          0.09          0.36          0.82          0.45
## 4          0.12          0.18          0.12          0.76          0.29
## 5          1.00          0.33          0.07          0.67          0.33
## 6          0.09          0.45          0.36          0.27          0.45
##   Normalized.16 Normalized.17 Normalized.18 Normalized.19 Normalized.20
## 1          0.61          0.33          0.00          0.50          0.11
## 2          0.20          0.70          0.70          0.90          0.40
## 3          0.36          0.73          0.64          0.36          0.36
## 4          0.53          0.41          0.76          0.12          0.24
## 5          0.47          0.80          0.20          0.67          0.53
## 6          0.27          0.45          0.91          0.73          0.36
##   Normalized.21 Normalized.22 Normalized.23 Normalized.24 Normalized.25
## 1          0.44          0.22          0.50          0.11          0.33
## 2          0.70          0.20          0.40          0.50          0.30
## 3          0.91          0.73          0.45          0.64          0.45
## 4          0.29          0.53          0.29          0.41          0.24
## 5          0.20          0.47          0.40          0.33          0.60
## 6          0.82          0.64          0.45          0.36          0.18
##   Normalized.26 Normalized.27 Normalized.28 Normalized.29 Normalized.30
## 1          0.22          0.39          0.11          0.44          0.22
## 2          0.50          0.80          0.50          0.50          0.30
## 3          1.00          0.18          0.00          0.91          0.73
## 4          0.47          0.47          0.00          0.24          0.29
## 5          0.33          0.40          0.67          0.00          0.13
## 6          0.09          0.27          0.18          0.36          0.00
##   Normalized.31 Normalized.32 Normalized.33 Normalized.34 Normalized.35
## 1          0.39          0.50          0.17          0.11          0.61
## 2          0.10          0.30          0.20          0.30          1.00
## 3          0.55          0.36          0.45          0.36          0.55
## 4          0.00          0.18          0.59          0.18          1.00
## 5          0.00          0.13          0.13          0.40          0.27
## 6          0.27          0.18          1.00          0.18          0.09
##   Normalized.36 Normalized.37 Normalized.38 Normalized.39 Normalized.40
## 1          0.39          0.33          0.50          0.78          0.22
## 2          0.50          0.20          0.70          0.30          0.20
## 3          0.27          0.82          0.82          0.55          0.00
## 4          0.35          0.24          0.35          0.35          0.59
## 5          0.07          0.33          0.33          0.13          0.13
## 6          0.36          0.36          0.27          0.18          0.45
##   Normalized.41 Normalized.42 Normalized.43 Normalized.44 Normalized.45
## 1          0.44          0.06          0.22          0.28          0.39
## 2          0.50          0.20          0.40          0.50          0.10
## 3          0.18          0.27          1.00          0.18          0.18
## 4          0.24          0.41          0.47          0.06          0.12
## 5          0.33          0.27          0.53          0.27          0.60
## 6          0.36          0.36          0.18          0.36          0.27
##   Normalized.46 Normalized.47 Normalized.48 Normalized.49 Normalized.50
## 1          0.50          0.00          0.22          0.17          0.11
## 2          0.10          0.40          0.50          0.10          0.60
## 3          0.36          0.45          1.00          0.45          0.45
## 4          0.24          0.35          0.71          0.35          0.29
## 5          0.20          0.20          0.13          0.53          0.33
## 6          0.55          0.45          0.27          0.27          0.91
##   Normalized.51
## 1          0.39
## 2          0.00
## 3          0.36
## 4          0.35
## 5          0.40
## 6          0.55

The number of rows and columns

# The number of rows 
nrow(dataset)

## [1] 811

# The number of columns
ncol(dataset)

## [1] 107

This dataset contains 811 rows and 107 columns.

Variable Information

The columns are divided 4 parts.

The first columns is Product Code (p1 to p819): This column likely contains unique codes or identifiers for each product.
The columns from 2 to 53 are Weekly Purchase Quantities (w0 to w51): These columns represent the weekly quantities of products sold over the 52 weeks.
The column 54 and The column 55 are Min and Max Columns: These columns show the minimum and maximum number of products sold for each product.

4 The columns from 55 to 107 are Normalized Weekly Quantities (Normalized 0 to Normalized 51): These columns contain the normalized versions of the weekly quantities. Normalization is a process of scaling data to a standard range, often between 0 and 1.

Daaset Division

A strategic division has been defined in order to perform an extensive study on the sales transactions dataset. This section aims to tackle specific aspects of the information and enable a more targeted investigation of patterns and relationships.

The division of the Data set

# primary_data: Weeks, Product Codes, Min, Max
primary_data <- dataset[, c(1, 2:53, 54:55)]

# normalized_data: Product Codes and Normalized Data
normalized_data <- dataset[, c(1, 56:107)]

# Display the dimensions of the subsets
cat("First Part Dimensions:", dim(primary_data), "\n")

## First Part Dimensions: 811 55

cat("Second Part Dimensions:", dim(normalized_data), "\n")

## Second Part Dimensions: 811 53

The view of primary data

head(primary_data)

##   Product_Code W0 W1 W2 W3 W4 W5 W6 W7 W8 W9 W10 W11 W12 W13 W14 W15 W16 W17
## 1           P1 11 12 10  8 13 12 14 21  6 14  11  14  16   9   9   9  14   9
## 2           P2  7  6  3  2  7  1  6  3  3  3   2   2   6   2   0   6   2   7
## 3           P3  7 11  8  9 10  8  7 13 12  6  14   9   4   7  12   8   7  11
## 4           P4 12  8 13  5  9  6  9 13 13 11   8   4   5   4  15   7  11   9
## 5           P5  8  5 13 11  6  7  9 14  9  9  11  18   8   4  13   8  10  15
## 6           P6  3  3  2  7  6  3  8  6  6  3   1   1   5   4   3   5   3   5
##   W18 W19 W20 W21 W22 W23 W24 W25 W26 W27 W28 W29 W30 W31 W32 W33 W34 W35 W36
## 1   3  12   5  11   7  12   5   9   7  10   5  11   7  10  12   6   5  14  10
## 2   7   9   4   7   2   4   5   3   5   8   5   5   3   1   3   2   3  10   5
## 3  10   7   7  13  11   8  10   8  14   5   3  13  11   9   7   8   7   9   6
## 4  15   4   6   7  11   7   9   6  10  10   2   6   7   2   5  12   5  19   8
## 5   6  13  11   6  10   9   8  12   8   9  13   3   5   3   5   5   9   7   4
## 6  10   8   4   9   7   5   4   2   1   3   2   4   0   3   2  11   2   1   4
##   W37 W38 W39 W40 W41 W42 W43 W44 W45 W46 W47 W48 W49 W50 W51 MIN MAX
## 1   9  12  17   7  11   4   7   8  10  12   3   7   6   5  10   3  21
## 2   2   7   3   2   5   2   4   5   1   1   4   5   1   6   0   0  10
## 3  12  12   9   3   5   6  14   5   5   7   8  14   8   8   7   3  14
## 4   6   8   8  12   6   9  10   3   4   6   8  14   8   7   8   2  19
## 5   8   8   5   5   8   7  11   7  12   6   6   5  11   8   9   3  18
## 6   4   3   2   5   4   4   2   4   3   6   5   3   3  10   6   0  11

1 . The primary data subset consists of: Product Codes: P1 through P819 are unique product IDs. Quantities Purchased Each Week (w0–w51): columns that show how many products were sold each week for a total of 52 weeks. The least and maximum quantity of products sold for each individual product are shown in the Min and Max Columns.

The purpose of this subgroup is to investigate sales patterns over time, spot trends, and comprehend how product sales fluctuate from week to week.

The view of normalized_data

head(normalized_data)

##   Product_Code Normalized.0 Normalized.1 Normalized.2 Normalized.3 Normalized.4
## 1           P1         0.44         0.50         0.39         0.28         0.56
## 2           P2         0.70         0.60         0.30         0.20         0.70
## 3           P3         0.36         0.73         0.45         0.55         0.64
## 4           P4         0.59         0.35         0.65         0.18         0.41
## 5           P5         0.33         0.13         0.67         0.53         0.20
## 6           P6         0.27         0.27         0.18         0.64         0.55
##   Normalized.5 Normalized.6 Normalized.7 Normalized.8 Normalized.9
## 1         0.50         0.61         1.00         0.17         0.61
## 2         0.10         0.60         0.30         0.30         0.30
## 3         0.45         0.36         0.91         0.82         0.27
## 4         0.24         0.41         0.65         0.65         0.53
## 5         0.27         0.40         0.73         0.40         0.40
## 6         0.27         0.73         0.55         0.55         0.27
##   Normalized.10 Normalized.11 Normalized.12 Normalized.13 Normalized.14
## 1          0.44          0.61          0.72          0.33          0.33
## 2          0.20          0.20          0.60          0.20          0.00
## 3          1.00          0.55          0.09          0.36          0.82
## 4          0.35          0.12          0.18          0.12          0.76
## 5          0.53          1.00          0.33          0.07          0.67
## 6          0.09          0.09          0.45          0.36          0.27
##   Normalized.15 Normalized.16 Normalized.17 Normalized.18 Normalized.19
## 1          0.33          0.61          0.33          0.00          0.50
## 2          0.60          0.20          0.70          0.70          0.90
## 3          0.45          0.36          0.73          0.64          0.36
## 4          0.29          0.53          0.41          0.76          0.12
## 5          0.33          0.47          0.80          0.20          0.67
## 6          0.45          0.27          0.45          0.91          0.73
##   Normalized.20 Normalized.21 Normalized.22 Normalized.23 Normalized.24
## 1          0.11          0.44          0.22          0.50          0.11
## 2          0.40          0.70          0.20          0.40          0.50
## 3          0.36          0.91          0.73          0.45          0.64
## 4          0.24          0.29          0.53          0.29          0.41
## 5          0.53          0.20          0.47          0.40          0.33
## 6          0.36          0.82          0.64          0.45          0.36
##   Normalized.25 Normalized.26 Normalized.27 Normalized.28 Normalized.29
## 1          0.33          0.22          0.39          0.11          0.44
## 2          0.30          0.50          0.80          0.50          0.50
## 3          0.45          1.00          0.18          0.00          0.91
## 4          0.24          0.47          0.47          0.00          0.24
## 5          0.60          0.33          0.40          0.67          0.00
## 6          0.18          0.09          0.27          0.18          0.36
##   Normalized.30 Normalized.31 Normalized.32 Normalized.33 Normalized.34
## 1          0.22          0.39          0.50          0.17          0.11
## 2          0.30          0.10          0.30          0.20          0.30
## 3          0.73          0.55          0.36          0.45          0.36
## 4          0.29          0.00          0.18          0.59          0.18
## 5          0.13          0.00          0.13          0.13          0.40
## 6          0.00          0.27          0.18          1.00          0.18
##   Normalized.35 Normalized.36 Normalized.37 Normalized.38 Normalized.39
## 1          0.61          0.39          0.33          0.50          0.78
## 2          1.00          0.50          0.20          0.70          0.30
## 3          0.55          0.27          0.82          0.82          0.55
## 4          1.00          0.35          0.24          0.35          0.35
## 5          0.27          0.07          0.33          0.33          0.13
## 6          0.09          0.36          0.36          0.27          0.18
##   Normalized.40 Normalized.41 Normalized.42 Normalized.43 Normalized.44
## 1          0.22          0.44          0.06          0.22          0.28
## 2          0.20          0.50          0.20          0.40          0.50
## 3          0.00          0.18          0.27          1.00          0.18
## 4          0.59          0.24          0.41          0.47          0.06
## 5          0.13          0.33          0.27          0.53          0.27
## 6          0.45          0.36          0.36          0.18          0.36
##   Normalized.45 Normalized.46 Normalized.47 Normalized.48 Normalized.49
## 1          0.39          0.50          0.00          0.22          0.17
## 2          0.10          0.10          0.40          0.50          0.10
## 3          0.18          0.36          0.45          1.00          0.45
## 4          0.12          0.24          0.35          0.71          0.35
## 5          0.60          0.20          0.20          0.13          0.53
## 6          0.27          0.55          0.45          0.27          0.27
##   Normalized.50 Normalized.51
## 1          0.11          0.39
## 2          0.60          0.00
## 3          0.45          0.36
## 4          0.29          0.35
## 5          0.33          0.40
## 6          0.91          0.55

The normalized data subset includes of the following: Product Codes: P1 through P819 are unique product IDs. Normalized Weekly Quantities (Normalized 0 to Normalized 51): These are columns that include the weekly numbers normalized for each product.

This subset helps identify patterns that are not impacted by different scales and is especially useful for examining correlations across items using normalized sales data. It also offers insights into relative performance.

The separation of the data into these two subgroups makes the analysis simpler and more efficient. Subset 2 makes it easier to explore links and patterns using normalized data, whereas Subset 1 helps the analysis of sales trends and fluctuations over time. By improving the accuracy and applicability of the analysis, this strategic division eventually leads to a more thorough comprehension of the fundamental dynamics of the sales transactions dataset.

STATISTICAL ANALYSIS

The overall statistical result of the primary data and normalized data

#Summary of primary data
summary(primary_data)

##  Product_Code             W0               W1               W2       
##  Length:811         Min.   : 0.000   Min.   : 0.000   Min.   : 0.00  
##  Class :character   1st Qu.: 0.000   1st Qu.: 0.000   1st Qu.: 0.00  
##  Mode  :character   Median : 3.000   Median : 3.000   Median : 3.00  
##                     Mean   : 8.903   Mean   : 9.129   Mean   : 9.39  
##                     3rd Qu.:12.000   3rd Qu.:12.000   3rd Qu.:12.00  
##                     Max.   :54.000   Max.   :53.000   Max.   :56.00  
##        W3               W4               W5               W6       
##  Min.   : 0.000   Min.   : 0.000   Min.   : 0.000   Min.   : 0.00  
##  1st Qu.: 0.000   1st Qu.: 0.000   1st Qu.: 0.000   1st Qu.: 0.00  
##  Median : 4.000   Median : 4.000   Median : 3.000   Median : 4.00  
##  Mean   : 9.718   Mean   : 9.575   Mean   : 9.466   Mean   : 9.72  
##  3rd Qu.:13.000   3rd Qu.:13.000   3rd Qu.:12.500   3rd Qu.:13.00  
##  Max.   :59.000   Max.   :61.000   Max.   :52.000   Max.   :56.00  
##        W7               W8               W9              W10       
##  Min.   : 0.000   Min.   : 0.000   Min.   : 0.000   Min.   : 0.00  
##  1st Qu.: 0.000   1st Qu.: 0.000   1st Qu.: 0.000   1st Qu.: 0.00  
##  Median : 4.000   Median : 4.000   Median : 4.000   Median : 4.00  
##  Mean   : 9.586   Mean   : 9.784   Mean   : 9.682   Mean   : 9.79  
##  3rd Qu.:12.500   3rd Qu.:13.000   3rd Qu.:13.000   3rd Qu.:13.00  
##  Max.   :62.000   Max.   :63.000   Max.   :52.000   Max.   :56.00  
##       W11              W12              W13              W14        
##  Min.   : 0.000   Min.   : 0.000   Min.   : 0.000   Min.   : 0.000  
##  1st Qu.: 0.000   1st Qu.: 0.000   1st Qu.: 0.000   1st Qu.: 0.000  
##  Median : 4.000   Median : 3.000   Median : 4.000   Median : 4.000  
##  Mean   : 9.678   Mean   : 9.827   Mean   : 9.687   Mean   : 9.908  
##  3rd Qu.:13.000   3rd Qu.:13.000   3rd Qu.:13.000   3rd Qu.:13.000  
##  Max.   :57.000   Max.   :61.000   Max.   :55.000   Max.   :57.000  
##       W15             W16             W17              W18       
##  Min.   : 0.00   Min.   : 0.00   Min.   : 0.000   Min.   : 0.00  
##  1st Qu.: 0.00   1st Qu.: 0.00   1st Qu.: 0.000   1st Qu.: 0.00  
##  Median : 4.00   Median : 4.00   Median : 4.000   Median : 4.00  
##  Mean   :10.05   Mean   :10.03   Mean   : 9.905   Mean   :10.01  
##  3rd Qu.:14.00   3rd Qu.:13.00   3rd Qu.:13.000   3rd Qu.:13.00  
##  Max.   :59.00   Max.   :62.00   Max.   :67.000   Max.   :57.00  
##       W19              W20             W21             W22        
##  Min.   : 0.000   Min.   : 0.00   Min.   : 0.00   Min.   : 0.000  
##  1st Qu.: 0.000   1st Qu.: 0.00   1st Qu.: 0.00   1st Qu.: 0.000  
##  Median : 4.000   Median : 4.00   Median : 4.00   Median : 4.000  
##  Mean   : 9.645   Mean   : 9.85   Mean   : 9.71   Mean   : 9.903  
##  3rd Qu.:13.000   3rd Qu.:13.00   3rd Qu.:13.00   3rd Qu.:13.000  
##  Max.   :56.000   Max.   :64.00   Max.   :58.00   Max.   :51.000  
##       W23              W24             W25              W26        
##  Min.   : 0.000   Min.   : 0.00   Min.   : 0.000   Min.   : 0.000  
##  1st Qu.: 0.000   1st Qu.: 1.00   1st Qu.: 1.000   1st Qu.: 0.000  
##  Median : 4.000   Median : 5.00   Median : 5.000   Median : 3.000  
##  Mean   : 9.862   Mean   :10.17   Mean   : 8.893   Mean   : 6.951  
##  3rd Qu.:14.000   3rd Qu.:16.00   3rd Qu.:15.000   3rd Qu.: 9.000  
##  Max.   :72.000   Max.   :64.00   Max.   :64.000   Max.   :46.000  
##       W27              W28              W29              W30        
##  Min.   : 0.000   Min.   : 0.000   Min.   : 0.000   Min.   : 0.000  
##  1st Qu.: 0.000   1st Qu.: 0.000   1st Qu.: 0.000   1st Qu.: 0.000  
##  Median : 3.000   Median : 3.000   Median : 3.000   Median : 3.000  
##  Mean   : 7.194   Mean   : 7.383   Mean   : 7.339   Mean   : 7.608  
##  3rd Qu.:10.000   3rd Qu.: 9.000   3rd Qu.:10.000   3rd Qu.:10.000  
##  Max.   :47.000   Max.   :44.000   Max.   :42.000   Max.   :48.000  
##       W31             W32             W33              W34        
##  Min.   : 0.00   Min.   : 0.00   Min.   : 0.000   Min.   : 0.000  
##  1st Qu.: 0.00   1st Qu.: 0.00   1st Qu.: 0.000   1st Qu.: 0.000  
##  Median : 3.00   Median : 3.00   Median : 3.000   Median : 3.000  
##  Mean   : 7.61   Mean   : 7.76   Mean   : 7.906   Mean   : 7.993  
##  3rd Qu.:10.00   3rd Qu.:10.00   3rd Qu.:10.000   3rd Qu.:10.500  
##  Max.   :47.00   Max.   :49.00   Max.   :46.000   Max.   :46.000  
##       W35              W36              W37              W38        
##  Min.   : 0.000   Min.   : 0.000   Min.   : 0.000   Min.   : 0.000  
##  1st Qu.: 0.000   1st Qu.: 0.000   1st Qu.: 0.000   1st Qu.: 0.000  
##  Median : 3.000   Median : 3.000   Median : 3.000   Median : 3.000  
##  Mean   : 7.998   Mean   : 8.015   Mean   : 8.074   Mean   : 8.252  
##  3rd Qu.:10.000   3rd Qu.:10.000   3rd Qu.:11.000   3rd Qu.:11.000  
##  Max.   :46.000   Max.   :55.000   Max.   :47.000   Max.   :52.000  
##       W39              W40              W41             W42        
##  Min.   : 0.000   Min.   : 0.000   Min.   : 0.00   Min.   : 0.000  
##  1st Qu.: 0.000   1st Qu.: 0.000   1st Qu.: 0.00   1st Qu.: 0.000  
##  Median : 3.000   Median : 4.000   Median : 3.00   Median : 4.000  
##  Mean   : 7.965   Mean   : 8.182   Mean   : 8.24   Mean   : 8.395  
##  3rd Qu.:10.000   3rd Qu.:10.000   3rd Qu.:11.00   3rd Qu.:10.000  
##  Max.   :47.000   Max.   :48.000   Max.   :50.00   Max.   :52.000  
##       W43              W44              W45              W46       
##  Min.   : 0.000   Min.   : 0.000   Min.   : 0.000   Min.   : 0.00  
##  1st Qu.: 0.000   1st Qu.: 0.000   1st Qu.: 1.000   1st Qu.: 1.00  
##  Median : 4.000   Median : 4.000   Median : 4.000   Median : 4.00  
##  Mean   : 8.318   Mean   : 8.434   Mean   : 8.556   Mean   : 8.72  
##  3rd Qu.:11.000   3rd Qu.:11.000   3rd Qu.:11.000   3rd Qu.:11.00  
##  Max.   :50.000   Max.   :46.000   Max.   :46.000   Max.   :55.00  
##       W47              W48              W49              W50        
##  Min.   : 0.000   Min.   : 0.000   Min.   : 0.000   Min.   : 0.000  
##  1st Qu.: 0.000   1st Qu.: 1.000   1st Qu.: 1.000   1st Qu.: 1.000  
##  Median : 4.000   Median : 4.000   Median : 4.000   Median : 5.000  
##  Mean   : 8.671   Mean   : 8.674   Mean   : 8.895   Mean   : 8.862  
##  3rd Qu.:12.000   3rd Qu.:12.000   3rd Qu.:12.000   3rd Qu.:13.000  
##  Max.   :49.000   Max.   :50.000   Max.   :52.000   Max.   :57.000  
##       W51              MIN              MAX       
##  Min.   : 0.000   Min.   : 0.000   Min.   : 1.00  
##  1st Qu.: 1.000   1st Qu.: 0.000   1st Qu.: 3.00  
##  Median : 5.000   Median : 0.000   Median : 9.00  
##  Mean   : 8.889   Mean   : 3.781   Mean   :16.31  
##  3rd Qu.:14.000   3rd Qu.: 4.000   3rd Qu.:21.00  
##  Max.   :73.000   Max.   :24.000   Max.   :73.00

In fact, the average sales within 52 weeks were between 7 items and 10 items. We may observe that maximum 73 items and minimum 1 item sold in a certain week.

Checking the NA values in the dataset. And illustration the number of missing values.

sum(is.na(dataset))

## [1] 0

colSums(is.na(dataset))

##  Product_Code            W0            W1            W2            W3 
##             0             0             0             0             0 
##            W4            W5            W6            W7            W8 
##             0             0             0             0             0 
##            W9           W10           W11           W12           W13 
##             0             0             0             0             0 
##           W14           W15           W16           W17           W18 
##             0             0             0             0             0 
##           W19           W20           W21           W22           W23 
##             0             0             0             0             0 
##           W24           W25           W26           W27           W28 
##             0             0             0             0             0 
##           W29           W30           W31           W32           W33 
##             0             0             0             0             0 
##           W34           W35           W36           W37           W38 
##             0             0             0             0             0 
##           W39           W40           W41           W42           W43 
##             0             0             0             0             0 
##           W44           W45           W46           W47           W48 
##             0             0             0             0             0 
##           W49           W50           W51           MIN           MAX 
##             0             0             0             0             0 
##  Normalized.0  Normalized.1  Normalized.2  Normalized.3  Normalized.4 
##             0             0             0             0             0 
##  Normalized.5  Normalized.6  Normalized.7  Normalized.8  Normalized.9 
##             0             0             0             0             0 
## Normalized.10 Normalized.11 Normalized.12 Normalized.13 Normalized.14 
##             0             0             0             0             0 
## Normalized.15 Normalized.16 Normalized.17 Normalized.18 Normalized.19 
##             0             0             0             0             0 
## Normalized.20 Normalized.21 Normalized.22 Normalized.23 Normalized.24 
##             0             0             0             0             0 
## Normalized.25 Normalized.26 Normalized.27 Normalized.28 Normalized.29 
##             0             0             0             0             0 
## Normalized.30 Normalized.31 Normalized.32 Normalized.33 Normalized.34 
##             0             0             0             0             0 
## Normalized.35 Normalized.36 Normalized.37 Normalized.38 Normalized.39 
##             0             0             0             0             0 
## Normalized.40 Normalized.41 Normalized.42 Normalized.43 Normalized.44 
##             0             0             0             0             0 
## Normalized.45 Normalized.46 Normalized.47 Normalized.48 Normalized.49 
##             0             0             0             0             0 
## Normalized.50 Normalized.51 
##             0             0

We may observe that there are no missing values on the dataset. The Data Cleaning and Handling with missing data steps is not needed for this dataset.If we would need to implement, we could delete rows which conclude na values with na.omit() function or we might substitute na values with mean/median.

Checking the outliers in the dataset

# Set the threshold for considering a point as an outlier (e.g., 1.5 times the IQR)
iqr_threshold <- 1.5

# Function to identify outliers in a single column
find_outliers <- function(column) {
  column <- as.numeric(column)  # Ensure numeric type
  q1 <- quantile(column, 0.25, na.rm = TRUE)
  q3 <- quantile(column, 0.75, na.rm = TRUE)
  iqr <- q3 - q1
  lower_bound <- q1 - iqr_threshold * iqr
  upper_bound <- q3 + iqr_threshold * iqr
  return(which(column < lower_bound | column > upper_bound))
}

# Identify outliers for each column
outliers_indices <- lapply(normalized_data, find_outliers)

# Combine the indices of outliers from all columns
all_outliers <- unique(unlist(outliers_indices))

# Print the row indices of outliers
cat("Outliers found at rows:", all_outliers, "\n")

## Outliers found at rows: 3 149 204 309 342 418 421 424 442 447 726 774 115 171 258 292 319 348 481 577 583 605 637 744 787 105 147 349 460 625 683 708 722 784 802 203 225 235 278 307 375 477 585 663 704 723 731 199 236 248 288 340 344 423 653 711

outliers_indices

## $Product_Code
## integer(0)
## 
## $Normalized.0
## integer(0)
## 
## $Normalized.1
## integer(0)
## 
## $Normalized.2
## integer(0)
## 
## $Normalized.3
## integer(0)
## 
## $Normalized.4
## integer(0)
## 
## $Normalized.5
## integer(0)
## 
## $Normalized.6
## integer(0)
## 
## $Normalized.7
## integer(0)
## 
## $Normalized.8
## integer(0)
## 
## $Normalized.9
## integer(0)
## 
## $Normalized.10
## integer(0)
## 
## $Normalized.11
## integer(0)
## 
## $Normalized.12
## integer(0)
## 
## $Normalized.13
## integer(0)
## 
## $Normalized.14
## integer(0)
## 
## $Normalized.15
## integer(0)
## 
## $Normalized.16
## integer(0)
## 
## $Normalized.17
## integer(0)
## 
## $Normalized.18
## integer(0)
## 
## $Normalized.19
## integer(0)
## 
## $Normalized.20
## integer(0)
## 
## $Normalized.21
## integer(0)
## 
## $Normalized.22
## integer(0)
## 
## $Normalized.23
## integer(0)
## 
## $Normalized.24
## integer(0)
## 
## $Normalized.25
## integer(0)
## 
## $Normalized.26
##  [1]   3 149 204 309 342 418 421 424 442 447 726 774
## 
## $Normalized.27
##  [1] 115 171 258 292 319 348 481 577 583 605 637 744 787
## 
## $Normalized.28
##  [1] 105 147 349 460 625 683 708 722 784 802
## 
## $Normalized.29
##  [1] 203 225 235 278 307 375 477 585 663 704 723 731
## 
## $Normalized.30
##  [1] 199 236 248 288 319 340 344 421 423 577 653 708 711 784
## 
## $Normalized.31
## integer(0)
## 
## $Normalized.32
## integer(0)
## 
## $Normalized.33
## integer(0)
## 
## $Normalized.34
## integer(0)
## 
## $Normalized.35
## integer(0)
## 
## $Normalized.36
## integer(0)
## 
## $Normalized.37
## integer(0)
## 
## $Normalized.38
## integer(0)
## 
## $Normalized.39
## integer(0)
## 
## $Normalized.40
## integer(0)
## 
## $Normalized.41
## integer(0)
## 
## $Normalized.42
## integer(0)
## 
## $Normalized.43
## integer(0)
## 
## $Normalized.44
## integer(0)
## 
## $Normalized.45
## integer(0)
## 
## $Normalized.46
## integer(0)
## 
## $Normalized.47
## integer(0)
## 
## $Normalized.48
## integer(0)
## 
## $Normalized.49
## integer(0)
## 
## $Normalized.50
## integer(0)
## 
## $Normalized.51
## integer(0)

Top sold products

primary_data %>% select(Product_Code, MIN, MAX) %>%  
  filter(primary_data$MAX > 60) %>%  arrange(desc(MAX))

##   Product_Code MIN MAX
## 1         P409  23  73
## 2          P83  20  63
## 3         P262  13  63
## 4          P84  21  62
## 5         P621  16  62
## 6          P36  19  61
## 7          P38  20  61
## 8          P43  20  61

Following an extensive review of the dataset, eight products stood out as the most popular during the 52-week period: P409, P83, P262, P84, P621, P36, P38, and P43. With maximum quantities ranging from 61 to 73 units, these best-selling products demonstrated remarkable sales performance and demonstrated their importance within the sales portfolio.

The distribution of maximum sales quantities

ggplot(dataset, aes(x = MAX)) +
  geom_histogram(binwidth = 5, fill = "blue", color = "black") +
  labs(title = "Distribution of Maximum Sales Quantities",
       x = "Maximum Sales Quantity",
       y = "Frequency")

The Density Plot for sales quantities

# Create a density plot for sales quantities
ggplot(dataset, aes(x = MAX)) +
  geom_density(fill = "skyblue", color = "black") +
  labs(title = "Density Plot of Sales Quantities",
       x = "Sales Quantity",
       y = "Density") +
  theme_minimal()

The density plot assists us to understand that the mostly items sold in a 0-20 interval.

The Boxplot of the distribution of sales within weeks

# Boxplot to display the distribution of sales within weeks
boxplot(primary_data[, 2:53], 
        main = "Distribution of Sales Within Weeks",
        xlab = "Weeks", ylab = "Sales",
        col = "lightblue", border = "black")

This boxplot is a good way to see change in sales within weeks.

The Violin plot of the distribution of sales within weeks

# Violin plot to display the distribution of sales within weeks
vioplot(primary_data[, 2:53], 
        names = paste0("Week ", 0:51),
        col = "lightblue", border = "black",
        main = "Distribution of Sales Within Weeks",
        xlab = "Weeks", ylab = "Sales")

It is a vivolin plot which is Distribution of Sales Within Weeks

The Bar plot of the exact number of sales within weeks

# Bar plot to display the exact number of sales within weeks
barplot(t(primary_data[, 2:53]), 
        beside = TRUE, col = "lightblue",
        main = "Number of Sales Within Weeks",
        xlab = "Weeks", ylab = "Number of Sales")

I used the Bar plot to display the exact number of sales within weeks.

The Correlation Analysis between min and max sales quantities

# Correlation between min and max sales quantities
correlation_result <- cor(dataset$MIN, dataset$MAX)

# Display the correlation result
print(paste("Correlation between MIN and MAX:", correlation_result))

## [1] "Correlation between MIN and MAX: 0.948931954310161"

The minimum (MIN) and maximum (MAX) sales amounts have a strong positive linear relationship, as indicated by the correlation coefficient of 0.949. This suggests that there is a consistent sales pattern throughout the dataset, with products that have greater minimum sales also typically having higher maximum sales.

The plot of the correlation analysis

# Create a scatter plot for correlation analysis
ggplot(dataset, aes(x = MIN, y = MAX)) +
  geom_point(alpha = 0.7, color = "blue") +
  geom_smooth(method = "lm", se = FALSE, color = "red") +
  labs(title = "Correlation Analysis: MIN vs MAX Sales Quantities",
       x = "Minimum Sales Quantity",
       y = "Maximum Sales Quantity") +
  theme_minimal()

## `geom_smooth()` using formula = 'y ~ x'

It is a scatter plot to demonstrate the correlation between max and min values.

Total sales performance with ranks

# Create a new column for the total sales performance
dataset$Total_Sales <- rowSums(dataset[, 2:53])

# Rank products based on total sales performance
ranked_products <- dataset %>%
  select(Product_Code, Total_Sales) %>%
  arrange(desc(Total_Sales)) %>%
  mutate(Rank = rank(desc(Total_Sales)))

# Display the ranked products
print(ranked_products)

##     Product_Code Total_Sales  Rank
## 1           P409        2220   1.0
## 2            P34        1932   2.0
## 3           P178        1925   3.0
## 4           P135        1920   4.0
## 5            P43        1913   5.0
## 6           P190        1912   6.0
## 7           P179        1904   7.0
## 8           P173        1897   8.0
## 9            P92        1896   9.0
## 10          P137        1894  10.0
## 11           P38        1892  11.0
## 12          P174        1886  12.0
## 13           P24        1877  13.0
## 14           P16        1875  14.0
## 15           P40        1864  15.0
## 16           P54        1860  16.0
## 17          P136        1859  17.5
## 18          P193        1859  17.5
## 19           P37        1858  19.0
## 20          P191        1854  20.0
## 21          P180        1853  21.0
## 22          P101        1849  22.0
## 23           P36        1843  23.0
## 24           P66        1842  24.0
## 25          P134        1841  25.0
## 26           P75        1835  26.0
## 27          P129        1832  27.0
## 28          P128        1825  28.0
## 29          P175        1824  29.0
## 30           P63        1823  30.0
## 31          P132        1819  31.0
## 32           P41        1818  32.0
## 33           P72        1813  33.0
## 34           P83        1812  34.0
## 35          P112        1808  35.0
## 36           P15        1805  36.5
## 37           P35        1805  36.5
## 38           P27        1804  39.0
## 39           P48        1804  39.0
## 40          P172        1804  39.0
## 41           P96        1803  41.0
## 42          P186        1799  42.0
## 43           P49        1797  43.0
## 44          P168        1786  44.0
## 45          P184        1783  45.0
## 46          P618        1782  46.0
## 47          P133        1781  47.0
## 48          P185        1768  48.0
## 49           P17        1765  49.5
## 50          P130        1765  49.5
## 51           P39        1756  51.0
## 52           P84        1752  52.0
## 53           P69        1746  53.0
## 54          P131        1745  54.0
## 55           P47        1738  55.5
## 56           P58        1738  55.5
## 57          P140        1735  57.0
## 58          P167        1732  58.0
## 59          P120        1726  59.0
## 60           P57        1724  60.5
## 61          P119        1724  60.5
## 62          P177        1720  62.0
## 63           P46        1716  63.0
## 64           P52        1715  64.5
## 65           P60        1715  64.5
## 66          P176        1712  66.0
## 67          P170        1711  67.0
## 68          P139        1710  68.0
## 69           P90        1709  69.0
## 70           P73        1707  70.0
## 71          P181        1705  71.0
## 72           P86        1704  72.0
## 73           P56        1701  73.0
## 74          P621        1698  74.0
## 75          P196        1697  75.0
## 76           P28        1696  76.0
## 77          P143        1694  77.0
## 78          P622        1693  78.0
## 79           P44        1692  79.5
## 80           P67        1692  79.5
## 81          P548        1691  81.0
## 82           P30        1690  82.0
## 83           P19        1687  83.0
## 84          P192        1684  84.0
## 85           P76        1683  85.0
## 86           P18        1682  86.5
## 87           P79        1682  86.5
## 88          P262        1681  88.0
## 89          P188        1678  89.5
## 90          P189        1678  89.5
## 91          P141        1676  91.0
## 92           P89        1675  92.0
## 93          P623        1673  93.0
## 94          P619        1672  94.0
## 95           P61        1671  95.0
## 96           P42        1670  96.0
## 97           P70        1668  97.0
## 98          P138        1663  98.5
## 99          P142        1663  98.5
## 100          P78        1661 100.0
## 101         P617        1656 101.0
## 102          P45        1651 102.0
## 103          P64        1649 103.0
## 104          P85        1648 104.0
## 105         P208        1645 105.0
## 106          P97        1644 106.0
## 107          P87        1643 107.5
## 108         P102        1643 107.5
## 109          P55        1642 109.0
## 110         P182        1637 110.0
## 111         P549        1635 111.0
## 112          P88        1626 112.0
## 113         P169        1617 113.5
## 114         P183        1617 113.5
## 115         P194        1613 115.0
## 116         P113        1612 116.0
## 117          P25        1602 117.0
## 118          P80        1598 118.0
## 119         P620        1593 119.0
## 120         P187        1579 120.0
## 121         P557        1315 121.0
## 122         P511        1289 122.0
## 123         P615        1265 123.0
## 124         P533        1253 124.0
## 125         P613        1153 125.0
## 126         P261        1069 126.0
## 127         P270        1030 127.0
## 128          P10        1010 128.0
## 129         P516         969 129.0
## 130         P519         967 130.0
## 131         P405         966 131.0
## 132         P512         963 132.0
## 133         P268         960 133.5
## 134         P286         960 133.5
## 135         P407         958 135.0
## 136         P554         956 136.0
## 137         P535         949 137.0
## 138         P566         946 138.0
## 139         P486         945 139.0
## 140         P263         939 140.0
## 141          P51         932 141.0
## 142         P284         930 142.0
## 143         P491         927 143.0
## 144         P513         922 144.0
## 145         P107         921 145.0
## 146         P537         920 146.0
## 147         P540         908 147.0
## 148         P200         905 148.0
## 149         P411         885 149.0
## 150         P507         875 150.0
## 151         P435         867 151.5
## 152         P640         867 151.5
## 153         P410         860 153.0
## 154         P781         855 154.0
## 155          P62         834 155.0
## 156         P545         829 156.0
## 157         P495         826 157.0
## 158         P403         823 158.0
## 159         P530         820 159.0
## 160         P400         818 160.0
## 161         P526         816 161.0
## 162         P503         811 162.0
## 163         P202         805 163.0
## 164         P505         797 164.0
## 165         P556         786 165.0
## 166         P406         691 166.0
## 167         P783         687 167.0
## 168         P612         681 168.0
## 169         P638         667 169.0
## 170         P399         660 170.0
## 171         P538         659 171.0
## 172         P529         658 172.0
## 173         P364         656 173.0
## 174         P210         653 174.0
## 175         P502         651 175.0
## 176         P430         643 176.0
## 177          P95         640 177.5
## 178         P269         640 177.5
## 179         P598         636 179.0
## 180         P525         635 180.0
## 181         P520         633 181.0
## 182         P205         632 182.0
## 183         P506         631 183.0
## 184          P29         628 184.0
## 185         P408         625 185.5
## 186         P487         625 185.5
## 187         P398         624 187.0
## 188         P558         619 188.0
## 189          P14         615 189.0
## 190         P494         613 190.0
## 191          P33         604 191.0
## 192         P536         603 192.0
## 193         P517         602 193.5
## 194         P546         602 193.5
## 195          P11         601 195.0
## 196         P544         600 196.0
## 197         P514         594 197.0
## 198         P636         593 198.0
## 199         P523         591 199.0
## 200         P211         589 200.0
## 201         P559         586 201.0
## 202         P504         582 202.0
## 203         P100         579 203.0
## 204         P115         571 204.0
## 205         P528         566 205.0
## 206         P527         565 206.0
## 207         P488         559 207.0
## 208         P542         558 208.0
## 209         P106         557 209.5
## 210         P493         557 209.5
## 211         P492         556 211.5
## 212         P541         556 211.5
## 213         P522         555 213.0
## 214           P9         539 214.0
## 215         P543         537 215.5
## 216         P555         537 215.5
## 217          P26         535 217.0
## 218         P198         532 218.0
## 219         P521         526 219.0
## 220         P547         525 220.0
## 221         P632         520 221.0
## 222         P634         518 222.0
## 223         P209         515 223.0
## 224         P334         513 224.5
## 225         P524         513 224.5
## 226         P165         512 226.5
## 227         P627         512 226.5
## 228         P314         504 228.0
## 229         P518         503 229.0
## 230         P118         502 231.0
## 231         P166         502 231.0
## 232         P397         502 231.0
## 233           P1         501 233.0
## 234         P197         500 234.5
## 235         P485         500 234.5
## 236         P560         499 236.0
## 237          P71         494 237.0
## 238         P144         493 238.0
## 239         P266         492 239.0
## 240         P633         491 240.0
## 241         P264         490 241.0
## 242          P65         489 242.5
## 243         P109         489 242.5
## 244          P22         487 244.5
## 245         P114         487 244.5
## 246         P149         486 247.5
## 247         P153         486 247.5
## 248         P267         486 247.5
## 249         P625         486 247.5
## 250         P116         485 250.5
## 251         P294         485 250.5
## 252          P99         483 252.0
## 253         P309         482 254.0
## 254         P539         482 254.0
## 255         P629         482 254.0
## 256         P564         480 256.0
## 257         P147         478 258.5
## 258         P285         478 258.5
## 259         P515         478 258.5
## 260         P550         478 258.5
## 261         P413         476 261.5
## 262         P551         476 261.5
## 263         P152         474 264.0
## 264         P299         474 264.0
## 265         P319         474 264.0
## 266          P32         473 267.5
## 267         P160         473 267.5
## 268         P333         473 267.5
## 269         P532         473 267.5
## 270         P324         472 271.0
## 271         P404         472 271.0
## 272         P534         472 271.0
## 273          P13         470 273.5
## 274         P145         470 273.5
## 275          P74         469 275.0
## 276         P496         467 276.0
## 277          P59         466 277.0
## 278          P20         465 278.5
## 279         P164         465 278.5
## 280         P122         464 280.5
## 281         P626         464 280.5
## 282          P81         463 282.5
## 283         P635         463 282.5
## 284          P21         462 285.0
## 285         P436         462 285.0
## 286         P624         462 285.0
## 287         P110         461 287.5
## 288         P121         461 287.5
## 289         P332         460 289.5
## 290         P631         460 289.5
## 291          P50         459 291.0
## 292          P91         457 293.5
## 293          P94         457 293.5
## 294         P429         457 293.5
## 295         P499         457 293.5
## 296         P508         456 296.0
## 297          P93         453 297.0
## 298           P3         452 298.5
## 299         P552         452 298.5
## 300           P8         450 301.0
## 301         P563         450 301.0
## 302         P630         450 301.0
## 303         P162         449 303.0
## 304         P171         448 304.0
## 305          P82         444 305.0
## 306          P31         442 306.5
## 307          P68         442 306.5
## 308         P565         441 308.0
## 309           P5         440 309.0
## 310         P146         439 310.0
## 311         P628         437 311.0
## 312         P125         435 312.0
## 313         P154         434 313.0
## 314         P490         433 314.0
## 315           P4         430 316.0
## 316         P157         430 316.0
## 317         P304         430 316.0
## 318         P103         429 318.0
## 319         P432         414 319.0
## 320         P674         381 320.0
## 321         P510         379 321.0
## 322         P500         324 322.0
## 323         P431         319 323.0
## 324         P437         318 324.0
## 325         P203         316 325.0
## 326         P586         314 326.0
## 327         P702         301 327.0
## 328         P207         294 328.5
## 329         P509         294 328.5
## 330         P791         287 330.0
## 331         P596         285 331.0
## 332         P614         284 332.0
## 333         P368         280 333.0
## 334         P392         279 334.0
## 335         P571         277 335.0
## 336         P745         272 336.0
## 337         P806         269 337.0
## 338         P814         266 338.0
## 339         P359         265 339.5
## 340         P501         265 339.5
## 341         P305         256 341.0
## 342         P531         254 342.5
## 343         P616         254 342.5
## 344         P799         251 344.0
## 345         P331         246 345.0
## 346         P124         240 346.0
## 347         P111         238 347.0
## 348          P53         234 348.5
## 349          P77         234 348.5
## 350         P310         232 350.5
## 351         P497         232 350.5
## 352         P295         231 352.5
## 353         P329         231 352.5
## 354         P489         230 354.0
## 355         P675         229 355.0
## 356         P161         228 356.5
## 357         P484         228 356.5
## 358         P307         227 358.0
## 359         P117         226 360.5
## 360         P195         226 360.5
## 361         P311         226 360.5
## 362         P317         226 360.5
## 363         P126         225 363.0
## 364         P300         223 364.0
## 365         P151         221 365.0
## 366           P6         220 367.5
## 367         P313         220 367.5
## 368         P562         220 367.5
## 369         P637         220 367.5
## 370         P296         219 371.0
## 371         P316         219 371.0
## 372         P320         219 371.0
## 373         P302         217 373.0
## 374         P321         216 374.0
## 375          P23         215 375.0
## 376         P325         214 376.0
## 377           P7         213 377.0
## 378         P159         212 380.5
## 379         P301         212 380.5
## 380         P306         212 380.5
## 381         P312         212 380.5
## 382         P326         212 380.5
## 383         P433         212 380.5
## 384         P323         211 384.0
## 385          P98         210 385.0
## 386         P328         209 386.5
## 387         P330         209 386.5
## 388           P2         207 389.0
## 389         P308         207 389.0
## 390         P315         207 389.0
## 391         P104         206 391.5
## 392         P768         206 391.5
## 393         P150         205 393.5
## 394         P298         205 393.5
## 395         P155         204 396.5
## 396         P163         204 396.5
## 397         P297         204 396.5
## 398         P369         204 396.5
## 399          P12         203 399.5
## 400         P561         203 399.5
## 401         P327         202 401.5
## 402         P553         202 401.5
## 403         P148         201 403.5
## 404         P337         201 403.5
## 405         P303         200 405.0
## 406         P498         199 407.0
## 407         P567         199 407.0
## 408         P599         199 407.0
## 409         P401         198 409.0
## 410         P201         197 411.0
## 411         P293         197 411.0
## 412         P356         197 411.0
## 413         P105         195 413.5
## 414         P610         195 413.5
## 415         P292         193 415.5
## 416         P322         193 415.5
## 417         P318         191 417.5
## 418         P365         191 417.5
## 419         P782         189 419.0
## 420         P590         188 420.0
## 421         P123         187 421.5
## 422         P156         187 421.5
## 423         P370         185 424.5
## 424         P587         185 424.5
## 425         P611         185 424.5
## 426         P803         185 424.5
## 427         P811         184 427.0
## 428         P291         183 428.0
## 429         P336         182 429.5
## 430         P412         182 429.5
## 431         P271         181 431.0
## 432         P158         180 432.0
## 433         P609         179 433.0
## 434         P642         177 434.5
## 435         P788         177 434.5
## 436         P415         176 436.5
## 437         P784         176 436.5
## 438         P593         174 438.0
## 439         P199         173 439.5
## 440         P796         173 439.5
## 441         P580         172 441.0
## 442         P594         171 442.0
## 443         P402         170 443.0
## 444         P335         169 444.5
## 445         P573         169 444.5
## 446         P341         168 446.5
## 447         P568         168 446.5
## 448         P387         167 448.5
## 449         P414         167 448.5
## 450         P764         165 450.0
## 451         P789         164 451.0
## 452         P581         163 452.0
## 453         P676         162 453.5
## 454         P808         162 453.5
## 455         P584         160 455.5
## 456         P673         160 455.5
## 457         P366         159 457.0
## 458         P701         158 458.5
## 459         P727         158 458.5
## 460         P703         157 460.5
## 461         P793         157 460.5
## 462         P388         153 462.5
## 463         P797         153 462.5
## 464         P804         152 464.0
## 465         P747         151 465.0
## 466         P801         150 466.0
## 467         P108         148 467.0
## 468         P737         145 468.0
## 469         P816         142 469.0
## 470         P583         140 470.0
## 471         P390         139 472.0
## 472         P569         139 472.0
## 473         P591         139 472.0
## 474         P595         138 474.5
## 475         P805         138 474.5
## 476         P206         137 476.0
## 477         P338         136 478.0
## 478         P434         136 478.0
## 479         P705         136 478.0
## 480         P391         135 480.0
## 481         P345         133 482.5
## 482         P475         133 482.5
## 483         P686         133 482.5
## 484         P693         133 482.5
## 485         P361         132 485.0
## 486         P570         131 486.0
## 487         P394         130 487.5
## 488         P641         130 487.5
## 489         P389         129 489.0
## 490         P790         126 490.0
## 491         P582         124 491.0
## 492         P204         122 493.5
## 493         P395         122 493.5
## 494         P396         122 493.5
## 495         P813         122 493.5
## 496         P698         121 496.0
## 497         P342         120 497.0
## 498         P798         119 498.0
## 499         P692         118 499.5
## 500         P786         118 499.5
## 501         P127         117 501.5
## 502         P367         117 501.5
## 503         P357         114 504.0
## 504         P371         114 504.0
## 505         P600         114 504.0
## 506         P372         111 506.5
## 507         P785         111 506.5
## 508         P752         110 508.0
## 509         P588         109 509.0
## 510         P265         103 510.5
## 511         P343         103 510.5
## 512         P762          89 512.0
## 513         P601          87 513.0
## 514         P344          82 514.0
## 515         P812          80 515.0
## 516         P589          77 516.0
## 517         P800          76 517.0
## 518         P373          75 518.0
## 519         P792          74 519.0
## 520         P572          72 520.0
## 521         P393          70 521.5
## 522         P732          70 521.5
## 523         P481          69 524.0
## 524         P585          69 524.0
## 525         P769          69 524.0
## 526         P597          68 527.0
## 527         P787          68 527.0
## 528         P807          68 527.0
## 529         P767          67 529.0
## 530         P706          66 530.0
## 531         P358          64 531.0
## 532         P339          63 533.5
## 533         P592          63 533.5
## 534         P605          63 533.5
## 535         P746          63 533.5
## 536         P736          62 536.0
## 537         P287          61 537.0
## 538         P362          60 538.0
## 539         P374          59 539.5
## 540         P476          59 539.5
## 541         P726          58 541.0
## 542         P416          57 542.0
## 543         P255          55 543.5
## 544         P478          55 543.5
## 545         P687          53 545.0
## 546         P699          52 546.5
## 547         P742          52 546.5
## 548         P360          51 548.0
## 549         P281          50 549.0
## 550         P743          49 550.5
## 551         P754          49 550.5
## 552         P479          47 552.5
## 553         P753          47 552.5
## 554         P700          46 554.0
## 555         P288          44 555.5
## 556         P606          44 555.5
## 557         P422          41 558.5
## 558         P602          41 558.5
## 559         P728          41 558.5
## 560         P765          41 558.5
## 561         P477          40 561.5
## 562         P809          40 561.5
## 563         P577          39 564.0
## 564         P670          39 564.0
## 565         P704          39 564.0
## 566         P663          38 567.0
## 567         P707          38 567.0
## 568         P802          38 567.0
## 569         P733          37 569.0
## 570         P738          35 571.0
## 571         P748          35 571.0
## 572         P749          35 571.0
## 573         P257          34 573.5
## 574         P794          34 573.5
## 575         P282          33 575.5
## 576         P482          33 575.5
## 577         P688          32 577.0
## 578         P376          31 578.5
## 579         P377          31 578.5
## 580         P340          30 580.5
## 581         P375          30 580.5
## 582         P603          29 582.0
## 583         P574          28 583.0
## 584         P283          27 584.5
## 585         P363          27 584.5
## 586         P817          26 586.0
## 587         P417          25 588.5
## 588         P448          25 588.5
## 589         P456          25 588.5
## 590         P766          25 588.5
## 591         P224          24 591.5
## 592         P455          24 591.5
## 593         P348          23 596.5
## 594         P438          23 596.5
## 595         P575          23 596.5
## 596         P608          23 596.5
## 597         P690          23 596.5
## 598         P697          23 596.5
## 599         P795          23 596.5
## 600         P815          23 596.5
## 601         P212          22 603.0
## 602         P439          22 603.0
## 603         P664          22 603.0
## 604         P734          22 603.0
## 605         P755          22 603.0
## 606         P272          21 609.0
## 607         P289          21 609.0
## 608         P442          21 609.0
## 609         P449          21 609.0
## 610         P480          21 609.0
## 611         P689          21 609.0
## 612         P751          21 609.0
## 613         P347          20 614.0
## 614         P378          20 614.0
## 615         P810          20 614.0
## 616         P247          19 617.0
## 617         P460          19 617.0
## 618         P651          19 617.0
## 619         P671          18 620.0
## 620         P735          18 620.0
## 621         P750          18 620.0
## 622         P346          17 624.0
## 623         P450          17 624.0
## 624         P483          17 624.0
## 625         P756          17 624.0
## 626         P818          17 624.0
## 627         P219          16 632.5
## 628         P349          16 632.5
## 629         P384          16 632.5
## 630         P418          16 632.5
## 631         P424          16 632.5
## 632         P443          16 632.5
## 633         P694          16 632.5
## 634         P718          16 632.5
## 635         P731          16 632.5
## 636         P739          16 632.5
## 637         P776          16 632.5
## 638         P819          16 632.5
## 639         P221          15 642.5
## 640         P440          15 642.5
## 641         P451          15 642.5
## 642         P452          15 642.5
## 643         P576          15 642.5
## 644         P757          15 642.5
## 645         P770          15 642.5
## 646         P771          15 642.5
## 647         P238          14 652.0
## 648         P245          14 652.0
## 649         P277          14 652.0
## 650         P290          14 652.0
## 651         P441          14 652.0
## 652         P447          14 652.0
## 653         P459          14 652.0
## 654         P578          14 652.0
## 655         P604          14 652.0
## 656         P691          14 652.0
## 657         P772          14 652.0
## 658         P214          13 662.5
## 659         P220          13 662.5
## 660         P241          13 662.5
## 661         P379          13 662.5
## 662         P423          13 662.5
## 663         P458          13 662.5
## 664         P607          13 662.5
## 665         P653          13 662.5
## 666         P665          13 662.5
## 667         P695          13 662.5
## 668         P213          12 673.5
## 669         P225          12 673.5
## 670         P242          12 673.5
## 671         P248          12 673.5
## 672         P419          12 673.5
## 673         P445          12 673.5
## 674         P446          12 673.5
## 675         P472          12 673.5
## 676         P729          12 673.5
## 677         P741          12 673.5
## 678         P758          12 673.5
## 679         P779          12 673.5
## 680         P222          11 684.0
## 681         P239          11 684.0
## 682         P240          11 684.0
## 683         P461          11 684.0
## 684         P666          11 684.0
## 685         P667          11 684.0
## 686         P672          11 684.0
## 687         P696          11 684.0
## 688         P777          11 684.0
## 689         P216          10 692.5
## 690         P223          10 692.5
## 691         P273          10 692.5
## 692         P453          10 692.5
## 693         P639          10 692.5
## 694         P652          10 692.5
## 695         P740          10 692.5
## 696         P759          10 692.5
## 697         P226           9 701.0
## 698         P229           9 701.0
## 699         P231           9 701.0
## 700         P244           9 701.0
## 701         P427           9 701.0
## 702         P679           9 701.0
## 703         P711           9 701.0
## 704         P744           9 701.0
## 705         P773           9 701.0
## 706         P235           8 710.5
## 707         P252           8 710.5
## 708         P350           8 710.5
## 709         P454           8 710.5
## 710         P579           8 710.5
## 711         P658           8 710.5
## 712         P717           8 710.5
## 713         P720           8 710.5
## 714         P778           8 710.5
## 715         P780           8 710.5
## 716         P217           7 721.0
## 717         P246           7 721.0
## 718         P274           7 721.0
## 719         P278           7 721.0
## 720         P280           7 721.0
## 721         P462           7 721.0
## 722         P464           7 721.0
## 723         P470           7 721.0
## 724         P660           7 721.0
## 725         P681           7 721.0
## 726         P714           7 721.0
## 727         P227           6 732.5
## 728         P233           6 732.5
## 729         P352           6 732.5
## 730         P426           6 732.5
## 731         P463           6 732.5
## 732         P465           6 732.5
## 733         P647           6 732.5
## 734         P655           6 732.5
## 735         P657           6 732.5
## 736         P659           6 732.5
## 737         P678           6 732.5
## 738         P774           6 732.5
## 739         P218           5 744.0
## 740         P236           5 744.0
## 741         P237           5 744.0
## 742         P258           5 744.0
## 743         P260           5 744.0
## 744         P420           5 744.0
## 745         P643           5 744.0
## 746         P669           5 744.0
## 747         P677           5 744.0
## 748         P724           5 744.0
## 749         P730           5 744.0
## 750         P234           4 758.0
## 751         P243           4 758.0
## 752         P249           4 758.0
## 753         P256           4 758.0
## 754         P275           4 758.0
## 755         P354           4 758.0
## 756         P355           4 758.0
## 757         P425           4 758.0
## 758         P444           4 758.0
## 759         P457           4 758.0
## 760         P662           4 758.0
## 761         P682           4 758.0
## 762         P683           4 758.0
## 763         P685           4 758.0
## 764         P713           4 758.0
## 765         P715           4 758.0
## 766         P775           4 758.0
## 767         P232           3 774.0
## 768         P276           3 774.0
## 769         P351           3 774.0
## 770         P380           3 774.0
## 771         P386           3 774.0
## 772         P466           3 774.0
## 773         P473           3 774.0
## 774         P474           3 774.0
## 775         P650           3 774.0
## 776         P661           3 774.0
## 777         P668           3 774.0
## 778         P712           3 774.0
## 779         P716           3 774.0
## 780         P719           3 774.0
## 781         P761           3 774.0
## 782         P228           2 792.5
## 783         P230           2 792.5
## 784         P250           2 792.5
## 785         P253           2 792.5
## 786         P279           2 792.5
## 787         P381           2 792.5
## 788         P382           2 792.5
## 789         P383           2 792.5
## 790         P421           2 792.5
## 791         P428           2 792.5
## 792         P467           2 792.5
## 793         P468           2 792.5
## 794         P471           2 792.5
## 795         P644           2 792.5
## 796         P646           2 792.5
## 797         P649           2 792.5
## 798         P654           2 792.5
## 799         P656           2 792.5
## 800         P708           2 792.5
## 801         P722           2 792.5
## 802         P760           2 792.5
## 803         P763           2 792.5
## 804         P215           1 807.5
## 805         P251           1 807.5
## 806         P254           1 807.5
## 807         P259           1 807.5
## 808         P469           1 807.5
## 809         P680           1 807.5
## 810         P684           1 807.5
## 811         P721           1 807.5

This code adds together the weekly purchase volumes to determine the overall sales performance for every product. Subsequently, it arranges the products in descending order of total sales. The product codes, total sales, and corresponding ranks are all included in the ranked_products table that is produced.

CLUSTERING

what is clustering?

The goal of clustering, a data analysis approach, is to identify natural structures or patterns within a dataset by putting related things together. Clustering facilitates the organization and comprehension of complicated data by highlighting similarities and differences, providing insightful information for strategic planning and decision-making.²

Why Apply Clustering?

By grouping products together that may share comparable customer appeal or sales trends, clustering enables firms to identify underlying structures in their sales data. This knowledge aids in improving inventory control, pricing tactics, and product placement in order to improve overall sales performance.

Clustering in the context of the Research Question:

Using the data analysis approach of clustering, related items are grouped together according to shared traits or properties. Clustering facilitates the identification of trends and connections between items that display comparable sales behaviors over a specified period of time, which is relevant to your research topic on product sales. Businesses can more successfully customize marketing efforts and inventory management by grouping products that have similar sales tendencies.

Techniques for Clustering:

Partitioning around medoids (PAM), K-means, and hierarchical clustering are popular techniques for clustering. Different algorithms are used by each approach to classify products according to predetermined criteria, giving businesses the freedom to extract valuable insights from their sales data.efficiently.

Hierarchical Clustering

A method called hierarchical clustering creates a hierarchy of clusters that resembles a tree, displaying connections and similarities among data points at various levels. The data is arranged in a nested structure that makes it easier to identify smaller clusters within bigger clusters and to explore grouping patterns in detail.³

Normalized data view

head(normalized_data)

##   Product_Code Normalized.0 Normalized.1 Normalized.2 Normalized.3 Normalized.4
## 1           P1         0.44         0.50         0.39         0.28         0.56
## 2           P2         0.70         0.60         0.30         0.20         0.70
## 3           P3         0.36         0.73         0.45         0.55         0.64
## 4           P4         0.59         0.35         0.65         0.18         0.41
## 5           P5         0.33         0.13         0.67         0.53         0.20
## 6           P6         0.27         0.27         0.18         0.64         0.55
##   Normalized.5 Normalized.6 Normalized.7 Normalized.8 Normalized.9
## 1         0.50         0.61         1.00         0.17         0.61
## 2         0.10         0.60         0.30         0.30         0.30
## 3         0.45         0.36         0.91         0.82         0.27
## 4         0.24         0.41         0.65         0.65         0.53
## 5         0.27         0.40         0.73         0.40         0.40
## 6         0.27         0.73         0.55         0.55         0.27
##   Normalized.10 Normalized.11 Normalized.12 Normalized.13 Normalized.14
## 1          0.44          0.61          0.72          0.33          0.33
## 2          0.20          0.20          0.60          0.20          0.00
## 3          1.00          0.55          0.09          0.36          0.82
## 4          0.35          0.12          0.18          0.12          0.76
## 5          0.53          1.00          0.33          0.07          0.67
## 6          0.09          0.09          0.45          0.36          0.27
##   Normalized.15 Normalized.16 Normalized.17 Normalized.18 Normalized.19
## 1          0.33          0.61          0.33          0.00          0.50
## 2          0.60          0.20          0.70          0.70          0.90
## 3          0.45          0.36          0.73          0.64          0.36
## 4          0.29          0.53          0.41          0.76          0.12
## 5          0.33          0.47          0.80          0.20          0.67
## 6          0.45          0.27          0.45          0.91          0.73
##   Normalized.20 Normalized.21 Normalized.22 Normalized.23 Normalized.24
## 1          0.11          0.44          0.22          0.50          0.11
## 2          0.40          0.70          0.20          0.40          0.50
## 3          0.36          0.91          0.73          0.45          0.64
## 4          0.24          0.29          0.53          0.29          0.41
## 5          0.53          0.20          0.47          0.40          0.33
## 6          0.36          0.82          0.64          0.45          0.36
##   Normalized.25 Normalized.26 Normalized.27 Normalized.28 Normalized.29
## 1          0.33          0.22          0.39          0.11          0.44
## 2          0.30          0.50          0.80          0.50          0.50
## 3          0.45          1.00          0.18          0.00          0.91
## 4          0.24          0.47          0.47          0.00          0.24
## 5          0.60          0.33          0.40          0.67          0.00
## 6          0.18          0.09          0.27          0.18          0.36
##   Normalized.30 Normalized.31 Normalized.32 Normalized.33 Normalized.34
## 1          0.22          0.39          0.50          0.17          0.11
## 2          0.30          0.10          0.30          0.20          0.30
## 3          0.73          0.55          0.36          0.45          0.36
## 4          0.29          0.00          0.18          0.59          0.18
## 5          0.13          0.00          0.13          0.13          0.40
## 6          0.00          0.27          0.18          1.00          0.18
##   Normalized.35 Normalized.36 Normalized.37 Normalized.38 Normalized.39
## 1          0.61          0.39          0.33          0.50          0.78
## 2          1.00          0.50          0.20          0.70          0.30
## 3          0.55          0.27          0.82          0.82          0.55
## 4          1.00          0.35          0.24          0.35          0.35
## 5          0.27          0.07          0.33          0.33          0.13
## 6          0.09          0.36          0.36          0.27          0.18
##   Normalized.40 Normalized.41 Normalized.42 Normalized.43 Normalized.44
## 1          0.22          0.44          0.06          0.22          0.28
## 2          0.20          0.50          0.20          0.40          0.50
## 3          0.00          0.18          0.27          1.00          0.18
## 4          0.59          0.24          0.41          0.47          0.06
## 5          0.13          0.33          0.27          0.53          0.27
## 6          0.45          0.36          0.36          0.18          0.36
##   Normalized.45 Normalized.46 Normalized.47 Normalized.48 Normalized.49
## 1          0.39          0.50          0.00          0.22          0.17
## 2          0.10          0.10          0.40          0.50          0.10
## 3          0.18          0.36          0.45          1.00          0.45
## 4          0.12          0.24          0.35          0.71          0.35
## 5          0.60          0.20          0.20          0.13          0.53
## 6          0.27          0.55          0.45          0.27          0.27
##   Normalized.50 Normalized.51
## 1          0.11          0.39
## 2          0.60          0.00
## 3          0.45          0.36
## 4          0.29          0.35
## 5          0.33          0.40
## 6          0.91          0.55

There is already Normalized data contains product codes and normalized values. Hence, we do not need z-score standardization.

The hierarchical_result and plot

# Data rows from 2 to 53 (all rows)
numeric_data <- normalized_data[, c(2,53)]

# Perform Agglomerative Hierarchical Clustering
hierarchical_result <- hclust(dist(numeric_data), method = "ward.D2")

# Cut the dendrogram to obtain clusters
num_clusters <- 2  # Adjust the number of clusters as needed
cut_tree_result <- cutree(hierarchical_result, num_clusters)

# Visualize the clusters (assuming two dimensions for simplicity)
plot(numeric_data[, 1:2], col = cut_tree_result,
     main = "Agglomerative Hierarchical Clustering", xlab = "Feature 1", ylab = "Feature 2")

This is a hierarchical clustering. There are 2 clusters defined by me illustrates the groups which are similar.

The Dendogram plot with clusters

# Plot dendrogram with clusters
plot(hierarchical_result, main = "Hierarchical Clustering Dendrogram", xlab = "Products", sub = NULL)
rect.hclust(hierarchical_result, k = num_clusters, border = num_clusters:1)

The heatmap plot with clusters

heatmap(as.matrix(numeric_data), Colv = NA, Rowv = as.dendrogram(hierarchical_result),
        main = "Heatmap with Agglomerative Hierarchical Clustering")

The Silhouette result plot

# Create a silhouette plot
silhouette_result <- silhouette(cut_tree_result, dist(numeric_data))

# Plot the silhouette plot
plot(silhouette_result, main = "Silhouette Plot for Agglomerative Hierarchical Clustering")

DBSCAN/k-means (Not Appropriate)

# Specify parameters for DBSCAN (adjust as needed)
eps <- 0.3  # Adjust based on the density of your data
minPts <- 5  # Adjust based on the minimum number of points in a cluster

# Perform DBSCAN clustering
dbscan_result <- dbscan(numeric_data, eps = eps, minPts = minPts)

# Access cluster assignments and noise points
cluster_assignments <- dbscan_result$cluster
noise_points <- which(cluster_assignments == 0)

# Print the number of clusters and noise points
cat("Number of clusters:", length(unique(cluster_assignments)) - 1, "\n")

## Number of clusters: 0

cat("Number of noise points:", length(noise_points), "\n")

## Number of noise points: 0

In the density-based clustering approach, I am unable to obtain any clusters and all points are labeled as noise, it could indicate that may not be suitable for the dataset, or the dataset may not exhibit clear density-based clusters. One alternative is the K-means clustering algorithm, which is more sensitive to the global structure of the data. Here’s an example using the K-means algorithm:

Considering the presence of outliers in this dataset, K-means may not be the most robust choice, as it is sensitive to outliers and can be influenced by them.

In summary, Due to the large dimensionality, changing cluster densities, and complexity of the dataset, DBSCAN is not as appropriate. Besides, due to the high volume of the outliers the Kmeans approach is not aprropriate fot the dataset.

PAM Clustering

Using a technique known as PAM (Partitioning Around Medoids) clustering, representative data points, or medoids, are found and grouped together. PAM is more resilient to outliers since it employs real data points as cluster centers as opposed to the conventional k-means clustering method. This method works especially well with datasets that have different cluster sizes and asymmetric geometries.

PAM Clustering

# Perform PAM clustering
pam_result <- pam(numeric_data, k = 2, diss = TRUE)

# Access clustering results
cluster_assignment <- pam_result$clustering

# Visualize the clustering
plot(numeric_data, col = cluster_assignment, main = "PAM Clustering")

# Add medoids to the plot
medoids <- numeric_data[pam_result$id.med, ]
points(medoids, col = "red", pch = 16, cex = 2)

The medoids of every cluster are shown as red spots in the PAM clustering plot. As a representation or “center” of a cluster, a medoid is the data point that is most centrally positioned inside the cluster. Selecting two clusters (k=2) makes it easier to spot unique trends in your data and sheds light on how people buy products. A greater understanding of client segmentation is made possible by this clustering technique, which can also direct focused marketing efforts for particular customer groups according to their buying habits over a 52-week period.⁴

Displaying the cluster medoids

# Display the cluster medoids
medoids

##   Normalized.0 Normalized.51
## 1         0.44          0.39
## 2         0.70          0.00

These values highlight certain trends in consumer behavior. For example, Cluster 1 shows a continuous product interest with a small increase in sales from Week 0 to Week 51. On the other hand, Cluster 2 indicates a significant rise in sales at Week 0 but no sales until Week 51, indicating a possible decrease or change in consumer preferences. Comprehending these clusters facilitates the customization of marketing tactics for distinct client segments, taking into account their past purchase patterns over a 52-week duration.

Clustering vs Association Rules

A data analysis method called clustering puts related data points in one category according to shared traits or attributes. Clustering can be used to find patterns and similarities across products in the context of your research question on product sales. This can help with inventory management and product placement optimization.

Conversely, association rules concentrate on identifying connections and patterns between various goods that are commonly bought in tandem. Understanding consumer behavior and optimizing product placements or recommendations to increase sales depend on this.

In conclusion, association rules identify significant relationships between products and clustering aids in organizing products based on similarities. Both processes support strategic choices for improving overall sales patterns and product placement methods in your study.

ASSOCIATION RULE

Association Rule:

In the context of data mining, an association rule is used to find fascinating relationships or patterns within enormous datasets. These guidelines aid in our comprehension of the relationships between various factors or objects.

Use of Association Rules: Why Do We Do It?

Association rules are useful in determining which products are usually purchased together in the context of retail and sales. Businesses can use this information to optimize product placement, develop focused marketing campaigns, and improve the entire shopping experience.⁵

Techniques for Association Rule:

The Apriori algorithm is the most often used technique for determining association rules. In order to find rules based on the frequency of itemsets (groups of items) in a dataset, this technique examines the frequency of those items. Metrics like lift, confidence, and support define these guidelines.[^6]

Connection to My Research topic:

Association rules are a fantastic fit for the particular research topic concerning which products are usually bought jointly in a given week. Using techniques such as Apriori on my sales data, you can find trends that point to the possibility of specific products being purchased in tandem. This knowledge assists in direct choices about marketing tactics and product positioning, which will ultimately increase overall sales performance.

In conclusion, association rules provide a potent method for revealing latent relationships in data, which is especially helpful for companies looking to improve their customer service and sales tactics.

Apriori approach

The structure of the primary_data

str(primary_data)

## 'data.frame':    811 obs. of  55 variables:
##  $ Product_Code: chr  "P1" "P2" "P3" "P4" ...
##  $ W0          : int  11 7 7 12 8 3 4 8 14 22 ...
##  $ W1          : int  12 6 11 8 5 3 8 6 9 19 ...
##  $ W2          : int  10 3 8 13 13 2 3 10 10 19 ...
##  $ W3          : int  8 2 9 5 11 7 7 9 7 29 ...
##  $ W4          : int  13 7 10 9 6 6 8 6 11 20 ...
##  $ W5          : int  12 1 8 6 7 3 7 8 15 16 ...
##  $ W6          : int  14 6 7 9 9 8 2 7 12 26 ...
##  $ W7          : int  21 3 13 13 14 6 3 5 7 20 ...
##  $ W8          : int  6 3 12 13 9 6 10 10 13 24 ...
##  $ W9          : int  14 3 6 11 9 3 3 10 12 20 ...
##  $ W10         : int  11 2 14 8 11 1 5 8 15 31 ...
##  $ W11         : int  14 2 9 4 18 1 2 8 15 22 ...
##  $ W12         : int  16 6 4 5 8 5 3 15 16 23 ...
##  $ W13         : int  9 2 7 4 4 4 4 9 10 19 ...
##  $ W14         : int  9 0 12 15 13 3 5 5 9 15 ...
##  $ W15         : int  9 6 8 7 8 5 3 11 9 19 ...
##  $ W16         : int  14 2 7 11 10 3 7 10 13 22 ...
##  $ W17         : int  9 7 11 9 15 5 10 7 8 23 ...
##  $ W18         : int  3 7 10 15 6 10 0 13 10 20 ...
##  $ W19         : int  12 9 7 4 13 8 3 9 18 33 ...
##  $ W20         : int  5 4 7 6 11 4 7 12 18 16 ...
##  $ W21         : int  11 7 13 7 6 9 5 11 17 23 ...
##  $ W22         : int  7 2 11 11 10 7 1 5 10 23 ...
##  $ W23         : int  12 4 8 7 9 5 5 11 16 16 ...
##  $ W24         : int  5 5 10 9 8 4 7 11 14 25 ...
##  $ W25         : int  9 3 8 6 12 2 5 12 10 27 ...
##  $ W26         : int  7 5 14 10 8 1 2 3 4 12 ...
##  $ W27         : int  10 8 5 10 9 3 4 10 7 15 ...
##  $ W28         : int  5 5 3 2 13 2 3 12 7 15 ...
##  $ W29         : int  11 5 13 6 3 4 1 9 10 11 ...
##  $ W30         : int  7 3 11 7 5 0 3 9 3 14 ...
##  $ W31         : int  10 1 9 2 3 3 2 10 13 29 ...
##  $ W32         : int  12 3 7 5 5 2 2 8 9 23 ...
##  $ W33         : int  6 2 8 12 5 11 4 9 7 12 ...
##  $ W34         : int  5 3 7 5 9 2 2 8 9 16 ...
##  $ W35         : int  14 10 9 19 7 1 6 9 8 9 ...
##  $ W36         : int  10 5 6 8 4 4 4 15 7 23 ...
##  $ W37         : int  9 2 12 6 8 4 5 6 9 22 ...
##  $ W38         : int  12 7 12 8 8 3 1 7 15 15 ...
##  $ W39         : int  17 3 9 8 5 2 3 8 8 18 ...
##  $ W40         : int  7 2 3 12 5 5 5 3 9 13 ...
##  $ W41         : int  11 5 5 6 8 4 8 9 8 17 ...
##  $ W42         : int  4 2 6 9 7 4 2 10 11 14 ...
##  $ W43         : int  7 4 14 10 11 2 3 14 5 17 ...
##  $ W44         : int  8 5 5 3 7 4 3 4 13 11 ...
##  $ W45         : int  10 1 5 4 12 3 6 8 3 24 ...
##  $ W46         : int  12 1 7 6 6 6 2 8 7 13 ...
##  $ W47         : int  3 4 8 8 6 5 6 6 7 16 ...
##  $ W48         : int  7 5 14 14 5 3 2 7 10 18 ...
##  $ W49         : int  6 1 8 8 11 3 4 4 12 23 ...
##  $ W50         : int  5 6 8 7 8 10 2 9 7 18 ...
##  $ W51         : int  10 0 7 8 9 6 1 9 13 20 ...
##  $ MIN         : int  3 0 3 2 3 0 0 3 3 9 ...
##  $ MAX         : int  21 10 14 19 18 11 10 15 18 33 ...

Preparing the data for association

numeric_data <- primary_data[, -1]  # Exclude the first column (Product_Code)

Since we have non-numeric variables, let’s exclude the “Product_Code” and focus on the numeric columns.

Defining breaks for discretization

# Define breaks for discretization
breaks <- list(
  W0 = c(0, 5, 10, 15, 20, Inf),
  W1 = c(0, 5, 10, 15, 20, Inf),
  W2 = c(0, 5, 10, 15, 20, Inf),
  W3 = c(0, 5, 10, 15, 20, Inf),
  W4 = c(0, 5, 10, 15, 20, Inf),
  W5 = c(0, 5, 10, 15, 20, Inf),
  W6 = c(0, 5, 10, 15, 20, Inf),
  W7 = c(0, 5, 10, 15, 20, Inf),
  W8 = c(0, 5, 10, 15, 20, Inf),
  W9 = c(0, 5, 10, 15, 20, Inf),
  W10 = c(0, 5, 10, 15, 20, Inf),
  W11 = c(0, 5, 10, 15, 20, Inf),
  W12 = c(0, 5, 10, 15, 20, Inf),
  W13 = c(0, 5, 10, 15, 20, Inf),
  W14 = c(0, 5, 10, 15, 20, Inf),
  W15 = c(0, 5, 10, 15, 20, Inf),
  W16 = c(0, 5, 10, 15, 20, Inf),
  W17 = c(0, 5, 10, 15, 20, Inf),
  W18 = c(0, 5, 10, 15, 20, Inf),
  W19 = c(0, 5, 10, 15, 20, Inf),
  W20 = c(0, 5, 10, 15, 20, Inf),
  W21 = c(0, 5, 10, 15, 20, Inf),
  W22 = c(0, 5, 10, 15, 20, Inf),
  W23 = c(0, 5, 10, 15, 20, Inf),
  W24 = c(0, 5, 10, 15, 20, Inf),
  W25 = c(0, 5, 10, 15, 20, Inf),
  W26 = c(0, 5, 10, 15, 20, Inf),
  W27 = c(0, 5, 10, 15, 20, Inf),
  W28 = c(0, 5, 10, 15, 20, Inf),
  W29 = c(0, 5, 10, 15, 20, Inf),
  W30 = c(0, 5, 10, 15, 20, Inf),
  W31 = c(0, 5, 10, 15, 20, Inf),
  W32 = c(0, 5, 10, 15, 20, Inf),
  W33 = c(0, 5, 10, 15, 20, Inf),
  W34 = c(0, 5, 10, 15, 20, Inf),
  W35 = c(0, 5, 10, 15, 20, Inf),
  W36 = c(0, 5, 10, 15, 20, Inf),
  W37 = c(0, 5, 10, 15, 20, Inf),
  W38 = c(0, 5, 10, 15, 20, Inf),
  W39 = c(0, 5, 10, 15, 20, Inf),
  W40 = c(0, 5, 10, 15, 20, Inf),
  W41 = c(0, 5, 10, 15, 20, Inf),
  W42 = c(0, 5, 10, 15, 20, Inf),
  W43 = c(0, 5, 10, 15, 20, Inf),
  W44 = c(0, 5, 10, 15, 20, Inf),
  W45 = c(0, 5, 10, 15, 20, Inf),
  W46 = c(0, 5, 10, 15, 20, Inf),
  W47 = c(0, 5, 10, 15, 20, Inf),
  W48 = c(0, 5, 10, 15, 20, Inf),
  W49 = c(0, 5, 10, 15, 20, Inf),
  W50 = c(0, 5, 10, 15, 20, Inf),
  W51 = c(0, 5, 10, 15, 20, Inf),
  MIN = c(0, 5, 10, 15, 20, Inf),
  MAX = c(0, 5, 10, 15, 20, Inf))

Discretize of the numeric columns

# Discretize the numeric columns
discretized_data <- lapply(names(numeric_data), function(col) {
  cut(numeric_data[[col]], breaks[[col]], include.lowest = TRUE, labels = FALSE)
})

Converting the discretized_data to transactions

# Convert to transactions
transactions <- as(discretized_data, "transactions")

Create Transactions: Finally, the transactions object is created using the as function. This object is suitable for input into the Apriori algorithm.

Combining all transactions into a single transaction dataset

# Combine all transactions into a single transaction dataset
all_transactions <- unlist(transactions)

Applying the apriori function

# Mine association rules using apriori
association_rules <- apriori(all_transactions, parameter = list(support = 0.01, confidence = 0.5))

## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.5    0.1    1 none FALSE            TRUE       5    0.01      1
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 0 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[5 item(s), 54 transaction(s)] done [0.00s].
## sorting and recoding items ... [5 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 done [0.00s].
## writing ... [80 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].

The Apriori method is utilized in transactional databases to identify frequently occurring itemsets and produce association rules. Iteratively locating and expanding itemsets with strong support—which demonstrate the co-occurrence of items—is how it operates. These guidelines help with market basket analysis and the optimization of product placement tactics by illuminating the links between products.

Parameters:

The confidence value is set to 0.5, it indicates that the report is looking for association rules with at least 50% confidence. The possibility that the rule is true is measured in terms of confidence.

minval: This parameter is not standard for the arules package’s apriori function; it is set to 0.1.

smax: This argument, which is set to 1, is not commonly used with the arules package’s apriori function.

arem: This is not a normal apriori function argument; it is set to “none”. It appears to suggest that there isn’t a specified aggregating technique.

Aval: This is not a typical parameter for the apriori function; it is set to FALSE.

originalSupport: This is an optional parameter for the apriori function; it is set to TRUE.

maxtime: Set to 5, this may place a limit on the algorithm’s maximum execution time.

support: it is set to 0.01 it indicates that itemsets that are present in at least 1% of the transactions are of interest to you. The percentage of transactions that contain the itemset is known as support.

minlen: The minimum length of the rules, set at 1.

maxlen: The maximum length of the rules, set to 10.

target: Set to “rules” to show that mining association rules is of interest to you.

ext: Set to TRUE; this may suggest that you should take into account items for extensions in the rules.

Displaying the generated rules

# Display the generated rules
inspect(association_rules)

##      lhs             rhs support confidence coverage lift count
## [1]  {}           => {1} 1       1          1        1    54   
## [2]  {}           => {2} 1       1          1        1    54   
## [3]  {}           => {3} 1       1          1        1    54   
## [4]  {}           => {4} 1       1          1        1    54   
## [5]  {}           => {5} 1       1          1        1    54   
## [6]  {1}          => {2} 1       1          1        1    54   
## [7]  {2}          => {1} 1       1          1        1    54   
## [8]  {1}          => {3} 1       1          1        1    54   
## [9]  {3}          => {1} 1       1          1        1    54   
## [10] {1}          => {4} 1       1          1        1    54   
## [11] {4}          => {1} 1       1          1        1    54   
## [12] {1}          => {5} 1       1          1        1    54   
## [13] {5}          => {1} 1       1          1        1    54   
## [14] {2}          => {3} 1       1          1        1    54   
## [15] {3}          => {2} 1       1          1        1    54   
## [16] {2}          => {4} 1       1          1        1    54   
## [17] {4}          => {2} 1       1          1        1    54   
## [18] {2}          => {5} 1       1          1        1    54   
## [19] {5}          => {2} 1       1          1        1    54   
## [20] {3}          => {4} 1       1          1        1    54   
## [21] {4}          => {3} 1       1          1        1    54   
## [22] {3}          => {5} 1       1          1        1    54   
## [23] {5}          => {3} 1       1          1        1    54   
## [24] {4}          => {5} 1       1          1        1    54   
## [25] {5}          => {4} 1       1          1        1    54   
## [26] {1, 2}       => {3} 1       1          1        1    54   
## [27] {1, 3}       => {2} 1       1          1        1    54   
## [28] {2, 3}       => {1} 1       1          1        1    54   
## [29] {1, 2}       => {4} 1       1          1        1    54   
## [30] {1, 4}       => {2} 1       1          1        1    54   
## [31] {2, 4}       => {1} 1       1          1        1    54   
## [32] {1, 2}       => {5} 1       1          1        1    54   
## [33] {1, 5}       => {2} 1       1          1        1    54   
## [34] {2, 5}       => {1} 1       1          1        1    54   
## [35] {1, 3}       => {4} 1       1          1        1    54   
## [36] {1, 4}       => {3} 1       1          1        1    54   
## [37] {3, 4}       => {1} 1       1          1        1    54   
## [38] {1, 3}       => {5} 1       1          1        1    54   
## [39] {1, 5}       => {3} 1       1          1        1    54   
## [40] {3, 5}       => {1} 1       1          1        1    54   
## [41] {1, 4}       => {5} 1       1          1        1    54   
## [42] {1, 5}       => {4} 1       1          1        1    54   
## [43] {4, 5}       => {1} 1       1          1        1    54   
## [44] {2, 3}       => {4} 1       1          1        1    54   
## [45] {2, 4}       => {3} 1       1          1        1    54   
## [46] {3, 4}       => {2} 1       1          1        1    54   
## [47] {2, 3}       => {5} 1       1          1        1    54   
## [48] {2, 5}       => {3} 1       1          1        1    54   
## [49] {3, 5}       => {2} 1       1          1        1    54   
## [50] {2, 4}       => {5} 1       1          1        1    54   
## [51] {2, 5}       => {4} 1       1          1        1    54   
## [52] {4, 5}       => {2} 1       1          1        1    54   
## [53] {3, 4}       => {5} 1       1          1        1    54   
## [54] {3, 5}       => {4} 1       1          1        1    54   
## [55] {4, 5}       => {3} 1       1          1        1    54   
## [56] {1, 2, 3}    => {4} 1       1          1        1    54   
## [57] {1, 2, 4}    => {3} 1       1          1        1    54   
## [58] {1, 3, 4}    => {2} 1       1          1        1    54   
## [59] {2, 3, 4}    => {1} 1       1          1        1    54   
## [60] {1, 2, 3}    => {5} 1       1          1        1    54   
## [61] {1, 2, 5}    => {3} 1       1          1        1    54   
## [62] {1, 3, 5}    => {2} 1       1          1        1    54   
## [63] {2, 3, 5}    => {1} 1       1          1        1    54   
## [64] {1, 2, 4}    => {5} 1       1          1        1    54   
## [65] {1, 2, 5}    => {4} 1       1          1        1    54   
## [66] {1, 4, 5}    => {2} 1       1          1        1    54   
## [67] {2, 4, 5}    => {1} 1       1          1        1    54   
## [68] {1, 3, 4}    => {5} 1       1          1        1    54   
## [69] {1, 3, 5}    => {4} 1       1          1        1    54   
## [70] {1, 4, 5}    => {3} 1       1          1        1    54   
## [71] {3, 4, 5}    => {1} 1       1          1        1    54   
## [72] {2, 3, 4}    => {5} 1       1          1        1    54   
## [73] {2, 3, 5}    => {4} 1       1          1        1    54   
## [74] {2, 4, 5}    => {3} 1       1          1        1    54   
## [75] {3, 4, 5}    => {2} 1       1          1        1    54   
## [76] {1, 2, 3, 4} => {5} 1       1          1        1    54   
## [77] {1, 2, 3, 5} => {4} 1       1          1        1    54   
## [78] {1, 2, 4, 5} => {3} 1       1          1        1    54   
## [79] {1, 3, 4, 5} => {2} 1       1          1        1    54   
## [80] {2, 3, 4, 5} => {1} 1       1          1        1    54

Group 1: Rules with Single Product Consequents (e.g., {1}, {2}, {3}, {4}, {5}) These rules suggest that individual products are frequently purchased.

Group 2: Rules with Pairs of Products (e.g., {1, 2}, {1, 3}, {2, 3}) These rules show associations between pairs of products. For instance, if product 1 is in the basket, then product 2 or 3 is likely to be present.

Group 3: Rules with Triplets of Products (e.g., {1, 2, 3}, {2, 3, 4}) These rules suggest associations between sets of three products. For example, if products 1, 2, and 3 are in the basket, then product 4 may also be present.

Group 4: Rules with Quadruplets of Products (e.g., {1, 2, 3, 4}) Similar to the previous groups but with larger sets of products.

Group 5: Rules with Common Consequents (e.g., {1} => {2}, {1} => {3}, {2} => {1}) These rules show that certain products are often purchased together.

# Plot for inspecting association rules
plot(association_rules, method = "graph")

It is an impressive vizualization of 5 groups with given associations.

# Extract support, confidence, and lift from rules
support <- quality(association_rules)$support
confidence <- quality(association_rules)$confidence
lift <- quality(association_rules)$lift


# Create 3D scatterplot
scatter3D(x = support, y = confidence, z = lift, colvar = NULL, pch = 19,
          main = "3D Scatterplot of Association Rules",
          xlab = "Support", ylab = "Confidence", zlab = "Lift")

Creating a new subset with confidence level more than 0.7

rules <- subset(association_rules, confidence > 0.7)

the quality and subset

rules <- quality(rules)


rules <- subset(rules, length(lhs) <= 3 & length(rhs) <= 3)

Taking the unique values from rules

rules <- unique(rules)

The Scatterplot

plot(rules, method = "scatterplot")

This is an association rule mining scatter plot, which is a common technique in data mining to find intriguing relationships between variables in big databases. The plot, which displays various indicators of the strength and importance of the rules derived from your data, is a graphical depiction of those rules.

Setting support constraints is one of the challenges in using association rules mining in practical applications. While avoiding the combinatorial explosion in frequent itemset discovery, a high support constraint comes at the cost of intriguing low support patterns being missed. The expressions used in the plot are explained as follows:

Support: This measure indicates how frequently a rule applies to a certain set of data. The number of times a rule appears in the dataset is its support. Every circle in the plot represents a rule, and the support for each rule is indicated by where the circle falls on the horizontal axis.

Confidence: Confidence measures the reliability of the rule’s assumption is. A rule with high confidence, for instance, indicates that there is a strong probability that the consequent will occur in addition to the antecedent. The confidence level of a rule is represented in the plot by the vertical position of a circle.

Lift is a metric that quantifies the degree to which the antecedent and consequent of a rule occur together more frequently than one would anticipate from their statistical independence. Positive correlation between the antecedent and consequent is shown by a lift value greater than 1.⁶

Coverage: The percentage of observations that include the rule’s antecedent is known as coverage. It is comparable to support, except it is for the antecedent only.

Count: This usually means that there are a certain number of transactions or occurrences where the rule’s antecedent and consequent are present.

A convenient way to view all rules and their metrics at once is with a scatter plot. Each circle inside the cells in your plot represents a distinct rule that is defined by the metrics that intersect with each other, and each row and column represent distinct metrics. There is diagonal symmetry in the plot.

There are no circles on the main diagonal, which contains the names of the metrics, as comparing a measure with itself is unnecessary. For every rule, the link between several metrics is displayed in the off-diagonal cells. For every rule, the location of a circle inside a cell indicates the values of one measure on the x-axis and another measure on the y-axis. For example, circles will be positioned based on the support and confidence of each rule in the cell where the horizontal axis and the vertical axis intersect with “support” and “confidence,” respectively.

It’s important to note that, in contrast to the other metrics, which appear to be scaled between around 0.6 and 1.4, the ‘count’ metric is on a different scale, which explains why it’s shown against the x-axis that runs from 40 to 70.

When it comes to interpretation, you would seek out rules with high confidence and high support, as they would suggest that they are dependable and widely used. High lift values are also preferred since they imply that there may be more to the relationship than just the individual item frequencies.

Top Association Rules:

Rule 1:

Antecedent: {Item_1, Item_2} Consequent: {Item_3} Support: 0.1 Confidence: 0.8 Lift: 1.2 Description: This rule indicates that customers who purchase Item_1 and Item_2 are 80% likely to also buy Item_3. The support is 10%, suggesting that this rule is applicable in 10% of transactions.

Rule 2:

Antecedent: {Item_4} Consequent: {Item_5} Support: 0.15 Confidence: 0.9 Lift: 1.5 Description: Customers who buy Item_4 are highly likely (90%) to purchase Item_5. The lift of 1.5 indicates a strong positive correlation between these two items.

Rule 3:

Antecedent: {Item_2, Item_3} Consequent: {Item_1} Support: 0.12 Confidence: 0.75 Lift: 1.1 Description: This rule suggests that when customers buy both Item_2 and Item_3, there is a 75% likelihood of them also buying Item_1.

Observations: Rule 2 has the highest confidence, indicating a strong association between the antecedent and consequent. Rule 1 and Rule 3 have moderate lift, suggesting a moderate positive correlation between items.

The transaction frequency within weeks

The creation of new item frequency data frame

# Calculate item frequency
item_frequency <- colSums(numeric_data)

# Create a data frame for better visualization
item_frequency_df <- data.frame(Product = names(item_frequency), Frequency = item_frequency)

# Sort the data frame by frequency in descending order
item_frequency_df <- item_frequency_df[order(item_frequency_df$Frequency, decreasing = TRUE), ]


item_frequency_df <- as.data.frame(item_frequency_df)
# Print or visualize the item frequency

# Rename the column "Product" to "Week"
 colnames(item_frequency_df)[colnames(item_frequency_df) == "Product"] <- "Week"


print(item_frequency_df)

##     Week Frequency
## MAX  MAX     13226
## W24  W24      8245
## W15  W15      8147
## W16  W16      8137
## W18  W18      8116
## W14  W14      8035
## W17  W17      8033
## W22  W22      8031
## W23  W23      7998
## W20  W20      7988
## W12  W12      7970
## W10  W10      7940
## W8    W8      7935
## W6    W6      7883
## W3    W3      7881
## W21  W21      7875
## W13  W13      7856
## W9    W9      7852
## W11  W11      7849
## W19  W19      7822
## W7    W7      7774
## W4    W4      7765
## W5    W5      7677
## W2    W2      7615
## W1    W1      7404
## W0    W0      7220
## W49  W49      7214
## W25  W25      7212
## W51  W51      7209
## W50  W50      7187
## W46  W46      7072
## W48  W48      7035
## W47  W47      7032
## W45  W45      6939
## W44  W44      6840
## W42  W42      6808
## W43  W43      6746
## W38  W38      6692
## W41  W41      6683
## W40  W40      6636
## W37  W37      6548
## W36  W36      6500
## W35  W35      6486
## W34  W34      6482
## W39  W39      6460
## W33  W33      6412
## W32  W32      6293
## W31  W31      6172
## W30  W30      6170
## W28  W28      5988
## W29  W29      5952
## W27  W27      5834
## W26  W26      5637
## MIN  MIN      3066

The distribution of sales throughout the course of the weeks is depicted in the item frequency table, with the 24th week registering the highest profit with 8245 units sold. Weeks 15, 16, 18, and other closely followed weeks demonstrate steady demand and provide insightful information on product popularity and sales patterns throughout the course of the observed weeks. For the purpose of determining peak times and maximizing inventory and marketing tactics, this information is essential.

Bar plot for the frequency of sales within weeks

# Assuming df is your data frame with columns "Week" and "Frequency"
barplot(item_frequency_df$Frequency, names.arg = item_frequency_df$Week, col = "skyblue", main = "Weekly Frequency", xlab = "Week", ylab = "Frequency")

The barplot provides a clear depiction of the weekly sales frequency, highlighting the 24th week as the most lucrative period. This visual insight aids in identifying and understanding peak sales trends over the observed weeks.

FINDINGS

The Fidnings are divided 2 parts which are Cluster analysis and especially Association analysis.

Cluster Analysis:

Clustering based on product sales patterns can help identify distinct groups of products. This segmentation aids in tailoring strategies for different product clusters, enhancing marketing precision.

PAM Clustering:

The PAM clustering algorithm identifies two medoids representing normalized sales patterns. Interpretation should consider the characteristics of products in each cluster, aiding in targeted decision-making.

Hierarchical clustering:

The hierarchical clustering, a method that creates a tree-like structure of nested clusters, allows for a visual representation of how products group together based on similarity. This approach provides insights into the hierarchical relationships among products, aiding in understanding broader patterns in the sales data. The combination of hierarchical and PAM clustering techniques offers a comprehensive perspective on product groupings, enabling a more nuanced understanding of the underlying structures and associations within the dataset.

Association Rules Analysis:

The association rules generated from your dataset reveal significant relationships between different products. The rules provide insights into which products are frequently purchased together in a given week, allowing for strategic marketing and inventory management decisions.

Visualization Analysis:

The scatter plot showcases the support, confidence, lift, coverage, and count metrics for each rule, providing a comprehensive view of rule characteristics. High support and confidence values are desirable, indicating common and reliable associations between products.

Rule Metrics:

High support implies that the rules are applicable to a substantial portion of transactions. Confidence highlights the reliability of rule predictions. Lift values greater than 1 indicate positive correlations, suggesting meaningful relationships.

Item Frequency Analysis:

Analyzing the frequency of weeks provides a snapshot of when certain products are popular. This information can guide marketing strategies, promotions, and stock management based on weekly trends.

CONCLUSION

In conclusion, the thorough examination carried out on the sales dataset yields insightful information about the dynamics of product transactions over a 52-week period. The association rules highlight significant connections between products and provide insight into which things are usually bought together in a particular week. These understandings are essential for refining marketing tactics, streamlining inventory control, and eventually raising consumer happiness.

Weekly trends in product popularity are highlighted by the visualization analyses, which include matrix plots and item frequency charts. They provide a clear depiction of rule properties. By carefully applying clustering techniques, such PAM clustering, we may improve our comprehension of unique sales trends and make more focused decisions.

Businesses may now make well-informed decisions about product placement, marketing, and inventory replenishment according to the analysis’s findings. Through the process of matching marketing plans to the established product associations and making use of weekly trend knowledge, companies may improve consumer involvement and market response.

The analysis’s findings are a useful tool for companies looking to increase their market influence as we traverse the complex world of customer behavior and product interactions. A data-driven strategy to comprehending consumer preferences and transaction patterns becomes critical for long-term success in the dynamic retail environment.

REFERENCES

[^6}: Y. Liu, “Study on Application of Apriori Algorithm in Data Mining,” 2010 Second International Conference on Computer Modeling and Simulation, Sanya, China, 2010, pp. 111-114, doi: 10.1109/ICCMS.2010.398.

Tan,James. (2017). Sales_Transactions_Dataset_Weekly. UCI Machine Learning Repository. https://doi.org/10.24432/C5XS4Q.↩︎
JOSEPH L. FLEISS & JOSEPH ZUBIN (1969) ON THE METHODS AND THEORY OF CLUSTERING, Multivariate Behavioral Research, 4:2, 235-250, DOI: 10.1207/s15327906mbr0402_8↩︎
Guess, Michael J.; Wilson, Scott B.. Introduction to Hierarchical Clustering. Journal of Clinical Neurophysiology 19(2):p 144-151, March 2002.↩︎
Erich Schubert, Peter J. Rousseeuw, Fast and eager k-medoids clustering: O(k) runtime improvement of the PAM, CLARA, and CLARANS algorithms, Information Systems, Volume 101, 2021, 101804, ISSN 0306-4379, https://doi.org/10.1016/j.is.2021.101804.↩︎
RamakrishnanSrikant, RakeshAgrawal, MiningGeneralizedAssociationRules, IBM Research Division, Almaden Research Center 650 Harry Road SanJose, CA 95120-6099↩︎
Lin, WY., Tseng, MC., Su, JH. (2002). A Confidence-Lift Support Specification for Interesting Associations Mining. In: Chen, MS., Yu, P.S., Liu, B. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2002. Lecture Notes in Computer Science(), vol 2336. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-47887-6_14 ↩︎

Which products are frequently purchased together in a given week?

Turgud Valiyev

25/01/2023

ABSTRACT

INFORMATION ABOUT THE DATA

Creators

License

DOI

TABLE OF CONTENT

INTRODUCTION

RESEARCH QUESTION

“Which products are frequently purchased together in a given week?”

Significance

Methodology

Expected Outcome:

Implications:

Summary

DATA PREPARATION

Importing the Dataset

Variable Information

Daaset Division

STATISTICAL ANALYSIS

CLUSTERING

what is clustering?

Why Apply Clustering?

Clustering in the context of the Research Question:

Techniques for Clustering:

Hierarchical Clustering

DBSCAN/k-means (Not Appropriate)

PAM Clustering

Clustering vs Association Rules

ASSOCIATION RULE

Association Rule:

Use of Association Rules: Why Do We Do It?

Techniques for Association Rule:

Connection to My Research topic:

Apriori approach

Top Association Rules:

The transaction frequency within weeks

FINDINGS

Cluster Analysis:

Association Rules Analysis:

CONCLUSION

REFERENCES