This study uses complex data mining and analytics tools to examine the complex dynamics of product transactions over a 52-week period. Finding significant relationships between products is the main goal, providing a detailed picture of commonly co-purchased goods in a particular week. Utilizing PAM clustering for pattern recognition and the Apriori algorithm for association rule mining, the analysis offers a thorough understanding of customer behavior and product interactions.
Strong association rules are demonstrated by the outcomes, which also highlight important connections and transaction patterns. The study visually illustrates the importance and strength of these laws using visualization techniques like item frequency charts and matrix plots. Furthermore, PAM clustering helps with strategic decision-making by enhancing the analysis and exposing unique sales patterns.
Significant findings provide companies with useful information that can be used to improve consumer engagement, inventory control, and marketing tactics. Businesses can improve their operations and overall market responsiveness by identifying weekly patterns and aligning with established product associations.
To sum up, this study adds to the expanding body of knowledge in retail analytics by offering a data-driven method for figuring out what customers want and how they behave. This study is a useful tool for companies navigating the difficulties of the contemporary retail environment in order to make well-informed decisions and achieve long-term market success.
The key elements of 52 weeks’ worth of sales transactions involving 819 different products is captured in this dataset. The records demonstrate how product purchases have changed over time, exhibiting patterns and trends in consumer behavior over a 12-month period. This dataset offers a rich environment for revealing obscure insights into customer preferences and the dynamic nature of product sales in a business setting because it focuses on both raw and normalized sales data.1
James Tan
This dataset is licensed under a Creative Commons Attribution 4.0 International (CC BY 4.0) license.
10.24432/C5XS4Q
Details about citations
0 citations 14402 views
Understanding customer behavior and identifying trends in purchase decisions are essential for organizations looking to stay ahead in the dynamic and competitive market landscape of today. Large datasets are readily available, offering a rare chance to explore the complexities of customer preferences and uncover insightful information that might inform strategic choices.
This dataset, an extensive compilation of transactional records, is a veritable gold mine of data that captures the subtleties of consumer interactions with a wide range of items. These records, which are a snapshot in time, provide insight into how the market is changing. Analysis of these transactions is essential as companies manage the difficulties of satisfying customer requests, adjusting to trends, and improving product offers.
The goal is apparent when I delve more into this dataset: to learn more about the products that customers purchase and how they are connected in the intricate network of choices they have. The analysis’s conclusions may serve as a basis for product bundling, inventory control, and marketing strategy, all of which could lead to a more knowledgeable and responsive company environment.
The purpose of this study is to identify product co-purchasing tendencies within the dataset. We may learn a great deal about consumer preferences by identifying the goods that are typically purchased in tandem over a given week. This information can then be used to guide strategic decisions about product placement, marketing, and inventory control.
By providing complementary items together, it is possible to improve the entire consumer experience by identifying products that are regularly co-purchased.
Finding connections between products offers a chance to improve marketing tactics by recommending products to specific clients.
Understanding co-purchasing trends can help with effective inventory control by guaranteeing that linked products are suitably stocked.
I will continue to use association rule mining, with the Apriori method in particularly. Utilizing data from transactions, this algorithm works well for identifying frequent item groupings and creating association rules.
A set of association rules highlighting products that are commonly bought together is the anticipated result. An antecedent (items already in the basket) and a consequent (item suggested to be added to the basket) will make up each rule.
1 . Business Strategy:
The outcomes can help with cross-selling, product grouping, and advertising campaign strategy considerations.
Businesses can improve client involvement through targeted promotions and discounts by analyzing co-purchasing behaviors.
By demonstrating hidden relationships between products, this study aims to illuminate the subtleties of consumer behavior. The results could lead to significant gains in operational effectiveness, business profitability, and customer pleasure.
Changing the language to English
Sys.setlocale("LC_ALL","English")
## Warning in Sys.setlocale("LC_ALL", "English"): using locale code page other
## than 65001 ("UTF-8") may cause problems
## [1] "LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252"
Sys.setenv(LANGUAGE='en')
Installing the Packages
# Set the CRAN mirror
options(repos = c(CRAN = "https://cloud.r-project.org"))
install.packages("readr")
install.packages("stats")
install.packages("factoextra")
install.packages("flexclust")
install.packages("fpc")
install.packages("clustertend")
install.packages("cluster")
install.packages("ClusterR")
install.packages("dplyr")
install.packages("ggplot2")
install.packages("hopkins")
install.packages("NbClust")
install.packages("tidyverse")
install.packages("dendextend")
install.packages("Rtsne")
install.packages("gridExtra")
install.packages("caret")
install.packages("pheatmap")
install.packages("FactoMineR")
install.packages("vioplot")
install.packages("stats")
install.packages("arules")
install.packages("arulesViz")
install.packages("plot3D")
install.packages("dbscan")
Activating the packages with Library function
library(readr)
library(stats)
library(factoextra)
## Loading required package: ggplot2
## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa
library(flexclust)
## Loading required package: grid
## Loading required package: lattice
## Loading required package: modeltools
## Loading required package: stats4
library(grid)
library(lattice)
library(modeltools)
library(stats4)
library(hopkins)
library(fpc)
library(clustertend)
## Package `clustertend` is deprecated. Use package `hopkins` instead.
##
## Attaching package: 'clustertend'
## The following object is masked from 'package:hopkins':
##
## hopkins
library(cluster)
library(ClusterR)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
library(lubridate)
##
## Attaching package: 'lubridate'
## The following objects are masked from 'package:base':
##
## date, intersect, setdiff, union
library(NbClust)
library(tidyverse)
## -- Attaching core tidyverse packages ------------------------ tidyverse 2.0.0 --
## v forcats 1.0.0 v tibble 3.2.1
## v purrr 1.0.2 v tidyr 1.3.0
## v stringr 1.5.0
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
## i Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(dendextend)
##
## ---------------------
## Welcome to dendextend version 1.17.1
## Type citation('dendextend') for how to cite the package.
##
## Type browseVignettes(package = 'dendextend') for the package vignette.
## The github page is: https://github.com/talgalili/dendextend/
##
## Suggestions and bug-reports can be submitted at: https://github.com/talgalili/dendextend/issues
## You may ask questions at stackoverflow, use the r and dendextend tags:
## https://stackoverflow.com/questions/tagged/dendextend
##
## To suppress this message use: suppressPackageStartupMessages(library(dendextend))
## ---------------------
##
##
## Attaching package: 'dendextend'
##
## The following object is masked from 'package:stats':
##
## cutree
library(Rtsne)
library(gridExtra)
##
## Attaching package: 'gridExtra'
##
## The following object is masked from 'package:dplyr':
##
## combine
library(caret)
##
## Attaching package: 'caret'
##
## The following object is masked from 'package:purrr':
##
## lift
library(pheatmap)
library(FactoMineR)
library(vioplot)
## Loading required package: sm
## Package 'sm', version 2.2-5.7: type help(sm) for summary information
## Loading required package: zoo
##
## Attaching package: 'zoo'
##
## The following objects are masked from 'package:base':
##
## as.Date, as.Date.numeric
library(stats)
library(arules)
## Loading required package: Matrix
##
## Attaching package: 'Matrix'
##
## The following objects are masked from 'package:tidyr':
##
## expand, pack, unpack
##
##
## Attaching package: 'arules'
##
## The following object is masked from 'package:dplyr':
##
## recode
##
## The following object is masked from 'package:flexclust':
##
## info
##
## The following object is masked from 'package:modeltools':
##
## info
##
## The following objects are masked from 'package:base':
##
## abbreviate, write
library(arulesViz)
library("plot3D")
library(dbscan)
##
## Attaching package: 'dbscan'
##
## The following object is masked from 'package:fpc':
##
## dbscan
##
## The following object is masked from 'package:stats':
##
## as.dendrogram
dataset <- read.csv("C:/Users/User/Desktop/UL Research 3 - Association Rule/Sales_Transactions_Dataset_Weekly.csv")
A brief overview of the dataset
head(dataset)
## Product_Code W0 W1 W2 W3 W4 W5 W6 W7 W8 W9 W10 W11 W12 W13 W14 W15 W16 W17
## 1 P1 11 12 10 8 13 12 14 21 6 14 11 14 16 9 9 9 14 9
## 2 P2 7 6 3 2 7 1 6 3 3 3 2 2 6 2 0 6 2 7
## 3 P3 7 11 8 9 10 8 7 13 12 6 14 9 4 7 12 8 7 11
## 4 P4 12 8 13 5 9 6 9 13 13 11 8 4 5 4 15 7 11 9
## 5 P5 8 5 13 11 6 7 9 14 9 9 11 18 8 4 13 8 10 15
## 6 P6 3 3 2 7 6 3 8 6 6 3 1 1 5 4 3 5 3 5
## W18 W19 W20 W21 W22 W23 W24 W25 W26 W27 W28 W29 W30 W31 W32 W33 W34 W35 W36
## 1 3 12 5 11 7 12 5 9 7 10 5 11 7 10 12 6 5 14 10
## 2 7 9 4 7 2 4 5 3 5 8 5 5 3 1 3 2 3 10 5
## 3 10 7 7 13 11 8 10 8 14 5 3 13 11 9 7 8 7 9 6
## 4 15 4 6 7 11 7 9 6 10 10 2 6 7 2 5 12 5 19 8
## 5 6 13 11 6 10 9 8 12 8 9 13 3 5 3 5 5 9 7 4
## 6 10 8 4 9 7 5 4 2 1 3 2 4 0 3 2 11 2 1 4
## W37 W38 W39 W40 W41 W42 W43 W44 W45 W46 W47 W48 W49 W50 W51 MIN MAX
## 1 9 12 17 7 11 4 7 8 10 12 3 7 6 5 10 3 21
## 2 2 7 3 2 5 2 4 5 1 1 4 5 1 6 0 0 10
## 3 12 12 9 3 5 6 14 5 5 7 8 14 8 8 7 3 14
## 4 6 8 8 12 6 9 10 3 4 6 8 14 8 7 8 2 19
## 5 8 8 5 5 8 7 11 7 12 6 6 5 11 8 9 3 18
## 6 4 3 2 5 4 4 2 4 3 6 5 3 3 10 6 0 11
## Normalized.0 Normalized.1 Normalized.2 Normalized.3 Normalized.4 Normalized.5
## 1 0.44 0.50 0.39 0.28 0.56 0.50
## 2 0.70 0.60 0.30 0.20 0.70 0.10
## 3 0.36 0.73 0.45 0.55 0.64 0.45
## 4 0.59 0.35 0.65 0.18 0.41 0.24
## 5 0.33 0.13 0.67 0.53 0.20 0.27
## 6 0.27 0.27 0.18 0.64 0.55 0.27
## Normalized.6 Normalized.7 Normalized.8 Normalized.9 Normalized.10
## 1 0.61 1.00 0.17 0.61 0.44
## 2 0.60 0.30 0.30 0.30 0.20
## 3 0.36 0.91 0.82 0.27 1.00
## 4 0.41 0.65 0.65 0.53 0.35
## 5 0.40 0.73 0.40 0.40 0.53
## 6 0.73 0.55 0.55 0.27 0.09
## Normalized.11 Normalized.12 Normalized.13 Normalized.14 Normalized.15
## 1 0.61 0.72 0.33 0.33 0.33
## 2 0.20 0.60 0.20 0.00 0.60
## 3 0.55 0.09 0.36 0.82 0.45
## 4 0.12 0.18 0.12 0.76 0.29
## 5 1.00 0.33 0.07 0.67 0.33
## 6 0.09 0.45 0.36 0.27 0.45
## Normalized.16 Normalized.17 Normalized.18 Normalized.19 Normalized.20
## 1 0.61 0.33 0.00 0.50 0.11
## 2 0.20 0.70 0.70 0.90 0.40
## 3 0.36 0.73 0.64 0.36 0.36
## 4 0.53 0.41 0.76 0.12 0.24
## 5 0.47 0.80 0.20 0.67 0.53
## 6 0.27 0.45 0.91 0.73 0.36
## Normalized.21 Normalized.22 Normalized.23 Normalized.24 Normalized.25
## 1 0.44 0.22 0.50 0.11 0.33
## 2 0.70 0.20 0.40 0.50 0.30
## 3 0.91 0.73 0.45 0.64 0.45
## 4 0.29 0.53 0.29 0.41 0.24
## 5 0.20 0.47 0.40 0.33 0.60
## 6 0.82 0.64 0.45 0.36 0.18
## Normalized.26 Normalized.27 Normalized.28 Normalized.29 Normalized.30
## 1 0.22 0.39 0.11 0.44 0.22
## 2 0.50 0.80 0.50 0.50 0.30
## 3 1.00 0.18 0.00 0.91 0.73
## 4 0.47 0.47 0.00 0.24 0.29
## 5 0.33 0.40 0.67 0.00 0.13
## 6 0.09 0.27 0.18 0.36 0.00
## Normalized.31 Normalized.32 Normalized.33 Normalized.34 Normalized.35
## 1 0.39 0.50 0.17 0.11 0.61
## 2 0.10 0.30 0.20 0.30 1.00
## 3 0.55 0.36 0.45 0.36 0.55
## 4 0.00 0.18 0.59 0.18 1.00
## 5 0.00 0.13 0.13 0.40 0.27
## 6 0.27 0.18 1.00 0.18 0.09
## Normalized.36 Normalized.37 Normalized.38 Normalized.39 Normalized.40
## 1 0.39 0.33 0.50 0.78 0.22
## 2 0.50 0.20 0.70 0.30 0.20
## 3 0.27 0.82 0.82 0.55 0.00
## 4 0.35 0.24 0.35 0.35 0.59
## 5 0.07 0.33 0.33 0.13 0.13
## 6 0.36 0.36 0.27 0.18 0.45
## Normalized.41 Normalized.42 Normalized.43 Normalized.44 Normalized.45
## 1 0.44 0.06 0.22 0.28 0.39
## 2 0.50 0.20 0.40 0.50 0.10
## 3 0.18 0.27 1.00 0.18 0.18
## 4 0.24 0.41 0.47 0.06 0.12
## 5 0.33 0.27 0.53 0.27 0.60
## 6 0.36 0.36 0.18 0.36 0.27
## Normalized.46 Normalized.47 Normalized.48 Normalized.49 Normalized.50
## 1 0.50 0.00 0.22 0.17 0.11
## 2 0.10 0.40 0.50 0.10 0.60
## 3 0.36 0.45 1.00 0.45 0.45
## 4 0.24 0.35 0.71 0.35 0.29
## 5 0.20 0.20 0.13 0.53 0.33
## 6 0.55 0.45 0.27 0.27 0.91
## Normalized.51
## 1 0.39
## 2 0.00
## 3 0.36
## 4 0.35
## 5 0.40
## 6 0.55
The number of rows and columns
# The number of rows
nrow(dataset)
## [1] 811
# The number of columns
ncol(dataset)
## [1] 107
This dataset contains 811 rows and 107 columns.
The columns are divided 4 parts.
The first columns is Product Code (p1 to p819): This column likely contains unique codes or identifiers for each product.
The columns from 2 to 53 are Weekly Purchase Quantities (w0 to w51): These columns represent the weekly quantities of products sold over the 52 weeks.
The column 54 and The column 55 are Min and Max Columns: These columns show the minimum and maximum number of products sold for each product.
4 The columns from 55 to 107 are Normalized Weekly Quantities (Normalized 0 to Normalized 51): These columns contain the normalized versions of the weekly quantities. Normalization is a process of scaling data to a standard range, often between 0 and 1.
A strategic division has been defined in order to perform an extensive study on the sales transactions dataset. This section aims to tackle specific aspects of the information and enable a more targeted investigation of patterns and relationships.
The division of the Data set
# primary_data: Weeks, Product Codes, Min, Max
primary_data <- dataset[, c(1, 2:53, 54:55)]
# normalized_data: Product Codes and Normalized Data
normalized_data <- dataset[, c(1, 56:107)]
# Display the dimensions of the subsets
cat("First Part Dimensions:", dim(primary_data), "\n")
## First Part Dimensions: 811 55
cat("Second Part Dimensions:", dim(normalized_data), "\n")
## Second Part Dimensions: 811 53
The view of primary data
head(primary_data)
## Product_Code W0 W1 W2 W3 W4 W5 W6 W7 W8 W9 W10 W11 W12 W13 W14 W15 W16 W17
## 1 P1 11 12 10 8 13 12 14 21 6 14 11 14 16 9 9 9 14 9
## 2 P2 7 6 3 2 7 1 6 3 3 3 2 2 6 2 0 6 2 7
## 3 P3 7 11 8 9 10 8 7 13 12 6 14 9 4 7 12 8 7 11
## 4 P4 12 8 13 5 9 6 9 13 13 11 8 4 5 4 15 7 11 9
## 5 P5 8 5 13 11 6 7 9 14 9 9 11 18 8 4 13 8 10 15
## 6 P6 3 3 2 7 6 3 8 6 6 3 1 1 5 4 3 5 3 5
## W18 W19 W20 W21 W22 W23 W24 W25 W26 W27 W28 W29 W30 W31 W32 W33 W34 W35 W36
## 1 3 12 5 11 7 12 5 9 7 10 5 11 7 10 12 6 5 14 10
## 2 7 9 4 7 2 4 5 3 5 8 5 5 3 1 3 2 3 10 5
## 3 10 7 7 13 11 8 10 8 14 5 3 13 11 9 7 8 7 9 6
## 4 15 4 6 7 11 7 9 6 10 10 2 6 7 2 5 12 5 19 8
## 5 6 13 11 6 10 9 8 12 8 9 13 3 5 3 5 5 9 7 4
## 6 10 8 4 9 7 5 4 2 1 3 2 4 0 3 2 11 2 1 4
## W37 W38 W39 W40 W41 W42 W43 W44 W45 W46 W47 W48 W49 W50 W51 MIN MAX
## 1 9 12 17 7 11 4 7 8 10 12 3 7 6 5 10 3 21
## 2 2 7 3 2 5 2 4 5 1 1 4 5 1 6 0 0 10
## 3 12 12 9 3 5 6 14 5 5 7 8 14 8 8 7 3 14
## 4 6 8 8 12 6 9 10 3 4 6 8 14 8 7 8 2 19
## 5 8 8 5 5 8 7 11 7 12 6 6 5 11 8 9 3 18
## 6 4 3 2 5 4 4 2 4 3 6 5 3 3 10 6 0 11
1 . The primary data subset consists of: Product Codes: P1 through P819 are unique product IDs. Quantities Purchased Each Week (w0–w51): columns that show how many products were sold each week for a total of 52 weeks. The least and maximum quantity of products sold for each individual product are shown in the Min and Max Columns.
The purpose of this subgroup is to investigate sales patterns over time, spot trends, and comprehend how product sales fluctuate from week to week.
The view of normalized_data
head(normalized_data)
## Product_Code Normalized.0 Normalized.1 Normalized.2 Normalized.3 Normalized.4
## 1 P1 0.44 0.50 0.39 0.28 0.56
## 2 P2 0.70 0.60 0.30 0.20 0.70
## 3 P3 0.36 0.73 0.45 0.55 0.64
## 4 P4 0.59 0.35 0.65 0.18 0.41
## 5 P5 0.33 0.13 0.67 0.53 0.20
## 6 P6 0.27 0.27 0.18 0.64 0.55
## Normalized.5 Normalized.6 Normalized.7 Normalized.8 Normalized.9
## 1 0.50 0.61 1.00 0.17 0.61
## 2 0.10 0.60 0.30 0.30 0.30
## 3 0.45 0.36 0.91 0.82 0.27
## 4 0.24 0.41 0.65 0.65 0.53
## 5 0.27 0.40 0.73 0.40 0.40
## 6 0.27 0.73 0.55 0.55 0.27
## Normalized.10 Normalized.11 Normalized.12 Normalized.13 Normalized.14
## 1 0.44 0.61 0.72 0.33 0.33
## 2 0.20 0.20 0.60 0.20 0.00
## 3 1.00 0.55 0.09 0.36 0.82
## 4 0.35 0.12 0.18 0.12 0.76
## 5 0.53 1.00 0.33 0.07 0.67
## 6 0.09 0.09 0.45 0.36 0.27
## Normalized.15 Normalized.16 Normalized.17 Normalized.18 Normalized.19
## 1 0.33 0.61 0.33 0.00 0.50
## 2 0.60 0.20 0.70 0.70 0.90
## 3 0.45 0.36 0.73 0.64 0.36
## 4 0.29 0.53 0.41 0.76 0.12
## 5 0.33 0.47 0.80 0.20 0.67
## 6 0.45 0.27 0.45 0.91 0.73
## Normalized.20 Normalized.21 Normalized.22 Normalized.23 Normalized.24
## 1 0.11 0.44 0.22 0.50 0.11
## 2 0.40 0.70 0.20 0.40 0.50
## 3 0.36 0.91 0.73 0.45 0.64
## 4 0.24 0.29 0.53 0.29 0.41
## 5 0.53 0.20 0.47 0.40 0.33
## 6 0.36 0.82 0.64 0.45 0.36
## Normalized.25 Normalized.26 Normalized.27 Normalized.28 Normalized.29
## 1 0.33 0.22 0.39 0.11 0.44
## 2 0.30 0.50 0.80 0.50 0.50
## 3 0.45 1.00 0.18 0.00 0.91
## 4 0.24 0.47 0.47 0.00 0.24
## 5 0.60 0.33 0.40 0.67 0.00
## 6 0.18 0.09 0.27 0.18 0.36
## Normalized.30 Normalized.31 Normalized.32 Normalized.33 Normalized.34
## 1 0.22 0.39 0.50 0.17 0.11
## 2 0.30 0.10 0.30 0.20 0.30
## 3 0.73 0.55 0.36 0.45 0.36
## 4 0.29 0.00 0.18 0.59 0.18
## 5 0.13 0.00 0.13 0.13 0.40
## 6 0.00 0.27 0.18 1.00 0.18
## Normalized.35 Normalized.36 Normalized.37 Normalized.38 Normalized.39
## 1 0.61 0.39 0.33 0.50 0.78
## 2 1.00 0.50 0.20 0.70 0.30
## 3 0.55 0.27 0.82 0.82 0.55
## 4 1.00 0.35 0.24 0.35 0.35
## 5 0.27 0.07 0.33 0.33 0.13
## 6 0.09 0.36 0.36 0.27 0.18
## Normalized.40 Normalized.41 Normalized.42 Normalized.43 Normalized.44
## 1 0.22 0.44 0.06 0.22 0.28
## 2 0.20 0.50 0.20 0.40 0.50
## 3 0.00 0.18 0.27 1.00 0.18
## 4 0.59 0.24 0.41 0.47 0.06
## 5 0.13 0.33 0.27 0.53 0.27
## 6 0.45 0.36 0.36 0.18 0.36
## Normalized.45 Normalized.46 Normalized.47 Normalized.48 Normalized.49
## 1 0.39 0.50 0.00 0.22 0.17
## 2 0.10 0.10 0.40 0.50 0.10
## 3 0.18 0.36 0.45 1.00 0.45
## 4 0.12 0.24 0.35 0.71 0.35
## 5 0.60 0.20 0.20 0.13 0.53
## 6 0.27 0.55 0.45 0.27 0.27
## Normalized.50 Normalized.51
## 1 0.11 0.39
## 2 0.60 0.00
## 3 0.45 0.36
## 4 0.29 0.35
## 5 0.33 0.40
## 6 0.91 0.55
This subset helps identify patterns that are not impacted by different scales and is especially useful for examining correlations across items using normalized sales data. It also offers insights into relative performance.
The separation of the data into these two subgroups makes the analysis simpler and more efficient. Subset 2 makes it easier to explore links and patterns using normalized data, whereas Subset 1 helps the analysis of sales trends and fluctuations over time. By improving the accuracy and applicability of the analysis, this strategic division eventually leads to a more thorough comprehension of the fundamental dynamics of the sales transactions dataset.
#Summary of primary data
summary(primary_data)
## Product_Code W0 W1 W2
## Length:811 Min. : 0.000 Min. : 0.000 Min. : 0.00
## Class :character 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.00
## Mode :character Median : 3.000 Median : 3.000 Median : 3.00
## Mean : 8.903 Mean : 9.129 Mean : 9.39
## 3rd Qu.:12.000 3rd Qu.:12.000 3rd Qu.:12.00
## Max. :54.000 Max. :53.000 Max. :56.00
## W3 W4 W5 W6
## Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.00
## 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.00
## Median : 4.000 Median : 4.000 Median : 3.000 Median : 4.00
## Mean : 9.718 Mean : 9.575 Mean : 9.466 Mean : 9.72
## 3rd Qu.:13.000 3rd Qu.:13.000 3rd Qu.:12.500 3rd Qu.:13.00
## Max. :59.000 Max. :61.000 Max. :52.000 Max. :56.00
## W7 W8 W9 W10
## Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.00
## 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.00
## Median : 4.000 Median : 4.000 Median : 4.000 Median : 4.00
## Mean : 9.586 Mean : 9.784 Mean : 9.682 Mean : 9.79
## 3rd Qu.:12.500 3rd Qu.:13.000 3rd Qu.:13.000 3rd Qu.:13.00
## Max. :62.000 Max. :63.000 Max. :52.000 Max. :56.00
## W11 W12 W13 W14
## Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000
## 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000
## Median : 4.000 Median : 3.000 Median : 4.000 Median : 4.000
## Mean : 9.678 Mean : 9.827 Mean : 9.687 Mean : 9.908
## 3rd Qu.:13.000 3rd Qu.:13.000 3rd Qu.:13.000 3rd Qu.:13.000
## Max. :57.000 Max. :61.000 Max. :55.000 Max. :57.000
## W15 W16 W17 W18
## Min. : 0.00 Min. : 0.00 Min. : 0.000 Min. : 0.00
## 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.: 0.000 1st Qu.: 0.00
## Median : 4.00 Median : 4.00 Median : 4.000 Median : 4.00
## Mean :10.05 Mean :10.03 Mean : 9.905 Mean :10.01
## 3rd Qu.:14.00 3rd Qu.:13.00 3rd Qu.:13.000 3rd Qu.:13.00
## Max. :59.00 Max. :62.00 Max. :67.000 Max. :57.00
## W19 W20 W21 W22
## Min. : 0.000 Min. : 0.00 Min. : 0.00 Min. : 0.000
## 1st Qu.: 0.000 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.: 0.000
## Median : 4.000 Median : 4.00 Median : 4.00 Median : 4.000
## Mean : 9.645 Mean : 9.85 Mean : 9.71 Mean : 9.903
## 3rd Qu.:13.000 3rd Qu.:13.00 3rd Qu.:13.00 3rd Qu.:13.000
## Max. :56.000 Max. :64.00 Max. :58.00 Max. :51.000
## W23 W24 W25 W26
## Min. : 0.000 Min. : 0.00 Min. : 0.000 Min. : 0.000
## 1st Qu.: 0.000 1st Qu.: 1.00 1st Qu.: 1.000 1st Qu.: 0.000
## Median : 4.000 Median : 5.00 Median : 5.000 Median : 3.000
## Mean : 9.862 Mean :10.17 Mean : 8.893 Mean : 6.951
## 3rd Qu.:14.000 3rd Qu.:16.00 3rd Qu.:15.000 3rd Qu.: 9.000
## Max. :72.000 Max. :64.00 Max. :64.000 Max. :46.000
## W27 W28 W29 W30
## Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000
## 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000
## Median : 3.000 Median : 3.000 Median : 3.000 Median : 3.000
## Mean : 7.194 Mean : 7.383 Mean : 7.339 Mean : 7.608
## 3rd Qu.:10.000 3rd Qu.: 9.000 3rd Qu.:10.000 3rd Qu.:10.000
## Max. :47.000 Max. :44.000 Max. :42.000 Max. :48.000
## W31 W32 W33 W34
## Min. : 0.00 Min. : 0.00 Min. : 0.000 Min. : 0.000
## 1st Qu.: 0.00 1st Qu.: 0.00 1st Qu.: 0.000 1st Qu.: 0.000
## Median : 3.00 Median : 3.00 Median : 3.000 Median : 3.000
## Mean : 7.61 Mean : 7.76 Mean : 7.906 Mean : 7.993
## 3rd Qu.:10.00 3rd Qu.:10.00 3rd Qu.:10.000 3rd Qu.:10.500
## Max. :47.00 Max. :49.00 Max. :46.000 Max. :46.000
## W35 W36 W37 W38
## Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000
## 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.000
## Median : 3.000 Median : 3.000 Median : 3.000 Median : 3.000
## Mean : 7.998 Mean : 8.015 Mean : 8.074 Mean : 8.252
## 3rd Qu.:10.000 3rd Qu.:10.000 3rd Qu.:11.000 3rd Qu.:11.000
## Max. :46.000 Max. :55.000 Max. :47.000 Max. :52.000
## W39 W40 W41 W42
## Min. : 0.000 Min. : 0.000 Min. : 0.00 Min. : 0.000
## 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 0.00 1st Qu.: 0.000
## Median : 3.000 Median : 4.000 Median : 3.00 Median : 4.000
## Mean : 7.965 Mean : 8.182 Mean : 8.24 Mean : 8.395
## 3rd Qu.:10.000 3rd Qu.:10.000 3rd Qu.:11.00 3rd Qu.:10.000
## Max. :47.000 Max. :48.000 Max. :50.00 Max. :52.000
## W43 W44 W45 W46
## Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.00
## 1st Qu.: 0.000 1st Qu.: 0.000 1st Qu.: 1.000 1st Qu.: 1.00
## Median : 4.000 Median : 4.000 Median : 4.000 Median : 4.00
## Mean : 8.318 Mean : 8.434 Mean : 8.556 Mean : 8.72
## 3rd Qu.:11.000 3rd Qu.:11.000 3rd Qu.:11.000 3rd Qu.:11.00
## Max. :50.000 Max. :46.000 Max. :46.000 Max. :55.00
## W47 W48 W49 W50
## Min. : 0.000 Min. : 0.000 Min. : 0.000 Min. : 0.000
## 1st Qu.: 0.000 1st Qu.: 1.000 1st Qu.: 1.000 1st Qu.: 1.000
## Median : 4.000 Median : 4.000 Median : 4.000 Median : 5.000
## Mean : 8.671 Mean : 8.674 Mean : 8.895 Mean : 8.862
## 3rd Qu.:12.000 3rd Qu.:12.000 3rd Qu.:12.000 3rd Qu.:13.000
## Max. :49.000 Max. :50.000 Max. :52.000 Max. :57.000
## W51 MIN MAX
## Min. : 0.000 Min. : 0.000 Min. : 1.00
## 1st Qu.: 1.000 1st Qu.: 0.000 1st Qu.: 3.00
## Median : 5.000 Median : 0.000 Median : 9.00
## Mean : 8.889 Mean : 3.781 Mean :16.31
## 3rd Qu.:14.000 3rd Qu.: 4.000 3rd Qu.:21.00
## Max. :73.000 Max. :24.000 Max. :73.00
In fact, the average sales within 52 weeks were between 7 items and 10 items. We may observe that maximum 73 items and minimum 1 item sold in a certain week.
sum(is.na(dataset))
## [1] 0
colSums(is.na(dataset))
## Product_Code W0 W1 W2 W3
## 0 0 0 0 0
## W4 W5 W6 W7 W8
## 0 0 0 0 0
## W9 W10 W11 W12 W13
## 0 0 0 0 0
## W14 W15 W16 W17 W18
## 0 0 0 0 0
## W19 W20 W21 W22 W23
## 0 0 0 0 0
## W24 W25 W26 W27 W28
## 0 0 0 0 0
## W29 W30 W31 W32 W33
## 0 0 0 0 0
## W34 W35 W36 W37 W38
## 0 0 0 0 0
## W39 W40 W41 W42 W43
## 0 0 0 0 0
## W44 W45 W46 W47 W48
## 0 0 0 0 0
## W49 W50 W51 MIN MAX
## 0 0 0 0 0
## Normalized.0 Normalized.1 Normalized.2 Normalized.3 Normalized.4
## 0 0 0 0 0
## Normalized.5 Normalized.6 Normalized.7 Normalized.8 Normalized.9
## 0 0 0 0 0
## Normalized.10 Normalized.11 Normalized.12 Normalized.13 Normalized.14
## 0 0 0 0 0
## Normalized.15 Normalized.16 Normalized.17 Normalized.18 Normalized.19
## 0 0 0 0 0
## Normalized.20 Normalized.21 Normalized.22 Normalized.23 Normalized.24
## 0 0 0 0 0
## Normalized.25 Normalized.26 Normalized.27 Normalized.28 Normalized.29
## 0 0 0 0 0
## Normalized.30 Normalized.31 Normalized.32 Normalized.33 Normalized.34
## 0 0 0 0 0
## Normalized.35 Normalized.36 Normalized.37 Normalized.38 Normalized.39
## 0 0 0 0 0
## Normalized.40 Normalized.41 Normalized.42 Normalized.43 Normalized.44
## 0 0 0 0 0
## Normalized.45 Normalized.46 Normalized.47 Normalized.48 Normalized.49
## 0 0 0 0 0
## Normalized.50 Normalized.51
## 0 0
We may observe that there are no missing values on the dataset. The Data Cleaning and Handling with missing data steps is not needed for this dataset.If we would need to implement, we could delete rows which conclude na values with na.omit() function or we might substitute na values with mean/median.
# Set the threshold for considering a point as an outlier (e.g., 1.5 times the IQR)
iqr_threshold <- 1.5
# Function to identify outliers in a single column
find_outliers <- function(column) {
column <- as.numeric(column) # Ensure numeric type
q1 <- quantile(column, 0.25, na.rm = TRUE)
q3 <- quantile(column, 0.75, na.rm = TRUE)
iqr <- q3 - q1
lower_bound <- q1 - iqr_threshold * iqr
upper_bound <- q3 + iqr_threshold * iqr
return(which(column < lower_bound | column > upper_bound))
}
# Identify outliers for each column
outliers_indices <- lapply(normalized_data, find_outliers)
# Combine the indices of outliers from all columns
all_outliers <- unique(unlist(outliers_indices))
# Print the row indices of outliers
cat("Outliers found at rows:", all_outliers, "\n")
## Outliers found at rows: 3 149 204 309 342 418 421 424 442 447 726 774 115 171 258 292 319 348 481 577 583 605 637 744 787 105 147 349 460 625 683 708 722 784 802 203 225 235 278 307 375 477 585 663 704 723 731 199 236 248 288 340 344 423 653 711
outliers_indices
## $Product_Code
## integer(0)
##
## $Normalized.0
## integer(0)
##
## $Normalized.1
## integer(0)
##
## $Normalized.2
## integer(0)
##
## $Normalized.3
## integer(0)
##
## $Normalized.4
## integer(0)
##
## $Normalized.5
## integer(0)
##
## $Normalized.6
## integer(0)
##
## $Normalized.7
## integer(0)
##
## $Normalized.8
## integer(0)
##
## $Normalized.9
## integer(0)
##
## $Normalized.10
## integer(0)
##
## $Normalized.11
## integer(0)
##
## $Normalized.12
## integer(0)
##
## $Normalized.13
## integer(0)
##
## $Normalized.14
## integer(0)
##
## $Normalized.15
## integer(0)
##
## $Normalized.16
## integer(0)
##
## $Normalized.17
## integer(0)
##
## $Normalized.18
## integer(0)
##
## $Normalized.19
## integer(0)
##
## $Normalized.20
## integer(0)
##
## $Normalized.21
## integer(0)
##
## $Normalized.22
## integer(0)
##
## $Normalized.23
## integer(0)
##
## $Normalized.24
## integer(0)
##
## $Normalized.25
## integer(0)
##
## $Normalized.26
## [1] 3 149 204 309 342 418 421 424 442 447 726 774
##
## $Normalized.27
## [1] 115 171 258 292 319 348 481 577 583 605 637 744 787
##
## $Normalized.28
## [1] 105 147 349 460 625 683 708 722 784 802
##
## $Normalized.29
## [1] 203 225 235 278 307 375 477 585 663 704 723 731
##
## $Normalized.30
## [1] 199 236 248 288 319 340 344 421 423 577 653 708 711 784
##
## $Normalized.31
## integer(0)
##
## $Normalized.32
## integer(0)
##
## $Normalized.33
## integer(0)
##
## $Normalized.34
## integer(0)
##
## $Normalized.35
## integer(0)
##
## $Normalized.36
## integer(0)
##
## $Normalized.37
## integer(0)
##
## $Normalized.38
## integer(0)
##
## $Normalized.39
## integer(0)
##
## $Normalized.40
## integer(0)
##
## $Normalized.41
## integer(0)
##
## $Normalized.42
## integer(0)
##
## $Normalized.43
## integer(0)
##
## $Normalized.44
## integer(0)
##
## $Normalized.45
## integer(0)
##
## $Normalized.46
## integer(0)
##
## $Normalized.47
## integer(0)
##
## $Normalized.48
## integer(0)
##
## $Normalized.49
## integer(0)
##
## $Normalized.50
## integer(0)
##
## $Normalized.51
## integer(0)
primary_data %>% select(Product_Code, MIN, MAX) %>%
filter(primary_data$MAX > 60) %>% arrange(desc(MAX))
## Product_Code MIN MAX
## 1 P409 23 73
## 2 P83 20 63
## 3 P262 13 63
## 4 P84 21 62
## 5 P621 16 62
## 6 P36 19 61
## 7 P38 20 61
## 8 P43 20 61
Following an extensive review of the dataset, eight products stood out as the most popular during the 52-week period: P409, P83, P262, P84, P621, P36, P38, and P43. With maximum quantities ranging from 61 to 73 units, these best-selling products demonstrated remarkable sales performance and demonstrated their importance within the sales portfolio.
ggplot(dataset, aes(x = MAX)) +
geom_histogram(binwidth = 5, fill = "blue", color = "black") +
labs(title = "Distribution of Maximum Sales Quantities",
x = "Maximum Sales Quantity",
y = "Frequency")
# Create a density plot for sales quantities
ggplot(dataset, aes(x = MAX)) +
geom_density(fill = "skyblue", color = "black") +
labs(title = "Density Plot of Sales Quantities",
x = "Sales Quantity",
y = "Density") +
theme_minimal()
The density plot assists us to understand that the mostly items sold in
a 0-20 interval.
# Boxplot to display the distribution of sales within weeks
boxplot(primary_data[, 2:53],
main = "Distribution of Sales Within Weeks",
xlab = "Weeks", ylab = "Sales",
col = "lightblue", border = "black")
This boxplot is a good way to see change in sales within weeks.
# Violin plot to display the distribution of sales within weeks
vioplot(primary_data[, 2:53],
names = paste0("Week ", 0:51),
col = "lightblue", border = "black",
main = "Distribution of Sales Within Weeks",
xlab = "Weeks", ylab = "Sales")
It is a vivolin plot which is Distribution of Sales Within Weeks
# Bar plot to display the exact number of sales within weeks
barplot(t(primary_data[, 2:53]),
beside = TRUE, col = "lightblue",
main = "Number of Sales Within Weeks",
xlab = "Weeks", ylab = "Number of Sales")
I used the Bar plot to display the exact number of sales within weeks.
# Correlation between min and max sales quantities
correlation_result <- cor(dataset$MIN, dataset$MAX)
# Display the correlation result
print(paste("Correlation between MIN and MAX:", correlation_result))
## [1] "Correlation between MIN and MAX: 0.948931954310161"
The minimum (MIN) and maximum (MAX) sales amounts have a strong positive linear relationship, as indicated by the correlation coefficient of 0.949. This suggests that there is a consistent sales pattern throughout the dataset, with products that have greater minimum sales also typically having higher maximum sales.
# Create a scatter plot for correlation analysis
ggplot(dataset, aes(x = MIN, y = MAX)) +
geom_point(alpha = 0.7, color = "blue") +
geom_smooth(method = "lm", se = FALSE, color = "red") +
labs(title = "Correlation Analysis: MIN vs MAX Sales Quantities",
x = "Minimum Sales Quantity",
y = "Maximum Sales Quantity") +
theme_minimal()
## `geom_smooth()` using formula = 'y ~ x'
It is a scatter plot to demonstrate the correlation between max and min values.
# Create a new column for the total sales performance
dataset$Total_Sales <- rowSums(dataset[, 2:53])
# Rank products based on total sales performance
ranked_products <- dataset %>%
select(Product_Code, Total_Sales) %>%
arrange(desc(Total_Sales)) %>%
mutate(Rank = rank(desc(Total_Sales)))
# Display the ranked products
print(ranked_products)
## Product_Code Total_Sales Rank
## 1 P409 2220 1.0
## 2 P34 1932 2.0
## 3 P178 1925 3.0
## 4 P135 1920 4.0
## 5 P43 1913 5.0
## 6 P190 1912 6.0
## 7 P179 1904 7.0
## 8 P173 1897 8.0
## 9 P92 1896 9.0
## 10 P137 1894 10.0
## 11 P38 1892 11.0
## 12 P174 1886 12.0
## 13 P24 1877 13.0
## 14 P16 1875 14.0
## 15 P40 1864 15.0
## 16 P54 1860 16.0
## 17 P136 1859 17.5
## 18 P193 1859 17.5
## 19 P37 1858 19.0
## 20 P191 1854 20.0
## 21 P180 1853 21.0
## 22 P101 1849 22.0
## 23 P36 1843 23.0
## 24 P66 1842 24.0
## 25 P134 1841 25.0
## 26 P75 1835 26.0
## 27 P129 1832 27.0
## 28 P128 1825 28.0
## 29 P175 1824 29.0
## 30 P63 1823 30.0
## 31 P132 1819 31.0
## 32 P41 1818 32.0
## 33 P72 1813 33.0
## 34 P83 1812 34.0
## 35 P112 1808 35.0
## 36 P15 1805 36.5
## 37 P35 1805 36.5
## 38 P27 1804 39.0
## 39 P48 1804 39.0
## 40 P172 1804 39.0
## 41 P96 1803 41.0
## 42 P186 1799 42.0
## 43 P49 1797 43.0
## 44 P168 1786 44.0
## 45 P184 1783 45.0
## 46 P618 1782 46.0
## 47 P133 1781 47.0
## 48 P185 1768 48.0
## 49 P17 1765 49.5
## 50 P130 1765 49.5
## 51 P39 1756 51.0
## 52 P84 1752 52.0
## 53 P69 1746 53.0
## 54 P131 1745 54.0
## 55 P47 1738 55.5
## 56 P58 1738 55.5
## 57 P140 1735 57.0
## 58 P167 1732 58.0
## 59 P120 1726 59.0
## 60 P57 1724 60.5
## 61 P119 1724 60.5
## 62 P177 1720 62.0
## 63 P46 1716 63.0
## 64 P52 1715 64.5
## 65 P60 1715 64.5
## 66 P176 1712 66.0
## 67 P170 1711 67.0
## 68 P139 1710 68.0
## 69 P90 1709 69.0
## 70 P73 1707 70.0
## 71 P181 1705 71.0
## 72 P86 1704 72.0
## 73 P56 1701 73.0
## 74 P621 1698 74.0
## 75 P196 1697 75.0
## 76 P28 1696 76.0
## 77 P143 1694 77.0
## 78 P622 1693 78.0
## 79 P44 1692 79.5
## 80 P67 1692 79.5
## 81 P548 1691 81.0
## 82 P30 1690 82.0
## 83 P19 1687 83.0
## 84 P192 1684 84.0
## 85 P76 1683 85.0
## 86 P18 1682 86.5
## 87 P79 1682 86.5
## 88 P262 1681 88.0
## 89 P188 1678 89.5
## 90 P189 1678 89.5
## 91 P141 1676 91.0
## 92 P89 1675 92.0
## 93 P623 1673 93.0
## 94 P619 1672 94.0
## 95 P61 1671 95.0
## 96 P42 1670 96.0
## 97 P70 1668 97.0
## 98 P138 1663 98.5
## 99 P142 1663 98.5
## 100 P78 1661 100.0
## 101 P617 1656 101.0
## 102 P45 1651 102.0
## 103 P64 1649 103.0
## 104 P85 1648 104.0
## 105 P208 1645 105.0
## 106 P97 1644 106.0
## 107 P87 1643 107.5
## 108 P102 1643 107.5
## 109 P55 1642 109.0
## 110 P182 1637 110.0
## 111 P549 1635 111.0
## 112 P88 1626 112.0
## 113 P169 1617 113.5
## 114 P183 1617 113.5
## 115 P194 1613 115.0
## 116 P113 1612 116.0
## 117 P25 1602 117.0
## 118 P80 1598 118.0
## 119 P620 1593 119.0
## 120 P187 1579 120.0
## 121 P557 1315 121.0
## 122 P511 1289 122.0
## 123 P615 1265 123.0
## 124 P533 1253 124.0
## 125 P613 1153 125.0
## 126 P261 1069 126.0
## 127 P270 1030 127.0
## 128 P10 1010 128.0
## 129 P516 969 129.0
## 130 P519 967 130.0
## 131 P405 966 131.0
## 132 P512 963 132.0
## 133 P268 960 133.5
## 134 P286 960 133.5
## 135 P407 958 135.0
## 136 P554 956 136.0
## 137 P535 949 137.0
## 138 P566 946 138.0
## 139 P486 945 139.0
## 140 P263 939 140.0
## 141 P51 932 141.0
## 142 P284 930 142.0
## 143 P491 927 143.0
## 144 P513 922 144.0
## 145 P107 921 145.0
## 146 P537 920 146.0
## 147 P540 908 147.0
## 148 P200 905 148.0
## 149 P411 885 149.0
## 150 P507 875 150.0
## 151 P435 867 151.5
## 152 P640 867 151.5
## 153 P410 860 153.0
## 154 P781 855 154.0
## 155 P62 834 155.0
## 156 P545 829 156.0
## 157 P495 826 157.0
## 158 P403 823 158.0
## 159 P530 820 159.0
## 160 P400 818 160.0
## 161 P526 816 161.0
## 162 P503 811 162.0
## 163 P202 805 163.0
## 164 P505 797 164.0
## 165 P556 786 165.0
## 166 P406 691 166.0
## 167 P783 687 167.0
## 168 P612 681 168.0
## 169 P638 667 169.0
## 170 P399 660 170.0
## 171 P538 659 171.0
## 172 P529 658 172.0
## 173 P364 656 173.0
## 174 P210 653 174.0
## 175 P502 651 175.0
## 176 P430 643 176.0
## 177 P95 640 177.5
## 178 P269 640 177.5
## 179 P598 636 179.0
## 180 P525 635 180.0
## 181 P520 633 181.0
## 182 P205 632 182.0
## 183 P506 631 183.0
## 184 P29 628 184.0
## 185 P408 625 185.5
## 186 P487 625 185.5
## 187 P398 624 187.0
## 188 P558 619 188.0
## 189 P14 615 189.0
## 190 P494 613 190.0
## 191 P33 604 191.0
## 192 P536 603 192.0
## 193 P517 602 193.5
## 194 P546 602 193.5
## 195 P11 601 195.0
## 196 P544 600 196.0
## 197 P514 594 197.0
## 198 P636 593 198.0
## 199 P523 591 199.0
## 200 P211 589 200.0
## 201 P559 586 201.0
## 202 P504 582 202.0
## 203 P100 579 203.0
## 204 P115 571 204.0
## 205 P528 566 205.0
## 206 P527 565 206.0
## 207 P488 559 207.0
## 208 P542 558 208.0
## 209 P106 557 209.5
## 210 P493 557 209.5
## 211 P492 556 211.5
## 212 P541 556 211.5
## 213 P522 555 213.0
## 214 P9 539 214.0
## 215 P543 537 215.5
## 216 P555 537 215.5
## 217 P26 535 217.0
## 218 P198 532 218.0
## 219 P521 526 219.0
## 220 P547 525 220.0
## 221 P632 520 221.0
## 222 P634 518 222.0
## 223 P209 515 223.0
## 224 P334 513 224.5
## 225 P524 513 224.5
## 226 P165 512 226.5
## 227 P627 512 226.5
## 228 P314 504 228.0
## 229 P518 503 229.0
## 230 P118 502 231.0
## 231 P166 502 231.0
## 232 P397 502 231.0
## 233 P1 501 233.0
## 234 P197 500 234.5
## 235 P485 500 234.5
## 236 P560 499 236.0
## 237 P71 494 237.0
## 238 P144 493 238.0
## 239 P266 492 239.0
## 240 P633 491 240.0
## 241 P264 490 241.0
## 242 P65 489 242.5
## 243 P109 489 242.5
## 244 P22 487 244.5
## 245 P114 487 244.5
## 246 P149 486 247.5
## 247 P153 486 247.5
## 248 P267 486 247.5
## 249 P625 486 247.5
## 250 P116 485 250.5
## 251 P294 485 250.5
## 252 P99 483 252.0
## 253 P309 482 254.0
## 254 P539 482 254.0
## 255 P629 482 254.0
## 256 P564 480 256.0
## 257 P147 478 258.5
## 258 P285 478 258.5
## 259 P515 478 258.5
## 260 P550 478 258.5
## 261 P413 476 261.5
## 262 P551 476 261.5
## 263 P152 474 264.0
## 264 P299 474 264.0
## 265 P319 474 264.0
## 266 P32 473 267.5
## 267 P160 473 267.5
## 268 P333 473 267.5
## 269 P532 473 267.5
## 270 P324 472 271.0
## 271 P404 472 271.0
## 272 P534 472 271.0
## 273 P13 470 273.5
## 274 P145 470 273.5
## 275 P74 469 275.0
## 276 P496 467 276.0
## 277 P59 466 277.0
## 278 P20 465 278.5
## 279 P164 465 278.5
## 280 P122 464 280.5
## 281 P626 464 280.5
## 282 P81 463 282.5
## 283 P635 463 282.5
## 284 P21 462 285.0
## 285 P436 462 285.0
## 286 P624 462 285.0
## 287 P110 461 287.5
## 288 P121 461 287.5
## 289 P332 460 289.5
## 290 P631 460 289.5
## 291 P50 459 291.0
## 292 P91 457 293.5
## 293 P94 457 293.5
## 294 P429 457 293.5
## 295 P499 457 293.5
## 296 P508 456 296.0
## 297 P93 453 297.0
## 298 P3 452 298.5
## 299 P552 452 298.5
## 300 P8 450 301.0
## 301 P563 450 301.0
## 302 P630 450 301.0
## 303 P162 449 303.0
## 304 P171 448 304.0
## 305 P82 444 305.0
## 306 P31 442 306.5
## 307 P68 442 306.5
## 308 P565 441 308.0
## 309 P5 440 309.0
## 310 P146 439 310.0
## 311 P628 437 311.0
## 312 P125 435 312.0
## 313 P154 434 313.0
## 314 P490 433 314.0
## 315 P4 430 316.0
## 316 P157 430 316.0
## 317 P304 430 316.0
## 318 P103 429 318.0
## 319 P432 414 319.0
## 320 P674 381 320.0
## 321 P510 379 321.0
## 322 P500 324 322.0
## 323 P431 319 323.0
## 324 P437 318 324.0
## 325 P203 316 325.0
## 326 P586 314 326.0
## 327 P702 301 327.0
## 328 P207 294 328.5
## 329 P509 294 328.5
## 330 P791 287 330.0
## 331 P596 285 331.0
## 332 P614 284 332.0
## 333 P368 280 333.0
## 334 P392 279 334.0
## 335 P571 277 335.0
## 336 P745 272 336.0
## 337 P806 269 337.0
## 338 P814 266 338.0
## 339 P359 265 339.5
## 340 P501 265 339.5
## 341 P305 256 341.0
## 342 P531 254 342.5
## 343 P616 254 342.5
## 344 P799 251 344.0
## 345 P331 246 345.0
## 346 P124 240 346.0
## 347 P111 238 347.0
## 348 P53 234 348.5
## 349 P77 234 348.5
## 350 P310 232 350.5
## 351 P497 232 350.5
## 352 P295 231 352.5
## 353 P329 231 352.5
## 354 P489 230 354.0
## 355 P675 229 355.0
## 356 P161 228 356.5
## 357 P484 228 356.5
## 358 P307 227 358.0
## 359 P117 226 360.5
## 360 P195 226 360.5
## 361 P311 226 360.5
## 362 P317 226 360.5
## 363 P126 225 363.0
## 364 P300 223 364.0
## 365 P151 221 365.0
## 366 P6 220 367.5
## 367 P313 220 367.5
## 368 P562 220 367.5
## 369 P637 220 367.5
## 370 P296 219 371.0
## 371 P316 219 371.0
## 372 P320 219 371.0
## 373 P302 217 373.0
## 374 P321 216 374.0
## 375 P23 215 375.0
## 376 P325 214 376.0
## 377 P7 213 377.0
## 378 P159 212 380.5
## 379 P301 212 380.5
## 380 P306 212 380.5
## 381 P312 212 380.5
## 382 P326 212 380.5
## 383 P433 212 380.5
## 384 P323 211 384.0
## 385 P98 210 385.0
## 386 P328 209 386.5
## 387 P330 209 386.5
## 388 P2 207 389.0
## 389 P308 207 389.0
## 390 P315 207 389.0
## 391 P104 206 391.5
## 392 P768 206 391.5
## 393 P150 205 393.5
## 394 P298 205 393.5
## 395 P155 204 396.5
## 396 P163 204 396.5
## 397 P297 204 396.5
## 398 P369 204 396.5
## 399 P12 203 399.5
## 400 P561 203 399.5
## 401 P327 202 401.5
## 402 P553 202 401.5
## 403 P148 201 403.5
## 404 P337 201 403.5
## 405 P303 200 405.0
## 406 P498 199 407.0
## 407 P567 199 407.0
## 408 P599 199 407.0
## 409 P401 198 409.0
## 410 P201 197 411.0
## 411 P293 197 411.0
## 412 P356 197 411.0
## 413 P105 195 413.5
## 414 P610 195 413.5
## 415 P292 193 415.5
## 416 P322 193 415.5
## 417 P318 191 417.5
## 418 P365 191 417.5
## 419 P782 189 419.0
## 420 P590 188 420.0
## 421 P123 187 421.5
## 422 P156 187 421.5
## 423 P370 185 424.5
## 424 P587 185 424.5
## 425 P611 185 424.5
## 426 P803 185 424.5
## 427 P811 184 427.0
## 428 P291 183 428.0
## 429 P336 182 429.5
## 430 P412 182 429.5
## 431 P271 181 431.0
## 432 P158 180 432.0
## 433 P609 179 433.0
## 434 P642 177 434.5
## 435 P788 177 434.5
## 436 P415 176 436.5
## 437 P784 176 436.5
## 438 P593 174 438.0
## 439 P199 173 439.5
## 440 P796 173 439.5
## 441 P580 172 441.0
## 442 P594 171 442.0
## 443 P402 170 443.0
## 444 P335 169 444.5
## 445 P573 169 444.5
## 446 P341 168 446.5
## 447 P568 168 446.5
## 448 P387 167 448.5
## 449 P414 167 448.5
## 450 P764 165 450.0
## 451 P789 164 451.0
## 452 P581 163 452.0
## 453 P676 162 453.5
## 454 P808 162 453.5
## 455 P584 160 455.5
## 456 P673 160 455.5
## 457 P366 159 457.0
## 458 P701 158 458.5
## 459 P727 158 458.5
## 460 P703 157 460.5
## 461 P793 157 460.5
## 462 P388 153 462.5
## 463 P797 153 462.5
## 464 P804 152 464.0
## 465 P747 151 465.0
## 466 P801 150 466.0
## 467 P108 148 467.0
## 468 P737 145 468.0
## 469 P816 142 469.0
## 470 P583 140 470.0
## 471 P390 139 472.0
## 472 P569 139 472.0
## 473 P591 139 472.0
## 474 P595 138 474.5
## 475 P805 138 474.5
## 476 P206 137 476.0
## 477 P338 136 478.0
## 478 P434 136 478.0
## 479 P705 136 478.0
## 480 P391 135 480.0
## 481 P345 133 482.5
## 482 P475 133 482.5
## 483 P686 133 482.5
## 484 P693 133 482.5
## 485 P361 132 485.0
## 486 P570 131 486.0
## 487 P394 130 487.5
## 488 P641 130 487.5
## 489 P389 129 489.0
## 490 P790 126 490.0
## 491 P582 124 491.0
## 492 P204 122 493.5
## 493 P395 122 493.5
## 494 P396 122 493.5
## 495 P813 122 493.5
## 496 P698 121 496.0
## 497 P342 120 497.0
## 498 P798 119 498.0
## 499 P692 118 499.5
## 500 P786 118 499.5
## 501 P127 117 501.5
## 502 P367 117 501.5
## 503 P357 114 504.0
## 504 P371 114 504.0
## 505 P600 114 504.0
## 506 P372 111 506.5
## 507 P785 111 506.5
## 508 P752 110 508.0
## 509 P588 109 509.0
## 510 P265 103 510.5
## 511 P343 103 510.5
## 512 P762 89 512.0
## 513 P601 87 513.0
## 514 P344 82 514.0
## 515 P812 80 515.0
## 516 P589 77 516.0
## 517 P800 76 517.0
## 518 P373 75 518.0
## 519 P792 74 519.0
## 520 P572 72 520.0
## 521 P393 70 521.5
## 522 P732 70 521.5
## 523 P481 69 524.0
## 524 P585 69 524.0
## 525 P769 69 524.0
## 526 P597 68 527.0
## 527 P787 68 527.0
## 528 P807 68 527.0
## 529 P767 67 529.0
## 530 P706 66 530.0
## 531 P358 64 531.0
## 532 P339 63 533.5
## 533 P592 63 533.5
## 534 P605 63 533.5
## 535 P746 63 533.5
## 536 P736 62 536.0
## 537 P287 61 537.0
## 538 P362 60 538.0
## 539 P374 59 539.5
## 540 P476 59 539.5
## 541 P726 58 541.0
## 542 P416 57 542.0
## 543 P255 55 543.5
## 544 P478 55 543.5
## 545 P687 53 545.0
## 546 P699 52 546.5
## 547 P742 52 546.5
## 548 P360 51 548.0
## 549 P281 50 549.0
## 550 P743 49 550.5
## 551 P754 49 550.5
## 552 P479 47 552.5
## 553 P753 47 552.5
## 554 P700 46 554.0
## 555 P288 44 555.5
## 556 P606 44 555.5
## 557 P422 41 558.5
## 558 P602 41 558.5
## 559 P728 41 558.5
## 560 P765 41 558.5
## 561 P477 40 561.5
## 562 P809 40 561.5
## 563 P577 39 564.0
## 564 P670 39 564.0
## 565 P704 39 564.0
## 566 P663 38 567.0
## 567 P707 38 567.0
## 568 P802 38 567.0
## 569 P733 37 569.0
## 570 P738 35 571.0
## 571 P748 35 571.0
## 572 P749 35 571.0
## 573 P257 34 573.5
## 574 P794 34 573.5
## 575 P282 33 575.5
## 576 P482 33 575.5
## 577 P688 32 577.0
## 578 P376 31 578.5
## 579 P377 31 578.5
## 580 P340 30 580.5
## 581 P375 30 580.5
## 582 P603 29 582.0
## 583 P574 28 583.0
## 584 P283 27 584.5
## 585 P363 27 584.5
## 586 P817 26 586.0
## 587 P417 25 588.5
## 588 P448 25 588.5
## 589 P456 25 588.5
## 590 P766 25 588.5
## 591 P224 24 591.5
## 592 P455 24 591.5
## 593 P348 23 596.5
## 594 P438 23 596.5
## 595 P575 23 596.5
## 596 P608 23 596.5
## 597 P690 23 596.5
## 598 P697 23 596.5
## 599 P795 23 596.5
## 600 P815 23 596.5
## 601 P212 22 603.0
## 602 P439 22 603.0
## 603 P664 22 603.0
## 604 P734 22 603.0
## 605 P755 22 603.0
## 606 P272 21 609.0
## 607 P289 21 609.0
## 608 P442 21 609.0
## 609 P449 21 609.0
## 610 P480 21 609.0
## 611 P689 21 609.0
## 612 P751 21 609.0
## 613 P347 20 614.0
## 614 P378 20 614.0
## 615 P810 20 614.0
## 616 P247 19 617.0
## 617 P460 19 617.0
## 618 P651 19 617.0
## 619 P671 18 620.0
## 620 P735 18 620.0
## 621 P750 18 620.0
## 622 P346 17 624.0
## 623 P450 17 624.0
## 624 P483 17 624.0
## 625 P756 17 624.0
## 626 P818 17 624.0
## 627 P219 16 632.5
## 628 P349 16 632.5
## 629 P384 16 632.5
## 630 P418 16 632.5
## 631 P424 16 632.5
## 632 P443 16 632.5
## 633 P694 16 632.5
## 634 P718 16 632.5
## 635 P731 16 632.5
## 636 P739 16 632.5
## 637 P776 16 632.5
## 638 P819 16 632.5
## 639 P221 15 642.5
## 640 P440 15 642.5
## 641 P451 15 642.5
## 642 P452 15 642.5
## 643 P576 15 642.5
## 644 P757 15 642.5
## 645 P770 15 642.5
## 646 P771 15 642.5
## 647 P238 14 652.0
## 648 P245 14 652.0
## 649 P277 14 652.0
## 650 P290 14 652.0
## 651 P441 14 652.0
## 652 P447 14 652.0
## 653 P459 14 652.0
## 654 P578 14 652.0
## 655 P604 14 652.0
## 656 P691 14 652.0
## 657 P772 14 652.0
## 658 P214 13 662.5
## 659 P220 13 662.5
## 660 P241 13 662.5
## 661 P379 13 662.5
## 662 P423 13 662.5
## 663 P458 13 662.5
## 664 P607 13 662.5
## 665 P653 13 662.5
## 666 P665 13 662.5
## 667 P695 13 662.5
## 668 P213 12 673.5
## 669 P225 12 673.5
## 670 P242 12 673.5
## 671 P248 12 673.5
## 672 P419 12 673.5
## 673 P445 12 673.5
## 674 P446 12 673.5
## 675 P472 12 673.5
## 676 P729 12 673.5
## 677 P741 12 673.5
## 678 P758 12 673.5
## 679 P779 12 673.5
## 680 P222 11 684.0
## 681 P239 11 684.0
## 682 P240 11 684.0
## 683 P461 11 684.0
## 684 P666 11 684.0
## 685 P667 11 684.0
## 686 P672 11 684.0
## 687 P696 11 684.0
## 688 P777 11 684.0
## 689 P216 10 692.5
## 690 P223 10 692.5
## 691 P273 10 692.5
## 692 P453 10 692.5
## 693 P639 10 692.5
## 694 P652 10 692.5
## 695 P740 10 692.5
## 696 P759 10 692.5
## 697 P226 9 701.0
## 698 P229 9 701.0
## 699 P231 9 701.0
## 700 P244 9 701.0
## 701 P427 9 701.0
## 702 P679 9 701.0
## 703 P711 9 701.0
## 704 P744 9 701.0
## 705 P773 9 701.0
## 706 P235 8 710.5
## 707 P252 8 710.5
## 708 P350 8 710.5
## 709 P454 8 710.5
## 710 P579 8 710.5
## 711 P658 8 710.5
## 712 P717 8 710.5
## 713 P720 8 710.5
## 714 P778 8 710.5
## 715 P780 8 710.5
## 716 P217 7 721.0
## 717 P246 7 721.0
## 718 P274 7 721.0
## 719 P278 7 721.0
## 720 P280 7 721.0
## 721 P462 7 721.0
## 722 P464 7 721.0
## 723 P470 7 721.0
## 724 P660 7 721.0
## 725 P681 7 721.0
## 726 P714 7 721.0
## 727 P227 6 732.5
## 728 P233 6 732.5
## 729 P352 6 732.5
## 730 P426 6 732.5
## 731 P463 6 732.5
## 732 P465 6 732.5
## 733 P647 6 732.5
## 734 P655 6 732.5
## 735 P657 6 732.5
## 736 P659 6 732.5
## 737 P678 6 732.5
## 738 P774 6 732.5
## 739 P218 5 744.0
## 740 P236 5 744.0
## 741 P237 5 744.0
## 742 P258 5 744.0
## 743 P260 5 744.0
## 744 P420 5 744.0
## 745 P643 5 744.0
## 746 P669 5 744.0
## 747 P677 5 744.0
## 748 P724 5 744.0
## 749 P730 5 744.0
## 750 P234 4 758.0
## 751 P243 4 758.0
## 752 P249 4 758.0
## 753 P256 4 758.0
## 754 P275 4 758.0
## 755 P354 4 758.0
## 756 P355 4 758.0
## 757 P425 4 758.0
## 758 P444 4 758.0
## 759 P457 4 758.0
## 760 P662 4 758.0
## 761 P682 4 758.0
## 762 P683 4 758.0
## 763 P685 4 758.0
## 764 P713 4 758.0
## 765 P715 4 758.0
## 766 P775 4 758.0
## 767 P232 3 774.0
## 768 P276 3 774.0
## 769 P351 3 774.0
## 770 P380 3 774.0
## 771 P386 3 774.0
## 772 P466 3 774.0
## 773 P473 3 774.0
## 774 P474 3 774.0
## 775 P650 3 774.0
## 776 P661 3 774.0
## 777 P668 3 774.0
## 778 P712 3 774.0
## 779 P716 3 774.0
## 780 P719 3 774.0
## 781 P761 3 774.0
## 782 P228 2 792.5
## 783 P230 2 792.5
## 784 P250 2 792.5
## 785 P253 2 792.5
## 786 P279 2 792.5
## 787 P381 2 792.5
## 788 P382 2 792.5
## 789 P383 2 792.5
## 790 P421 2 792.5
## 791 P428 2 792.5
## 792 P467 2 792.5
## 793 P468 2 792.5
## 794 P471 2 792.5
## 795 P644 2 792.5
## 796 P646 2 792.5
## 797 P649 2 792.5
## 798 P654 2 792.5
## 799 P656 2 792.5
## 800 P708 2 792.5
## 801 P722 2 792.5
## 802 P760 2 792.5
## 803 P763 2 792.5
## 804 P215 1 807.5
## 805 P251 1 807.5
## 806 P254 1 807.5
## 807 P259 1 807.5
## 808 P469 1 807.5
## 809 P680 1 807.5
## 810 P684 1 807.5
## 811 P721 1 807.5
This code adds together the weekly purchase volumes to determine the overall sales performance for every product. Subsequently, it arranges the products in descending order of total sales. The product codes, total sales, and corresponding ranks are all included in the ranked_products table that is produced.
The goal of clustering, a data analysis approach, is to identify natural structures or patterns within a dataset by putting related things together. Clustering facilitates the organization and comprehension of complicated data by highlighting similarities and differences, providing insightful information for strategic planning and decision-making.2
By grouping products together that may share comparable customer appeal or sales trends, clustering enables firms to identify underlying structures in their sales data. This knowledge aids in improving inventory control, pricing tactics, and product placement in order to improve overall sales performance.
Using the data analysis approach of clustering, related items are grouped together according to shared traits or properties. Clustering facilitates the identification of trends and connections between items that display comparable sales behaviors over a specified period of time, which is relevant to your research topic on product sales. Businesses can more successfully customize marketing efforts and inventory management by grouping products that have similar sales tendencies.
Partitioning around medoids (PAM), K-means, and hierarchical clustering are popular techniques for clustering. Different algorithms are used by each approach to classify products according to predetermined criteria, giving businesses the freedom to extract valuable insights from their sales data.efficiently.
A method called hierarchical clustering creates a hierarchy of clusters that resembles a tree, displaying connections and similarities among data points at various levels. The data is arranged in a nested structure that makes it easier to identify smaller clusters within bigger clusters and to explore grouping patterns in detail.3
Normalized data view
head(normalized_data)
## Product_Code Normalized.0 Normalized.1 Normalized.2 Normalized.3 Normalized.4
## 1 P1 0.44 0.50 0.39 0.28 0.56
## 2 P2 0.70 0.60 0.30 0.20 0.70
## 3 P3 0.36 0.73 0.45 0.55 0.64
## 4 P4 0.59 0.35 0.65 0.18 0.41
## 5 P5 0.33 0.13 0.67 0.53 0.20
## 6 P6 0.27 0.27 0.18 0.64 0.55
## Normalized.5 Normalized.6 Normalized.7 Normalized.8 Normalized.9
## 1 0.50 0.61 1.00 0.17 0.61
## 2 0.10 0.60 0.30 0.30 0.30
## 3 0.45 0.36 0.91 0.82 0.27
## 4 0.24 0.41 0.65 0.65 0.53
## 5 0.27 0.40 0.73 0.40 0.40
## 6 0.27 0.73 0.55 0.55 0.27
## Normalized.10 Normalized.11 Normalized.12 Normalized.13 Normalized.14
## 1 0.44 0.61 0.72 0.33 0.33
## 2 0.20 0.20 0.60 0.20 0.00
## 3 1.00 0.55 0.09 0.36 0.82
## 4 0.35 0.12 0.18 0.12 0.76
## 5 0.53 1.00 0.33 0.07 0.67
## 6 0.09 0.09 0.45 0.36 0.27
## Normalized.15 Normalized.16 Normalized.17 Normalized.18 Normalized.19
## 1 0.33 0.61 0.33 0.00 0.50
## 2 0.60 0.20 0.70 0.70 0.90
## 3 0.45 0.36 0.73 0.64 0.36
## 4 0.29 0.53 0.41 0.76 0.12
## 5 0.33 0.47 0.80 0.20 0.67
## 6 0.45 0.27 0.45 0.91 0.73
## Normalized.20 Normalized.21 Normalized.22 Normalized.23 Normalized.24
## 1 0.11 0.44 0.22 0.50 0.11
## 2 0.40 0.70 0.20 0.40 0.50
## 3 0.36 0.91 0.73 0.45 0.64
## 4 0.24 0.29 0.53 0.29 0.41
## 5 0.53 0.20 0.47 0.40 0.33
## 6 0.36 0.82 0.64 0.45 0.36
## Normalized.25 Normalized.26 Normalized.27 Normalized.28 Normalized.29
## 1 0.33 0.22 0.39 0.11 0.44
## 2 0.30 0.50 0.80 0.50 0.50
## 3 0.45 1.00 0.18 0.00 0.91
## 4 0.24 0.47 0.47 0.00 0.24
## 5 0.60 0.33 0.40 0.67 0.00
## 6 0.18 0.09 0.27 0.18 0.36
## Normalized.30 Normalized.31 Normalized.32 Normalized.33 Normalized.34
## 1 0.22 0.39 0.50 0.17 0.11
## 2 0.30 0.10 0.30 0.20 0.30
## 3 0.73 0.55 0.36 0.45 0.36
## 4 0.29 0.00 0.18 0.59 0.18
## 5 0.13 0.00 0.13 0.13 0.40
## 6 0.00 0.27 0.18 1.00 0.18
## Normalized.35 Normalized.36 Normalized.37 Normalized.38 Normalized.39
## 1 0.61 0.39 0.33 0.50 0.78
## 2 1.00 0.50 0.20 0.70 0.30
## 3 0.55 0.27 0.82 0.82 0.55
## 4 1.00 0.35 0.24 0.35 0.35
## 5 0.27 0.07 0.33 0.33 0.13
## 6 0.09 0.36 0.36 0.27 0.18
## Normalized.40 Normalized.41 Normalized.42 Normalized.43 Normalized.44
## 1 0.22 0.44 0.06 0.22 0.28
## 2 0.20 0.50 0.20 0.40 0.50
## 3 0.00 0.18 0.27 1.00 0.18
## 4 0.59 0.24 0.41 0.47 0.06
## 5 0.13 0.33 0.27 0.53 0.27
## 6 0.45 0.36 0.36 0.18 0.36
## Normalized.45 Normalized.46 Normalized.47 Normalized.48 Normalized.49
## 1 0.39 0.50 0.00 0.22 0.17
## 2 0.10 0.10 0.40 0.50 0.10
## 3 0.18 0.36 0.45 1.00 0.45
## 4 0.12 0.24 0.35 0.71 0.35
## 5 0.60 0.20 0.20 0.13 0.53
## 6 0.27 0.55 0.45 0.27 0.27
## Normalized.50 Normalized.51
## 1 0.11 0.39
## 2 0.60 0.00
## 3 0.45 0.36
## 4 0.29 0.35
## 5 0.33 0.40
## 6 0.91 0.55
There is already Normalized data contains product codes and normalized values. Hence, we do not need z-score standardization.
The hierarchical_result and plot
# Data rows from 2 to 53 (all rows)
numeric_data <- normalized_data[, c(2,53)]
# Perform Agglomerative Hierarchical Clustering
hierarchical_result <- hclust(dist(numeric_data), method = "ward.D2")
# Cut the dendrogram to obtain clusters
num_clusters <- 2 # Adjust the number of clusters as needed
cut_tree_result <- cutree(hierarchical_result, num_clusters)
# Visualize the clusters (assuming two dimensions for simplicity)
plot(numeric_data[, 1:2], col = cut_tree_result,
main = "Agglomerative Hierarchical Clustering", xlab = "Feature 1", ylab = "Feature 2")
This is a hierarchical clustering. There are 2 clusters defined by me illustrates the groups which are similar.
The Dendogram plot with clusters
# Plot dendrogram with clusters
plot(hierarchical_result, main = "Hierarchical Clustering Dendrogram", xlab = "Products", sub = NULL)
rect.hclust(hierarchical_result, k = num_clusters, border = num_clusters:1)
The heatmap plot with clusters
heatmap(as.matrix(numeric_data), Colv = NA, Rowv = as.dendrogram(hierarchical_result),
main = "Heatmap with Agglomerative Hierarchical Clustering")
The Silhouette result plot
# Create a silhouette plot
silhouette_result <- silhouette(cut_tree_result, dist(numeric_data))
# Plot the silhouette plot
plot(silhouette_result, main = "Silhouette Plot for Agglomerative Hierarchical Clustering")
# Specify parameters for DBSCAN (adjust as needed)
eps <- 0.3 # Adjust based on the density of your data
minPts <- 5 # Adjust based on the minimum number of points in a cluster
# Perform DBSCAN clustering
dbscan_result <- dbscan(numeric_data, eps = eps, minPts = minPts)
# Access cluster assignments and noise points
cluster_assignments <- dbscan_result$cluster
noise_points <- which(cluster_assignments == 0)
# Print the number of clusters and noise points
cat("Number of clusters:", length(unique(cluster_assignments)) - 1, "\n")
## Number of clusters: 0
cat("Number of noise points:", length(noise_points), "\n")
## Number of noise points: 0
In the density-based clustering approach, I am unable to obtain any clusters and all points are labeled as noise, it could indicate that may not be suitable for the dataset, or the dataset may not exhibit clear density-based clusters. One alternative is the K-means clustering algorithm, which is more sensitive to the global structure of the data. Here’s an example using the K-means algorithm:
Considering the presence of outliers in this dataset, K-means may not be the most robust choice, as it is sensitive to outliers and can be influenced by them.
In summary, Due to the large dimensionality, changing cluster densities, and complexity of the dataset, DBSCAN is not as appropriate. Besides, due to the high volume of the outliers the Kmeans approach is not aprropriate fot the dataset.
Using a technique known as PAM (Partitioning Around Medoids) clustering, representative data points, or medoids, are found and grouped together. PAM is more resilient to outliers since it employs real data points as cluster centers as opposed to the conventional k-means clustering method. This method works especially well with datasets that have different cluster sizes and asymmetric geometries.
PAM Clustering
# Perform PAM clustering
pam_result <- pam(numeric_data, k = 2, diss = TRUE)
# Access clustering results
cluster_assignment <- pam_result$clustering
# Visualize the clustering
plot(numeric_data, col = cluster_assignment, main = "PAM Clustering")
# Add medoids to the plot
medoids <- numeric_data[pam_result$id.med, ]
points(medoids, col = "red", pch = 16, cex = 2)
The medoids of every cluster are shown as red spots in the PAM clustering plot. As a representation or “center” of a cluster, a medoid is the data point that is most centrally positioned inside the cluster. Selecting two clusters (k=2) makes it easier to spot unique trends in your data and sheds light on how people buy products. A greater understanding of client segmentation is made possible by this clustering technique, which can also direct focused marketing efforts for particular customer groups according to their buying habits over a 52-week period.4
Displaying the cluster medoids
# Display the cluster medoids
medoids
## Normalized.0 Normalized.51
## 1 0.44 0.39
## 2 0.70 0.00
These values highlight certain trends in consumer behavior. For example, Cluster 1 shows a continuous product interest with a small increase in sales from Week 0 to Week 51. On the other hand, Cluster 2 indicates a significant rise in sales at Week 0 but no sales until Week 51, indicating a possible decrease or change in consumer preferences. Comprehending these clusters facilitates the customization of marketing tactics for distinct client segments, taking into account their past purchase patterns over a 52-week duration.
A data analysis method called clustering puts related data points in one category according to shared traits or attributes. Clustering can be used to find patterns and similarities across products in the context of your research question on product sales. This can help with inventory management and product placement optimization.
Conversely, association rules concentrate on identifying connections and patterns between various goods that are commonly bought in tandem. Understanding consumer behavior and optimizing product placements or recommendations to increase sales depend on this.
In conclusion, association rules identify significant relationships between products and clustering aids in organizing products based on similarities. Both processes support strategic choices for improving overall sales patterns and product placement methods in your study.
In the context of data mining, an association rule is used to find fascinating relationships or patterns within enormous datasets. These guidelines aid in our comprehension of the relationships between various factors or objects.
Association rules are useful in determining which products are usually purchased together in the context of retail and sales. Businesses can use this information to optimize product placement, develop focused marketing campaigns, and improve the entire shopping experience.5
The Apriori algorithm is the most often used technique for determining association rules. In order to find rules based on the frequency of itemsets (groups of items) in a dataset, this technique examines the frequency of those items. Metrics like lift, confidence, and support define these guidelines.[^6]
Association rules are a fantastic fit for the particular research topic concerning which products are usually bought jointly in a given week. Using techniques such as Apriori on my sales data, you can find trends that point to the possibility of specific products being purchased in tandem. This knowledge assists in direct choices about marketing tactics and product positioning, which will ultimately increase overall sales performance.
In conclusion, association rules provide a potent method for revealing latent relationships in data, which is especially helpful for companies looking to improve their customer service and sales tactics.
The structure of the primary_data
str(primary_data)
## 'data.frame': 811 obs. of 55 variables:
## $ Product_Code: chr "P1" "P2" "P3" "P4" ...
## $ W0 : int 11 7 7 12 8 3 4 8 14 22 ...
## $ W1 : int 12 6 11 8 5 3 8 6 9 19 ...
## $ W2 : int 10 3 8 13 13 2 3 10 10 19 ...
## $ W3 : int 8 2 9 5 11 7 7 9 7 29 ...
## $ W4 : int 13 7 10 9 6 6 8 6 11 20 ...
## $ W5 : int 12 1 8 6 7 3 7 8 15 16 ...
## $ W6 : int 14 6 7 9 9 8 2 7 12 26 ...
## $ W7 : int 21 3 13 13 14 6 3 5 7 20 ...
## $ W8 : int 6 3 12 13 9 6 10 10 13 24 ...
## $ W9 : int 14 3 6 11 9 3 3 10 12 20 ...
## $ W10 : int 11 2 14 8 11 1 5 8 15 31 ...
## $ W11 : int 14 2 9 4 18 1 2 8 15 22 ...
## $ W12 : int 16 6 4 5 8 5 3 15 16 23 ...
## $ W13 : int 9 2 7 4 4 4 4 9 10 19 ...
## $ W14 : int 9 0 12 15 13 3 5 5 9 15 ...
## $ W15 : int 9 6 8 7 8 5 3 11 9 19 ...
## $ W16 : int 14 2 7 11 10 3 7 10 13 22 ...
## $ W17 : int 9 7 11 9 15 5 10 7 8 23 ...
## $ W18 : int 3 7 10 15 6 10 0 13 10 20 ...
## $ W19 : int 12 9 7 4 13 8 3 9 18 33 ...
## $ W20 : int 5 4 7 6 11 4 7 12 18 16 ...
## $ W21 : int 11 7 13 7 6 9 5 11 17 23 ...
## $ W22 : int 7 2 11 11 10 7 1 5 10 23 ...
## $ W23 : int 12 4 8 7 9 5 5 11 16 16 ...
## $ W24 : int 5 5 10 9 8 4 7 11 14 25 ...
## $ W25 : int 9 3 8 6 12 2 5 12 10 27 ...
## $ W26 : int 7 5 14 10 8 1 2 3 4 12 ...
## $ W27 : int 10 8 5 10 9 3 4 10 7 15 ...
## $ W28 : int 5 5 3 2 13 2 3 12 7 15 ...
## $ W29 : int 11 5 13 6 3 4 1 9 10 11 ...
## $ W30 : int 7 3 11 7 5 0 3 9 3 14 ...
## $ W31 : int 10 1 9 2 3 3 2 10 13 29 ...
## $ W32 : int 12 3 7 5 5 2 2 8 9 23 ...
## $ W33 : int 6 2 8 12 5 11 4 9 7 12 ...
## $ W34 : int 5 3 7 5 9 2 2 8 9 16 ...
## $ W35 : int 14 10 9 19 7 1 6 9 8 9 ...
## $ W36 : int 10 5 6 8 4 4 4 15 7 23 ...
## $ W37 : int 9 2 12 6 8 4 5 6 9 22 ...
## $ W38 : int 12 7 12 8 8 3 1 7 15 15 ...
## $ W39 : int 17 3 9 8 5 2 3 8 8 18 ...
## $ W40 : int 7 2 3 12 5 5 5 3 9 13 ...
## $ W41 : int 11 5 5 6 8 4 8 9 8 17 ...
## $ W42 : int 4 2 6 9 7 4 2 10 11 14 ...
## $ W43 : int 7 4 14 10 11 2 3 14 5 17 ...
## $ W44 : int 8 5 5 3 7 4 3 4 13 11 ...
## $ W45 : int 10 1 5 4 12 3 6 8 3 24 ...
## $ W46 : int 12 1 7 6 6 6 2 8 7 13 ...
## $ W47 : int 3 4 8 8 6 5 6 6 7 16 ...
## $ W48 : int 7 5 14 14 5 3 2 7 10 18 ...
## $ W49 : int 6 1 8 8 11 3 4 4 12 23 ...
## $ W50 : int 5 6 8 7 8 10 2 9 7 18 ...
## $ W51 : int 10 0 7 8 9 6 1 9 13 20 ...
## $ MIN : int 3 0 3 2 3 0 0 3 3 9 ...
## $ MAX : int 21 10 14 19 18 11 10 15 18 33 ...
Preparing the data for association
numeric_data <- primary_data[, -1] # Exclude the first column (Product_Code)
Since we have non-numeric variables, let’s exclude the “Product_Code” and focus on the numeric columns.
Defining breaks for discretization
# Define breaks for discretization
breaks <- list(
W0 = c(0, 5, 10, 15, 20, Inf),
W1 = c(0, 5, 10, 15, 20, Inf),
W2 = c(0, 5, 10, 15, 20, Inf),
W3 = c(0, 5, 10, 15, 20, Inf),
W4 = c(0, 5, 10, 15, 20, Inf),
W5 = c(0, 5, 10, 15, 20, Inf),
W6 = c(0, 5, 10, 15, 20, Inf),
W7 = c(0, 5, 10, 15, 20, Inf),
W8 = c(0, 5, 10, 15, 20, Inf),
W9 = c(0, 5, 10, 15, 20, Inf),
W10 = c(0, 5, 10, 15, 20, Inf),
W11 = c(0, 5, 10, 15, 20, Inf),
W12 = c(0, 5, 10, 15, 20, Inf),
W13 = c(0, 5, 10, 15, 20, Inf),
W14 = c(0, 5, 10, 15, 20, Inf),
W15 = c(0, 5, 10, 15, 20, Inf),
W16 = c(0, 5, 10, 15, 20, Inf),
W17 = c(0, 5, 10, 15, 20, Inf),
W18 = c(0, 5, 10, 15, 20, Inf),
W19 = c(0, 5, 10, 15, 20, Inf),
W20 = c(0, 5, 10, 15, 20, Inf),
W21 = c(0, 5, 10, 15, 20, Inf),
W22 = c(0, 5, 10, 15, 20, Inf),
W23 = c(0, 5, 10, 15, 20, Inf),
W24 = c(0, 5, 10, 15, 20, Inf),
W25 = c(0, 5, 10, 15, 20, Inf),
W26 = c(0, 5, 10, 15, 20, Inf),
W27 = c(0, 5, 10, 15, 20, Inf),
W28 = c(0, 5, 10, 15, 20, Inf),
W29 = c(0, 5, 10, 15, 20, Inf),
W30 = c(0, 5, 10, 15, 20, Inf),
W31 = c(0, 5, 10, 15, 20, Inf),
W32 = c(0, 5, 10, 15, 20, Inf),
W33 = c(0, 5, 10, 15, 20, Inf),
W34 = c(0, 5, 10, 15, 20, Inf),
W35 = c(0, 5, 10, 15, 20, Inf),
W36 = c(0, 5, 10, 15, 20, Inf),
W37 = c(0, 5, 10, 15, 20, Inf),
W38 = c(0, 5, 10, 15, 20, Inf),
W39 = c(0, 5, 10, 15, 20, Inf),
W40 = c(0, 5, 10, 15, 20, Inf),
W41 = c(0, 5, 10, 15, 20, Inf),
W42 = c(0, 5, 10, 15, 20, Inf),
W43 = c(0, 5, 10, 15, 20, Inf),
W44 = c(0, 5, 10, 15, 20, Inf),
W45 = c(0, 5, 10, 15, 20, Inf),
W46 = c(0, 5, 10, 15, 20, Inf),
W47 = c(0, 5, 10, 15, 20, Inf),
W48 = c(0, 5, 10, 15, 20, Inf),
W49 = c(0, 5, 10, 15, 20, Inf),
W50 = c(0, 5, 10, 15, 20, Inf),
W51 = c(0, 5, 10, 15, 20, Inf),
MIN = c(0, 5, 10, 15, 20, Inf),
MAX = c(0, 5, 10, 15, 20, Inf))
Discretize of the numeric columns
# Discretize the numeric columns
discretized_data <- lapply(names(numeric_data), function(col) {
cut(numeric_data[[col]], breaks[[col]], include.lowest = TRUE, labels = FALSE)
})
Converting the discretized_data to transactions
# Convert to transactions
transactions <- as(discretized_data, "transactions")
Create Transactions: Finally, the transactions object is created using the as function. This object is suitable for input into the Apriori algorithm.
Combining all transactions into a single transaction dataset
# Combine all transactions into a single transaction dataset
all_transactions <- unlist(transactions)
Applying the apriori function
# Mine association rules using apriori
association_rules <- apriori(all_transactions, parameter = list(support = 0.01, confidence = 0.5))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.5 0.1 1 none FALSE TRUE 5 0.01 1
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 0
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[5 item(s), 54 transaction(s)] done [0.00s].
## sorting and recoding items ... [5 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 done [0.00s].
## writing ... [80 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
The Apriori method is utilized in transactional databases to identify frequently occurring itemsets and produce association rules. Iteratively locating and expanding itemsets with strong support—which demonstrate the co-occurrence of items—is how it operates. These guidelines help with market basket analysis and the optimization of product placement tactics by illuminating the links between products.
Parameters:
The confidence value is set to 0.5, it indicates that the report is looking for association rules with at least 50% confidence. The possibility that the rule is true is measured in terms of confidence.
minval: This parameter is not standard for the arules package’s apriori function; it is set to 0.1.
smax: This argument, which is set to 1, is not commonly used with the arules package’s apriori function.
arem: This is not a normal apriori function argument; it is set to “none”. It appears to suggest that there isn’t a specified aggregating technique.
Aval: This is not a typical parameter for the apriori function; it is set to FALSE.
originalSupport: This is an optional parameter for the apriori function; it is set to TRUE.
maxtime: Set to 5, this may place a limit on the algorithm’s maximum execution time.
support: it is set to 0.01 it indicates that itemsets that are present in at least 1% of the transactions are of interest to you. The percentage of transactions that contain the itemset is known as support.
minlen: The minimum length of the rules, set at 1.
maxlen: The maximum length of the rules, set to 10.
target: Set to “rules” to show that mining association rules is of interest to you.
ext: Set to TRUE; this may suggest that you should take into account items for extensions in the rules.
Displaying the generated rules
# Display the generated rules
inspect(association_rules)
## lhs rhs support confidence coverage lift count
## [1] {} => {1} 1 1 1 1 54
## [2] {} => {2} 1 1 1 1 54
## [3] {} => {3} 1 1 1 1 54
## [4] {} => {4} 1 1 1 1 54
## [5] {} => {5} 1 1 1 1 54
## [6] {1} => {2} 1 1 1 1 54
## [7] {2} => {1} 1 1 1 1 54
## [8] {1} => {3} 1 1 1 1 54
## [9] {3} => {1} 1 1 1 1 54
## [10] {1} => {4} 1 1 1 1 54
## [11] {4} => {1} 1 1 1 1 54
## [12] {1} => {5} 1 1 1 1 54
## [13] {5} => {1} 1 1 1 1 54
## [14] {2} => {3} 1 1 1 1 54
## [15] {3} => {2} 1 1 1 1 54
## [16] {2} => {4} 1 1 1 1 54
## [17] {4} => {2} 1 1 1 1 54
## [18] {2} => {5} 1 1 1 1 54
## [19] {5} => {2} 1 1 1 1 54
## [20] {3} => {4} 1 1 1 1 54
## [21] {4} => {3} 1 1 1 1 54
## [22] {3} => {5} 1 1 1 1 54
## [23] {5} => {3} 1 1 1 1 54
## [24] {4} => {5} 1 1 1 1 54
## [25] {5} => {4} 1 1 1 1 54
## [26] {1, 2} => {3} 1 1 1 1 54
## [27] {1, 3} => {2} 1 1 1 1 54
## [28] {2, 3} => {1} 1 1 1 1 54
## [29] {1, 2} => {4} 1 1 1 1 54
## [30] {1, 4} => {2} 1 1 1 1 54
## [31] {2, 4} => {1} 1 1 1 1 54
## [32] {1, 2} => {5} 1 1 1 1 54
## [33] {1, 5} => {2} 1 1 1 1 54
## [34] {2, 5} => {1} 1 1 1 1 54
## [35] {1, 3} => {4} 1 1 1 1 54
## [36] {1, 4} => {3} 1 1 1 1 54
## [37] {3, 4} => {1} 1 1 1 1 54
## [38] {1, 3} => {5} 1 1 1 1 54
## [39] {1, 5} => {3} 1 1 1 1 54
## [40] {3, 5} => {1} 1 1 1 1 54
## [41] {1, 4} => {5} 1 1 1 1 54
## [42] {1, 5} => {4} 1 1 1 1 54
## [43] {4, 5} => {1} 1 1 1 1 54
## [44] {2, 3} => {4} 1 1 1 1 54
## [45] {2, 4} => {3} 1 1 1 1 54
## [46] {3, 4} => {2} 1 1 1 1 54
## [47] {2, 3} => {5} 1 1 1 1 54
## [48] {2, 5} => {3} 1 1 1 1 54
## [49] {3, 5} => {2} 1 1 1 1 54
## [50] {2, 4} => {5} 1 1 1 1 54
## [51] {2, 5} => {4} 1 1 1 1 54
## [52] {4, 5} => {2} 1 1 1 1 54
## [53] {3, 4} => {5} 1 1 1 1 54
## [54] {3, 5} => {4} 1 1 1 1 54
## [55] {4, 5} => {3} 1 1 1 1 54
## [56] {1, 2, 3} => {4} 1 1 1 1 54
## [57] {1, 2, 4} => {3} 1 1 1 1 54
## [58] {1, 3, 4} => {2} 1 1 1 1 54
## [59] {2, 3, 4} => {1} 1 1 1 1 54
## [60] {1, 2, 3} => {5} 1 1 1 1 54
## [61] {1, 2, 5} => {3} 1 1 1 1 54
## [62] {1, 3, 5} => {2} 1 1 1 1 54
## [63] {2, 3, 5} => {1} 1 1 1 1 54
## [64] {1, 2, 4} => {5} 1 1 1 1 54
## [65] {1, 2, 5} => {4} 1 1 1 1 54
## [66] {1, 4, 5} => {2} 1 1 1 1 54
## [67] {2, 4, 5} => {1} 1 1 1 1 54
## [68] {1, 3, 4} => {5} 1 1 1 1 54
## [69] {1, 3, 5} => {4} 1 1 1 1 54
## [70] {1, 4, 5} => {3} 1 1 1 1 54
## [71] {3, 4, 5} => {1} 1 1 1 1 54
## [72] {2, 3, 4} => {5} 1 1 1 1 54
## [73] {2, 3, 5} => {4} 1 1 1 1 54
## [74] {2, 4, 5} => {3} 1 1 1 1 54
## [75] {3, 4, 5} => {2} 1 1 1 1 54
## [76] {1, 2, 3, 4} => {5} 1 1 1 1 54
## [77] {1, 2, 3, 5} => {4} 1 1 1 1 54
## [78] {1, 2, 4, 5} => {3} 1 1 1 1 54
## [79] {1, 3, 4, 5} => {2} 1 1 1 1 54
## [80] {2, 3, 4, 5} => {1} 1 1 1 1 54
Group 1: Rules with Single Product Consequents (e.g., {1}, {2}, {3}, {4}, {5}) These rules suggest that individual products are frequently purchased.
Group 2: Rules with Pairs of Products (e.g., {1, 2}, {1, 3}, {2, 3}) These rules show associations between pairs of products. For instance, if product 1 is in the basket, then product 2 or 3 is likely to be present.
Group 3: Rules with Triplets of Products (e.g., {1, 2, 3}, {2, 3, 4}) These rules suggest associations between sets of three products. For example, if products 1, 2, and 3 are in the basket, then product 4 may also be present.
Group 4: Rules with Quadruplets of Products (e.g., {1, 2, 3, 4}) Similar to the previous groups but with larger sets of products.
Group 5: Rules with Common Consequents (e.g., {1} => {2}, {1} => {3}, {2} => {1}) These rules show that certain products are often purchased together.
# Plot for inspecting association rules
plot(association_rules, method = "graph")
It is an impressive vizualization of 5 groups with given
associations.
# Extract support, confidence, and lift from rules
support <- quality(association_rules)$support
confidence <- quality(association_rules)$confidence
lift <- quality(association_rules)$lift
# Create 3D scatterplot
scatter3D(x = support, y = confidence, z = lift, colvar = NULL, pch = 19,
main = "3D Scatterplot of Association Rules",
xlab = "Support", ylab = "Confidence", zlab = "Lift")
Creating a new subset with confidence level more than 0.7
rules <- subset(association_rules, confidence > 0.7)
the quality and subset
rules <- quality(rules)
rules <- subset(rules, length(lhs) <= 3 & length(rhs) <= 3)
Taking the unique values from rules
rules <- unique(rules)
The Scatterplot
plot(rules, method = "scatterplot")
This is an association rule mining scatter plot, which is a common technique in data mining to find intriguing relationships between variables in big databases. The plot, which displays various indicators of the strength and importance of the rules derived from your data, is a graphical depiction of those rules.
Setting support constraints is one of the challenges in using association rules mining in practical applications. While avoiding the combinatorial explosion in frequent itemset discovery, a high support constraint comes at the cost of intriguing low support patterns being missed. The expressions used in the plot are explained as follows:
Support: This measure indicates how frequently a rule applies to a certain set of data. The number of times a rule appears in the dataset is its support. Every circle in the plot represents a rule, and the support for each rule is indicated by where the circle falls on the horizontal axis.
Confidence: Confidence measures the reliability of the rule’s assumption is. A rule with high confidence, for instance, indicates that there is a strong probability that the consequent will occur in addition to the antecedent. The confidence level of a rule is represented in the plot by the vertical position of a circle.
Lift is a metric that quantifies the degree to which the antecedent and consequent of a rule occur together more frequently than one would anticipate from their statistical independence. Positive correlation between the antecedent and consequent is shown by a lift value greater than 1.6
Coverage: The percentage of observations that include the rule’s antecedent is known as coverage. It is comparable to support, except it is for the antecedent only.
Count: This usually means that there are a certain number of transactions or occurrences where the rule’s antecedent and consequent are present.
A convenient way to view all rules and their metrics at once is with a scatter plot. Each circle inside the cells in your plot represents a distinct rule that is defined by the metrics that intersect with each other, and each row and column represent distinct metrics. There is diagonal symmetry in the plot.
There are no circles on the main diagonal, which contains the names of the metrics, as comparing a measure with itself is unnecessary. For every rule, the link between several metrics is displayed in the off-diagonal cells. For every rule, the location of a circle inside a cell indicates the values of one measure on the x-axis and another measure on the y-axis. For example, circles will be positioned based on the support and confidence of each rule in the cell where the horizontal axis and the vertical axis intersect with “support” and “confidence,” respectively.
It’s important to note that, in contrast to the other metrics, which appear to be scaled between around 0.6 and 1.4, the ‘count’ metric is on a different scale, which explains why it’s shown against the x-axis that runs from 40 to 70.
When it comes to interpretation, you would seek out rules with high confidence and high support, as they would suggest that they are dependable and widely used. High lift values are also preferred since they imply that there may be more to the relationship than just the individual item frequencies.
Rule 1:
Antecedent: {Item_1, Item_2} Consequent: {Item_3} Support: 0.1 Confidence: 0.8 Lift: 1.2 Description: This rule indicates that customers who purchase Item_1 and Item_2 are 80% likely to also buy Item_3. The support is 10%, suggesting that this rule is applicable in 10% of transactions.
Rule 2:
Antecedent: {Item_4} Consequent: {Item_5} Support: 0.15 Confidence: 0.9 Lift: 1.5 Description: Customers who buy Item_4 are highly likely (90%) to purchase Item_5. The lift of 1.5 indicates a strong positive correlation between these two items.
Rule 3:
Antecedent: {Item_2, Item_3} Consequent: {Item_1} Support: 0.12 Confidence: 0.75 Lift: 1.1 Description: This rule suggests that when customers buy both Item_2 and Item_3, there is a 75% likelihood of them also buying Item_1.
Observations: Rule 2 has the highest confidence, indicating a strong association between the antecedent and consequent. Rule 1 and Rule 3 have moderate lift, suggesting a moderate positive correlation between items.
The creation of new item frequency data frame
# Calculate item frequency
item_frequency <- colSums(numeric_data)
# Create a data frame for better visualization
item_frequency_df <- data.frame(Product = names(item_frequency), Frequency = item_frequency)
# Sort the data frame by frequency in descending order
item_frequency_df <- item_frequency_df[order(item_frequency_df$Frequency, decreasing = TRUE), ]
item_frequency_df <- as.data.frame(item_frequency_df)
# Print or visualize the item frequency
# Rename the column "Product" to "Week"
colnames(item_frequency_df)[colnames(item_frequency_df) == "Product"] <- "Week"
print(item_frequency_df)
## Week Frequency
## MAX MAX 13226
## W24 W24 8245
## W15 W15 8147
## W16 W16 8137
## W18 W18 8116
## W14 W14 8035
## W17 W17 8033
## W22 W22 8031
## W23 W23 7998
## W20 W20 7988
## W12 W12 7970
## W10 W10 7940
## W8 W8 7935
## W6 W6 7883
## W3 W3 7881
## W21 W21 7875
## W13 W13 7856
## W9 W9 7852
## W11 W11 7849
## W19 W19 7822
## W7 W7 7774
## W4 W4 7765
## W5 W5 7677
## W2 W2 7615
## W1 W1 7404
## W0 W0 7220
## W49 W49 7214
## W25 W25 7212
## W51 W51 7209
## W50 W50 7187
## W46 W46 7072
## W48 W48 7035
## W47 W47 7032
## W45 W45 6939
## W44 W44 6840
## W42 W42 6808
## W43 W43 6746
## W38 W38 6692
## W41 W41 6683
## W40 W40 6636
## W37 W37 6548
## W36 W36 6500
## W35 W35 6486
## W34 W34 6482
## W39 W39 6460
## W33 W33 6412
## W32 W32 6293
## W31 W31 6172
## W30 W30 6170
## W28 W28 5988
## W29 W29 5952
## W27 W27 5834
## W26 W26 5637
## MIN MIN 3066
The distribution of sales throughout the course of the weeks is depicted in the item frequency table, with the 24th week registering the highest profit with 8245 units sold. Weeks 15, 16, 18, and other closely followed weeks demonstrate steady demand and provide insightful information on product popularity and sales patterns throughout the course of the observed weeks. For the purpose of determining peak times and maximizing inventory and marketing tactics, this information is essential.
Bar plot for the frequency of sales within weeks
# Assuming df is your data frame with columns "Week" and "Frequency"
barplot(item_frequency_df$Frequency, names.arg = item_frequency_df$Week, col = "skyblue", main = "Weekly Frequency", xlab = "Week", ylab = "Frequency")
The barplot provides a clear depiction of the weekly sales frequency, highlighting the 24th week as the most lucrative period. This visual insight aids in identifying and understanding peak sales trends over the observed weeks.
The Fidnings are divided 2 parts which are Cluster analysis and especially Association analysis.
Clustering based on product sales patterns can help identify distinct groups of products. This segmentation aids in tailoring strategies for different product clusters, enhancing marketing precision.
PAM Clustering:
The PAM clustering algorithm identifies two medoids representing normalized sales patterns. Interpretation should consider the characteristics of products in each cluster, aiding in targeted decision-making.
Hierarchical clustering:
The hierarchical clustering, a method that creates a tree-like structure of nested clusters, allows for a visual representation of how products group together based on similarity. This approach provides insights into the hierarchical relationships among products, aiding in understanding broader patterns in the sales data. The combination of hierarchical and PAM clustering techniques offers a comprehensive perspective on product groupings, enabling a more nuanced understanding of the underlying structures and associations within the dataset.
The association rules generated from your dataset reveal significant relationships between different products. The rules provide insights into which products are frequently purchased together in a given week, allowing for strategic marketing and inventory management decisions.
Visualization Analysis:
The scatter plot showcases the support, confidence, lift, coverage, and count metrics for each rule, providing a comprehensive view of rule characteristics. High support and confidence values are desirable, indicating common and reliable associations between products.
Rule Metrics:
High support implies that the rules are applicable to a substantial portion of transactions. Confidence highlights the reliability of rule predictions. Lift values greater than 1 indicate positive correlations, suggesting meaningful relationships.
Item Frequency Analysis:
Analyzing the frequency of weeks provides a snapshot of when certain products are popular. This information can guide marketing strategies, promotions, and stock management based on weekly trends.
In conclusion, the thorough examination carried out on the sales dataset yields insightful information about the dynamics of product transactions over a 52-week period. The association rules highlight significant connections between products and provide insight into which things are usually bought together in a particular week. These understandings are essential for refining marketing tactics, streamlining inventory control, and eventually raising consumer happiness.
Weekly trends in product popularity are highlighted by the visualization analyses, which include matrix plots and item frequency charts. They provide a clear depiction of rule properties. By carefully applying clustering techniques, such PAM clustering, we may improve our comprehension of unique sales trends and make more focused decisions.
Businesses may now make well-informed decisions about product placement, marketing, and inventory replenishment according to the analysis’s findings. Through the process of matching marketing plans to the established product associations and making use of weekly trend knowledge, companies may improve consumer involvement and market response.
The analysis’s findings are a useful tool for companies looking to increase their market influence as we traverse the complex world of customer behavior and product interactions. A data-driven strategy to comprehending consumer preferences and transaction patterns becomes critical for long-term success in the dynamic retail environment.
[^6}: Y. Liu, “Study on Application of Apriori Algorithm in Data Mining,” 2010 Second International Conference on Computer Modeling and Simulation, Sanya, China, 2010, pp. 111-114, doi: 10.1109/ICCMS.2010.398.
Tan,James. (2017). Sales_Transactions_Dataset_Weekly. UCI Machine Learning Repository. https://doi.org/10.24432/C5XS4Q.↩︎
JOSEPH L. FLEISS & JOSEPH ZUBIN (1969) ON THE METHODS AND THEORY OF CLUSTERING, Multivariate Behavioral Research, 4:2, 235-250, DOI: 10.1207/s15327906mbr0402_8↩︎
Guess, Michael J.; Wilson, Scott B.. Introduction to Hierarchical Clustering. Journal of Clinical Neurophysiology 19(2):p 144-151, March 2002.↩︎
Erich Schubert, Peter J. Rousseeuw, Fast and eager k-medoids clustering: O(k) runtime improvement of the PAM, CLARA, and CLARANS algorithms, Information Systems, Volume 101, 2021, 101804, ISSN 0306-4379, https://doi.org/10.1016/j.is.2021.101804.↩︎
RamakrishnanSrikant, RakeshAgrawal, MiningGeneralizedAssociationRules, IBM Research Division, Almaden Research Center 650 Harry Road SanJose, CA 95120-6099↩︎
Lin, WY., Tseng, MC., Su, JH. (2002). A Confidence-Lift Support Specification for Interesting Associations Mining. In: Chen, MS., Yu, P.S., Liu, B. (eds) Advances in Knowledge Discovery and Data Mining. PAKDD 2002. Lecture Notes in Computer Science(), vol 2336. Springer, Berlin, Heidelberg. https://doi.org/10.1007/3-540-47887-6_14↩︎