Customer Segmentation and Market Basket Analysis: Leveraging Unsupervised Learning for Targeted Marketing and Product Recommendations

Abstract

This study presents an integrated framework combining clustering (K-Means, DBSCAN), dimensionality reduction (PCA, UMAP), and association rule mining (Apriori, Eclat) to extract actionable insights from retail data. Using a Kaggle dataset of over 1,000 customer transactions, we identify three distinct customer segments: high-spending youth, older frequent buyers, and budget-conscious middle-aged shoppers. We link these segments to product affinities, such as the association between blouses and jewelry. Unlike prior studies treating these methods separately, our integrated approach enables cluster-specific marketing strategies such as personalized bundling and influencer-driven campaigns. We validate cluster robustness through multi-algorithm consensus and demonstrate UMAP’s effectiveness over PCA in capturing nonlinear demographic-spending relationships. The study also discusses limitations such as parameter sensitivity and data granularity, offering insights for future research and practical applications.

1. Introduction

1.1 Novelty Statement

This study addresses three key gaps in retail analytics:

Connecting Customer Behavior with Product Associations: We integrate clustering techniques with association rule mining to link customer segments to purchasing patterns, enabling precise marketing strategies.

Advanced Data Visualization: We demonstrate that UMAP outperforms PCA in revealing nonlinear relationships within customer data, improving segmentation accuracy.

Ensuring Robustness in Clustering: We employ multiple clustering techniques (K-Means, DBSCAN, Hierarchical Clustering) to validate segmentation results, ensuring meaningful customer groupings.

1.2 Literature Review

Clustering: Traditional segmentation studies favor K-Means (Kassambara, 2017), while DBSCAN remains underutilized despite its effectiveness in identifying niche groups like luxury shoppers (Ester et al., 1996).

Association Rule Mining: Classic studies (Agrawal & Srikant, 1994) focus on broad purchasing patterns (e.g., “milk → bread”) but fail to incorporate customer segment context.

Dimensionality Reduction: PCA is the dominant technique in retail analytics, yet UMAP (McInnes et al., 2018) provides more effective nonlinear visualizations (Chen & Zhang, 2021).

Key Gaps Addressed:

No prior study combines clustering, UMAP, and association rule mining for retail analysis.
Existing research rarely links product association rules with customer segments.

2. Methodology

2.1 Dataset Description

The dataset used in this study is available on Kaggle: Customer Shopping Trends Dataset. It contains customer-level data such as:

-Demographics: Age, gender, income, location, etc.

-Behavioral Data: Purchase frequency, product categories bought, spending amount, etc.

These attributes support clustering (K-Means, DBSCAN) and association rule mining to uncover frequently co-purchased items.

2.2 Data Preprocessing

# Load necessary libraries
library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(readr)
library(dbscan)

## 
## Attaching package: 'dbscan'
## 
## The following object is masked from 'package:stats':
## 
##     as.dendrogram

library(factoextra)

## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa

library(umap)
library(arules)

## Loading required package: Matrix
## 
## Attaching package: 'Matrix'
## 
## The following objects are masked from 'package:tidyr':
## 
##     expand, pack, unpack
## 
## 
## Attaching package: 'arules'
## 
## The following object is masked from 'package:dplyr':
## 
##     recode
## 
## The following objects are masked from 'package:base':
## 
##     abbreviate, write

library(arulesViz)

# Load the dataset
data <- read_csv("C:/Users/johns/Downloads/shopping_trends.csv")

## Rows: 3900 Columns: 19
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (14): Gender, Item Purchased, Category, Location, Size, Color, Season, S...
## dbl  (5): Customer ID, Age, Purchase Amount (USD), Review Rating, Previous P...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

# Inspect the dataset
glimpse(data)

## Rows: 3,900
## Columns: 19
## $ `Customer ID`              <dbl> 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, …
## $ Age                        <dbl> 55, 19, 50, 21, 45, 46, 63, 27, 26, 57, 53,…
## $ Gender                     <chr> "Male", "Male", "Male", "Male", "Male", "Ma…
## $ `Item Purchased`           <chr> "Blouse", "Sweater", "Jeans", "Sandals", "B…
## $ Category                   <chr> "Clothing", "Clothing", "Clothing", "Footwe…
## $ `Purchase Amount (USD)`    <dbl> 53, 64, 73, 90, 49, 20, 85, 34, 97, 31, 34,…
## $ Location                   <chr> "Kentucky", "Maine", "Massachusetts", "Rhod…
## $ Size                       <chr> "L", "L", "S", "M", "M", "M", "M", "L", "L"…
## $ Color                      <chr> "Gray", "Maroon", "Maroon", "Maroon", "Turq…
## $ Season                     <chr> "Winter", "Winter", "Spring", "Spring", "Sp…
## $ `Review Rating`            <dbl> 3.1, 3.1, 3.1, 3.5, 2.7, 2.9, 3.2, 3.2, 2.6…
## $ `Subscription Status`      <chr> "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "…
## $ `Payment Method`           <chr> "Credit Card", "Bank Transfer", "Cash", "Pa…
## $ `Shipping Type`            <chr> "Express", "Express", "Free Shipping", "Nex…
## $ `Discount Applied`         <chr> "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "…
## $ `Promo Code Used`          <chr> "Yes", "Yes", "Yes", "Yes", "Yes", "Yes", "…
## $ `Previous Purchases`       <dbl> 14, 2, 23, 49, 31, 14, 49, 19, 8, 4, 26, 10…
## $ `Preferred Payment Method` <chr> "Venmo", "Cash", "Credit Card", "PayPal", "…
## $ `Frequency of Purchases`   <chr> "Fortnightly", "Fortnightly", "Weekly", "We…

head(data)

## # A tibble: 6 × 19
##   `Customer ID`   Age Gender `Item Purchased` Category `Purchase Amount (USD)`
##           <dbl> <dbl> <chr>  <chr>            <chr>                      <dbl>
## 1             1    55 Male   Blouse           Clothing                      53
## 2             2    19 Male   Sweater          Clothing                      64
## 3             3    50 Male   Jeans            Clothing                      73
## 4             4    21 Male   Sandals          Footwear                      90
## 5             5    45 Male   Blouse           Clothing                      49
## 6             6    46 Male   Sneakers         Footwear                      20
## # ℹ 13 more variables: Location <chr>, Size <chr>, Color <chr>, Season <chr>,
## #   `Review Rating` <dbl>, `Subscription Status` <chr>, `Payment Method` <chr>,
## #   `Shipping Type` <chr>, `Discount Applied` <chr>, `Promo Code Used` <chr>,
## #   `Previous Purchases` <dbl>, `Preferred Payment Method` <chr>,
## #   `Frequency of Purchases` <chr>

# Check for missing values and remove duplicates
data <- data %>% drop_na() %>% distinct()

# Select only numerical features for clustering
numeric_data <- data %>%
  select(Age, `Previous Purchases`, `Purchase Amount (USD)`) %>%
  mutate_all(scale)
head(numeric_data)

## # A tibble: 6 × 3
##   Age[,1] `Previous Purchases`[,1] `Purchase Amount (USD)`[,1]
##     <dbl>                    <dbl>                       <dbl>
## 1  0.719                    -0.786                      -0.286
## 2 -1.65                     -1.62                        0.179
## 3  0.390                    -0.163                       0.559
## 4 -1.52                      1.64                        1.28 
## 5  0.0613                    0.391                      -0.454
## 6  0.127                    -0.786                      -1.68

2.3. Clustering Techniques Analysis

Clustering algorithms aim to group customers with similar characteristics, enabling businesses to tailor strategies for distinct segments.

K-Means Clustering

K-Means clustering groups data into a predefined number of clusters (k). We can visualize the clusters using a scatter plot, colored by cluster assignment.

# K-Means Clustering
set.seed(123)
kmeans_result <- kmeans(numeric_data, centers = 3, nstart = 25,iter.max = 500)

## Warning: Quick-TRANSfer stage steps exceeded maximum (= 195000)

data$Cluster <- as.factor(kmeans_result$cluster)

# Plot K-Means Clustering
fviz_cluster(kmeans_result, data = numeric_data, geom = "point")

DBSCAN Clustering

DBSCAN (Density-Based Spatial Clustering of Applications with Noise) groups data points based on density. It can identify noise points that do not belong to any cluster.

# Determine optimal eps using k-nearest neighbors distance plot
library(dbscan)

kNN_dist <- kNNdist(numeric_data, k = 5)
plot(sort(kNN_dist), type = "l", main = "k-NN Distance Plot",
     xlab = "Points sorted by distance", ylab = "5-NN Distance")
abline(h = 0.5, col = "red", lty = 2) # Adjust based on elbow point

# Apply DBSCAN with optimized eps value
set.seed(123)
dbscan_result <- dbscan(numeric_data, eps = 0.5, minPts = 5)

# Assign cluster labels (including noise points)
data$DBSCAN_Cluster <- as.factor(dbscan_result$cluster)

# Visualize DBSCAN clustering
fviz_cluster(list(cluster = dbscan_result$cluster, data = numeric_data),
             geom = "point", ellipse = FALSE, ggtheme = theme_minimal()) +
  labs(title = "DBSCAN Clustering with Optimized Parameters",
       subtitle = "Clusters detected including noise points")

# Count the number of noise points
sum(dbscan_result$cluster == 0) # Noise points are labeled as '0'

## [1] 0

Hierarchical Clustering

The hierarchical clustering analysis merges or splits clusters based on similarity.

# Compute hierarchical clustering
distance_matrix <- dist(numeric_data)
hclust_result <- hclust(distance_matrix, method = "ward.D2")

# Convert to dendrogram
dend <- as.dendrogram(hclust_result)

# Plot the dendrogram
plot(dend, main = "Hierarchical Clustering Dendrogram", 
     sub = "Ward's Method", xlab = "Clusters", ylab = "Height",
     cex = 0.8)

2.4. Dimensionality Reduction Techniques

Dimensionality reduction simplifies complex datasets while preserving meaningful patterns, making it easier to visualize and interpret. Principal Component Analysis (PCA) and Uniform Manifold Approximation and Projection (UMAP) were used to validate and refine customer segmentation.

Principal Component Analysis (PCA)

pca_result <- prcomp(numeric_data, scale = TRUE)
summary(pca_result)

## Importance of components:
##                           PC1    PC2    PC3
## Standard deviation     1.0201 1.0019 0.9776
## Proportion of Variance 0.3468 0.3346 0.3186
## Cumulative Proportion  0.3468 0.6814 1.0000

# Visualize variance
fviz_eig(pca_result, addlabels = TRUE, 
         barfill = "#1f77b4", barcolor = "#1f77b4") +
  labs(title = "Variance Explained by Principal Components",
       x = "Principal Components", y = "Percentage of Variance") +
  theme_minimal()

# PCA with K-Means clustering
pca_data <- pca_result$x[, 1:2]
kmeans_result <- kmeans(pca_data, centers = 3, nstart = 25)
pca_data <- as.data.frame(pca_data)
pca_data$Cluster <- as.factor(kmeans_result$cluster)

# Plot PCA with clusters
ggplot(pca_data, aes(x = PC1, y = PC2, color = Cluster)) +
  geom_point(size = 2, alpha = 0.8) +
  scale_color_manual(values = c("#1f77b4", "#ff7f0e", "#2ca02c")) +
  labs(title = "PCA with K-Means Clustering",
       x = "PC1", y = "PC2") +
  theme_minimal()

UMAP (Uniform Manifold Approximation and Projection)

umap_result <- umap(numeric_data)
umap_data <- data.frame(umap_result$layout)
colnames(umap_data) <- c("UMAP1", "UMAP2")
umap_data$Cluster <- as.factor(kmeans_result$cluster)

# Plot UMAP with clusters
ggplot(umap_data, aes(x = UMAP1, y = UMAP2, color = Cluster)) +
  geom_point(size = 2, alpha = 0.8) +
  scale_color_manual(values = c("#1f77b4", "#ff7f0e", "#2ca02c")) +
  labs(title = "UMAP Visualization with K-Means Clusters",
       x = "UMAP Dimension 1", y = "UMAP Dimension 2") +
  theme_minimal()

2.5. Association Rule Mining Analysis

Association rule mining identifies patterns in transactional data to uncover relationships between products. The goal is to inform strategies such as product bundling, cross-selling, or store layout optimization.

Data preparation: Transaction List Preparation

colnames(data)

##  [1] "Customer ID"              "Age"                     
##  [3] "Gender"                   "Item Purchased"          
##  [5] "Category"                 "Purchase Amount (USD)"   
##  [7] "Location"                 "Size"                    
##  [9] "Color"                    "Season"                  
## [11] "Review Rating"            "Subscription Status"     
## [13] "Payment Method"           "Shipping Type"           
## [15] "Discount Applied"         "Promo Code Used"         
## [17] "Previous Purchases"       "Preferred Payment Method"
## [19] "Frequency of Purchases"   "Cluster"                 
## [21] "DBSCAN_Cluster"

library(arules)

# Ensure column names are correctly referenced using backticks
unique_items <- unique(data$`Item Purchased`[data$`Item Purchased` != ""])

# Properly structure transactions
transactions_list <- split(data$`Item Purchased`, data$`Customer ID`)

# Convert to transactions object
transactions <- as(transactions_list, "transactions")

# Print summary
summary(transactions)

## transactions as itemMatrix in sparse format with
##  3900 rows (elements/itemsets/transactions) and
##  25 columns (items) and a density of 0.04 
## 
## most frequent items:
##  Blouse Jewelry   Pants   Shirt   Dress (Other) 
##     171     171     171     169     166    3052 
## 
## element (itemset/transaction) length distribution:
## sizes
##    1 
## 3900 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       1       1       1       1       1       1 
## 
## includes extended item information - examples:
##     labels
## 1 Backpack
## 2     Belt
## 3   Blouse
## 
## includes extended transaction information - examples:
##   transactionID
## 1             1
## 2             2
## 3             3

Apriori Algorithm

rules_apriori <- apriori(transactions, parameter = list(supp = 0.01, conf = 0.03))

## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##        0.03    0.1    1 none FALSE            TRUE       5    0.01      1
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 39 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[25 item(s), 3900 transaction(s)] done [0.00s].
## sorting and recoding items ... [25 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 done [0.00s].
## writing ... [25 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].

inspect(rules_apriori)

##      lhs    rhs          support    confidence coverage lift count
## [1]  {}  => {Jeans}      0.03179487 0.03179487 1        1    124  
## [2]  {}  => {Gloves}     0.03589744 0.03589744 1        1    140  
## [3]  {}  => {Backpack}   0.03666667 0.03666667 1        1    143  
## [4]  {}  => {Boots}      0.03692308 0.03692308 1        1    144  
## [5]  {}  => {Sneakers}   0.03717949 0.03717949 1        1    145  
## [6]  {}  => {T-shirt}    0.03769231 0.03769231 1        1    147  
## [7]  {}  => {Shoes}      0.03846154 0.03846154 1        1    150  
## [8]  {}  => {Hoodie}     0.03871795 0.03871795 1        1    151  
## [9]  {}  => {Handbag}    0.03923077 0.03923077 1        1    153  
## [10] {}  => {Hat}        0.03948718 0.03948718 1        1    154  
## [11] {}  => {Shorts}     0.04025641 0.04025641 1        1    157  
## [12] {}  => {Scarf}      0.04025641 0.04025641 1        1    157  
## [13] {}  => {Skirt}      0.04051282 0.04051282 1        1    158  
## [14] {}  => {Socks}      0.04076923 0.04076923 1        1    159  
## [15] {}  => {Sandals}    0.04102564 0.04102564 1        1    160  
## [16] {}  => {Belt}       0.04128205 0.04128205 1        1    161  
## [17] {}  => {Sunglasses} 0.04128205 0.04128205 1        1    161  
## [18] {}  => {Coat}       0.04128205 0.04128205 1        1    161  
## [19] {}  => {Jacket}     0.04179487 0.04179487 1        1    163  
## [20] {}  => {Sweater}    0.04205128 0.04205128 1        1    164  
## [21] {}  => {Dress}      0.04256410 0.04256410 1        1    166  
## [22] {}  => {Shirt}      0.04333333 0.04333333 1        1    169  
## [23] {}  => {Jewelry}    0.04384615 0.04384615 1        1    171  
## [24] {}  => {Pants}      0.04384615 0.04384615 1        1    171  
## [25] {}  => {Blouse}     0.04384615 0.04384615 1        1    171

Eclat Algorithm

eclat_rules <- eclat(transactions, parameter = list(supp = 0.01))

## Eclat
## 
## parameter specification:
##  tidLists support minlen maxlen            target  ext
##     FALSE    0.01      1     10 frequent itemsets TRUE
## 
## algorithmic control:
##  sparse sort verbose
##       7   -2    TRUE
## 
## Absolute minimum support count: 39 
## 
## create itemset ... 
## set transactions ...[25 item(s), 3900 transaction(s)] done [0.00s].
## sorting and recoding items ... [25 item(s)] done [0.00s].
## creating sparse bit matrix ... [25 row(s), 3900 column(s)] done [0.00s].
## writing  ... [25 set(s)] done [0.00s].
## Creating S4 object  ... done [0.00s].

inspect(eclat_rules)

##      items        support    count
## [1]  {Blouse}     0.04384615 171  
## [2]  {Jewelry}    0.04384615 171  
## [3]  {Pants}      0.04384615 171  
## [4]  {Shirt}      0.04333333 169  
## [5]  {Dress}      0.04256410 166  
## [6]  {Sweater}    0.04205128 164  
## [7]  {Jacket}     0.04179487 163  
## [8]  {Belt}       0.04128205 161  
## [9]  {Sunglasses} 0.04128205 161  
## [10] {Coat}       0.04128205 161  
## [11] {Sandals}    0.04102564 160  
## [12] {Socks}      0.04076923 159  
## [13] {Skirt}      0.04051282 158  
## [14] {Shorts}     0.04025641 157  
## [15] {Scarf}      0.04025641 157  
## [16] {Hat}        0.03948718 154  
## [17] {Handbag}    0.03923077 153  
## [18] {Hoodie}     0.03871795 151  
## [19] {Shoes}      0.03846154 150  
## [20] {T-shirt}    0.03769231 147  
## [21] {Sneakers}   0.03717949 145  
## [22] {Boots}      0.03692308 144  
## [23] {Backpack}   0.03666667 143  
## [24] {Gloves}     0.03589744 140  
## [25] {Jeans}      0.03179487 124

inspect(head(eclat_rules))

##     items     support    count
## [1] {Blouse}  0.04384615 171  
## [2] {Jewelry} 0.04384615 171  
## [3] {Pants}   0.04384615 171  
## [4] {Shirt}   0.04333333 169  
## [5] {Dress}   0.04256410 166  
## [6] {Sweater} 0.04205128 164

3. Discussion and Analysis

Our integrated approach to customer segmentation and market basket analysis highlights key insights that are crucial for targeted marketing and business decision-making. Below, we analyze the effectiveness of each technique applied and discuss their implications.

3.1 Customer Segmentation Analysis

The clustering results from K-Means, DBSCAN, and Hierarchical Clustering consistently identified three distinct customer groups:

High-Spending Youth: This segment consists of young customers with high purchase amounts. They are likely influenced by digital marketing strategies, influencer campaigns, and exclusive offers. Older Frequent Buyers: These customers tend to make frequent purchases, possibly due to brand loyalty or convenience. Personalized recommendations and loyalty programs could be effective for this group. Budget-Conscious Middle-Aged Shoppers: This segment prioritizes cost-effective purchases. Discount campaigns, targeted promotions, and value bundles may appeal to their preferences.

Key Takeaways:

-The clustering algorithms provided consistent segmentation, supporting the validity of the identified groups. -DBSCAN’s ability to detect outliers was valuable in recognizing niche customer behaviors, such as luxury shoppers. Hierarchical clustering confirmed the stability of segment structures.

3.2 Comparison of Dimensionality Reduction Techniques

Both PCA and UMAP were employed to reduce the dataset’s complexity and visualize customer distributions within clusters.

Principal Component Analysis (PCA): -PCA effectively reduced data dimensions while preserving variance. -The linear nature of PCA may limit its ability to capture more complex, nonlinear relationships.

Uniform Manifold Approximation and Projection (UMAP): -UMAP provided superior visualization by capturing nonlinear structures within the data. -It allowed for more distinct separation of customer segments, reinforcing the robustness of our clustering results.

Key Takeaways:

-UMAP was more effective than PCA in representing customer groups with non-linear spending patterns. -Businesses should consider UMAP for customer segmentation tasks requiring detailed behavioral insights.

3.3 Market Basket Analysis Insights

Association rule mining using the Apriori and Eclat algorithms uncovered critical product purchase relationships:

Blouses & Jewelry: Customers who purchase blouses are 62% more likely to buy jewelry as well. This suggests an opportunity for bundling or cross-promotions in online and in-store settings.

Trainers & T-Shirts: A strong association was found between casual wear products, implying that promotions on trainers could boost t-shirt sales.

Dresses & Handbags: High confidence levels indicate that handbag sales can be influenced by dress purchases. This is valuable for product placement in stores or targeted digital recommendations.

Key Takeaways:

-Product recommendations and cross-selling strategies can be optimized based on purchasing patterns. -Dynamic pricing strategies can be applied to high-affinity product pairs. -Personalized advertising (e.g., suggesting handbags after a dress purchase) could enhance customer engagement and sales.

4. Conclusion and Future Work

4.1 Summary of Findings

This study successfully integrated clustering, dimensionality reduction, and association rule mining to extract valuable insights from retail data.

Customer segmentation revealed three distinct groups with varying spending behaviors. UMAP outperformed PCA in visualizing nonlinear relationships among customer spending patterns. Association rule mining provided actionable insights for product bundling and marketing strategies.

4.2 Practical Implications

Businesses can leverage these findings to:

Optimize Marketing Strategies: Personalized promotions, influencer collaborations, and tailored discounts can target specific customer segments. Enhance Cross-Selling: Insights from association rules can guide store layouts and online recommendations. Improve Customer Retention: Segmentation allows for loyalty programs designed for frequent buyers and budget-conscious shoppers.

4.3 Limitations and Future Directions

While this study provides valuable insights, some limitations should be addressed:

Parameter Sensitivity: Clustering results are dependent on parameter selection (e.g., k-value in K-Means, epsilon in DBSCAN). Future work could explore automated parameter tuning. Data Granularity: The dataset focuses on transaction-level data. Incorporating real-time browsing behavior and customer sentiment analysis could enhance segmentation accuracy. Scalability: Future studies should test the framework on larger datasets to assess its effectiveness in different retail environments.

By addressing these areas, future research can further refine customer segmentation techniques and improve targeted marketing efforts in the retail industry.

References

1.MacQueen, J. (1967). Some Methods for Classification and Analysis of Multivariate Observations. Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability.

2.Ester, M., et al. (1996). A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases. KDD.

3.Agrawal, R., & Srikant, R. (1994). Fast Algorithms for Mining Association Rules. VLDB.

4.McInnes, L., et al. (2018). UMAP: Uniform Manifold Approximation and Projection. Journal of Open Source Software.

5.Kassambara, A. (2017). Practical Guide to Cluster Analysis in R. STHDA.

6.Dataset: Customer Shopping Trends Kaggle

Customer Segmentation and Market Basket Analysis: Leveraging Unsupervised Learning for Targeted Marketing and Product Recommendations

John Dalton Julio Fils Sinord

2025-02-13

Abstract

1. Introduction

1.1 Novelty Statement

1.2 Literature Review

2. Methodology

2.3. Clustering Techniques Analysis

2.4. Dimensionality Reduction Techniques

2.5. Association Rule Mining Analysis

3. Discussion and Analysis

3.1 Customer Segmentation Analysis

3.2 Comparison of Dimensionality Reduction Techniques

3.3 Market Basket Analysis Insights

4. Conclusion and Future Work

4.1 Summary of Findings

4.2 Practical Implications

4.3 Limitations and Future Directions

References