Imagine 10000 receipts sitting on your table. Each receipt represents a transaction with items that were purchased. The receipt is a representation of stuff that went into a customer’s basket - and therefore ‘Market Basket Analysis’.

That is exactly what the Groceries Data Set contains: a collection of receipts with each line representing 1 receipt and the items purchased. Each line is called a transaction and each column in a row represents an item. The data set is attached.

Your assignment is to use R to mine the data for association rules. You should report support, confidence and lift and your top 10 rules by lift.

Extra credit: do a simple cluster analysis on the data as well. Use whichever packages you like.

Market Basket Analyis

# Loading library
library(arules)

## Loading required package: Matrix

## 
## Attaching package: 'arules'

## The following objects are masked from 'package:base':
## 
##     abbreviate, write

library(arulesViz)
library(cluster)
library(factoextra)

## Loading required package: ggplot2

## Welcome! Want to learn more? See two factoextra-related books at https://goo.gl/ve3WBa

# Loading the data 
grocery_data =  read.transactions("GroceryDataSet.csv", sep = ",")

library(readr)
GroceryDataSet = read_csv("GroceryDataSet.csv")

## New names:
## Rows: 9834 Columns: 32
## ── Column specification
## ──────────────────────────────────────────────────────── Delimiter: "," chr
## (32): citrus fruit, semi-finished bread, margarine, ready soups, ...5, ....
## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
## Specify the column types or set `show_col_types = FALSE` to quiet this message.
## • `` -> `...5`
## • `` -> `...6`
## • `` -> `...7`
## • `` -> `...8`
## • `` -> `...9`
## • `` -> `...10`
## • `` -> `...11`
## • `` -> `...12`
## • `` -> `...13`
## • `` -> `...14`
## • `` -> `...15`
## • `` -> `...16`
## • `` -> `...17`
## • `` -> `...18`
## • `` -> `...19`
## • `` -> `...20`
## • `` -> `...21`
## • `` -> `...22`
## • `` -> `...23`
## • `` -> `...24`
## • `` -> `...25`
## • `` -> `...26`
## • `` -> `...27`
## • `` -> `...28`
## • `` -> `...29`
## • `` -> `...30`
## • `` -> `...31`
## • `` -> `...32`

View(GroceryDataSet)

In market basket analysis,we observe the frequency at which customers buy products together. An example, would be if a customer buy a bag of chips , how likely are they to buy a can of soda or bottle of water. In order to understand this relationship , we seek to observe a technique called association. This shows the likelihood of the relationship occurring when the customer purchases an item.

We first have to check a summary of data, do see the frequencies of products purchased.

summary(grocery_data)

## transactions as itemMatrix in sparse format with
##  9835 rows (elements/itemsets/transactions) and
##  169 columns (items) and a density of 0.02609146 
## 
## most frequent items:
##       whole milk other vegetables       rolls/buns             soda 
##             2513             1903             1809             1715 
##           yogurt          (Other) 
##             1372            34055 
## 
## element (itemset/transaction) length distribution:
## sizes
##    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16 
## 2159 1643 1299 1005  855  645  545  438  350  246  182  117   78   77   55   46 
##   17   18   19   20   21   22   23   24   26   27   28   29   32 
##   29   14   14    9   11    4    6    1    1    1    1    3    1 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   2.000   3.000   4.409   6.000  32.000 
## 
## includes extended item information - examples:
##             labels
## 1 abrasive cleaner
## 2 artif. sweetener
## 3   baby cosmetics

We can explore each item to understand how frequently customer were interested in purchasing this items.

# We will display 15 of the most frequet items that are brought.
itemFrequencyPlot(grocery_data, topN=15, xlab = "Products", ylab = "Frequency of Items", main = "Top 15 Most Frequently Purchased Grocery Items", col = rainbow(15))

?itemFrequencyPlot

Our plot indicates that that milk, vegetables, and buns were the most popular items purchased in the store.

Each of these items have have a relationship that can be observed through “association rules”. Association rules creates “if-then” statements that show the relationship between these products in the grocery. If you went to store to purchase bread and jelly, you are more likely to purchase peanut butter. This association rule can be written as {bread, jelly} -> {peanut butter} This relationship creates a peanut butter and jelly sandwich. In order to evaluate this rules, we use metrics such as support, confidence and lift to understand the relationship. These metrics develop a ratio between the items in rules to understand its significance. Through support, we check how often the the items occur together on the list. This shows how common the combo, of bread and peanut butter is. bread and peanut butter. Confidence shows probability of peanut butter being purchased when bread is purchased. This give as the strength of the association rule. Lastly, lift shows the likelihood of bread and peanut butter being purchased together, compared to when the purchase of both items are made independent.A lift that is more than 1, is a positive association, equal to 1 is no association, and less than 1 is a negative association.

Our support parameter shows items that purchased together at least 1 percent of the time during a transaction. This sets the minimum threshold to 1%, so we considnder itemset that will occur 1 % in all of the transactions. The confidence has minimum confidence of 50%, showing we the 50% of the antecedent “{A,B}” will happens at least 50% of the time. The “minlen = 2” shows that the rule must, have least two items in there.

We can inspect the rules in the grocery data set using apriori function, found the arules library to create these rules.

# list is the data sctruture that hold collection of data. 

grocery_rules  =  apriori(grocery_data, parameter = list(supp = 0.01, conf = 0.5, minlen = 2))

## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.5    0.1    1 none FALSE            TRUE       5    0.01      2
##  maxlen target  ext
##      10  rules TRUE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 98 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[169 item(s), 9835 transaction(s)] done [0.00s].
## sorting and recoding items ... [88 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [15 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].

summary(grocery_rules)

## set of 15 rules
## 
## rule length distribution (lhs + rhs):sizes
##  3 
## 15 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       3       3       3       3       3       3 
## 
## summary of quality measures:
##     support          confidence        coverage            lift      
##  Min.   :0.01007   Min.   :0.5000   Min.   :0.01729   Min.   :1.984  
##  1st Qu.:0.01174   1st Qu.:0.5151   1st Qu.:0.02089   1st Qu.:2.036  
##  Median :0.01230   Median :0.5245   Median :0.02430   Median :2.203  
##  Mean   :0.01316   Mean   :0.5411   Mean   :0.02454   Mean   :2.299  
##  3rd Qu.:0.01403   3rd Qu.:0.5718   3rd Qu.:0.02598   3rd Qu.:2.432  
##  Max.   :0.02227   Max.   :0.5862   Max.   :0.04342   Max.   :3.030  
##      count      
##  Min.   : 99.0  
##  1st Qu.:115.5  
##  Median :121.0  
##  Mean   :129.4  
##  3rd Qu.:138.0  
##  Max.   :219.0  
## 
## mining info:
##          data ntransactions support confidence
##  grocery_data          9835    0.01        0.5
##                                                                                 call
##  apriori(data = grocery_data, parameter = list(supp = 0.01, conf = 0.5, minlen = 2))

Our grocery data does many association rules that occur throughout, with some rules have a stronger association than others. Mention previously, lift indicates how strong association is. We will sort through the grocery rules to observe this.

# Here will sort via the lift, looking the top 10 rules 
inspect(sort(grocery_rules, by="lift")[1:10])

##      lhs                                  rhs                support   
## [1]  {citrus fruit, root vegetables}   => {other vegetables} 0.01037112
## [2]  {root vegetables, tropical fruit} => {other vegetables} 0.01230300
## [3]  {rolls/buns, root vegetables}     => {other vegetables} 0.01220132
## [4]  {root vegetables, yogurt}         => {other vegetables} 0.01291307
## [5]  {curd, yogurt}                    => {whole milk}       0.01006609
## [6]  {butter, other vegetables}        => {whole milk}       0.01148958
## [7]  {root vegetables, tropical fruit} => {whole milk}       0.01199797
## [8]  {root vegetables, yogurt}         => {whole milk}       0.01453991
## [9]  {domestic eggs, other vegetables} => {whole milk}       0.01230300
## [10] {whipped/sour cream, yogurt}      => {whole milk}       0.01087951
##      confidence coverage   lift     count
## [1]  0.5862069  0.01769192 3.029608 102  
## [2]  0.5845411  0.02104728 3.020999 121  
## [3]  0.5020921  0.02430097 2.594890 120  
## [4]  0.5000000  0.02582613 2.584078 127  
## [5]  0.5823529  0.01728521 2.279125  99  
## [6]  0.5736041  0.02003050 2.244885 113  
## [7]  0.5700483  0.02104728 2.230969 118  
## [8]  0.5629921  0.02582613 2.203354 143  
## [9]  0.5525114  0.02226741 2.162336 121  
## [10] 0.5245098  0.02074225 2.052747 107

Sorting the rules with lift, show that {citrus fruit, root vegetables} = {other vegetables} as the strongest relationship. A customer will likely purchase this set togther, than the item set with weakest relationship at {other vegetables, whipped/sour cream} = {whole milk}.

Next, we can visualize our model through a basket model plot

# this graph shows the confidence, support and lift level of 10 rules 
plot(grocery_rules)

plot(grocery_rules, method = "graph")

This graph visualize th spread of the association rules among the purchase food items. Each point help us understand the purchases that frequently brought together and strength of association likely occurring. The size of each circle show items, determine which items are most likely brought. The arrows show that for one item brought, this item is likely purchased together with it. The item with the largest dot, is “whole milk”, because it has high support, and customers that purchase “whole milk” are morel likely to buy yogurt or curd, as “curd” as darker red indicating a strong association with “whole milk” and “curd”.

Clustering

{r}# install.packages("rstatix")

library(rstatix)

## 
## Attaching package: 'rstatix'

## The following object is masked from 'package:stats':
## 
##     filter

In clustering we are using a technique to group similar items together. transactions that occur in the cluster, will behave similar to each other , transactions in different cluster have more distinct behavior.

K- means clustering is technique, that will group the grocery items with similar purchase rules and measure the distance between each feature vector. Feature vector representing the customer behavior.

We will set different groups of cluster and repeat the algorithm, to observe the amount of items in cluster.

Before we start clustering, we must convert the grocery data into a data frame.

groceries_matrix <- as(GroceryDataSet, “matrix”)

grocery_dataframe = as.data.frame(as.matrix(grocery_data@data))

# we set row names for our data to check the grocery item
row.names(grocery_dataframe) =  grocery_data@itemInfo$labels

# we will set a seed before repeat each algorithm
set.seed(451)
# we break our cluster in 5 group, run the algorithm  75 times, then set a limit of 15 iterations to update the cluster
cluster_1 = kmeans(grocery_dataframe, centers = 5, nstart = 75,iter.max = 15 )

table(cluster_1$cluster)

## 
##   1   2   3   4   5 
##   2 143   1   1  22

fviz_cluster(cluster_1, 
             data = grocery_dataframe, 
             geom = "point",
             ellipse.type = "convex",  
             main = "K-means Clustering For Cluster 1")

# we can check the See the transactions in the cluster
cluster_1$cluster[cluster_1$cluster == 2]  #

##          abrasive cleaner          artif. sweetener            baby cosmetics 
##                         2                         2                         2 
##                 baby food                      bags             baking powder 
##                         2                         2                         2 
##          bathroom cleaner                   berries                 beverages 
##                         2                         2                         2 
##                    brandy               butter milk                  cake bar 
##                         2                         2                         2 
##                   candles                     candy               canned beer 
##                         2                         2                         2 
##               canned fish              canned fruit         canned vegetables 
##                         2                         2                         2 
##                  cat food                   cereals               chewing gum 
##                         2                         2                         2 
##                   chicken                 chocolate     chocolate marshmallow 
##                         2                         2                         2 
##                   cleaner           cling film/bags              cocoa drinks 
##                         2                         2                         2 
##                    coffee            condensed milk         cooking chocolate 
##                         2                         2                         2 
##                  cookware                     cream              cream cheese 
##                         2                         2                         2 
##               curd cheese               decalcifier               dental care 
##                         2                         2                         2 
##                   dessert                 detergent              dish cleaner 
##                         2                         2                         2 
##                    dishes                  dog food  female sanitary products 
##                         2                         2                         2 
##         finished products                      fish                     flour 
##                         2                         2                         2 
##            flower (seeds)    flower soil/fertilizer            frozen chicken 
##                         2                         2                         2 
##            frozen dessert               frozen fish             frozen fruits 
##                         2                         2                         2 
##              frozen meals    frozen potato products         frozen vegetables 
##                         2                         2                         2 
##                    grapes                hair spray                       ham 
##                         2                         2                         2 
##            hamburger meat               hard cheese                     herbs 
##                         2                         2                         2 
##                     honey    house keeping products          hygiene articles 
##                         2                         2                         2 
##                 ice cream            instant coffee     Instant food products 
##                         2                         2                         2 
##                       jam                   ketchup            kitchen towels 
##                         2                         2                         2 
##           kitchen utensil               light bulbs                   liqueur 
##                         2                         2                         2 
##                    liquor        liquor (appetizer)                liver loaf 
##                         2                         2                         2 
##  long life bakery product           make up remover            male cosmetics 
##                         2                         2                         2 
##                mayonnaise                      meat              meat spreads 
##                         2                         2                         2 
##           misc. beverages                   mustard                 nut snack 
##                         2                         2                         2 
##               nuts/prunes                       oil                    onions 
##                         2                         2                         2 
##          organic products           organic sausage packaged fruit/vegetables 
##                         2                         2                         2 
##                     pasta                  pet care                photo/film 
##                         2                         2                         2 
##        pickled vegetables                   popcorn                pot plants 
##                         2                         2                         2 
##           potato products     preservation products          processed cheese 
##                         2                         2                         2 
##                  prosecco            pudding powder               ready soups 
##                         2                         2                         2 
##            red/blush wine                      rice             roll products 
##                         2                         2                         2 
##           rubbing alcohol                       rum            salad dressing 
##                         2                         2                         2 
##                      salt               salty snack                    sauces 
##                         2                         2                         2 
##         seasonal products       semi-finished bread                 skin care 
##                         2                         2                         2 
##             sliced cheese            snack products                      soap 
##                         2                         2                         2 
##               soft cheese                  softener      sound storage medium 
##                         2                         2                         2 
##                     soups            sparkling wine             specialty bar 
##                         2                         2                         2 
##          specialty cheese       specialty chocolate             specialty fat 
##                         2                         2                         2 
##      specialty vegetables                    spices             spread cheese 
##                         2                         2                         2 
##                     sugar             sweet spreads                     syrup 
##                         2                         2                         2 
##                       tea                   tidbits            toilet cleaner 
##                         2                         2                         2 
##                    turkey                  UHT-milk                   vinegar 
##                         2                         2                         2 
##                   waffles                    whisky               white bread 
##                         2                         2                         2 
##                white wine                  zwieback 
##                         2                         2

Our first cluster groups the grocery items into five groups, and our graph shows their difference determined by two dimensions. When Dim 1 explains 8.2% of the difference between the items, and Dim2 explain 5.2 % difference between the items. Looking the table, the cluster with the most items is item 2 with 143 grocery items. Cluster 5 , has grouped 22 grocery items. With cluster 1,3 and 4 having at least 1 grocery item. Our cluster 2 has the most grocery items ,because it captures the most common behavior of customer shopping patterns Cluster 5 contains 22 grocery transactions, because it looks at unique group of shopping patterns, which does not common grocery items. Cluster with at least 1 item , are group of transactions that rarely purchased or outliers.

Next we will increase the cluster into different groups to visualize how they clustered and grouped.

set.seed(452)
# we break our cluster in 10 group, run the algorithm  75 times, then set a limit of 15 iterations to update the cluster
cluster_2 = kmeans(grocery_dataframe, centers = 10, nstart = 75,iter.max = 15 )
table(cluster_2$cluster)

## 
##   1   2   3   4   5   6   7   8   9  10 
##  21   1 138   2   1   2   1   1   1   1

fviz_cluster(cluster_2, 
             data = grocery_dataframe, 
             geom = "point",
             ellipse.type = "convex",  
             main = "K-means Clustering For Cluster 2")

cluster_2$cluster[cluster_2$cluster == 3]

##          abrasive cleaner          artif. sweetener            baby cosmetics 
##                         3                         3                         3 
##                 baby food                      bags             baking powder 
##                         3                         3                         3 
##          bathroom cleaner                   berries                 beverages 
##                         3                         3                         3 
##                    brandy               butter milk                  cake bar 
##                         3                         3                         3 
##                   candles                     candy               canned beer 
##                         3                         3                         3 
##               canned fish              canned fruit         canned vegetables 
##                         3                         3                         3 
##                  cat food                   cereals               chewing gum 
##                         3                         3                         3 
##     chocolate marshmallow                   cleaner           cling film/bags 
##                         3                         3                         3 
##              cocoa drinks            condensed milk         cooking chocolate 
##                         3                         3                         3 
##                  cookware                     cream              cream cheese 
##                         3                         3                         3 
##               curd cheese               decalcifier               dental care 
##                         3                         3                         3 
##                   dessert                 detergent              dish cleaner 
##                         3                         3                         3 
##                    dishes                  dog food  female sanitary products 
##                         3                         3                         3 
##         finished products                      fish                     flour 
##                         3                         3                         3 
##            flower (seeds)    flower soil/fertilizer            frozen chicken 
##                         3                         3                         3 
##            frozen dessert               frozen fish             frozen fruits 
##                         3                         3                         3 
##              frozen meals    frozen potato products                    grapes 
##                         3                         3                         3 
##                hair spray                       ham            hamburger meat 
##                         3                         3                         3 
##               hard cheese                     herbs                     honey 
##                         3                         3                         3 
##    house keeping products          hygiene articles                 ice cream 
##                         3                         3                         3 
##            instant coffee     Instant food products                       jam 
##                         3                         3                         3 
##                   ketchup            kitchen towels           kitchen utensil 
##                         3                         3                         3 
##               light bulbs                   liqueur                    liquor 
##                         3                         3                         3 
##        liquor (appetizer)                liver loaf  long life bakery product 
##                         3                         3                         3 
##           make up remover            male cosmetics                mayonnaise 
##                         3                         3                         3 
##                      meat              meat spreads           misc. beverages 
##                         3                         3                         3 
##                   mustard                 nut snack               nuts/prunes 
##                         3                         3                         3 
##                       oil                    onions          organic products 
##                         3                         3                         3 
##           organic sausage packaged fruit/vegetables                     pasta 
##                         3                         3                         3 
##                  pet care                photo/film        pickled vegetables 
##                         3                         3                         3 
##                   popcorn                pot plants           potato products 
##                         3                         3                         3 
##     preservation products          processed cheese                  prosecco 
##                         3                         3                         3 
##            pudding powder               ready soups            red/blush wine 
##                         3                         3                         3 
##                      rice             roll products           rubbing alcohol 
##                         3                         3                         3 
##                       rum            salad dressing                      salt 
##                         3                         3                         3 
##               salty snack                    sauces         seasonal products 
##                         3                         3                         3 
##       semi-finished bread                 skin care             sliced cheese 
##                         3                         3                         3 
##            snack products                      soap               soft cheese 
##                         3                         3                         3 
##                  softener      sound storage medium                     soups 
##                         3                         3                         3 
##            sparkling wine             specialty bar          specialty cheese 
##                         3                         3                         3 
##       specialty chocolate             specialty fat      specialty vegetables 
##                         3                         3                         3 
##                    spices             spread cheese                     sugar 
##                         3                         3                         3 
##             sweet spreads                     syrup                       tea 
##                         3                         3                         3 
##                   tidbits            toilet cleaner                    turkey 
##                         3                         3                         3 
##                  UHT-milk                   vinegar                   waffles 
##                         3                         3                         3 
##                    whisky                white wine                  zwieback 
##                         3                         3                         3

Our data is now groups grocery items into 10 groups, where cluster 3 contains 138 items and cluster 1 contains 21 grocery items. Each cluster contains at least 1 item. Cluster 3 contains most of the common combination of grocery transactions. In cluster 2,4, and 5 to 10, these transactions most likely outliers that rare to occur. Increasing our cluster , will create more groups, with some cluster more meaningful than other. This could lead to overfitting our data, making it harder to explain the shopping behavior of the consumer

set.seed(453)
# we break our cluster in 10 group, run the algorithm  75 times, then set a limit of 15 iterations to update the cluster
cluster_3 = kmeans(grocery_dataframe, centers = 20, nstart = 75,iter.max = 15 )
table(cluster_3$cluster)

## 
##   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19  20 
##   1   1   1  21   1   1   1   1   1   1   1   1   1   1   1   1   1   1   1 130

fviz_cluster(cluster_3, 
             data = grocery_dataframe, 
             geom = "point",
             ellipse.type = "convex",  
             main = "K-means Clustering For Cluster 3")

Our data is now groups grocery items into 20 groups, where cluster 4 contains 21 items and cluster 20 contains 130 grocery items. Each cluster contains at least 1 item. Cluster 20 is dominant, meaning half of the data is contained in that data. Increasing the cluster, has now cause overfitting, as most transactions are concentrated into cluster 20.

cluster_3$cluster[cluster_3$cluster == 1]

## newspapers 
##          1

set.seed(454)
# we break our cluster in 10 group, run the algorithm  75 times, then set a limit of 15 iterations to update the cluster
cluster_4 = kmeans(grocery_dataframe, centers = 50, nstart = 75,iter.max = 15 )
table(cluster_4$cluster)

## 
##   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17  18  19  20 
##   1   1   1   1   1   1   1   1   1   1   1   1   1   5   1   1   1   1   1   1 
##  21  22  23  24  25  26  27  28  29  30  31  32  33  34  35  36  37  38  39  40 
##   1   1   2   1   1   1   1   1   1   1   1   1   1 115   1   1   1   1   1   1 
##  41  42  43  44  45  46  47  48  49  50 
##   1   1   1   1   1   1   1   1   1   1

fviz_cluster(cluster_4, 
             data = grocery_dataframe, 
             geom = "point",
             ellipse.type = "convex",  
             main = "K-means Clustering For Cluster 4") + 
            theme(legend.position = "none")

## Warning in grid.Call.graphics(C_points, x$x, x$y, x$pch, x$size): unimplemented
## pch value '30'

## Warning in grid.Call.graphics(C_points, x$x, x$y, x$pch, x$size): unimplemented
## pch value '29'

## Warning in grid.Call.graphics(C_points, x$x, x$y, x$pch, x$size): unimplemented
## pch value '31'

## Warning in grid.Call.graphics(C_points, x$x, x$y, x$pch, x$size): unimplemented
## pch value '27'

## Warning in grid.Call.graphics(C_points, x$x, x$y, x$pch, x$size): unimplemented
## pch value '28'

## Warning in grid.Call.graphics(C_points, x$x, x$y, x$pch, x$size): unimplemented
## pch value '26'
## Warning in grid.Call.graphics(C_points, x$x, x$y, x$pch, x$size): unimplemented
## pch value '26'

## Warning in grid.Call.graphics(C_points, x$x, x$y, x$pch, x$size): unimplemented
## pch value '27'

## Warning in grid.Call.graphics(C_points, x$x, x$y, x$pch, x$size): unimplemented
## pch value '28'

## Warning in grid.Call.graphics(C_points, x$x, x$y, x$pch, x$size): unimplemented
## pch value '29'

## Warning in grid.Call.graphics(C_points, x$x, x$y, x$pch, x$size): unimplemented
## pch value '30'

## Warning in grid.Call.graphics(C_points, x$x, x$y, x$pch, x$size): unimplemented
## pch value '31'

cluster_4$cluster[cluster_4$cluster == 34]

##          abrasive cleaner          artif. sweetener            baby cosmetics 
##                        34                        34                        34 
##                 baby food                      bags             baking powder 
##                        34                        34                        34 
##          bathroom cleaner                    brandy                  cake bar 
##                        34                        34                        34 
##                   candles               canned fish              canned fruit 
##                        34                        34                        34 
##         canned vegetables                  cat food                   cereals 
##                        34                        34                        34 
##               chewing gum     chocolate marshmallow                   cleaner 
##                        34                        34                        34 
##           cling film/bags              cocoa drinks            condensed milk 
##                        34                        34                        34 
##         cooking chocolate                  cookware                     cream 
##                        34                        34                        34 
##               curd cheese               decalcifier               dental care 
##                        34                        34                        34 
##                 detergent              dish cleaner                    dishes 
##                        34                        34                        34 
##                  dog food  female sanitary products         finished products 
##                        34                        34                        34 
##                      fish                     flour            flower (seeds) 
##                        34                        34                        34 
##    flower soil/fertilizer            frozen chicken            frozen dessert 
##                        34                        34                        34 
##               frozen fish             frozen fruits    frozen potato products 
##                        34                        34                        34 
##                    grapes                hair spray                     herbs 
##                        34                        34                        34 
##                     honey    house keeping products                 ice cream 
##                        34                        34                        34 
##            instant coffee     Instant food products                       jam 
##                        34                        34                        34 
##                   ketchup            kitchen towels           kitchen utensil 
##                        34                        34                        34 
##               light bulbs                   liqueur                    liquor 
##                        34                        34                        34 
##        liquor (appetizer)                liver loaf           make up remover 
##                        34                        34                        34 
##            male cosmetics                mayonnaise                      meat 
##                        34                        34                        34 
##              meat spreads                   mustard                 nut snack 
##                        34                        34                        34 
##               nuts/prunes          organic products           organic sausage 
##                        34                        34                        34 
## packaged fruit/vegetables                     pasta                  pet care 
##                        34                        34                        34 
##                photo/film        pickled vegetables                   popcorn 
##                        34                        34                        34 
##                pot plants           potato products     preservation products 
##                        34                        34                        34 
##          processed cheese                  prosecco            pudding powder 
##                        34                        34                        34 
##               ready soups            red/blush wine                      rice 
##                        34                        34                        34 
##             roll products           rubbing alcohol                       rum 
##                        34                        34                        34 
##            salad dressing                      salt                    sauces 
##                        34                        34                        34 
##         seasonal products       semi-finished bread                 skin care 
##                        34                        34                        34 
##            snack products                      soap               soft cheese 
##                        34                        34                        34 
##                  softener      sound storage medium                     soups 
##                        34                        34                        34 
##            sparkling wine          specialty cheese             specialty fat 
##                        34                        34                        34 
##      specialty vegetables                    spices             spread cheese 
##                        34                        34                        34 
##             sweet spreads                     syrup                       tea 
##                        34                        34                        34 
##                   tidbits            toilet cleaner                    turkey 
##                        34                        34                        34 
##                   vinegar                    whisky                white wine 
##                        34                        34                        34 
##                  zwieback 
##                        34

These items in cluster 34, shows a customer is likely to purchase these groups of items together during a transactions. These are everyday food items people purchase for breakfast or lunch.

Our data is now groups grocery items into 100 groups, where cluster 34 contains 115 items and Each cluster contains at least 1 item. With so many clusters, containing 1 item we have caused overfitting, which have group of grocery item that are not useful for understanding the customer behavior.

Assignment 10

Nana Frimpong

2025-05-04

Market Basket Analyis

Clustering