##1 Introdution
Association rule is a rule-based machine learning method used to discover interesting relationships between variables in large databases. It aims to use some measures to identify strong rules found in the database. This rule-based approach will also generate new rules as more data is analyzed. Assuming that the data set is large enough, the ultimate goal is to help the machine simulate the feature extraction of the human brain and the abstract association capabilities of new unclassified data.
##2 Database
The data is suitable to do data mining for market basket analysis which has multiple variables. Based on that data the market basket analysis will be performed. The data is taken from kaggle platform. The database contains 9835 transactions from the grocery stores and 169 different products. Website link: https://www.kaggle.com/irfanasrullah/groceries?select=groceries.csv
First, importing the data first and transactionlized the data.
library(arules)
## Loading required package: Matrix
##
## Attaching package: 'arules'
## The following objects are masked from 'package:base':
##
## abbreviate, write
library(arulesViz)
## Loading required package: grid
trans<-read.transactions("groceries.csv", format="basket", sep=",", skip=0)
trans
## transactions in sparse format with
## 9835 transactions (rows) and
## 169 items (columns)
summary(trans)
## transactions as itemMatrix in sparse format with
## 9835 rows (elements/itemsets/transactions) and
## 169 columns (items) and a density of 0.02609146
##
## most frequent items:
## whole milk other vegetables rolls/buns soda
## 2513 1903 1809 1715
## yogurt (Other)
## 1372 34055
##
## element (itemset/transaction) length distribution:
## sizes
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
## 2159 1643 1299 1005 855 645 545 438 350 246 182 117 78 77 55 46
## 17 18 19 20 21 22 23 24 26 27 28 29 32
## 29 14 14 9 11 4 6 1 1 1 1 3 1
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 2.000 3.000 4.409 6.000 32.000
##
## includes extended item information - examples:
## labels
## 1 abrasive cleaner
## 2 artif. sweetener
## 3 baby cosmetics
Viewing the columns of the data.
LIST(head(trans))
## [[1]]
## [1] "citrus fruit" "margarine" "ready soups"
## [4] "semi-finished bread"
##
## [[2]]
## [1] "coffee" "tropical fruit" "yogurt"
##
## [[3]]
## [1] "whole milk"
##
## [[4]]
## [1] "cream cheese" "meat spreads" "pip fruit" "yogurt"
##
## [[5]]
## [1] "condensed milk" "long life bakery product"
## [3] "other vegetables" "whole milk"
##
## [[6]]
## [1] "abrasive cleaner" "butter" "rice" "whole milk"
## [5] "yogurt"
unique(trans)
## transactions in sparse format with
## 7011 transactions (rows) and
## 169 items (columns)
head (colnames(trans),n = 40)
## [1] "abrasive cleaner" "artif. sweetener" "baby cosmetics"
## [4] "baby food" "bags" "baking powder"
## [7] "bathroom cleaner" "beef" "berries"
## [10] "beverages" "bottled beer" "bottled water"
## [13] "brandy" "brown bread" "butter"
## [16] "butter milk" "cake bar" "candles"
## [19] "candy" "canned beer" "canned fish"
## [22] "canned fruit" "canned vegetables" "cat food"
## [25] "cereals" "chewing gum" "chicken"
## [28] "chocolate" "chocolate marshmallow" "citrus fruit"
## [31] "cleaner" "cling film/bags" "cocoa drinks"
## [34] "coffee" "condensed milk" "cooking chocolate"
## [37] "cookware" "cream" "cream cheese"
## [40] "curd"
Viewing the amount of items.
head(sort(itemFrequency(trans, type="absolute"), decreasing=TRUE), n=48)
## whole milk other vegetables rolls/buns
## 2513 1903 1809
## soda yogurt bottled water
## 1715 1372 1087
## root vegetables tropical fruit shopping bags
## 1072 1032 969
## sausage pastry citrus fruit
## 924 875 814
## bottled beer newspapers canned beer
## 792 785 764
## pip fruit fruit/vegetable juice whipped/sour cream
## 744 711 705
## brown bread domestic eggs frankfurter
## 638 624 580
## margarine coffee pork
## 576 571 567
## butter curd beef
## 545 524 516
## napkins chocolate frozen vegetables
## 515 488 473
## chicken white bread cream cheese
## 422 414 390
## waffles salty snack long life bakery product
## 378 372 368
## dessert sugar UHT-milk
## 365 333 329
## berries hamburger meat hygiene articles
## 327 327 324
## onions specialty chocolate candy
## 305 299 294
## frozen meals misc. beverages oil
## 279 279 276
##3 Association Rules
Association Rules reflect the interdependence and relevance between one thing and other things. It is an important technique for data exploration and is used for the correlation between data items excavated from a large amount of data.
3.1.Grouping products The database consists of almost 200 different items, so there should be many order of product placement in supermarkets. These products will be grouped into 15 groups.
baby products vegetables agricultural products fruits poultry dairy products daily necessities seasonings alcoholic beverages products ready food seafood bathroom and kitchen products other
names.real<-c( "baby food", "sound storage medium", "preservation products", "bags", "kitchen utensil", "baby cosmetics", "frozen chicken", "toilet cleaner", "make up remover", "salad dressin", "whisky", "liqueur", "butter", "margarine", "cocoa drinks","rubbing alcohol", "hair spray", "frozen fruits", "cream", "decalcifier","honey", "organic products", "specialty vegetables", "ready soups", "flower soil/fertilizer","prosecco", "cooking chocolate", "organic sausage", "pudding powder", "tidbits","pork", "soap", "bathroom cleaner", "cookware", "potato products","fish", "snack products", "nut snack", "artif. sweetener", "canned fruit", "syrup", "nuts/prunes", "abrasive cleaner", "skin care", "specialty fat", "tea", "brandy", "light bulbs", "ketchup", "meat spreads", "rum", "male cosmetics", "cleaner", "curd cheese", "liver loaf", "spices", "jam", "sauces", "softener", "sparkling wine", "cereals", "denta care", "kitchen towels", "female sanitary products", "finished products", "vinegar", "soups", "zwieback", "popcorn", "instant coffee", "rice", "liquor (appetizer)", "Instant food products", "turkey", "house keeping products", "frozen potato products", "dog food", "specialty cheese", "candles", "chocolate marshmallow", "sweet spreads", "mayonnaise", "photo/film", "pet care", "condensed milk", "roll products", "flower (seeds)", "dish cleaner", "canned vegetables", "frozen products", "salt", "liquor", "spread cheese", "cling film/bags", "frozen fish", "mustard", "packaged fruit/vegetables", "cake bar", "seasonal products", "canned fish", "pasta", "herbs", "processed cheese", "soft cheese", "potted plants", "flour", "dishes", "baking powder", "semi-finished bread", "pickled vegetables", "white wine", "detergent", "red/blush wine", "chewing gum", "grapes", "cat food", "hard cheese", "sliced cheese", "ice cream", "meat", "coffee", "ham", "specialty bar", "butter milk", "oil", "frozen meals", "misc. beverages", "candy", "specialty chocolate", "onions", "hygiene articles", "berries", "hamburger meat", "UHT-milk", "sugar", "products", "long life bakery product", "salty snack", "waffles", "cream cheese", "white bread", "chicken", "frozen vegetables", "chocolate", "napkins", "beef", "curd", "frankfurter", "dental care", "domestic eggs", "brown bread", "whipped/sour cream", "fruit/vegetable juice", "pip fruit", "canned beer", "newspapers", "bottled beer", "citrus fruit", "pastry", "sausage", "shopping bags", "tropical fruit", "root vegetables", "bottled water", "yogurt", "soda", "rolls/buns", "other vegetables", "whole milk")
length(names.real)
## [1] 169
length(unique(names.real))
## [1] 169
# baby products
# vegetables
# agricultural products
# fruits
# poultry
# dairy products
# daily necessities
# seasonings
# alcoholic
# beverages
# products
# ready food
# seafood
# bathroom and kitchen products
# other
names.level1<-c("baby products", "other", "daily necessities", "daily necessities", "daily necessities", "baby products", "poultry", "daily necessities", "daily necessities", "seasonings", "alcoholic", "alcoholic", "dairy products", "dairy products", "beverages", "alcoholic", "other", "fruits", "dairy products", "other", "seasonings", "agricultural products", "vegetables", "ready food", "agricultural products", "alcoholic", "products", "poultry", "products", "ready food", "poultry", "bathroom and kitchen products", "bathroom and kitchen products", "bathroom and kitchen products", "agricultural products", "seafood", "products", "products", "products", "ready food", "products", "products", "bathroom and kitchen products", "daily necessities", "other", "agricultural products", "alcoholic", "daily necessities", "seasonings", "seasonings", "alcoholic", "other", "bathroom and kitchen products", "dairy products", "poultry", "seasonings", "seasonings", "seasonings", "other", "alcoholic", "ready food", "bathroom and kitchen products", "bathroom and kitchen products", "bathroom and kitchen products", "ready food", "seasonings", "seasonings", "products", "agricultural products", "agricultural products", "agricultural products", "alcoholic", "ready food", "poultry", "bathroom and kitchen products", "ready food", "other", "dairy products", "other", "products", "seasonings", "seasonings", "other", "other", "dairy products", "ready food", "agricultural products", "bathroom and kitchen products", "ready food", "ready food", "seasonings", "alcoholic", "dairy products", "daily necessities", "ready food", "seasonings", "agricultural products", "products", "seasonings", "ready food", "agricultural products", "agricultural products", "dairy products", "dairy products", "agricultural products", "agricultural products", "bathroom and kitchen products", "agricultural products", "agricultural products", "ready food", "alcoholic", "bathroom and kitchen products", "alcoholic", "other", "fruits", "other", "dairy products", "dairy products", "products", "poultry", "beverages", "poultry", "products", "dairy products", "seasonings", "ready food", "beverages", "products", "products", "agricultural products", "bathroom and kitchen products", "fruits", "poultry", "dairy products", "seasonings", "products", "agricultural products", "products", "products", "dairy products", "agricultural products", "poultry", "ready food", "products", "bathroom and kitchen products", "poultry", "other", "poultry","other", "poultry", "agricultural products", "dairy products", "fruits", "fruits", "alcoholic", "other", "alcoholic", "fruits", "fruits", "poultry", "other", "fruits", "vegetables", "beverages", "dairy products", "beverages", "agricultural products", "vegetables", "dairy products")
library(data.table)
length(names.level1)
## [1] 169
length(unique(names.level1))
## [1] 15
itemInfo(trans) <- data.frame(labels = names.real, level1 = names.level1)
itemInfo(trans)
## labels level1
## 1 baby food baby products
## 2 sound storage medium other
## 3 preservation products daily necessities
## 4 bags daily necessities
## 5 kitchen utensil daily necessities
## 6 baby cosmetics baby products
## 7 frozen chicken poultry
## 8 toilet cleaner daily necessities
## 9 make up remover daily necessities
## 10 salad dressin seasonings
## 11 whisky alcoholic
## 12 liqueur alcoholic
## 13 butter dairy products
## 14 margarine dairy products
## 15 cocoa drinks beverages
## 16 rubbing alcohol alcoholic
## 17 hair spray other
## 18 frozen fruits fruits
## 19 cream dairy products
## 20 decalcifier other
## 21 honey seasonings
## 22 organic products agricultural products
## 23 specialty vegetables vegetables
## 24 ready soups ready food
## 25 flower soil/fertilizer agricultural products
## 26 prosecco alcoholic
## 27 cooking chocolate products
## 28 organic sausage poultry
## 29 pudding powder products
## 30 tidbits ready food
## 31 pork poultry
## 32 soap bathroom and kitchen products
## 33 bathroom cleaner bathroom and kitchen products
## 34 cookware bathroom and kitchen products
## 35 potato products agricultural products
## 36 fish seafood
## 37 snack products products
## 38 nut snack products
## 39 artif. sweetener products
## 40 canned fruit ready food
## 41 syrup products
## 42 nuts/prunes products
## 43 abrasive cleaner bathroom and kitchen products
## 44 skin care daily necessities
## 45 specialty fat other
## 46 tea agricultural products
## 47 brandy alcoholic
## 48 light bulbs daily necessities
## 49 ketchup seasonings
## 50 meat spreads seasonings
## 51 rum alcoholic
## 52 male cosmetics other
## 53 cleaner bathroom and kitchen products
## 54 curd cheese dairy products
## 55 liver loaf poultry
## 56 spices seasonings
## 57 jam seasonings
## 58 sauces seasonings
## 59 softener other
## 60 sparkling wine alcoholic
## 61 cereals ready food
## 62 denta care bathroom and kitchen products
## 63 kitchen towels bathroom and kitchen products
## 64 female sanitary products bathroom and kitchen products
## 65 finished products ready food
## 66 vinegar seasonings
## 67 soups seasonings
## 68 zwieback products
## 69 popcorn agricultural products
## 70 instant coffee agricultural products
## 71 rice agricultural products
## 72 liquor (appetizer) alcoholic
## 73 Instant food products ready food
## 74 turkey poultry
## 75 house keeping products bathroom and kitchen products
## 76 frozen potato products ready food
## 77 dog food other
## 78 specialty cheese dairy products
## 79 candles other
## 80 chocolate marshmallow products
## 81 sweet spreads seasonings
## 82 mayonnaise seasonings
## 83 photo/film other
## 84 pet care other
## 85 condensed milk dairy products
## 86 roll products ready food
## 87 flower (seeds) agricultural products
## 88 dish cleaner bathroom and kitchen products
## 89 canned vegetables ready food
## 90 frozen products ready food
## 91 salt seasonings
## 92 liquor alcoholic
## 93 spread cheese dairy products
## 94 cling film/bags daily necessities
## 95 frozen fish ready food
## 96 mustard seasonings
## 97 packaged fruit/vegetables agricultural products
## 98 cake bar products
## 99 seasonal products seasonings
## 100 canned fish ready food
## 101 pasta agricultural products
## 102 herbs agricultural products
## 103 processed cheese dairy products
## 104 soft cheese dairy products
## 105 potted plants agricultural products
## 106 flour agricultural products
## 107 dishes bathroom and kitchen products
## 108 baking powder agricultural products
## 109 semi-finished bread agricultural products
## 110 pickled vegetables ready food
## 111 white wine alcoholic
## 112 detergent bathroom and kitchen products
## 113 red/blush wine alcoholic
## 114 chewing gum other
## 115 grapes fruits
## 116 cat food other
## 117 hard cheese dairy products
## 118 sliced cheese dairy products
## 119 ice cream products
## 120 meat poultry
## 121 coffee beverages
## 122 ham poultry
## 123 specialty bar products
## 124 butter milk dairy products
## 125 oil seasonings
## 126 frozen meals ready food
## 127 misc. beverages beverages
## 128 candy products
## 129 specialty chocolate products
## 130 onions agricultural products
## 131 hygiene articles bathroom and kitchen products
## 132 berries fruits
## 133 hamburger meat poultry
## 134 UHT-milk dairy products
## 135 sugar seasonings
## 136 products products
## 137 long life bakery product agricultural products
## 138 salty snack products
## 139 waffles products
## 140 cream cheese dairy products
## 141 white bread agricultural products
## 142 chicken poultry
## 143 frozen vegetables ready food
## 144 chocolate products
## 145 napkins bathroom and kitchen products
## 146 beef poultry
## 147 curd other
## 148 frankfurter poultry
## 149 dental care other
## 150 domestic eggs poultry
## 151 brown bread agricultural products
## 152 whipped/sour cream dairy products
## 153 fruit/vegetable juice fruits
## 154 pip fruit fruits
## 155 canned beer alcoholic
## 156 newspapers other
## 157 bottled beer alcoholic
## 158 citrus fruit fruits
## 159 pastry fruits
## 160 sausage poultry
## 161 shopping bags other
## 162 tropical fruit fruits
## 163 root vegetables vegetables
## 164 bottled water beverages
## 165 yogurt dairy products
## 166 soda beverages
## 167 rolls/buns agricultural products
## 168 other vegetables vegetables
## 169 whole milk dairy products
Creating the plot for transaction in groups.
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:data.table':
##
## between, first, last
## The following objects are masked from 'package:arules':
##
## intersect, recode, setdiff, setequal, union
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(arulesViz)
# Absolute Item Frequency Plot
trans_2<-aggregate(na.omit(trans), by="level1")
itemFrequencyPlot(trans_2, topN=20, type="relative",col="#607CAF", xlab="Item name",
ylab="Frequency (relative)", main="Item Frequency")
We can see that the difference between each item is small in the plot. It shows that the dairy products is the most frequent one. The number is below
sort(itemFrequency(trans_2, type="absolute"), decreasing=TRUE)
## dairy products products
## 4387 4331
## agricultural products ready food
## 3810 3563
## bathroom and kitchen products seasonings
## 3275 2789
## alcoholic other
## 2492 2068
## vegetables fruits
## 1940 1665
## poultry daily necessities
## 1642 1286
## beverages baby products
## 790 209
## seafood
## 25
#3.2 Frequency inspection
Then, create the cross table for transactions.
ctab <- crossTable(trans_2, sort=TRUE)
ctab
## dairy products products agricultural products
## dairy products 4387 2197 2158
## products 2197 4331 1921
## agricultural products 2158 1921 3810
## ready food 2086 1787 1845
## bathroom and kitchen products 1857 1726 1589
## seasonings 1580 1498 1442
## alcoholic 1244 1212 1061
## other 999 933 800
## vegetables 1186 1031 1063
## fruits 1010 876 893
## poultry 916 852 722
## daily necessities 825 681 685
## beverages 500 416 441
## baby products 147 113 130
## seafood 12 12 15
## ready food bathroom and kitchen products
## dairy products 2086 1857
## products 1787 1726
## agricultural products 1845 1589
## ready food 3563 1560
## bathroom and kitchen products 1560 3275
## seasonings 1369 1194
## alcoholic 1088 977
## other 796 745
## vegetables 1055 879
## fruits 934 741
## poultry 778 718
## daily necessities 680 573
## beverages 437 365
## baby products 135 113
## seafood 14 11
## seasonings alcoholic other vegetables fruits
## dairy products 1580 1244 999 1186 1010
## products 1498 1212 933 1031 876
## agricultural products 1442 1061 800 1063 893
## ready food 1369 1088 796 1055 934
## bathroom and kitchen products 1194 977 745 879 741
## seasonings 2789 798 603 772 677
## alcoholic 798 2492 542 601 489
## other 603 542 2068 405 383
## vegetables 772 601 405 1940 533
## fruits 677 489 383 533 1665
## poultry 605 527 414 443 417
## daily necessities 519 379 318 447 349
## beverages 322 271 182 253 192
## baby products 93 73 45 98 69
## seafood 11 10 5 8 6
## poultry daily necessities beverages baby products
## dairy products 916 825 500 147
## products 852 681 416 113
## agricultural products 722 685 441 130
## ready food 778 680 437 135
## bathroom and kitchen products 718 573 365 113
## seasonings 605 519 322 93
## alcoholic 527 379 271 73
## other 414 318 182 45
## vegetables 443 447 253 98
## fruits 417 349 192 69
## poultry 1642 301 190 61
## daily necessities 301 1286 157 46
## beverages 190 157 790 43
## baby products 61 46 43 209
## seafood 5 6 3 7
## seafood
## dairy products 12
## products 12
## agricultural products 15
## ready food 14
## bathroom and kitchen products 11
## seasonings 11
## alcoholic 10
## other 5
## vegetables 8
## fruits 6
## poultry 5
## daily necessities 6
## beverages 3
## baby products 7
## seafood 25
chitest<-crossTable(trans_2, measure = "chiSquared", sort=TRUE)
chitest
## dairy products products agricultural products
## dairy products NA 3.699220e-03 1.257789e-02
## products 3.699220e-03 NA 3.584538e-03
## agricultural products 1.257789e-02 3.584538e-03 NA
## ready food 1.578281e-02 3.079020e-03 1.590913e-02
## bathroom and kitchen products 1.092316e-02 5.678440e-03 8.221556e-03
## seasonings 9.223673e-03 6.027105e-03 1.230259e-02
## alcoholic 1.603916e-03 1.217005e-03 9.629802e-04
## other 6.458756e-04 5.563749e-05 1.610851e-07
## vegetables 1.208025e-02 3.715638e-03 1.312432e-02
## fruits 9.782479e-03 2.827460e-03 9.694768e-03
## poultry 4.678000e-03 2.337073e-03 1.179539e-03
## daily necessities 1.119973e-02 2.361655e-03 7.122830e-03
## beverages 6.287126e-03 1.355867e-03 6.051462e-03
## baby products 3.153712e-03 4.855042e-04 3.019552e-03
## seafood 6.564419e-06 9.067483e-06 2.966022e-04
## ready food bathroom and kitchen products
## dairy products 0.0157828099 1.092316e-02
## products 0.0030790203 5.678440e-03
## agricultural products 0.0159091308 8.221556e-03
## ready food NA 1.195774e-02
## bathroom and kitchen products 0.0119577441 NA
## seasonings 0.0129412193 7.704503e-03
## alcoholic 0.0038631194 2.654154e-03
## other 0.0002973790 4.691338e-04
## vegetables 0.0179438451 8.544068e-03
## fruits 0.0184468195 6.383099e-03
## poultry 0.0057329540 5.451810e-03
## daily necessities 0.0100050969 4.976225e-03
## beverages 0.0080791165 4.016089e-03
## baby products 0.0047196814 2.752360e-03
## seafood 0.0002743065 8.740608e-05
## seasonings alcoholic other
## dairy products 9.223673e-03 1.603916e-03 6.458756e-04
## products 6.027105e-03 1.217005e-03 5.563749e-05
## agricultural products 1.230259e-02 9.629802e-04 1.610851e-07
## ready food 1.294122e-02 3.863119e-03 2.973790e-04
## bathroom and kitchen products 7.704503e-03 2.654154e-03 4.691338e-04
## seasonings NA 1.199899e-03 4.753831e-05
## alcoholic 1.199899e-03 NA 6.293011e-05
## other 4.753831e-05 6.293011e-05 NA
## vegetables 9.096925e-03 2.477494e-03 2.129235e-06
## fruits 9.035881e-03 1.085812e-03 3.143860e-04
## poultry 4.241061e-03 3.008318e-03 1.391441e-03
## daily necessities 6.639570e-03 8.815670e-04 8.517333e-04
## beverages 4.356455e-03 2.548293e-03 1.544947e-04
## baby products 1.952040e-03 7.713459e-04 2.568771e-06
## seafood 2.193216e-04 2.156621e-04 1.274922e-06
## vegetables fruits poultry
## dairy products 1.208025e-02 9.782479e-03 4.678000e-03
## products 3.715638e-03 2.827460e-03 2.337073e-03
## agricultural products 1.312432e-02 9.694768e-03 1.179539e-03
## ready food 1.794385e-02 1.844682e-02 5.732954e-03
## bathroom and kitchen products 8.544068e-03 6.383099e-03 5.451810e-03
## seasonings 9.096925e-03 9.035881e-03 4.241061e-03
## alcoholic 2.477494e-03 1.085812e-03 3.008318e-03
## other 2.129235e-06 3.143860e-04 1.391441e-03
## vegetables NA 1.295603e-02 4.453540e-03
## fruits 1.295603e-02 NA 7.069184e-03
## poultry 4.453540e-03 7.069184e-03 NA
## daily necessities 1.498159e-02 8.050075e-03 3.526705e-03
## beverages 6.160624e-03 2.580321e-03 2.602783e-03
## baby products 7.949639e-03 3.247695e-03 1.985987e-03
## seafood 1.941547e-04 7.506654e-05 1.662589e-05
## daily necessities beverages baby products
## dairy products 0.0111997264 6.287126e-03 3.153712e-03
## products 0.0023616550 1.355867e-03 4.855042e-04
## agricultural products 0.0071228298 6.051462e-03 3.019552e-03
## ready food 0.0100050969 8.079116e-03 4.719681e-03
## bathroom and kitchen products 0.0049762252 4.016089e-03 2.752360e-03
## seasonings 0.0066395700 4.356455e-03 1.952040e-03
## alcoholic 0.0008815670 2.548293e-03 7.713459e-04
## other 0.0008517333 1.544947e-04 2.568771e-06
## vegetables 0.0149815875 6.160624e-03 7.949639e-03
## fruits 0.0080500748 2.580321e-03 3.247695e-03
## poultry 0.0035267047 2.602783e-03 1.985987e-03
## daily necessities NA 2.838612e-03 1.297119e-03
## beverages 0.0028386118 NA 4.161279e-03
## baby products 0.0012971185 4.161279e-03 NA
## seafood 0.0002319970 4.981254e-05 8.008521e-03
## seafood
## dairy products 6.564419e-06
## products 9.067483e-06
## agricultural products 2.966022e-04
## ready food 2.743065e-04
## bathroom and kitchen products 8.740608e-05
## seasonings 2.193216e-04
## alcoholic 2.156621e-04
## other 1.274922e-06
## vegetables 1.941547e-04
## fruits 7.506654e-05
## poultry 1.662589e-05
## daily necessities 2.319970e-04
## beverages 4.981254e-05
## baby products 8.008521e-03
## seafood NA
The table shows that the null hypothesis of all the cases should be rejected which means the products are not independent.
#3.3 Apriori algorithm
Apriori is a classic algorithm for mining data relevance. It uses an iterative method to first search out the support of each Item in the first item set, and cut out the first item set that is lower than the minimum support. Then removed the second item set that is lower than the minimum support, until no items set can be found.
The first step in creating a set of association rules is to determine the optimal threshold for support and confidence. If we set these values too low, the algorithm will take longer to execute, and we will get many rules (most of which will be useless). We can experiment with different support and confidence values, and graphically see how many rules each combination generates.
The confidence of the rules means the number of transactions with all products in the itemset divided by the number of transactions with the left side of the rule (itemset that is the antecedent). To obtain any results to analysis the confidence had to be lowered, from the default value equal to 80%, to 60%. We will get 16 rules.
rules.trans<-apriori(trans_2, parameter=list(supp=0.1, conf=0.57))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.57 0.1 1 none FALSE TRUE 5 0.1 1
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 983
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[15 item(s), 9835 transaction(s)] done [0.00s].
## sorting and recoding items ... [12 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [16 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
rules.by.conf<-sort(rules.trans, by="confidence", decreasing=TRUE)
inspect(sort(rules.by.conf))
## lhs rhs support confidence coverage lift count
## [1] {ready food} => {dairy products} 0.2120996 0.5854617 0.3622776 1.312518 2086
## [2] {agricultural products,
## ready food} => {dairy products} 0.1279105 0.6818428 0.1875953 1.528590 1258
## [3] {dairy products,
## ready food} => {agricultural products} 0.1279105 0.6030681 0.2120996 1.556739 1258
## [4] {agricultural products,
## dairy products} => {ready food} 0.1279105 0.5829472 0.2194204 1.609117 1258
## [5] {agricultural products,
## products} => {dairy products} 0.1257753 0.6439355 0.1953228 1.443607 1237
## [6] {agricultural products,
## dairy products} => {products} 0.1257753 0.5732159 0.2194204 1.301681 1237
## [7] {vegetables} => {dairy products} 0.1205897 0.6113402 0.1972547 1.370534 1186
## [8] {products,
## ready food} => {dairy products} 0.1191662 0.6558478 0.1816980 1.470313 1172
## [9] {bathroom and kitchen products,
## products} => {dairy products} 0.1121505 0.6390498 0.1754957 1.432654 1103
## [10] {bathroom and kitchen products,
## dairy products} => {products} 0.1121505 0.5939688 0.1888155 1.348807 1103
## [11] {agricultural products,
## bathroom and kitchen products} => {dairy products} 0.1093035 0.6765261 0.1615658 1.516671 1075
## [12] {bathroom and kitchen products,
## dairy products} => {agricultural products} 0.1093035 0.5788907 0.1888155 1.494328 1075
## [13] {bathroom and kitchen products,
## ready food} => {dairy products} 0.1084901 0.6839744 0.1586172 1.533369 1067
## [14] {bathroom and kitchen products,
## dairy products} => {ready food} 0.1084901 0.5745827 0.1888155 1.586029 1067
## [15] {products,
## ready food} => {agricultural products} 0.1067616 0.5875769 0.1816980 1.516750 1050
## [16] {fruits} => {dairy products} 0.1026945 0.6066066 0.1692933 1.359922 1010
The plot shows that the rules contain antecedent itemsets with one or two items. All rules have quite similar support around 0.12, excluding the first one which equal to 0.21. Next, we can see that confidence below 0.7. Moreover, the lift is also an useful measure which equals to the confidence divided by expected confidence of the rule. In other words, it means that items appear more frequently in transactions than we would expect them to appear independently. Therefore, dairy products have appeared at least five times. There is also a rule that says that if people buy agricultural products and ready food, they will buy dairy products.
plot(rules.trans, measure=c("confidence","lift"), shading="support" )
plot(rules.by.conf, method="graph", control=list(layout=igraph::in_circle()))
It can also be seen on the graph that the confidence and lift measures for the rules increase collectively but the rules with the highest values of those two measures are the ones that have the smallest support.
is.significant(rules.trans, trans_2)
## [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [16] TRUE
is.maximal(rules.trans)
## [1] TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [13] TRUE TRUE TRUE TRUE
is.redundant(rules.trans)
## [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [13] FALSE FALSE FALSE FALSE
trans.closed<-apriori(trans_2, parameter=list(target="closed frequent itemsets", support=0.025))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## NA 0.1 1 none FALSE TRUE 5 0.025 1
## maxlen target ext
## 10 closed frequent itemsets TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 245
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[15 item(s), 9835 transaction(s)] done [0.00s].
## sorting and recoding items ... [13 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 6 done [0.00s].
## filtering closed item sets ... done [0.00s].
## sorting transactions ... done [0.00s].
## writing ... [407 set(s)] done [0.00s].
## creating S4 object ... done [0.00s].
rules.closed<-ruleInduction(trans.closed, trans_2, control=list(verbose=TRUE))
## ruleInduction: using method ptree
## preprocessing done [0s].
## searching done [0.014s].
## postprocessing done [0s].
rules.closed
## set of 12 rules
inspect(sort(rules.closed))
## lhs rhs support confidence lift itemset
## [1] {agricultural products,
## bathroom and kitchen products,
## ready food,
## seasonings} => {dairy products} 0.03975597 0.8162839 1.829987 402
## [2] {agricultural products,
## daily necessities,
## ready food} => {dairy products} 0.03457041 0.8037825 1.801961 262
## [3] {agricultural products,
## bathroom and kitchen products,
## ready food,
## vegetables} => {dairy products} 0.03223183 0.8149100 1.826907 392
## [4] {agricultural products,
## ready food,
## seasonings,
## vegetables} => {dairy products} 0.02857143 0.8264706 1.852824 388
## [5] {agricultural products,
## daily necessities,
## seasonings} => {dairy products} 0.02765633 0.8071217 1.809446 254
## [6] {products,
## ready food,
## seasonings,
## vegetables} => {dairy products} 0.02755465 0.8041543 1.802794 389
## [7] {agricultural products,
## bathroom and kitchen products,
## products,
## ready food,
## seasonings} => {dairy products} 0.02755465 0.8312883 1.863625 407
## [8] {daily necessities,
## ready food,
## seasonings} => {dairy products} 0.02714794 0.8115502 1.819374 253
## [9] {bathroom and kitchen products,
## fruits,
## products,
## ready food} => {dairy products} 0.02602949 0.8152866 1.827751 384
## [10] {agricultural products,
## alcoholic,
## ready food,
## seasonings} => {dairy products} 0.02572445 0.8161290 1.829640 396
## [11] {agricultural products,
## fruits,
## ready food,
## seasonings} => {dairy products} 0.02562278 0.8181818 1.834242 382
## [12] {bathroom and kitchen products,
## ready food,
## seasonings,
## vegetables} => {dairy products} 0.02562278 0.8372093 1.876898 386
##3.4 Similarity and Dissimilarity
There is also a possibility to measure dissimilarity of products using Jaccard index. It is based on the probability calcus and computed with a formula - (p(A∩B)-p(A∪B))/p(A∪B).
View product dissimilarity which is the most frequent products.
df<-trans_2[,itemFrequency(trans_2)>0.2]
d.jac.i<-dissimilarity(df, which="items", method = "dice")
round(d.jac.i,2)
## agricultural products alcoholic
## alcoholic 0.66
## bathroom and kitchen products 0.55 0.66
## dairy products 0.47 0.64
## other 0.73 0.76
## products 0.53 0.64
## ready food 0.50 0.64
## seasonings 0.56 0.70
## bathroom and kitchen products dairy products
## alcoholic
## bathroom and kitchen products
## dairy products 0.52
## other 0.72 0.69
## products 0.55 0.50
## ready food 0.54 0.48
## seasonings 0.61 0.56
## other products ready food
## alcoholic
## bathroom and kitchen products
## dairy products
## other
## products 0.71
## ready food 0.72 0.55
## seasonings 0.75 0.58 0.57
Additionally the dendrogram for those categories is presented below.
plot(hclust(d.jac.i, method = "ward.D2"), main = "Items Dendrogram")
Pick four types of products which are especially connected with the healthy or unhealthy eating: vegetables, poultry, ready food and agricultural products.
rule_veg<-apriori(data=trans_2, parameter=list(supp=0.05,conf = 0.2),
appearance=list(default="lhs", rhs="vegetables"), control=list(verbose=F))
rule_veg_bylift<-sort(rule_veg, by="lift", decreasing=TRUE)
inspect(sort(rule_veg_bylift))
## lhs rhs support confidence coverage lift count
## [1] {dairy products} => {vegetables} 0.12058973 0.2703442 0.4460600 1.370534 1186
## [2] {agricultural products} => {vegetables} 0.10808338 0.2790026 0.3873920 1.414428 1063
## [3] {ready food} => {vegetables} 0.10726995 0.2960988 0.3622776 1.501099 1055
## [4] {products} => {vegetables} 0.10482969 0.2380513 0.4403660 1.206822 1031
## [5] {bathroom and kitchen products} => {vegetables} 0.08937468 0.2683969 0.3329944 1.360662 879
## [6] {seasonings} => {vegetables} 0.07849517 0.2768017 0.2835791 1.403271 772
## [7] {agricultural products,
## dairy products} => {vegetables} 0.07676665 0.3498610 0.2194204 1.773651 755
## [8] {dairy products,
## ready food} => {vegetables} 0.07493645 0.3533078 0.2120996 1.791125 737
## [9] {dairy products,
## products} => {vegetables} 0.07137773 0.3195266 0.2233859 1.619868 702
## [10] {agricultural products,
## ready food} => {vegetables} 0.06914082 0.3685637 0.1875953 1.868466 680
## [11] {agricultural products,
## products} => {vegetables} 0.06731063 0.3446122 0.1953228 1.747042 662
## [12] {bathroom and kitchen products,
## dairy products} => {vegetables} 0.06487036 0.3435649 0.1888155 1.741732 638
## [13] {products,
## ready food} => {vegetables} 0.06415862 0.3531058 0.1816980 1.790101 631
## [14] {alcoholic} => {vegetables} 0.06110829 0.2411717 0.2533808 1.222641 601
## [15] {bathroom and kitchen products,
## ready food} => {vegetables} 0.05734621 0.3615385 0.1586172 1.832851 564
## [16] {agricultural products,
## bathroom and kitchen products} => {vegetables} 0.05653279 0.3499056 0.1615658 1.773877 556
## [17] {dairy products,
## seasonings} => {vegetables} 0.05571937 0.3468354 0.1606507 1.758313 548
## [18] {bathroom and kitchen products,
## products} => {vegetables} 0.05510930 0.3140209 0.1754957 1.591956 542
## [19] {fruits} => {vegetables} 0.05419420 0.3201201 0.1692933 1.622877 533
## [20] {agricultural products,
## dairy products,
## ready food} => {vegetables} 0.05287239 0.4133545 0.1279105 2.095537 520
## [21] {ready food,
## seasonings} => {vegetables} 0.05043213 0.3623083 0.1391967 1.836753 496
Vegetables are mostly purchased with dairy products, the supprot value is up to 0.12. When customers buy vegetables they also buy agricultural products and ready food frequently.
plot(rule_veg, method="paracoord", control=list(reorder=TRUE))
plot(rule_veg, method="graph", control=list(layout=igraph::in_circle()))
is.significant(rule_veg, trans_2)
## [1] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [16] TRUE TRUE TRUE TRUE TRUE TRUE
is.redundant(rule_veg)
## [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
rules_meat<-apriori(data=trans_2, parameter=list(supp=0.05,conf = 0.1),
appearance=list(default="lhs", rhs="poultry"), control=list(verbose=F))
rules_meat.bylift<-sort(rules_meat, by="lift", decreasing=TRUE)
inspect(sort(rules_meat.bylift))
## lhs rhs support confidence coverage lift count
## [1] {} => {poultry} 0.16695475 0.1669548 1.0000000 1.000000 1642
## [2] {dairy products} => {poultry} 0.09313676 0.2087987 0.4460600 1.250631 916
## [3] {products} => {poultry} 0.08662938 0.1967213 0.4403660 1.178291 852
## [4] {ready food} => {poultry} 0.07910524 0.2183553 0.3622776 1.307871 778
## [5] {agricultural products} => {poultry} 0.07341129 0.1895013 0.3873920 1.135046 722
## [6] {bathroom and kitchen products} => {poultry} 0.07300458 0.2192366 0.3329944 1.313150 718
## [7] {seasonings} => {poultry} 0.06151500 0.2169236 0.2835791 1.299296 605
## [8] {dairy products,
## products} => {poultry} 0.05571937 0.2494310 0.2233859 1.494004 548
## [9] {dairy products,
## ready food} => {poultry} 0.05409253 0.2550336 0.2120996 1.527561 532
## [10] {alcoholic} => {poultry} 0.05358414 0.2114767 0.2533808 1.266671 527
## [11] {bathroom and kitchen products,
## dairy products} => {poultry} 0.05144891 0.2724825 0.1888155 1.632074 506
## [12] {agricultural products,
## dairy products} => {poultry} 0.05033045 0.2293791 0.2194204 1.373900 495
Poultry are mostly purchased with dairy products, products and ready food.
plot(rules_meat, method="paracoord", control=list(reorder=TRUE))
# Different style of graph
plot(rules_meat, method="graph")
is.significant(rules_meat, trans_2)
## [1] FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
inspect(rules_meat[is.significant(rules_meat, trans_2)==F])
## lhs rhs support confidence coverage lift count
## [1] {} => {poultry} 0.1669548 0.1669548 1 1 1642
is.redundant(rules_meat)
## [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
rules_rf<-apriori(data=trans_2, parameter=list(supp=0.05,conf = 0.2),
appearance=list(default="lhs", rhs="ready food"), control=list(verbose=F))
rules_rf_bylift<-sort(rules_rf, by="lift", decreasing=TRUE)
inspect(sort(rules_rf_bylift))
## lhs rhs support confidence coverage lift count
## [1] {} => {ready food} 0.36227758 0.3622776 1.00000000 1.000000 3563
## [2] {dairy products} => {ready food} 0.21209964 0.4754958 0.44605999 1.312518 2086
## [3] {agricultural products} => {ready food} 0.18759532 0.4842520 0.38739197 1.336688 1845
## [4] {products} => {ready food} 0.18169802 0.4126068 0.44036604 1.138924 1787
## [5] {bathroom and kitchen products} => {ready food} 0.15861718 0.4763359 0.33299441 1.314837 1560
## [6] {seasonings} => {ready food} 0.13919675 0.4908569 0.28357905 1.354919 1369
## [7] {agricultural products,
## dairy products} => {ready food} 0.12791052 0.5829472 0.21942044 1.609117 1258
## [8] {dairy products,
## products} => {ready food} 0.11916624 0.5334547 0.22338587 1.472503 1172
## [9] {alcoholic} => {ready food} 0.11062532 0.4365971 0.25338078 1.205145 1088
## [10] {bathroom and kitchen products,
## dairy products} => {ready food} 0.10849009 0.5745827 0.18881546 1.586029 1067
## [11] {vegetables} => {ready food} 0.10726995 0.5438144 0.19725470 1.501099 1055
## [12] {agricultural products,
## products} => {ready food} 0.10676157 0.5465903 0.19532283 1.508761 1050
## [13] {dairy products,
## seasonings} => {ready food} 0.09689883 0.6031646 0.16065074 1.664924 953
## [14] {fruits} => {ready food} 0.09496695 0.5609610 0.16929334 1.548429 934
## [15] {agricultural products,
## bathroom and kitchen products} => {ready food} 0.09344179 0.5783512 0.16156584 1.596431 919
## [16] {bathroom and kitchen products,
## products} => {ready food} 0.09323843 0.5312862 0.17549568 1.466517 917
## [17] {agricultural products,
## seasonings} => {ready food} 0.08652771 0.5901526 0.14661922 1.629007 851
## [18] {products,
## seasonings} => {ready food} 0.08408744 0.5520694 0.15231317 1.523885 827
## [19] {other} => {ready food} 0.08093543 0.3849130 0.21026945 1.062481 796
## [20] {poultry} => {ready food} 0.07910524 0.4738124 0.16695475 1.307871 778
## [21] {agricultural products,
## dairy products,
## products} => {ready food} 0.07808846 0.6208569 0.12577529 1.713760 768
## [22] {dairy products,
## vegetables} => {ready food} 0.07493645 0.6214165 0.12058973 1.715305 737
## [23] {bathroom and kitchen products,
## seasonings} => {ready food} 0.07300458 0.6013400 0.12140315 1.659888 718
## [24] {alcoholic,
## dairy products} => {ready food} 0.07127605 0.5635048 0.12648704 1.555450 701
## [25] {agricultural products,
## bathroom and kitchen products,
## dairy products} => {ready food} 0.07005592 0.6409302 0.10930351 1.769169 689
## [26] {agricultural products,
## vegetables} => {ready food} 0.06914082 0.6396990 0.10808338 1.765770 680
## [27] {daily necessities} => {ready food} 0.06914082 0.5287714 0.13075750 1.459575 680
## [28] {bathroom and kitchen products,
## dairy products,
## products} => {ready food} 0.06832740 0.6092475 0.11215048 1.681715 672
## [29] {dairy products,
## fruits} => {ready food} 0.06609049 0.6435644 0.10269446 1.776440 650
## [30] {agricultural products,
## dairy products,
## seasonings} => {ready food} 0.06598882 0.6656410 0.09913574 1.837378 649
## [31] {alcoholic,
## products} => {ready food} 0.06527707 0.5297030 0.12323335 1.462147 642
## [32] {agricultural products,
## alcoholic} => {ready food} 0.06436197 0.5966070 0.10788002 1.646823 633
## [33] {products,
## vegetables} => {ready food} 0.06415862 0.6120272 0.10482969 1.689387 631
## [34] {dairy products,
## products,
## seasonings} => {ready food} 0.06243010 0.6395833 0.09761057 1.765451 614
## [35] {agricultural products,
## fruits} => {ready food} 0.05958312 0.6562150 0.09079817 1.811360 586
## [36] {agricultural products,
## bathroom and kitchen products,
## products} => {ready food} 0.05907473 0.6274298 0.09415353 1.731903 581
## [37] {dairy products,
## other} => {ready food} 0.05744789 0.5655656 0.10157600 1.561139 565
## [38] {bathroom and kitchen products,
## vegetables} => {ready food} 0.05734621 0.6416382 0.08937468 1.771123 564
## [39] {alcoholic,
## bathroom and kitchen products} => {ready food} 0.05673615 0.5711361 0.09933910 1.576515 558
## [40] {agricultural products,
## products,
## seasonings} => {ready food} 0.05653279 0.6398159 0.08835791 1.766093 556
## [41] {bathroom and kitchen products,
## dairy products,
## seasonings} => {ready food} 0.05561769 0.6630303 0.08388409 1.830172 547
## [42] {fruits,
## products} => {ready food} 0.05521098 0.6198630 0.08906965 1.711017 543
## [43] {dairy products,
## poultry} => {ready food} 0.05409253 0.5807860 0.09313676 1.603152 532
## [44] {agricultural products,
## dairy products,
## vegetables} => {ready food} 0.05287239 0.6887417 0.07676665 1.901144 520
## [45] {seasonings,
## vegetables} => {ready food} 0.05043213 0.6424870 0.07849517 1.773466 496
Ready food are mostly purchased with dairy products as well, the following are agricultural products and products.
plot(rules_rf, method="paracoord", control=list(reorder=TRUE))
plot(rules_sup1_conf50, method=“graph”)
plot(rules_rf, method="graph")
is.significant(rules_rf, trans_2)
## [1] FALSE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [13] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [25] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [37] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
inspect(rules_rf[is.significant(rules_rf, trans_2)==F])
## lhs rhs support confidence coverage lift count
## [1] {} => {ready food} 0.36227758 0.3622776 1.0000000 1.000000 3563
## [2] {other} => {ready food} 0.08093543 0.3849130 0.2102694 1.062481 796
is.redundant(rules_rf)
## [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
rules_ap<-apriori(data=trans_2, parameter=list(supp=0.05,conf = 0.2),
appearance=list(default="lhs", rhs="agricultural products"), control=list(verbose=F))
rules_ap_bylift<-sort(rules_ap, by="lift", decreasing=TRUE)
inspect(sort(rules_ap_bylift))
## lhs rhs support confidence coverage lift count
## [1] {} => {agricultural products} 0.38739197 0.3873920 1.00000000 1.0000000 3810
## [2] {dairy products} => {agricultural products} 0.21942044 0.4919079 0.44605999 1.2697938 2158
## [3] {products} => {agricultural products} 0.19532283 0.4435465 0.44036604 1.1449554 1921
## [4] {ready food} => {agricultural products} 0.18759532 0.5178221 0.36227758 1.3366877 1845
## [5] {bathroom and kitchen products} => {agricultural products} 0.16156584 0.4851908 0.33299441 1.2524546 1589
## [6] {seasonings} => {agricultural products} 0.14661922 0.5170312 0.28357905 1.3346461 1442
## [7] {dairy products,
## ready food} => {agricultural products} 0.12791052 0.6030681 0.21209964 1.5567387 1258
## [8] {dairy products,
## products} => {agricultural products} 0.12577529 0.5630405 0.22338587 1.4534130 1237
## [9] {bathroom and kitchen products,
## dairy products} => {agricultural products} 0.10930351 0.5788907 0.18881546 1.4943281 1075
## [10] {vegetables} => {agricultural products} 0.10808338 0.5479381 0.19725470 1.4144283 1063
## [11] {alcoholic} => {agricultural products} 0.10788002 0.4257624 0.25338078 1.0990482 1061
## [12] {products,
## ready food} => {agricultural products} 0.10676157 0.5875769 0.18169802 1.5167505 1050
## [13] {dairy products,
## seasonings} => {agricultural products} 0.09913574 0.6170886 0.16065074 1.5929308 975
## [14] {bathroom and kitchen products,
## products} => {agricultural products} 0.09415353 0.5365006 0.17549568 1.3849037 926
## [15] {bathroom and kitchen products,
## ready food} => {agricultural products} 0.09344179 0.5891026 0.15861718 1.5206886 919
## [16] {fruits} => {agricultural products} 0.09079817 0.5363363 0.16929334 1.3844798 893
## [17] {products,
## seasonings} => {agricultural products} 0.08835791 0.5801068 0.15231317 1.4974673 869
## [18] {ready food,
## seasonings} => {agricultural products} 0.08652771 0.6216216 0.13919675 1.6046322 851
## [19] {other} => {agricultural products} 0.08134215 0.3868472 0.21026945 0.9985937 800
## [20] {dairy products,
## products,
## ready food} => {agricultural products} 0.07808846 0.6552901 0.11916624 1.6915428 768
## [21] {dairy products,
## vegetables} => {agricultural products} 0.07676665 0.6365936 0.12058973 1.6432803 755
## [22] {bathroom and kitchen products,
## seasonings} => {agricultural products} 0.07351296 0.6055276 0.12140315 1.5630877 723
## [23] {poultry} => {agricultural products} 0.07341129 0.4397077 0.16695475 1.1350459 722
## [24] {bathroom and kitchen products,
## dairy products,
## ready food} => {agricultural products} 0.07005592 0.6457357 0.10849009 1.6668794 689
## [25] {daily necessities} => {agricultural products} 0.06964921 0.5326594 0.13075750 1.3749883 685
## [26] {alcoholic,
## dairy products} => {agricultural products} 0.06944586 0.5490354 0.12648704 1.4172606 683
## [27] {ready food,
## vegetables} => {agricultural products} 0.06914082 0.6445498 0.10726995 1.6638181 680
## [28] {bathroom and kitchen products,
## dairy products,
## products} => {agricultural products} 0.06842908 0.6101541 0.11215048 1.5750304 673
## [29] {products,
## vegetables} => {agricultural products} 0.06731063 0.6420951 0.10482969 1.6574816 662
## [30] {dairy products,
## ready food,
## seasonings} => {agricultural products} 0.06598882 0.6810073 0.09689883 1.7579284 649
## [31] {alcoholic,
## ready food} => {agricultural products} 0.06436197 0.5818015 0.11062532 1.5018419 633
## [32] {dairy products,
## products,
## seasonings} => {agricultural products} 0.06415862 0.6572917 0.09761057 1.6967096 631
## [33] {dairy products,
## fruits} => {agricultural products} 0.06253177 0.6089109 0.10269446 1.5718212 615
## [34] {alcoholic,
## products} => {agricultural products} 0.06100661 0.4950495 0.12323335 1.2779034 600
## [35] {fruits,
## ready food} => {agricultural products} 0.05958312 0.6274090 0.09496695 1.6195715 586
## [36] {bathroom and kitchen products,
## products,
## ready food} => {agricultural products} 0.05907473 0.6335878 0.09323843 1.6355212 581
## [37] {products,
## ready food,
## seasonings} => {agricultural products} 0.05653279 0.6723096 0.08408744 1.7354762 556
## [38] {bathroom and kitchen products,
## vegetables} => {agricultural products} 0.05653279 0.6325370 0.08937468 1.6328087 556
## [39] {dairy products,
## other} => {agricultural products} 0.05612608 0.5525526 0.10157600 1.4263397 552
## [40] {bathroom and kitchen products,
## dairy products,
## seasonings} => {agricultural products} 0.05541434 0.6606061 0.08388409 1.7052653 545
## [41] {alcoholic,
## bathroom and kitchen products} => {agricultural products} 0.05470259 0.5506653 0.09933910 1.4214680 538
## [42] {fruits,
## products} => {agricultural products} 0.05449924 0.6118721 0.08906965 1.5794652 536
## [43] {dairy products,
## ready food,
## vegetables} => {agricultural products} 0.05287239 0.7055631 0.07493645 1.8213158 520
## [44] {daily necessities,
## dairy products} => {agricultural products} 0.05134723 0.6121212 0.08388409 1.5801082 505
## [45] {dairy products,
## poultry} => {agricultural products} 0.05033045 0.5403930 0.09313676 1.3949515 495
Agricultural products are purchased with vegetables, ready food and dairy products with the support around 0.05.
plot(rules_ap, method="paracoord", control=list(reorder=TRUE))
# Grouped matrix plot
plot(rules_ap, method="grouped")
is.significant(rules_ap, trans_2)
## [1] FALSE TRUE TRUE TRUE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [13] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [25] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [37] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
inspect(rules_ap[is.significant(rules_ap, trans_2)==F])
## lhs rhs support confidence coverage
## [1] {} => {agricultural products} 0.38739197 0.3873920 1.0000000
## [2] {other} => {agricultural products} 0.08134215 0.3868472 0.2102694
## lift count
## [1] 1.0000000 3810
## [2] 0.9985937 800
is.redundant(rules_ap)
## [1] FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [13] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [25] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [37] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##5 Summary
According to the analysis indecates the patterns of products purchasing. The most probable baskets of products contain dairy products, ready food, agricultural products and ready food, agricultural products and dairy products. To conclude, we can observe a strong pattern of dairy products and ready food. Among other products, poultry is purchased along with dairy products, agricultural products with dairy products.
Other articles about association rule https://rpubs.com/ghorai77/526368