With this data set, we will predict what someone is most likely to buy based on the other items they have bought. We will be using market basket analysis to accomplish this. The data is already in transaction form so we do not need to convert it.
library(arules)
## Loading required package: Matrix
##
## Attaching package: 'arules'
## The following objects are masked from 'package:base':
##
## abbreviate, write
library(datasets)
data(Groceries)
library(arulesViz)
## Loading required package: grid
The table below shows the frequency of each grocery item. However, it is cumbersom to determine anything from this table so we will put it in a table that shows the items with the highest frequency.
## frankfurter sausage
## 0.0589730554 0.0939501779
## liver loaf ham
## 0.0050838841 0.0260294865
## meat finished products
## 0.0258261312 0.0065073716
## organic sausage chicken
## 0.0022369090 0.0429079817
## turkey pork
## 0.0081342145 0.0576512456
## beef hamburger meat
## 0.0524656838 0.0332486019
## fish citrus fruit
## 0.0029486528 0.0827656329
## tropical fruit pip fruit
## 0.1049313676 0.0756481952
## grapes berries
## 0.0223690900 0.0332486019
## nuts/prunes root vegetables
## 0.0033553635 0.1089984748
## onions herbs
## 0.0310116929 0.0162684291
## other vegetables packaged fruit/vegetables
## 0.1934926284 0.0130147433
## whole milk butter
## 0.2555160142 0.0554143366
## curd dessert
## 0.0532791052 0.0371123538
## butter milk yogurt
## 0.0279613625 0.1395017794
## whipped/sour cream beverages
## 0.0716827656 0.0260294865
## UHT-milk condensed milk
## 0.0334519573 0.0102694459
## cream soft cheese
## 0.0013218099 0.0170818505
## sliced cheese hard cheese
## 0.0245043213 0.0245043213
## cream cheese processed cheese
## 0.0396542959 0.0165734621
## spread cheese curd cheese
## 0.0111845450 0.0050838841
## specialty cheese mayonnaise
## 0.0085409253 0.0091509914
## salad dressing tidbits
## 0.0008134215 0.0023385867
## frozen vegetables frozen fruits
## 0.0480935435 0.0012201322
## frozen meals frozen fish
## 0.0283680732 0.0116929334
## frozen chicken ice cream
## 0.0006100661 0.0250127097
## frozen dessert frozen potato products
## 0.0107778343 0.0084392476
## domestic eggs rolls/buns
## 0.0634468734 0.1839349263
## white bread brown bread
## 0.0420945602 0.0648703610
## pastry roll products
## 0.0889679715 0.0102694459
## semi-finished bread zwieback
## 0.0176919166 0.0069140824
## potato products flour
## 0.0028469751 0.0173868836
## salt rice
## 0.0107778343 0.0076258261
## pasta vinegar
## 0.0150482969 0.0065073716
## oil margarine
## 0.0280630402 0.0585663447
## specialty fat sugar
## 0.0036603965 0.0338586680
## artif. sweetener honey
## 0.0032536858 0.0015251652
## mustard ketchup
## 0.0119979664 0.0042704626
## spices soups
## 0.0051855618 0.0068124047
## ready soups Instant food products
## 0.0018301983 0.0080325369
## sauces cereals
## 0.0054905948 0.0056939502
## organic products baking powder
## 0.0016268429 0.0176919166
## preservation products pudding powder
## 0.0002033554 0.0023385867
## canned vegetables canned fruit
## 0.0107778343 0.0032536858
## pickled vegetables specialty vegetables
## 0.0178952720 0.0017285206
## jam sweet spreads
## 0.0053889171 0.0090493137
## meat spreads canned fish
## 0.0042704626 0.0150482969
## dog food cat food
## 0.0085409253 0.0232841891
## pet care baby food
## 0.0094560244 0.0001016777
## coffee instant coffee
## 0.0580579563 0.0074224708
## tea cocoa drinks
## 0.0038637519 0.0022369090
## bottled water soda
## 0.1105236401 0.1743772242
## misc. beverages fruit/vegetable juice
## 0.0283680732 0.0722928317
## syrup bottled beer
## 0.0032536858 0.0805287239
## canned beer brandy
## 0.0776817489 0.0041687850
## whisky liquor
## 0.0008134215 0.0110828673
## rum liqueur
## 0.0044738180 0.0009150991
## liquor (appetizer) white wine
## 0.0079308592 0.0190137265
## red/blush wine prosecco
## 0.0192170819 0.0020335536
## sparkling wine salty snack
## 0.0055922725 0.0378240976
## popcorn nut snack
## 0.0072191154 0.0031520081
## snack products long life bakery product
## 0.0030503305 0.0374173869
## waffles cake bar
## 0.0384341637 0.0132180986
## chewing gum chocolate
## 0.0210472801 0.0496187087
## cooking chocolate specialty chocolate
## 0.0025419420 0.0304016268
## specialty bar chocolate marshmallow
## 0.0273512964 0.0090493137
## candy seasonal products
## 0.0298932384 0.0142348754
## detergent softener
## 0.0192170819 0.0054905948
## decalcifier dish cleaner
## 0.0015251652 0.0104728012
## abrasive cleaner cleaner
## 0.0035587189 0.0050838841
## toilet cleaner bathroom cleaner
## 0.0007117438 0.0027452974
## hair spray dental care
## 0.0011184545 0.0057956279
## male cosmetics make up remover
## 0.0045754957 0.0008134215
## skin care female sanitary products
## 0.0035587189 0.0061006609
## baby cosmetics soap
## 0.0006100661 0.0026436197
## rubbing alcohol hygiene articles
## 0.0010167768 0.0329435689
## napkins dishes
## 0.0523640061 0.0175902389
## cookware kitchen utensil
## 0.0027452974 0.0004067107
## cling film/bags kitchen towels
## 0.0113879004 0.0059989832
## house keeping products candles
## 0.0083375699 0.0089476360
## light bulbs sound storage medium
## 0.0041687850 0.0001016777
## newspapers photo/film
## 0.0798169802 0.0092526690
## pot plants flower soil/fertilizer
## 0.0172852059 0.0019318760
## flower (seeds) shopping bags
## 0.0103711235 0.0985256736
## bags
## 0.0004067107
According to this graph, there are twelve items with a frequency higher than 8%. Some of these items include whole milk, rolls/buns, and other vegetables. Next we will determine the some rules based on these item frequencies.
Below are the rules for what someone is most likely to buy based on what they have bought. However, not all of these rules are equally important because they all have varying levels of support, confidence, lift, and count. For this reason, we will only display rules that have a lift above 2.25 in the second table and then sort the applicable rules by their level of support.
groceryrules <- apriori(Groceries,parameter=list(support=.01,confidence=.5))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.5 0.1 1 none FALSE TRUE 5 0.01 1
## maxlen target ext
## 10 rules FALSE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 98
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[169 item(s), 9835 transaction(s)] done [0.00s].
## sorting and recoding items ... [88 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [15 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
inspect(groceryrules)
## lhs rhs support confidence lift count
## [1] {curd,
## yogurt} => {whole milk} 0.01006609 0.5823529 2.279125 99
## [2] {other vegetables,
## butter} => {whole milk} 0.01148958 0.5736041 2.244885 113
## [3] {other vegetables,
## domestic eggs} => {whole milk} 0.01230300 0.5525114 2.162336 121
## [4] {yogurt,
## whipped/sour cream} => {whole milk} 0.01087951 0.5245098 2.052747 107
## [5] {other vegetables,
## whipped/sour cream} => {whole milk} 0.01464159 0.5070423 1.984385 144
## [6] {pip fruit,
## other vegetables} => {whole milk} 0.01352313 0.5175097 2.025351 133
## [7] {citrus fruit,
## root vegetables} => {other vegetables} 0.01037112 0.5862069 3.029608 102
## [8] {tropical fruit,
## root vegetables} => {other vegetables} 0.01230300 0.5845411 3.020999 121
## [9] {tropical fruit,
## root vegetables} => {whole milk} 0.01199797 0.5700483 2.230969 118
## [10] {tropical fruit,
## yogurt} => {whole milk} 0.01514997 0.5173611 2.024770 149
## [11] {root vegetables,
## yogurt} => {other vegetables} 0.01291307 0.5000000 2.584078 127
## [12] {root vegetables,
## yogurt} => {whole milk} 0.01453991 0.5629921 2.203354 143
## [13] {root vegetables,
## rolls/buns} => {other vegetables} 0.01220132 0.5020921 2.594890 120
## [14] {root vegetables,
## rolls/buns} => {whole milk} 0.01270971 0.5230126 2.046888 125
## [15] {other vegetables,
## yogurt} => {whole milk} 0.02226741 0.5128806 2.007235 219
inspect(sort(subset(groceryrules, subset=lift > 2.25), by="support"))
## lhs rhs support confidence lift count
## [1] {root vegetables,
## yogurt} => {other vegetables} 0.01291307 0.5000000 2.584078 127
## [2] {tropical fruit,
## root vegetables} => {other vegetables} 0.01230300 0.5845411 3.020999 121
## [3] {root vegetables,
## rolls/buns} => {other vegetables} 0.01220132 0.5020921 2.594890 120
## [4] {citrus fruit,
## root vegetables} => {other vegetables} 0.01037112 0.5862069 3.029608 102
## [5] {curd,
## yogurt} => {whole milk} 0.01006609 0.5823529 2.279125 99
We can look at the support, confidence, and lift levels to determine what the most significant rules are. Overall, these rules have small levels of all three. However, I would say that rule 2 would be the most interesting to look at. It has the second highest support level, the second highest confidence level, and the second highest lift. It also has the second highest count, showing that there are 121 cases of this rule occuring. I would look at this one rather than the one with the highest confidence and lift (rule 4) because it has a low level of support and a low count of this rule occuring.