Imagine 10000 receipts sitting on your table. Each receipt represents a transaction with items that were purchased. The receipt is a representation of stuff that went into a customer’s basket - and therefore ‘Market Basket Analysis’. That is exactly what the Groceries Data Set contains: a collection of receipts with each line representing 1 receipt and the items purchased. Each line is called a transaction and each column in a row represents an item. The data set is attached.
Your assignment is to use R to mine the data for association rules. You should report support, confidence and lift and your top 10 rules by lift.
library(plyr)
library(tidyr)
library(readxl)
library(knitr)
library(ggplot2)
library(lubridate)
library(arules)
library(arulesViz)
df <- read.transactions("https://raw.githubusercontent.com/mkivenson/Predictive-Analytics/master/Market%20Basket%20Analysis/GroceryDataSet.csv", header = FALSE, sep = ",")
arules to return top 10 rules by liftrules <- apriori(df, parameter = list(supp = 0.001, conf = 0.8))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.8 0.1 1 none FALSE TRUE 5 0.001 1
## maxlen target ext
## 10 rules FALSE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 9
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[169 item(s), 9835 transaction(s)] done [0.00s].
## sorting and recoding items ... [157 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 6 done [0.01s].
## writing ... [410 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
inspect(rules[0:10])
## lhs rhs support confidence lift count
## [1] {liquor,
## red/blush wine} => {bottled beer} 0.001931876 0.9047619 11.235269 19
## [2] {cereals,
## curd} => {whole milk} 0.001016777 0.9090909 3.557863 10
## [3] {cereals,
## yogurt} => {whole milk} 0.001728521 0.8095238 3.168192 17
## [4] {butter,
## jam} => {whole milk} 0.001016777 0.8333333 3.261374 10
## [5] {bottled beer,
## soups} => {whole milk} 0.001118454 0.9166667 3.587512 11
## [6] {house keeping products,
## napkins} => {whole milk} 0.001321810 0.8125000 3.179840 13
## [7] {house keeping products,
## whipped/sour cream} => {whole milk} 0.001220132 0.9230769 3.612599 12
## [8] {pastry,
## sweet spreads} => {whole milk} 0.001016777 0.9090909 3.557863 10
## [9] {curd,
## turkey} => {other vegetables} 0.001220132 0.8000000 4.134524 12
## [10] {rice,
## sugar} => {whole milk} 0.001220132 1.0000000 3.913649 12