For this homework I will be running a Market Basket Analysis on a data set called groceries that deals with what groceries were bought within a store.
library(lattice)
library(nutshell)
## Loading required package: nutshell.bbdb
## Loading required package: nutshell.audioscrobbler
library(arules)
## Loading required package: Matrix
##
## Attaching package: 'arules'
## The following objects are masked from 'package:base':
##
## abbreviate, write
data("Groceries")
dim(Groceries)
## [1] 9835 169
itemFrequencyPlot(Groceries,topN=10, type = "absolute", col="darkmagenta")
This chart shows us the top 10 products that have been bought. Now I will take a look at the top 3 products (whole milk, rolls/buns and other vegatables) as well as tropical fruit because tropical fruit is extremely delicious.
Wmilkrules<-apriori(data=Groceries, parameter=list(supp=0.001,conf = 0.5), appearance = list(default="lhs",rhs="whole milk"),
control = list(verbose=F))
inspect(sort(Wmilkrules, by = "lift")[1:7])
## lhs rhs support confidence lift count
## [1] {rice,
## sugar} => {whole milk} 0.001220132 1 3.913649 12
## [2] {canned fish,
## hygiene articles} => {whole milk} 0.001118454 1 3.913649 11
## [3] {root vegetables,
## butter,
## rice} => {whole milk} 0.001016777 1 3.913649 10
## [4] {root vegetables,
## whipped/sour cream,
## flour} => {whole milk} 0.001728521 1 3.913649 17
## [5] {butter,
## soft cheese,
## domestic eggs} => {whole milk} 0.001016777 1 3.913649 10
## [6] {pip fruit,
## butter,
## hygiene articles} => {whole milk} 0.001016777 1 3.913649 10
## [7] {root vegetables,
## whipped/sour cream,
## hygiene articles} => {whole milk} 0.001016777 1 3.913649 10
With this result we can see that rule 4 would be the best and has the most support. It also has the most products which could be because the person who was buying them was possibly preparing a meal that include milk.
OVeggiesRules <- apriori(data=Groceries, parameter=list(supp=0.001,conf = 0.5), appearance = list(default="lhs",rhs="other vegetables"),
control = list(verbose=F))
inspect(sort(OVeggiesRules, by = "lift")[1:7])
## lhs rhs support confidence lift count
## [1] {citrus fruit,
## root vegetables,
## soft cheese} => {other vegetables} 0.001016777 1 5.168156 10
## [2] {pip fruit,
## whipped/sour cream,
## brown bread} => {other vegetables} 0.001118454 1 5.168156 11
## [3] {tropical fruit,
## grapes,
## whole milk,
## yogurt} => {other vegetables} 0.001016777 1 5.168156 10
## [4] {ham,
## tropical fruit,
## pip fruit,
## yogurt} => {other vegetables} 0.001016777 1 5.168156 10
## [5] {ham,
## tropical fruit,
## pip fruit,
## whole milk} => {other vegetables} 0.001118454 1 5.168156 11
## [6] {tropical fruit,
## butter,
## whipped/sour cream,
## fruit/vegetable juice} => {other vegetables} 0.001016777 1 5.168156 10
## [7] {whole milk,
## rolls/buns,
## soda,
## newspapers} => {other vegetables} 0.001016777 1 5.168156 10
For the result with other vegatables, either rule number 2 or rule number 5 would be the best rule.
RBrules <-apriori(data=Groceries, parameter=list(supp=0.001,conf = 0.5), appearance = list(default="lhs",rhs="rolls/buns"),
control = list(verbose=F))
inspect(sort(RBrules, by = "lift")[1:7])
## lhs rhs support confidence lift count
## [1] {spread cheese,
## newspapers} => {rolls/buns} 0.001220132 0.7500000 4.077529 12
## [2] {beef,
## tropical fruit,
## whole milk,
## yogurt} => {rolls/buns} 0.001321810 0.6842105 3.719851 13
## [3] {citrus fruit,
## whole milk,
## whipped/sour cream,
## pastry} => {rolls/buns} 0.001016777 0.6666667 3.624470 10
## [4] {soda,
## chocolate,
## candy} => {rolls/buns} 0.001220132 0.6315789 3.433709 12
## [5] {other vegetables,
## chocolate,
## napkins} => {rolls/buns} 0.001016777 0.6250000 3.397941 10
## [6] {frankfurter,
## beef,
## root vegetables} => {rolls/buns} 0.001016777 0.6250000 3.397941 10
## [7] {yogurt,
## bottled water,
## soda,
## newspapers} => {rolls/buns} 0.001016777 0.6250000 3.397941 10
When buying rolls/buns, the best rule would be rule number 1 as it gives us the highest lift and confidence while still maintainng excellent support.
TFrules <-apriori(data=Groceries, parameter=list(supp=0.001,conf = 0.5), appearance = list(default="lhs",rhs="tropical fruit"),
control = list(verbose=F))
inspect(sort(TFrules, by = "lift")[1:7])
## lhs rhs support confidence lift count
## [1] {citrus fruit,
## grapes,
## fruit/vegetable juice} => {tropical fruit} 0.001118454 0.8461538 8.063879 11
## [2] {ham,
## pip fruit,
## other vegetables,
## yogurt} => {tropical fruit} 0.001016777 0.8333333 7.941699 10
## [3] {grapes,
## other vegetables,
## fruit/vegetable juice} => {tropical fruit} 0.001118454 0.7857143 7.487888 11
## [4] {root vegetables,
## other vegetables,
## whole milk,
## yogurt,
## bottled water} => {tropical fruit} 0.001118454 0.7857143 7.487888 11
## [5] {other vegetables,
## whole milk,
## butter,
## yogurt,
## domestic eggs} => {tropical fruit} 0.001016777 0.7692308 7.330799 10
## [6] {ham,
## pip fruit,
## other vegetables,
## whole milk} => {tropical fruit} 0.001118454 0.7333333 6.988695 11
## [7] {root vegetables,
## whole milk,
## yogurt,
## oil} => {tropical fruit} 0.001118454 0.7333333 6.988695 11
Rule number one would be the best when it comes to tropical fruit. This one really makes sense since the other products also have to do with fruit, they would be buying fruit with fruit.