For this optional homework assignment, I figured I would dabble around with association rules.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.2 ✔ readr 2.1.4
## ✔ forcats 1.0.0 ✔ stringr 1.5.0
## ✔ ggplot2 3.4.2 ✔ tibble 3.2.1
## ✔ lubridate 1.9.2 ✔ tidyr 1.3.0
## ✔ purrr 1.0.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(arules)
## Warning: package 'arules' was built under R version 4.3.3
## Loading required package: Matrix
##
## Attaching package: 'Matrix'
##
## The following objects are masked from 'package:tidyr':
##
## expand, pack, unpack
##
##
## Attaching package: 'arules'
##
## The following object is masked from 'package:dplyr':
##
## recode
##
## The following objects are masked from 'package:base':
##
## abbreviate, write
df <- read.csv("C:\\Users\\Al Haque\\OneDrive\\Desktop\\Data 624\\GroceryDataSet.csv",header = FALSE)
head(df)
## V1 V2 V3 V4
## 1 citrus fruit semi-finished bread margarine ready soups
## 2 tropical fruit yogurt coffee
## 3 whole milk
## 4 pip fruit yogurt cream cheese meat spreads
## 5 other vegetables whole milk condensed milk long life bakery product
## 6 whole milk butter yogurt rice
## V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16 V17 V18 V19 V20 V21
## 1
## 2
## 3
## 4
## 5
## 6 abrasive cleaner
## V22 V23 V24 V25 V26 V27 V28 V29 V30 V31 V32
## 1
## 2
## 3
## 4
## 5
## 6
## In order to use the arules package we have to convert our recipet data into a transaction data.. you had to seperate it by commas
tData <- read.transactions("C:\\Users\\Al Haque\\OneDrive\\Desktop\\Data 624\\GroceryDataSet.csv",sep = ",")
## Look at the brief contents of the transactions data
inspect(head(tData,10))
## items
## [1] {citrus fruit,
## margarine,
## ready soups,
## semi-finished bread}
## [2] {coffee,
## tropical fruit,
## yogurt}
## [3] {whole milk}
## [4] {cream cheese,
## meat spreads,
## pip fruit,
## yogurt}
## [5] {condensed milk,
## long life bakery product,
## other vegetables,
## whole milk}
## [6] {abrasive cleaner,
## butter,
## rice,
## whole milk,
## yogurt}
## [7] {rolls/buns}
## [8] {bottled beer,
## liquor (appetizer),
## other vegetables,
## rolls/buns,
## UHT-milk}
## [9] {pot plants}
## [10] {cereals,
## whole milk}
frequentitems <- eclat(tData,parameter = list(supp = 0.7,maxlen = 5))
## Eclat
##
## parameter specification:
## tidLists support minlen maxlen target ext
## FALSE 0.7 1 5 frequent itemsets TRUE
##
## algorithmic control:
## sparse sort verbose
## 7 -2 TRUE
##
## Absolute minimum support count: 6884
## eclat - zero frequent items
# calculates support for frequent items
inspect(frequentitems)
itemFrequencyPlot(tData, topN=10, type="absolute", main="Item Frequency") # plot frequent items
Looking at the item frequency we see that whole milk,vegetables and rolls/buns were consisently purchased in a list of transactions.
Acoording to the article A and B appear together in at least s% of the transactions, B occurs in atleast c% of the transactions in which A occurs where s is the minimum support and c is the minimum confidence
Tweaking the parameters where the support is 0.001 and the confidence is 30 felt right, where we can still see the # of rules available and what type there is.
## Adjust the maxlen,supp,and conf arguements in the aprori function to control the number of rules generated,you will have to adjust this based on the sparseness of your data.
rules <- apriori (tData, parameter = list(supp = 0.001, conf = 0.30))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.3 0.1 1 none FALSE TRUE 5 0.001 1
## maxlen target ext
## 10 rules TRUE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 9
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[169 item(s), 9835 transaction(s)] done [0.00s].
## sorting and recoding items ... [157 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 6 done [0.01s].
## writing ... [13770 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
rules_conf <- sort(rules,by = "confidence",decreasing = TRUE)
rules_suppor <- sort(rules,by = "support",decreasing = TRUE)
inspect(head(rules_conf,10))
## lhs rhs support confidence coverage lift count
## [1] {rice,
## sugar} => {whole milk} 0.001220132 1 0.001220132 3.913649 12
## [2] {canned fish,
## hygiene articles} => {whole milk} 0.001118454 1 0.001118454 3.913649 11
## [3] {butter,
## rice,
## root vegetables} => {whole milk} 0.001016777 1 0.001016777 3.913649 10
## [4] {flour,
## root vegetables,
## whipped/sour cream} => {whole milk} 0.001728521 1 0.001728521 3.913649 17
## [5] {butter,
## domestic eggs,
## soft cheese} => {whole milk} 0.001016777 1 0.001016777 3.913649 10
## [6] {citrus fruit,
## root vegetables,
## soft cheese} => {other vegetables} 0.001016777 1 0.001016777 5.168156 10
## [7] {butter,
## hygiene articles,
## pip fruit} => {whole milk} 0.001016777 1 0.001016777 3.913649 10
## [8] {hygiene articles,
## root vegetables,
## whipped/sour cream} => {whole milk} 0.001016777 1 0.001016777 3.913649 10
## [9] {hygiene articles,
## pip fruit,
## root vegetables} => {whole milk} 0.001016777 1 0.001016777 3.913649 10
## [10] {cream cheese,
## domestic eggs,
## sugar} => {whole milk} 0.001118454 1 0.001118454 3.913649 11
Ranking the rules by confidence we see that it seems buying groceries items usually lead to the purchase of buying milk.
inspect(head(rules_suppor,10))
## lhs rhs support confidence coverage
## [1] {other vegetables} => {whole milk} 0.07483477 0.3867578 0.19349263
## [2] {rolls/buns} => {whole milk} 0.05663447 0.3079049 0.18393493
## [3] {yogurt} => {whole milk} 0.05602440 0.4016035 0.13950178
## [4] {root vegetables} => {whole milk} 0.04890696 0.4486940 0.10899847
## [5] {root vegetables} => {other vegetables} 0.04738180 0.4347015 0.10899847
## [6] {yogurt} => {other vegetables} 0.04341637 0.3112245 0.13950178
## [7] {tropical fruit} => {whole milk} 0.04229792 0.4031008 0.10493137
## [8] {tropical fruit} => {other vegetables} 0.03589222 0.3420543 0.10493137
## [9] {bottled water} => {whole milk} 0.03436706 0.3109476 0.11052364
## [10] {pastry} => {whole milk} 0.03324860 0.3737143 0.08896797
## lift count
## [1] 1.513634 736
## [2] 1.205032 557
## [3] 1.571735 551
## [4] 1.756031 481
## [5] 2.246605 466
## [6] 1.608457 427
## [7] 1.577595 416
## [8] 1.767790 353
## [9] 1.216940 338
## [10] 1.462587 327
Looking at the top ten for support we also see that certain items in this case when people buy groceries, they usually purchase milk on top of their groceries.
rules_lift <- sort(rules,by = "lift",decreasing = TRUE)
inspect(head(rules_lift,10))
## lhs rhs support
## [1] {bottled beer, red/blush wine} => {liquor} 0.001931876
## [2] {ham, white bread} => {processed cheese} 0.001931876
## [3] {bottled beer, liquor} => {red/blush wine} 0.001931876
## [4] {Instant food products, soda} => {hamburger meat} 0.001220132
## [5] {curd, sugar} => {flour} 0.001118454
## [6] {baking powder, sugar} => {flour} 0.001016777
## [7] {processed cheese, white bread} => {ham} 0.001931876
## [8] {popcorn, soda} => {salty snack} 0.001220132
## [9] {baking powder, flour} => {sugar} 0.001016777
## [10] {ham, processed cheese} => {white bread} 0.001931876
## confidence coverage lift count
## [1] 0.3958333 0.004880529 35.71579 19
## [2] 0.3800000 0.005083884 22.92822 19
## [3] 0.4130435 0.004677173 21.49356 19
## [4] 0.6315789 0.001931876 18.99565 12
## [5] 0.3235294 0.003457041 18.60767 11
## [6] 0.3125000 0.003253686 17.97332 10
## [7] 0.4634146 0.004168785 17.80345 19
## [8] 0.6315789 0.001931876 16.69779 12
## [9] 0.5555556 0.001830198 16.40807 10
## [10] 0.6333333 0.003050330 15.04549 19
Sorting the top 10 rules by lift we see that lhs has bottled beer, or red/blush wine often purchased together with liquor, and then it is followed by ham and white beard with processed cheese being bought with it, and a variety of items on the rhs which is different from the support and confidence rules which had milk.