homework

Introdution

library(arules)
library(arulesViz)
library(dplyr)


data <- read.transactions('GroceryDataSet.csv', format = 'basket', sep=',')
data

## transactions in sparse format with
##  9835 transactions (rows) and
##  169 items (columns)

summary(data)

## transactions as itemMatrix in sparse format with
##  9835 rows (elements/itemsets/transactions) and
##  169 columns (items) and a density of 0.02609146 
## 
## most frequent items:
##       whole milk other vegetables       rolls/buns             soda 
##             2513             1903             1809             1715 
##           yogurt          (Other) 
##             1372            34055 
## 
## element (itemset/transaction) length distribution:
## sizes
##    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15   16 
## 2159 1643 1299 1005  855  645  545  438  350  246  182  117   78   77   55   46 
##   17   18   19   20   21   22   23   24   26   27   28   29   32 
##   29   14   14    9   11    4    6    1    1    1    1    3    1 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   2.000   3.000   4.409   6.000  32.000 
## 
## includes extended item information - examples:
##             labels
## 1 abrasive cleaner
## 2 artif. sweetener
## 3   baby cosmetics

data %>% itemFrequencyPlot(topN=15,col= 'Blue', main="Grocery")

We can the product with the highest frequency its the “Whole Milk”.here we can see the 15th highst frequency products.

rule <- data %>% apriori(list(supp=0.001, conf=0.8,maxlen=5))

## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.8    0.1    1 none FALSE            TRUE       5   0.001      1
##  maxlen target   ext
##       5  rules FALSE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 9 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[169 item(s), 9835 transaction(s)] done [0.00s].
## sorting and recoding items ... [157 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5

## Warning in apriori(., list(supp = 0.001, conf = 0.8, maxlen = 5)): Mining
## stopped (maxlen reached). Only patterns up to a length of 5 returned!

##  done [0.01s].
## writing ... [398 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].

summary(rule)

## set of 398 rules
## 
## rule length distribution (lhs + rhs):sizes
##   3   4   5 
##  29 229 140 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   3.000   4.000   4.000   4.279   5.000   5.000 
## 
## summary of quality measures:
##     support           confidence          lift            count     
##  Min.   :0.001017   Min.   :0.8000   Min.   : 3.131   Min.   :10.0  
##  1st Qu.:0.001017   1st Qu.:0.8333   1st Qu.: 3.312   1st Qu.:10.0  
##  Median :0.001220   Median :0.8462   Median : 3.588   Median :12.0  
##  Mean   :0.001251   Mean   :0.8658   Mean   : 3.942   Mean   :12.3  
##  3rd Qu.:0.001322   3rd Qu.:0.9091   3rd Qu.: 4.307   3rd Qu.:13.0  
##  Max.   :0.003152   Max.   :1.0000   Max.   :11.235   Max.   :31.0  
## 
## mining info:
##  data ntransactions support confidence
##     .          9835   0.001        0.8

inspect(head(rule))

##     lhs                                 rhs            support     confidence
## [1] {liquor,red/blush wine}          => {bottled beer} 0.001931876 0.9047619 
## [2] {cereals,curd}                   => {whole milk}   0.001016777 0.9090909 
## [3] {cereals,yogurt}                 => {whole milk}   0.001728521 0.8095238 
## [4] {butter,jam}                     => {whole milk}   0.001016777 0.8333333 
## [5] {bottled beer,soups}             => {whole milk}   0.001118454 0.9166667 
## [6] {house keeping products,napkins} => {whole milk}   0.001321810 0.8125000 
##     lift      count
## [1] 11.235269 19   
## [2]  3.557863 10   
## [3]  3.168192 17   
## [4]  3.261374 10   
## [5]  3.587512 11   
## [6]  3.179840 13

We can see the combinations rule after the using the Apriori Algorythim. obseving the table we can say that 90 confidence who bougth a “liquor,red /blush wine” also bougth “bottled beer”.

plot(rule[quality(rule)$confidence>0.3],method ="paracoord")

After observing the result we see the most cofidence level products are usually breakfast prodcut(cereal,yogurt,butter) with “Whole Milk” and liquor product (red wine) with “bottled beer”.

Refences

Snipped code and refereces from: https://www.datacamp.com/community/tutorials/market-basket-analysis-r

homework_10

Anthony Munoz

4/22/2020

Introdution

Refences