搜集某一大型网上超市销售数据,进行关联分析,寻找各主要日常用品之间的关联关系,从支持度,置信度,提升度等几个方面进行说明;给出具体的销售策略。
## 设置显示方式
knitr::opts_chunk$set(echo = TRUE,message = FALSE,warning = FALSE,
fig.width = 9.5,fig.height = 6)
rm(list = ls());gc()## used (Mb) gc trigger (Mb) max used (Mb)
## Ncells 364639 19.5 592000 31.7 460000 24.6
## Vcells 552239 4.3 1023718 7.9 786371 6.0
## 准备工作 ,加载包
library(ggplot2)## Warning: package 'ggplot2' was built under R version 3.3.2
library(gridExtra)
library(arules)## Loading required package: Matrix
##
## Attaching package: 'arules'
## The following objects are masked from 'package:base':
##
## abbreviate, write
library(arulesViz)## Loading required package: grid
library(iplots)## Loading required package: rJava
## Note: On Mac OS X we strongly recommend using iplots from within JGR.
## Proceed at your own risk as iplots cannot resolve potential ev.loop deadlocks.
## 'Yes' is assumed for all dialogs as they cannot be shown without a deadlock,
## also ievent.wait() is disabled.
## More recent OS X version do not allow signle-threaded GUIs and will fail.
theme_set(theme_bw(base_family = "STKaiti"))groceries.csv 数据为一个商场的销售数据。数据集有9835行,169个项目
## grocery.csv
groceries <- read.transactions("groceries.csv",sep = ",")
summary(groceries)## transactions as itemMatrix in sparse format with
## 9835 rows (elements/itemsets/transactions) and
## 169 columns (items) and a density of 0.02609146
##
## most frequent items:
## whole milk other vegetables rolls/buns soda
## 2513 1903 1809 1715
## yogurt (Other)
## 1372 34055
##
## element (itemset/transaction) length distribution:
## sizes
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
## 2159 1643 1299 1005 855 645 545 438 350 246 182 117 78 77 55
## 16 17 18 19 20 21 22 23 24 26 27 28 29 32
## 46 29 14 14 9 11 4 6 1 1 1 1 3 1
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 2.000 3.000 4.409 6.000 32.000
##
## includes extended item information - examples:
## labels
## 1 abrasive cleaner
## 2 artif. sweetener
## 3 baby cosmetics
## 分析出现频率较多的项目
par(family = "STKaiti",cex = 0.8)
itemFrequencyPlot(groceries,topN = 35,main = "出现频率较大的项目",col = "red")## 频繁项集挖掘算法。
freiter <- arules::eclat(groceries,parameter = list(support = 0.01,
minlen = 2))## Eclat
##
## parameter specification:
## tidLists support minlen maxlen target ext
## FALSE 0.01 2 10 frequent itemsets FALSE
##
## algorithmic control:
## sparse sort verbose
## 7 -2 TRUE
##
## Absolute minimum support count: 98
##
## create itemset ...
## set transactions ...[169 item(s), 9835 transaction(s)] done [0.01s].
## sorting and recoding items ... [88 item(s)] done [0.00s].
## creating sparse bit matrix ... [88 row(s), 9835 column(s)] done [0.00s].
## writing ... [245 set(s)] done [0.01s].
## Creating S4 object ... done [0.00s].
summary(freiter)## set of 245 itemsets
##
## most frequent items:
## whole milk other vegetables yogurt rolls/buns
## 70 62 38 35
## root vegetables (Other)
## 33 284
##
## element (itemset/transaction) length distribution:sizes
## 2 3
## 213 32
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 2.000 2.000 2.000 2.131 2.000 3.000
##
## summary of quality measures:
## support
## Min. :0.01007
## 1st Qu.:0.01149
## Median :0.01423
## Mean :0.01745
## 3rd Qu.:0.02044
## Max. :0.07483
##
## includes transaction ID lists: FALSE
##
## mining info:
## data ntransactions support
## groceries 9835 0.01
aa <- crossprod(freiter@items@data)
stopifnot(isSymmetric(aa))
image(aa)guize1 <- apriori(groceries,parameter = list(supp = 0.01,
conf = 0.5))## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.5 0.1 1 none FALSE TRUE 5 0.01 1
## maxlen target ext
## 10 rules FALSE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 98
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[169 item(s), 9835 transaction(s)] done [0.00s].
## sorting and recoding items ... [88 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [15 rule(s)] done [0.00s].
## creating S4 object ... done [0.00s].
## 我们有15个rulers
summary(guize1)## set of 15 rules
##
## rule length distribution (lhs + rhs):sizes
## 3
## 15
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 3 3 3 3 3 3
##
## summary of quality measures:
## support confidence lift
## Min. :0.01007 Min. :0.5000 Min. :1.984
## 1st Qu.:0.01174 1st Qu.:0.5151 1st Qu.:2.036
## Median :0.01230 Median :0.5245 Median :2.203
## Mean :0.01316 Mean :0.5411 Mean :2.299
## 3rd Qu.:0.01403 3rd Qu.:0.5718 3rd Qu.:2.432
## Max. :0.02227 Max. :0.5862 Max. :3.030
##
## mining info:
## data ntransactions support confidence
## groceries 9835 0.01 0.5
inspect(sort(guize1,by = "lift"))## lhs rhs support confidence lift
## [1] {citrus fruit,
## root vegetables} => {other vegetables} 0.01037112 0.5862069 3.029608
## [2] {root vegetables,
## tropical fruit} => {other vegetables} 0.01230300 0.5845411 3.020999
## [3] {rolls/buns,
## root vegetables} => {other vegetables} 0.01220132 0.5020921 2.594890
## [4] {root vegetables,
## yogurt} => {other vegetables} 0.01291307 0.5000000 2.584078
## [5] {curd,
## yogurt} => {whole milk} 0.01006609 0.5823529 2.279125
## [6] {butter,
## other vegetables} => {whole milk} 0.01148958 0.5736041 2.244885
## [7] {root vegetables,
## tropical fruit} => {whole milk} 0.01199797 0.5700483 2.230969
## [8] {root vegetables,
## yogurt} => {whole milk} 0.01453991 0.5629921 2.203354
## [9] {domestic eggs,
## other vegetables} => {whole milk} 0.01230300 0.5525114 2.162336
## [10] {whipped/sour cream,
## yogurt} => {whole milk} 0.01087951 0.5245098 2.052747
## [11] {rolls/buns,
## root vegetables} => {whole milk} 0.01270971 0.5230126 2.046888
## [12] {other vegetables,
## pip fruit} => {whole milk} 0.01352313 0.5175097 2.025351
## [13] {tropical fruit,
## yogurt} => {whole milk} 0.01514997 0.5173611 2.024770
## [14] {other vegetables,
## yogurt} => {whole milk} 0.02226741 0.5128806 2.007235
## [15] {other vegetables,
## whipped/sour cream} => {whole milk} 0.01464159 0.5070423 1.984385
aa <- as(guize1,"data.frame")
aa15个规则中提升度均大于1,置信度均大于0.5.并且长度均为3。4个导致other vegetables。11个导致whole milk。
Inspect Associations Interactively using datatable
## Inspect Associations Interactively using datatable
arulesViz::inspectDT(guize1)## 可交互关联规则图示
library(plotly)
arulesViz::plotly_arules(guize1,method = "scatterplot")## scatterplot
plot(guize1,method = "scatterplot")## "graph"
plot(guize1,method = "graph")## "paracoord"
plot(guize1,method = "paracoord")从可视化的关联规则中,我们可以看出规则的情况和生成的过程,以及之间的关系。