试验3 :关联规则

搜集某一大型网上超市销售数据,进行关联分析,寻找各主要日常用品之间的关联关系,从支持度,置信度,提升度等几个方面进行说明;给出具体的销售策略。

## 设置显示方式
knitr::opts_chunk$set(echo = TRUE,message = FALSE,warning = FALSE,
                      fig.width = 9.5,fig.height = 6)
rm(list = ls());gc()
##          used (Mb) gc trigger (Mb) max used (Mb)
## Ncells 364639 19.5     592000 31.7   460000 24.6
## Vcells 552239  4.3    1023718  7.9   786371  6.0
## 准备工作 ,加载包
library(ggplot2)
## Warning: package 'ggplot2' was built under R version 3.3.2
library(gridExtra)
library(arules)
## Loading required package: Matrix
## 
## Attaching package: 'arules'
## The following objects are masked from 'package:base':
## 
##     abbreviate, write
library(arulesViz)
## Loading required package: grid
library(iplots)
## Loading required package: rJava
## Note: On Mac OS X we strongly recommend using iplots from within JGR.
## Proceed at your own risk as iplots cannot resolve potential ev.loop deadlocks.
## 'Yes' is assumed for all dialogs as they cannot be shown without a deadlock,
## also ievent.wait() is disabled.
## More recent OS X version do not allow signle-threaded GUIs and will fail.
theme_set(theme_bw(base_family = "STKaiti"))

数据准备

groceries.csv 数据为一个商场的销售数据。数据集有9835行,169个项目

## grocery.csv 
groceries <- read.transactions("groceries.csv",sep = ",")

summary(groceries)
## transactions as itemMatrix in sparse format with
##  9835 rows (elements/itemsets/transactions) and
##  169 columns (items) and a density of 0.02609146 
## 
## most frequent items:
##       whole milk other vegetables       rolls/buns             soda 
##             2513             1903             1809             1715 
##           yogurt          (Other) 
##             1372            34055 
## 
## element (itemset/transaction) length distribution:
## sizes
##    1    2    3    4    5    6    7    8    9   10   11   12   13   14   15 
## 2159 1643 1299 1005  855  645  545  438  350  246  182  117   78   77   55 
##   16   17   18   19   20   21   22   23   24   26   27   28   29   32 
##   46   29   14   14    9   11    4    6    1    1    1    1    3    1 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   2.000   3.000   4.409   6.000  32.000 
## 
## includes extended item information - examples:
##             labels
## 1 abrasive cleaner
## 2 artif. sweetener
## 3   baby cosmetics

数据探索

## 分析出现频率较多的项目

par(family = "STKaiti",cex = 0.8)
itemFrequencyPlot(groceries,topN = 35,main = "出现频率较大的项目",col = "red")

挖掘频繁项集

## 频繁项集挖掘算法。

freiter <- arules::eclat(groceries,parameter = list(support = 0.01,
                                         minlen = 2))
## Eclat
## 
## parameter specification:
##  tidLists support minlen maxlen            target   ext
##     FALSE    0.01      2     10 frequent itemsets FALSE
## 
## algorithmic control:
##  sparse sort verbose
##       7   -2    TRUE
## 
## Absolute minimum support count: 98 
## 
## create itemset ... 
## set transactions ...[169 item(s), 9835 transaction(s)] done [0.01s].
## sorting and recoding items ... [88 item(s)] done [0.00s].
## creating sparse bit matrix ... [88 row(s), 9835 column(s)] done [0.00s].
## writing  ... [245 set(s)] done [0.01s].
## Creating S4 object  ... done [0.00s].
summary(freiter)
## set of 245 itemsets
## 
## most frequent items:
##       whole milk other vegetables           yogurt       rolls/buns 
##               70               62               38               35 
##  root vegetables          (Other) 
##               33              284 
## 
## element (itemset/transaction) length distribution:sizes
##   2   3 
## 213  32 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   2.000   2.000   2.000   2.131   2.000   3.000 
## 
## summary of quality measures:
##     support       
##  Min.   :0.01007  
##  1st Qu.:0.01149  
##  Median :0.01423  
##  Mean   :0.01745  
##  3rd Qu.:0.02044  
##  Max.   :0.07483  
## 
## includes transaction ID lists: FALSE 
## 
## mining info:
##       data ntransactions support
##  groceries          9835    0.01
aa <- crossprod(freiter@items@data)
stopifnot(isSymmetric(aa))
image(aa)

挖掘有价值的规则

guize1 <- apriori(groceries,parameter = list(supp = 0.01,
                                            conf = 0.5))
## Apriori
## 
## Parameter specification:
##  confidence minval smax arem  aval originalSupport maxtime support minlen
##         0.5    0.1    1 none FALSE            TRUE       5    0.01      1
##  maxlen target   ext
##      10  rules FALSE
## 
## Algorithmic control:
##  filter tree heap memopt load sort verbose
##     0.1 TRUE TRUE  FALSE TRUE    2    TRUE
## 
## Absolute minimum support count: 98 
## 
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[169 item(s), 9835 transaction(s)] done [0.00s].
## sorting and recoding items ... [88 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 done [0.00s].
## writing ... [15 rule(s)] done [0.00s].
## creating S4 object  ... done [0.00s].
## 我们有15个rulers
summary(guize1)
## set of 15 rules
## 
## rule length distribution (lhs + rhs):sizes
##  3 
## 15 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       3       3       3       3       3       3 
## 
## summary of quality measures:
##     support          confidence          lift      
##  Min.   :0.01007   Min.   :0.5000   Min.   :1.984  
##  1st Qu.:0.01174   1st Qu.:0.5151   1st Qu.:2.036  
##  Median :0.01230   Median :0.5245   Median :2.203  
##  Mean   :0.01316   Mean   :0.5411   Mean   :2.299  
##  3rd Qu.:0.01403   3rd Qu.:0.5718   3rd Qu.:2.432  
##  Max.   :0.02227   Max.   :0.5862   Max.   :3.030  
## 
## mining info:
##       data ntransactions support confidence
##  groceries          9835    0.01        0.5

查看我们的规则并将规则根据提升度排序

inspect(sort(guize1,by = "lift"))
##      lhs                     rhs                   support confidence     lift
## [1]  {citrus fruit,                                                           
##       root vegetables}    => {other vegetables} 0.01037112  0.5862069 3.029608
## [2]  {root vegetables,                                                        
##       tropical fruit}     => {other vegetables} 0.01230300  0.5845411 3.020999
## [3]  {rolls/buns,                                                             
##       root vegetables}    => {other vegetables} 0.01220132  0.5020921 2.594890
## [4]  {root vegetables,                                                        
##       yogurt}             => {other vegetables} 0.01291307  0.5000000 2.584078
## [5]  {curd,                                                                   
##       yogurt}             => {whole milk}       0.01006609  0.5823529 2.279125
## [6]  {butter,                                                                 
##       other vegetables}   => {whole milk}       0.01148958  0.5736041 2.244885
## [7]  {root vegetables,                                                        
##       tropical fruit}     => {whole milk}       0.01199797  0.5700483 2.230969
## [8]  {root vegetables,                                                        
##       yogurt}             => {whole milk}       0.01453991  0.5629921 2.203354
## [9]  {domestic eggs,                                                          
##       other vegetables}   => {whole milk}       0.01230300  0.5525114 2.162336
## [10] {whipped/sour cream,                                                     
##       yogurt}             => {whole milk}       0.01087951  0.5245098 2.052747
## [11] {rolls/buns,                                                             
##       root vegetables}    => {whole milk}       0.01270971  0.5230126 2.046888
## [12] {other vegetables,                                                       
##       pip fruit}          => {whole milk}       0.01352313  0.5175097 2.025351
## [13] {tropical fruit,                                                         
##       yogurt}             => {whole milk}       0.01514997  0.5173611 2.024770
## [14] {other vegetables,                                                       
##       yogurt}             => {whole milk}       0.02226741  0.5128806 2.007235
## [15] {other vegetables,                                                       
##       whipped/sour cream} => {whole milk}       0.01464159  0.5070423 1.984385
aa <- as(guize1,"data.frame")
aa

15个规则中提升度均大于1,置信度均大于0.5.并且长度均为3。4个导致other vegetables。11个导致whole milk。

对规则可视化

Inspect Associations Interactively using datatable

## Inspect Associations Interactively using datatable
arulesViz::inspectDT(guize1)

可交互关联规则图示

## 可交互关联规则图示
library(plotly)
arulesViz::plotly_arules(guize1,method = "scatterplot")

规则可视化

## scatterplot
plot(guize1,method = "scatterplot")

## "graph"
plot(guize1,method = "graph")

## "paracoord"
plot(guize1,method = "paracoord")

从可视化的关联规则中,我们可以看出规则的情况和生成的过程,以及之间的关系。