1 Goal


The goal of this tutorial is to filter 1 item purchases in a basket analysis using the libraries arules and arulesViz. This procedure can be used to study 1 item transactions or n items transactions once we understand the logic of the procedure.


2 Loading the data


# We need to load two libraries to perform this task
library(arules)
## Loading required package: Matrix
## 
## Attaching package: 'arules'
## The following objects are masked from 'package:base':
## 
##     abbreviate, write
library(arulesViz)
## Loading required package: grid
# In this tutorial we are going to use transactions file format.
# The file looks like:
# apple, orange, pear
# apple, pear
# orange
# etc

# First we load the data
products <- read.transactions("transactions.csv", sep =",", format("basket"),  rm.duplicates = TRUE)
## Warning in readLines(file, encoding = encoding): incomplete final line
## found on 'transactions.csv'
## distribution of transactions with duplicates:
## items
##   1   2 
## 191  10
products <- sample(products, 1000)

3 Read and understand the data


# Transactions data is not a regular data frame and we must use functions from the libraries to inspect, extract and filter the data:

# To get information from transactions
summary(products)
## transactions as itemMatrix in sparse format with
##  1000 rows (elements/itemsets/transactions) and
##  125 columns (items) and a density of 0.033784 
## 
## most frequent items:
##                     iMac                HP Laptop CYBERPOWER Gamer Desktop 
##                      265                      186                      179 
##            Apple Earpods        Apple MacBook Air                  (Other) 
##                      166                      144                     3283 
## 
## element (itemset/transaction) length distribution:
## sizes
##   0   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  17 
##   1 240 161 123  94 110  65  59  32  37  21  19  11   8   7   3   2   3 
##  19  21  26  27 
##   1   1   1   1 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   0.000   2.000   3.000   4.223   6.000  27.000 
## 
## includes extended item information - examples:
##                             labels
## 1 1TB Portable External Hard Drive
## 2 2TB Portable External Hard Drive
## 3                   3-Button Mouse
# To plot the most common items
itemFrequencyPlot(products, topN = 5)

# Plot the individual transactions
image(sample(products, 100)) # Sample a small number of transactions to see the image

# To see the individual transaction
inspect(products[5, ])
##     items           
## [1] {Dell 2 Desktop}
# To see the number of items per transaction
size(products)[1:100]
##   [1]  6  1  1  3  1  1  1  5  9  1  1  7  9  2  2  4  7  2 19 12  1 12  2
##  [24]  4  4  3  9  3  3  2  2  3  5  7  2  6  5  8  4  1 10  1  2  2  3  2
##  [47]  6  7  4  5  1  4  1  3  1  5  5  2  5  3  3  6  1  4  2  2  1  3  5
##  [70]  6  2 13  1  3  2  4  6  8  7 11  1  1 10  9  1  2  1  7  6  1  7 11
##  [93]  7  1  9  1  8  2  7  1

4 Filter the data


# Now we can use the size function to keep only 1 item transactions
products_1item <- products[which(size(products) == 1), ]

# Now create the crossTable for this 1 item table
# It could be a great moment to read ?crossTable
my_crosstable <- crossTable(products_1item)
my_crosstable[10:14, 10:14]
##                          Alienware Laptop AOC Monitor
## Alienware Laptop                        2           0
## AOC Monitor                             0           1
## APIE Bluetooth Headphone                0           0
## Apple Earpods                           0           0
## Apple MacBook Air                       0           0
##                          APIE Bluetooth Headphone Apple Earpods
## Alienware Laptop                                0             0
## AOC Monitor                                     0             0
## APIE Bluetooth Headphone                        2             0
## Apple Earpods                                   0            13
## Apple MacBook Air                               0             0
##                          Apple MacBook Air
## Alienware Laptop                         0
## AOC Monitor                              0
## APIE Bluetooth Headphone                 0
## Apple Earpods                            0
## Apple MacBook Air                       35
# We can now find the most isolated sold product
itemFrequencyPlot(products_1item, topN = 1)

# And find how many times it was sold as a single product
my_crosstable["Apple MacBook Air", "Apple MacBook Air"]
## [1] 35

5 Conclusion


In this tutorial we have learnt how to select only 1 item transactions from a transaction file using the libraries arules and arulesViz. This logic can be expanded to filter, subset and select with all kinds of conditions in order to study our cross-selling features.