The goal of this tutorial is to filter 1 item purchases in a basket analysis using the libraries arules and arulesViz. This procedure can be used to study 1 item transactions or n items transactions once we understand the logic of the procedure.
# We need to load two libraries to perform this task
library(arules)
## Loading required package: Matrix
##
## Attaching package: 'arules'
## The following objects are masked from 'package:base':
##
## abbreviate, write
library(arulesViz)
## Loading required package: grid
# In this tutorial we are going to use transactions file format.
# The file looks like:
# apple, orange, pear
# apple, pear
# orange
# etc
# First we load the data
products <- read.transactions("transactions.csv", sep =",", format("basket"), rm.duplicates = TRUE)
## Warning in readLines(file, encoding = encoding): incomplete final line
## found on 'transactions.csv'
## distribution of transactions with duplicates:
## items
## 1 2
## 191 10
products <- sample(products, 1000)
# Transactions data is not a regular data frame and we must use functions from the libraries to inspect, extract and filter the data:
# To get information from transactions
summary(products)
## transactions as itemMatrix in sparse format with
## 1000 rows (elements/itemsets/transactions) and
## 125 columns (items) and a density of 0.035136
##
## most frequent items:
## iMac HP Laptop CYBERPOWER Gamer Desktop
## 223 199 198
## Apple Earpods Apple MacBook Air (Other)
## 189 158 3425
##
## element (itemset/transaction) length distribution:
## sizes
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 17 18 20
## 221 174 121 90 90 63 60 53 39 34 18 15 4 5 4 2 1 3
## 21 22
## 1 2
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 2.000 3.000 4.392 6.000 22.000
##
## includes extended item information - examples:
## labels
## 1 1TB Portable External Hard Drive
## 2 2TB Portable External Hard Drive
## 3 3-Button Mouse
# To plot the most common items
itemFrequencyPlot(products, topN = 5)
# Plot the individual transactions
image(sample(products, 100)) # Sample a small number of transactions to see the image
# To see the individual transaction
inspect(products[5, ])
## items
## [1] {Logitech Wireless Mouse}
# To see the number of items per transaction
size(products)[1:100]
## [1] 10 5 4 1 1 8 13 6 7 5 9 3 2 7 7 8 5 3 7 4 3 1 6
## [24] 2 7 4 1 2 1 1 3 1 2 2 4 2 1 6 3 1 7 1 7 4 4 10
## [47] 7 1 2 3 2 3 9 3 5 12 2 10 3 1 12 2 2 7 1 9 1 5 8
## [70] 1 7 11 3 8 3 6 2 5 5 2 1 5 2 2 3 12 9 6 7 6 2 9
## [93] 1 3 9 2 6 8 2 4
# Now we can use the size function to keep only 1 item transactions
products_1item <- products[which(size(products) == 1), ]
# Now create the crossTable for this 1 item table
# It could be a great moment to read ?crossTable
my_crosstable <- crossTable(products_1item)
my_crosstable[10:14, 10:14]
## Alienware Laptop AOC Monitor
## Alienware Laptop 3 0
## AOC Monitor 0 0
## APIE Bluetooth Headphone 0 0
## Apple Earpods 0 0
## Apple MacBook Air 0 0
## APIE Bluetooth Headphone Apple Earpods
## Alienware Laptop 0 0
## AOC Monitor 0 0
## APIE Bluetooth Headphone 0 0
## Apple Earpods 0 18
## Apple MacBook Air 0 0
## Apple MacBook Air
## Alienware Laptop 0
## AOC Monitor 0
## APIE Bluetooth Headphone 0
## Apple Earpods 0
## Apple MacBook Air 35
# We can now find the most isolated sold product
itemFrequencyPlot(products_1item, topN = 1)
# And find how many times it was sold as a single product
my_crosstable["Apple MacBook Air", "Apple MacBook Air"]
## [1] 35
In this tutorial we have learnt how to select only 1 item transactions from a transaction file using the libraries arules and arulesViz. This logic can be expanded to filter, subset and select with all kinds of conditions in order to study our cross-selling features.