The goal of this tutorial is to filter 1 item purchases in a basket analysis using the libraries arules and arulesViz. This procedure can be used to study 1 item transactions or n items transactions once we understand the logic of the procedure.
# We need to load two libraries to perform this task
library(arules)
## Loading required package: Matrix
##
## Attaching package: 'arules'
## The following objects are masked from 'package:base':
##
## abbreviate, write
library(arulesViz)
## Loading required package: grid
# In this tutorial we are going to use transactions file format.
# The file looks like:
# apple, orange, pear
# apple, pear
# orange
# etc
# First we load the data
products <- read.transactions("transactions.csv", sep =",", format("basket"), rm.duplicates = TRUE)
## Warning in readLines(file, encoding = encoding): incomplete final line
## found on 'transactions.csv'
## distribution of transactions with duplicates:
## items
## 1 2
## 191 10
products <- sample(products, 1000)
# Transactions data is not a regular data frame and we must use functions from the libraries to inspect, extract and filter the data:
# To get information from transactions
summary(products)
## transactions as itemMatrix in sparse format with
## 1000 rows (elements/itemsets/transactions) and
## 125 columns (items) and a density of 0.033784
##
## most frequent items:
## iMac HP Laptop CYBERPOWER Gamer Desktop
## 265 186 179
## Apple Earpods Apple MacBook Air (Other)
## 166 144 3283
##
## element (itemset/transaction) length distribution:
## sizes
## 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17
## 1 240 161 123 94 110 65 59 32 37 21 19 11 8 7 3 2 3
## 19 21 26 27
## 1 1 1 1
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.000 2.000 3.000 4.223 6.000 27.000
##
## includes extended item information - examples:
## labels
## 1 1TB Portable External Hard Drive
## 2 2TB Portable External Hard Drive
## 3 3-Button Mouse
# To plot the most common items
itemFrequencyPlot(products, topN = 5)
# Plot the individual transactions
image(sample(products, 100)) # Sample a small number of transactions to see the image
# To see the individual transaction
inspect(products[5, ])
## items
## [1] {Dell 2 Desktop}
# To see the number of items per transaction
size(products)[1:100]
## [1] 6 1 1 3 1 1 1 5 9 1 1 7 9 2 2 4 7 2 19 12 1 12 2
## [24] 4 4 3 9 3 3 2 2 3 5 7 2 6 5 8 4 1 10 1 2 2 3 2
## [47] 6 7 4 5 1 4 1 3 1 5 5 2 5 3 3 6 1 4 2 2 1 3 5
## [70] 6 2 13 1 3 2 4 6 8 7 11 1 1 10 9 1 2 1 7 6 1 7 11
## [93] 7 1 9 1 8 2 7 1
# Now we can use the size function to keep only 1 item transactions
products_1item <- products[which(size(products) == 1), ]
# Now create the crossTable for this 1 item table
# It could be a great moment to read ?crossTable
my_crosstable <- crossTable(products_1item)
my_crosstable[10:14, 10:14]
## Alienware Laptop AOC Monitor
## Alienware Laptop 2 0
## AOC Monitor 0 1
## APIE Bluetooth Headphone 0 0
## Apple Earpods 0 0
## Apple MacBook Air 0 0
## APIE Bluetooth Headphone Apple Earpods
## Alienware Laptop 0 0
## AOC Monitor 0 0
## APIE Bluetooth Headphone 2 0
## Apple Earpods 0 13
## Apple MacBook Air 0 0
## Apple MacBook Air
## Alienware Laptop 0
## AOC Monitor 0
## APIE Bluetooth Headphone 0
## Apple Earpods 0
## Apple MacBook Air 35
# We can now find the most isolated sold product
itemFrequencyPlot(products_1item, topN = 1)
# And find how many times it was sold as a single product
my_crosstable["Apple MacBook Air", "Apple MacBook Air"]
## [1] 35
In this tutorial we have learnt how to select only 1 item transactions from a transaction file using the libraries arules and arulesViz. This logic can be expanded to filter, subset and select with all kinds of conditions in order to study our cross-selling features.