The goal of this tutorial is to remove an item from a transaction file if the item is not interesting for our analysis. This is important because removing the item does affect the probabilities and the numbers of our rules.
# We need to load two libraries to perform this task
library(arules)
## Loading required package: Matrix
##
## Attaching package: 'arules'
## The following objects are masked from 'package:base':
##
## abbreviate, write
library(arulesViz)
## Loading required package: grid
# In this tutorial we are going to use transactions file format.
# The file looks like:
# apple, orange, pear
# apple, pear
# orange
# etc
# First we load the data
products <- read.transactions("Transactions_all.csv", sep =",", format("basket"), rm.duplicates = TRUE)
## Warning in readLines(file, encoding = encoding): incomplete final line
## found on 'Transactions_all.csv'
## distribution of transactions with duplicates:
## items
## 1 2
## 191 10
products <- sample(products, 1000)
summary(products)
## transactions as itemMatrix in sparse format with
## 1000 rows (elements/itemsets/transactions) and
## 125 columns (items) and a density of 0.035032
##
## most frequent items:
## iMac HP Laptop CYBERPOWER Gamer Desktop
## 258 199 187
## Apple MacBook Air Apple Earpods (Other)
## 176 166 3393
##
## element (itemset/transaction) length distribution:
## sizes
## 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 18 22
## 208 165 147 117 83 59 49 44 28 29 28 15 8 7 4 4 2 1
## 27
## 2
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 2.000 3.000 4.379 6.000 27.000
##
## includes extended item information - examples:
## labels
## 1 1TB Portable External Hard Drive
## 2 2TB Portable External Hard Drive
## 3 3-Button Mouse
# We need to load the data as a dataframe
products_df <- read.csv("Transactions_all.csv", header = FALSE, sep = ",")
head(products_df)
## V1 V2 V3
## 1 Acer Aspire Brother Printer Toner Belkin Mouse Pad
## 2 Dell Desktop Lenovo Desktop Computer Apple Wireless Keyboard
## 3 iMac
## 4 Acer Desktop Lenovo Desktop Computer Intel Desktop
## 5 HP Laptop iMac Epson Black Ink
## 6 iMac ASUS Monitor Lenovo Desktop Computer
## V4 V5 V6 V7 V8 V9 V10 V11 V12
## 1 VGA Monitor Cable
## 2
## 3
## 4 XIBERIA Gaming Headset
## 5 ASUS Desktop
## 6 Mackie CR Speakers Gaming Mouse Professional
## V13 V14 V15 V16 V17 V18 V19 V20 V21 V22 V23 V24 V25 V26 V27 V28 V29 V30
## 1
## 2
## 3
## 4
## 5
## 6
## V31 V32
## 1
## 2
## 3
## 4
## 5
## 6
# We can now remove certain product for our list
products_df[products_df == "Acer Aspire"] <- ""
# Now we save the dataframe using the write table command as in the tutorial <<Save dataframe without column names>>
write.table(products_df, file = "noAcer.csv", col.names = FALSE, row.names = FALSE, sep = ",")
products <- read.transactions("noAcer.csv", sep =",", format("basket"), rm.duplicates = TRUE)
## distribution of transactions with duplicates:
## items
## 1 2
## 191 10
summary(products)
## transactions as itemMatrix in sparse format with
## 9835 rows (elements/itemsets/transactions) and
## 124 columns (items) and a density of 0.03467701
##
## most frequent items:
## iMac HP Laptop CYBERPOWER Gamer Desktop
## 2519 1909 1809
## Apple Earpods Apple MacBook Air (Other)
## 1715 1530 32808
##
## element (itemset/transaction) length distribution:
## sizes
## 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14
## 17 2196 1659 1320 1015 854 648 539 427 340 231 161 112 77 67
## 15 16 17 18 19 20 21 22 23 25 26 27 28 29
## 54 37 21 17 13 8 8 5 2 1 1 3 1 1
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0 2.0 3.0 4.3 6.0 29.0
##
## includes extended item information - examples:
## labels
## 1 1TB Portable External Hard Drive
## 2 2TB Portable External Hard Drive
## 3 3-Button Mouse
head(colnames(products), 10)
## [1] "1TB Portable External Hard Drive" "2TB Portable External Hard Drive"
## [3] "3-Button Mouse" "3TB Portable External Hard Drive"
## [5] "5TB Desktop Hard Drive" "Acer Desktop"
## [7] "Acer Monitor" "Ailihen Stereo Headphones"
## [9] "Alienware Laptop" "AOC Monitor"
# Now the Acer Spire has been deleted from the transactions and we can crate new rules
In this tutorial we have learnt how to remove one item from transactional data in order to study rules without its interaction.