1 Goal


The goal of this tutorial is to remove an item from a transaction file if the item is not interesting for our analysis. This is important because removing the item does affect the probabilities and the numbers of our rules.


2 Loading the data


# We need to load two libraries to perform this task
library(arules)
## Loading required package: Matrix
## 
## Attaching package: 'arules'
## The following objects are masked from 'package:base':
## 
##     abbreviate, write
library(arulesViz)
## Loading required package: grid
# In this tutorial we are going to use transactions file format.
# The file looks like:
# apple, orange, pear
# apple, pear
# orange
# etc

# First we load the data
products <- read.transactions("Transactions_all.csv", sep =",", format("basket"),  rm.duplicates = TRUE)
## Warning in readLines(file, encoding = encoding): incomplete final line
## found on 'Transactions_all.csv'
## distribution of transactions with duplicates:
## items
##   1   2 
## 191  10
products <- sample(products, 1000)
summary(products)
## transactions as itemMatrix in sparse format with
##  1000 rows (elements/itemsets/transactions) and
##  125 columns (items) and a density of 0.035032 
## 
## most frequent items:
##                     iMac                HP Laptop CYBERPOWER Gamer Desktop 
##                      258                      199                      187 
##        Apple MacBook Air            Apple Earpods                  (Other) 
##                      176                      166                     3393 
## 
## element (itemset/transaction) length distribution:
## sizes
##   1   2   3   4   5   6   7   8   9  10  11  12  13  14  15  16  18  22 
## 208 165 147 117  83  59  49  44  28  29  28  15   8   7   4   4   2   1 
##  27 
##   2 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.000   2.000   3.000   4.379   6.000  27.000 
## 
## includes extended item information - examples:
##                             labels
## 1 1TB Portable External Hard Drive
## 2 2TB Portable External Hard Drive
## 3                   3-Button Mouse

3 Removing one item from dataset


# We need to load the data as a dataframe 
products_df <- read.csv("Transactions_all.csv", header = FALSE, sep = ",")
head(products_df)
##             V1                      V2                      V3
## 1  Acer Aspire   Brother Printer Toner        Belkin Mouse Pad
## 2 Dell Desktop Lenovo Desktop Computer Apple Wireless Keyboard
## 3         iMac                                                
## 4 Acer Desktop Lenovo Desktop Computer          Intel Desktop 
## 5    HP Laptop                    iMac         Epson Black Ink
## 6         iMac            ASUS Monitor Lenovo Desktop Computer
##                       V4                        V5 V6 V7 V8 V9 V10 V11 V12
## 1      VGA Monitor Cable                                                  
## 2                                                                         
## 3                                                                         
## 4 XIBERIA Gaming Headset                                                  
## 5           ASUS Desktop                                                  
## 6     Mackie CR Speakers Gaming Mouse Professional                        
##   V13 V14 V15 V16 V17 V18 V19 V20 V21 V22 V23 V24 V25 V26 V27 V28 V29 V30
## 1                                                                        
## 2                                                                        
## 3                                                                        
## 4                                                                        
## 5                                                                        
## 6                                                                        
##   V31 V32
## 1        
## 2        
## 3        
## 4        
## 5        
## 6
# We can now remove certain product for our list
products_df[products_df == "Acer Aspire"] <- ""

# Now we save the dataframe using the write table command as in the tutorial <<Save dataframe without column names>>

write.table(products_df, file = "noAcer.csv", col.names = FALSE, row.names = FALSE, sep = ",")
products <- read.transactions("noAcer.csv", sep =",", format("basket"),  rm.duplicates = TRUE)
## distribution of transactions with duplicates:
## items
##   1   2 
## 191  10
summary(products)
## transactions as itemMatrix in sparse format with
##  9835 rows (elements/itemsets/transactions) and
##  124 columns (items) and a density of 0.03467701 
## 
## most frequent items:
##                     iMac                HP Laptop CYBERPOWER Gamer Desktop 
##                     2519                     1909                     1809 
##            Apple Earpods        Apple MacBook Air                  (Other) 
##                     1715                     1530                    32808 
## 
## element (itemset/transaction) length distribution:
## sizes
##    0    1    2    3    4    5    6    7    8    9   10   11   12   13   14 
##   17 2196 1659 1320 1015  854  648  539  427  340  231  161  112   77   67 
##   15   16   17   18   19   20   21   22   23   25   26   27   28   29 
##   54   37   21   17   13    8    8    5    2    1    1    3    1    1 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     0.0     2.0     3.0     4.3     6.0    29.0 
## 
## includes extended item information - examples:
##                             labels
## 1 1TB Portable External Hard Drive
## 2 2TB Portable External Hard Drive
## 3                   3-Button Mouse
head(colnames(products), 10)
##  [1] "1TB Portable External Hard Drive" "2TB Portable External Hard Drive"
##  [3] "3-Button Mouse"                   "3TB Portable External Hard Drive"
##  [5] "5TB Desktop Hard Drive"           "Acer Desktop"                    
##  [7] "Acer Monitor"                     "Ailihen Stereo Headphones"       
##  [9] "Alienware Laptop"                 "AOC Monitor"
# Now the Acer Spire has been deleted from the transactions and we can crate new rules

4 Conclusion


In this tutorial we have learnt how to remove one item from transactional data in order to study rules without its interaction.