1 Goal


The goal of this tutorial is to create a dataframe containing the name of the products from transactions and the number of products sold.


2 Loading the data


# We need to load two libraries to perform this task
library(arules)
## Loading required package: Matrix
## 
## Attaching package: 'arules'
## The following objects are masked from 'package:base':
## 
##     abbreviate, write
library(arulesViz)
## Loading required package: grid
# In this tutorial we are going to use transactions file format.
# The file looks like:
# apple, orange, pear
# apple, pear
# orange
# etc

# First we load the data
products <- read.transactions("transactions.csv", sep =",", format("basket"),  rm.duplicates = TRUE)
## Warning in readLines(file, encoding = encoding): incomplete final line
## found on 'transactions.csv'
## distribution of transactions with duplicates:
## items
##   1   2 
## 191  10
products <- sample(products, 1000)

3 Create dataframe with names of products and volume of purchases


# We can ask the frequency of purchases of every product
head(itemFrequency(products))
## 1TB Portable External Hard Drive 2TB Portable External Hard Drive 
##                            0.004                            0.005 
##                   3-Button Mouse 3TB Portable External Hard Drive 
##                            0.072                            0.004 
##           5TB Desktop Hard Drive                      Acer Aspire 
##                            0.003                            0.085
# If we multiply the frequency by the number of total purchases we get the volume of purchases
head(itemFrequency(products) * nrow(products))
## 1TB Portable External Hard Drive 2TB Portable External Hard Drive 
##                                4                                5 
##                   3-Button Mouse 3TB Portable External Hard Drive 
##                               72                                4 
##           5TB Desktop Hard Drive                      Acer Aspire 
##                                3                               85
# Now we store this information in a data frame
products_df <- as.data.frame(itemFrequency(products) * nrow(products))
colnames(products_df) <- "Volume"
head(products_df)
##                                  Volume
## 1TB Portable External Hard Drive      4
## 2TB Portable External Hard Drive      5
## 3-Button Mouse                       72
## 3TB Portable External Hard Drive      4
## 5TB Desktop Hard Drive                3
## Acer Aspire                          85
# If we want to have the name of the products as a variable we can use the row names
products_df$Product_name <- row.names(products_df)
products_df <- products_df[c(2, 1)]
rownames(products_df) <- NULL
head(products_df)
##                       Product_name Volume
## 1 1TB Portable External Hard Drive      4
## 2 2TB Portable External Hard Drive      5
## 3                   3-Button Mouse     72
## 4 3TB Portable External Hard Drive      4
## 5           5TB Desktop Hard Drive      3
## 6                      Acer Aspire     85

4 Conclusion


In this tutorial we have learnt how to obtain a dataframe with the volume of purchases and the name of the products from the transaction file.