Background

In this paper I will present Basket analysis process on data Webscrabed from portal Otomoto.pl. (please find the process in my other paper)

Data

The basket analysis will be performed on the data composed by 3 factors: mark, age and the location of the cars that are for sale.

library(arules)
library (arulesViz)
as<-data[c("mark","location","age")]
#create factor variables
as$mark<- as.factor(as$mark)
as$location<- as.factor(as$location)
as$age<- as.factor(as$age)

For the basket analysis we need to transform the data into ‘transaction’ form. We can perform it by:

split(as[,1], as[,2], as[,3])
trans <- as(as, "transactions")

inspect(trans)

which gives us the result:

1860 transactions with 54 columns

example:
[495] {mark=Citroën,location=Zachodniopomorskie,age=9} 495
[496] {mark=Peugeot,location=Wielkopolskie,age=10} 496
[497] {mark=Peugeot,location=Zachodniopomorskie,age=10} 497

Analysis

Apriori analysis is done with usage of the function apriori() and we retreived 11 rules

trans_rules<-apriori(trans, parameter = list(supp=.002))
inspect(trans_rules[1:11])

# lhs rhs support confidence lift count
[1] {mark=Saab} => {age=10} 0.002143623 1.0 2.677188 4
[2] {mark=Subaru} => {age=10} 0.002143623 0.8 2.141750 4
[3] {mark=Dodge} => {age=10} 0.002143623 0.8 2.141750 4
[4] {mark=Mini} => {age=10} 0.002679528 1.0 2.677188 5
[5] {mark=Audi} => {age=9} 0.002143623 0.8 2.985600 4
[6] {mark=BMW,location=Pomorskie} => {age=10} 0.002143623 1.0 2.677188 4
[7] {mark=Volvo,location=Dolnośląskie} => {age=10} 0.002143623 1.0 2.677188 4
[8] {mark=Mazda,location=Świętokrzyskie} => {age=10} 0.002143623 1.0 2.677188 4
[9] {mark=Hyundai,location=Zachodniopomorskie} => {age=10} 0.002143623 0.8 2.141750 4
[10] {mark=Seat,location=Kujawsko-pomorskie} => {age=9} 0.002143623 1.0 3.732000 4
[11] {mark=Peugeot,location=Kujawsko-pomorskie} => {age=9} 0.002143623 0.8 2.985600 4

lhs (Left hand side) - this is our basket that emphasize to ‘choose’ rhs rhs (RIght hand side) lift value - when is over 1.0 it means is a good form to consider
In the following example, we can expect that if the mark of the car is Saab that means with the coifidence level = 1.0 that age of the car is 10 years. The first rows are probably the result of the small number of observations in our dataset(less than 6).

to check whether rules are unique we can call function:

####redundant in rules
redundant_rules<-is.redundant(trans_rules)
summary(redundant_rules)

Plots

By using plots we can easily show how components interact with each other

topRules<-trans_rules[1:10]
plot(topRules, method="graph")

Above graph shows top 10 rules. In the centre there is the value {age=10} which was the mostly chosen in our rules set as the right hand side. Around there are the values that occurs most frequently with that value.

The other method of visualising the interactions.

plot(topRules, method = "grouped")

We can also choose one of the component of the transaction and check with other components interacts with it. Its can be done by the following function:

#taking subsets of association rules that the age of the car will be 9 years old
ten_rules <-subset(trans_rules, items %in% "age=10")
inspect(ten_rules)

plot(ten_rules, method="graph", measure="lift",shading="confidence")

As we see the graph is very similar to the previous one, because of the condition {age=10}.

Associations and Basket Analysis on data from Otomoto.pl

Piotr Borowski

31 January 2019

Background

Data

Analysis

Plots