In this paper I will present Basket analysis process on data Webscrabed from portal Otomoto.pl. (please find the process in my other paper)
The basket analysis will be performed on the data composed by 3 factors: mark, age and the location of the cars that are for sale.
library(arules)
library (arulesViz)
as<-data[c("mark","location","age")]
#create factor variables
as$mark<- as.factor(as$mark)
as$location<- as.factor(as$location)
as$age<- as.factor(as$age)
For the basket analysis we need to transform the data into ‘transaction’ form. We can perform it by:
split(as[,1], as[,2], as[,3])
trans <- as(as, "transactions")
inspect(trans)
which gives us the result:
1860 transactions with 54 columns
example: [495] {mark=Citroën,location=Zachodniopomorskie,age=9} 495 [496] {mark=Peugeot,location=Wielkopolskie,age=10} 496 [497] {mark=Peugeot,location=Zachodniopomorskie,age=10} 497
Apriori analysis is done with usage of the function apriori() and we retreived 11 rules
trans_rules<-apriori(trans, parameter = list(supp=.002))
inspect(trans_rules[1:11])
# lhs rhs support confidence lift count [1] {mark=Saab} => {age=10} 0.002143623 1.0 2.677188 4
[2] {mark=Subaru} => {age=10} 0.002143623 0.8 2.141750 4
[3] {mark=Dodge} => {age=10} 0.002143623 0.8 2.141750 4
[4] {mark=Mini} => {age=10} 0.002679528 1.0 2.677188 5
[5] {mark=Audi} => {age=9} 0.002143623 0.8 2.985600 4
[6] {mark=BMW,location=Pomorskie} => {age=10} 0.002143623 1.0 2.677188 4
[7] {mark=Volvo,location=Dolnośląskie} => {age=10} 0.002143623 1.0 2.677188 4
[8] {mark=Mazda,location=Świętokrzyskie} => {age=10} 0.002143623 1.0 2.677188 4
[9] {mark=Hyundai,location=Zachodniopomorskie} => {age=10} 0.002143623 0.8 2.141750 4
[10] {mark=Seat,location=Kujawsko-pomorskie} => {age=9} 0.002143623 1.0 3.732000 4
[11] {mark=Peugeot,location=Kujawsko-pomorskie} => {age=9} 0.002143623 0.8 2.985600 4
lhs (Left hand side) - this is our basket that emphasize to ‘choose’ rhs rhs (RIght hand side) lift value - when is over 1.0 it means is a good form to consider In the following example, we can expect that if the mark of the car is Saab that means with the coifidence level = 1.0 that age of the car is 10 years. The first rows are probably the result of the small number of observations in our dataset(less than 6).
to check whether rules are unique we can call function:
####redundant in rules
redundant_rules<-is.redundant(trans_rules)
summary(redundant_rules)
By using plots we can easily show how components interact with each other
topRules<-trans_rules[1:10]
plot(topRules, method="graph")
Above graph shows top 10 rules. In the centre there is the value {age=10} which was the mostly chosen in our rules set as the right hand side. Around there are the values that occurs most frequently with that value.
The other method of visualising the interactions.
plot(topRules, method = "grouped")
We can also choose one of the component of the transaction and check with other components interacts with it. Its can be done by the following function:
#taking subsets of association rules that the age of the car will be 9 years old
ten_rules <-subset(trans_rules, items %in% "age=10")
inspect(ten_rules)
plot(ten_rules, method="graph", measure="lift",shading="confidence")
As we see the graph is very similar to the previous one, because of the condition {age=10}.