Clustering provides us with similarities in the data but it doesn’t allow for the understanding of causation. To look for relationships between a reviewer’s personal attributes and their ratings market basket analysis is used.
restaurant<- read.csv("C:/DataMining/Data/RestaurantRatersComplete.csv")
resttest <- read.csv("C:/DataMining/Data/RestaurantRatersTest.csv")
rest<-restaurant[,c(-1,-2)]
rest <- rest[,c(-11,-13,-14,-16:-19)]
library(arules)
## Loading required package: Matrix
##
## Attaching package: 'arules'
## The following objects are masked from 'package:base':
##
## abbreviate, write
library(car)
##
## Attaching package: 'car'
## The following object is masked from 'package:arules':
##
## recode
rest[["birth_year"]] <- ordered(cut(rest[["birth_year"]], c(1930,1986,1989,1994)),
labels = c("old", "middle","young"))
rest$rating <- recode(rest$rating,"'0' = 'poor';'1'='okay';'2'='good'")
rest$rating = as.factor(rest$rating)
rest$food_rating = as.factor(rest$food_rating)
rest$service_rating = as.factor(rest$service_rating)
set.seed(1)
rest1 <- as(rest, "transactions")
summary(rest1)
## transactions as itemMatrix in sparse format with
## 4090 rows (elements/itemsets/transactions) and
## 163 columns (items) and a density of 0.09850451
##
## most frequent items:
## marital_status=single activity=student
## 3919 3655
## Upayment=cash dress_preference=informal
## 3352 2651
## ambience=family (Other)
## 2427 49666
##
## element (itemset/transaction) length distribution:
## sizes
## 15 16 17
## 68 3724 298
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 15.00 16.00 16.00 16.06 16.00 17.00
##
## includes extended item information - examples:
## labels variables levels
## 1 smoker smoker TRUE
## 2 drink_level=abstemious drink_level abstemious
## 3 drink_level=casual drinker drink_level casual drinker
##
## includes extended transaction information - examples:
## transactionID
## 1 1
## 2 2
## 3 3
Overall association rules are mined from the data and a summary is run to provide basic descriptive statistics.
aa=as(rest1,"matrix") # transforms transaction matrix into incidence matrix
aa[1] # print the first row of the incidence matrix
## [1] FALSE
rules <- apriori(rest1, parameter = list(maxlen=20 ,support = 0.01, confidence = 0.6))
## Apriori
##
## Parameter specification:
## confidence minval smax arem aval originalSupport maxtime support minlen
## 0.6 0.1 1 none FALSE TRUE 5 0.01 1
## maxlen target ext
## 20 rules FALSE
##
## Algorithmic control:
## filter tree heap memopt load sort verbose
## 0.1 TRUE TRUE FALSE TRUE 2 TRUE
##
## Absolute minimum support count: 40
##
## set item appearances ...[0 item(s)] done [0.00s].
## set transactions ...[163 item(s), 4090 transaction(s)] done [0.00s].
## sorting and recoding items ... [71 item(s)] done [0.00s].
## creating transaction tree ... done [0.00s].
## checking subsets of size 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 done [0.56s].
## writing ... [2949837 rule(s)] done [0.66s].
## creating S4 object ... done [1.70s].
rules
## set of 2949837 rules
summary(rules)
## set of 2949837 rules
##
## rule length distribution (lhs + rhs):sizes
## 1 2 3 4 5 6 7 8 9 10
## 4 557 8055 53676 190038 402239 577412 620995 521183 337817
## 11 12 13 14 15
## 164269 57705 13770 1987 130
##
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.000 7.000 8.000 7.929 9.000 15.000
##
## summary of quality measures:
## support confidence lift count
## Min. :0.01002 Min. :0.6000 Min. : 0.6262 Min. : 41.0
## 1st Qu.:0.01320 1st Qu.:0.9091 1st Qu.: 1.5428 1st Qu.: 54.0
## Median :0.01760 Median :1.0000 Median : 2.0931 Median : 72.0
## Mean :0.05023 Mean :0.9370 Mean : 2.6003 Mean : 205.4
## 3rd Qu.:0.03423 3rd Qu.:1.0000 3rd Qu.: 3.2537 3rd Qu.: 140.0
## Max. :0.95819 Max. :1.0000 Max. :41.9487 Max. :3919.0
##
## mining info:
## data ntransactions support confidence
## rest1 4090 0.01 0.6
The top five rules that predict that a reviewer’s ratings are listed by confidence.
rulesPoorRatings <- subset(rules, subset = rhs %in% "rating=poor" & lift > 1.2)
inspect(sort(rulesPoorRatings, by = "confidence", decreasing = TRUE)[1:5])
## lhs rhs support confidence lift count
## [1] {hijos=kids,
## personality=hunter-ostentatious} => {rating=poor} 0.35256724 1 2.262168 1442
## [2] {hijos=kids,
## budget=low} => {rating=poor} 0.35256724 1 2.262168 1442
## [3] {dress_preference=no preference,
## birth_year=old,
## budget=low} => {rating=poor} 0.01075795 1 2.262168 44
## [4] {ambience=friends,
## birth_year=old,
## budget=low} => {rating=poor} 0.01075795 1 2.262168 44
## [5] {birth_year=old,
## budget=low,
## food_rating=0} => {rating=poor} 0.01075795 1 2.262168 44
rulesOkayRatings <- subset(rules, subset = rhs %in% "rating=okay" & lift > 1.2)
inspect(sort(rulesOkayRatings, by = "confidence", decreasing = TRUE)[1:5])
## lhs rhs support confidence lift count
## [1] {ambience=solitary,
## birth_year=middle,
## food_rating=1} => {rating=okay} 0.02029340 1 3.994141 83
## [2] {birth_year=old,
## food_rating=1,
## service_rating=1,
## Upayment=VISA} => {rating=okay} 0.01002445 1 3.994141 41
## [3] {drink_level=social drinker,
## dress_preference=formal,
## ambience=solitary,
## food_rating=1} => {rating=okay} 0.01711491 1 3.994141 70
## [4] {drink_level=social drinker,
## ambience=solitary,
## food_rating=1,
## service_rating=1} => {rating=okay} 0.01613692 1 3.994141 66
## [5] {drink_level=social drinker,
## ambience=solitary,
## interest=technology,
## food_rating=1} => {rating=okay} 0.01711491 1 3.994141 70
rulesGoodRatings <- subset(rules, subset = rhs %in% "rating=good" & lift > 1.2)
inspect(sort(rulesGoodRatings, by = "confidence", decreasing = TRUE)[1:5])
## lhs rhs support confidence lift count
## [1] {service_rating=2,
## Upayment=MasterCard-Eurocard} => {rating=good} 0.01418093 1 3.251192 58
## [2] {ambience=solitary,
## service_rating=2,
## Upayment=MasterCard-Eurocard} => {rating=good} 0.01320293 1 3.251192 54
## [3] {birth_year=old,
## service_rating=2,
## Upayment=MasterCard-Eurocard} => {rating=good} 0.01418093 1 3.251192 58
## [4] {interest=technology,
## service_rating=2,
## Upayment=MasterCard-Eurocard} => {rating=good} 0.01320293 1 3.251192 54
## [5] {drink_level=abstemious,
## service_rating=2,
## Upayment=MasterCard-Eurocard} => {rating=good} 0.01344743 1 3.251192 55
All of the rules have a confidence of 1 which means that if all of the conditions on the left-hand side have a 100% probability of giving the rating that is listed on the right-hand side. All of the rules have a lift greater than 1 meaning that the occurrence of the conditions on the left increase the likelihood of the rating on the right.
Three of the rules for a poor rating (rating of 0) contain the condition of the user being older. This is the same conclusion found with the cluster analysis above. The 3 sets of rules also show that a user will probably give the rating they gave to the food to the overall rating. The one strange connection that the rules bring up is the form of payment. It wouldn’t seem like the form of payment would have any effect on the rating even if joined with other conditions.
Conclusion
This data is from an internet review site and therefore attracts users with certain characteristics. The reviewers tend to be younger individuals (born after 1985), students, with low/medium budgets and they tend to describe themselves as hunters-ostentatious or thrifty-protector.
library(lattice)
barchart(rest$personality,ylab="Personality",col="black")

table(rest$activity)
##
## ? professional student unemployed working-class
## 25 385 3655 17 8
table(resttest$birth_year)
##
## 1930 1940 1943 1952 1967 1969 1979 1981 1982 1983 1984 1985 1986 1987 1988
## 73 144 5 10 6 20 8 9 40 579 20 54 64 116 1646
## 1989 1990 1991 1992 1993 1994
## 365 262 619 42 4 4
rate.budget.tbl=table(His=rest$budget,Mr=rest$rating)
rate.budget.tbl
## Mr
## His good okay poor
## ? 18 7 13
## high 42 32 12
## low 194 259 1559
## medium 1004 726 224
barchart(rate.budget.tbl,horizontal=FALSE,groups=FALSE,xlab="Budget",col="black")

plot(rating~personality,data=rest)#Reads across the bottom as conformist, hard-worker, hunter-ostentatious, thrifty protector

If we assume that this was an accurate sample of the population of diners at the 130 restaurants then it would be reasonable to make some conclusions on how to receive higher ratings. Restaurants that put a lot of effort into their food, attract older customers, dissuade those with ostentatious personalities, and attract customers with medium to high budgets.