Can we find out how many stars a customer is likely to give a local business in Las Vegas?

Jigme Norbu
22 November, 2015

The Big Question!

Can we predict the number of stars a business is likely to get from a user on Yelp based on the business characteristics and user characteristics?

Targeted Question: How about the local businesses in Las Vegas? .

  • What predictors do I choose?

  • which model is efficient?

  • how should I split my data into training and validation set?

Methodology

  • the outcome variable is a catagorical variable (stars from 1 to 5)
  • trainControl() funtion from the caret package to use the cross validation method
  • prediction with trees (rpart) model which is a method of classification
inTrain <- createDataPartition(Yelp_10_clean_df_v4$stars, p=0.80, list=F)
training <- Yelp_10_clean_df_v4[inTrain,]
testing <- Yelp_10_clean_df_v4[-inTrain,]

control =trainControl(method="cv", number=10, p=0.8)

mfit <- train(stars~., method="rpart", preProc=c("center","scale"), trControl=control, data = training)
mfit

Dendogram Plot

alt text

Results and Discussion

  • Users are most likely to avoid giving 2 or 3 stars and choose between 1, 4 and 5.

  • Accuracy rate is just 0.447:

it means that my machine learning method predicts the correct outcome only 44.7% of the time when predicting how many stars a user is likely to give a local business in Las Vegas.