Jigme Norbu
22 November, 2015
Can we predict the number of stars a business is likely to get from a user on Yelp based on the business characteristics and user characteristics?
Targeted Question: How about the local businesses in Las Vegas? .
What predictors do I choose?
which model is efficient?
how should I split my data into training and validation set?
inTrain <- createDataPartition(Yelp_10_clean_df_v4$stars, p=0.80, list=F)
training <- Yelp_10_clean_df_v4[inTrain,]
testing <- Yelp_10_clean_df_v4[-inTrain,]
control =trainControl(method="cv", number=10, p=0.8)
mfit <- train(stars~., method="rpart", preProc=c("center","scale"), trControl=control, data = training)
mfit
Users are most likely to avoid giving 2 or 3 stars and choose between 1, 4 and 5.
Accuracy rate is just 0.447:
it means that my machine learning method predicts the correct outcome only 44.7% of the time when predicting how many stars a user is likely to give a local business in Las Vegas.