A Predictive Model of Health Care Professionals' Yelp Scores

NJB
November 22, 2015

A model was constructed to predict the Yelp star ratings of approximately 2,500 healthcare vendors. This will help:

“By Appointment only”, “Extended Hours”, & 6 reflecting the presence of select topics in the text of reviews.

plot of chunk unnamed-chunk-2

Random forest model accuracy 42.6%, sensitivity 92.9%, specificity 9.9% on testing data.

plot of chunk unnamed-chunk-4

Poor accuracy because none of the individual predictors had a strong association with the outcome variable.
Many candidate predictors eliminated prior to model-building for being sparse or invariant – medical services are an immature growth area for Yelp.
Text mining holds promise because topical patterns exist and fields would be well-populated. Requires better effort than I gave it here.