A Predictive Model of Health Care Professionals' Yelp Scores

NJB
November 22, 2015

Motivation

A model was constructed to predict the Yelp star ratings of approximately 2,500 healthcare vendors. This will help:

  • explore available information on medical services
  • gauge usefulness of info in explaining ratings
  • understand edge uses for Yelp

Candidate Predictors

“By Appointment only”, “Extended Hours”, & 6 reflecting the presence of select topics in the text of reviews.

plot of chunk unnamed-chunk-2

Model Performance

Random forest model accuracy 42.6%, sensitivity 92.9%, specificity 9.9% on testing data.

plot of chunk unnamed-chunk-4

Conclusion

  • Poor accuracy because none of the individual predictors had a strong association with the outcome variable.
  • Many candidate predictors eliminated prior to model-building for being sparse or invariant – medical services are an immature growth area for Yelp.
  • Text mining holds promise because topical patterns exist and fields would be well-populated. Requires better effort than I gave it here.