Kernlab Support Vector Machine (ksvm) Applied to Yelp Data

Randall Shane
November 22, 2015

Yelp Data Analysis

Is it possible to accurately predict a business' star rating based on that business' associated attributes, categories, review count, and region, coupled with the quality of a review?

Examples of Attributes,Categories, and Review Quality: nw nw2 nw2

Logical ER Diagram

YLDM

Subset: Restaurant/Category/Regions

plot of chunk unnamed-chunk-1

Confusion Matrix for Predictions

X1 X1.5 X2 X2.5 X3 X3.5 X4 X4.5 X5
X1 7 0 0 0 0 0 0 0 0
X1.5 0 39 0 0 0 0 0 0 0
X2 0 1 180 2 0 0 0 0 0
X2.5 0 0 0 432 3 0 1 1 0
X3 0 0 1 12 562 3 0 2 0
X3.5 0 0 2 8 17 557 5 3 0
X4 0 0 0 1 5 12 588 6 1
X4.5 0 0 5 2 1 3 1 592 0
X5 0 0 0 2 0 0 0 0 109
  • As you can see the results are acceptable:
    • Accuracy: 0.9684144
    • Kappa: 0.9621462