final_presentation

Martin Polacek
11/22/2015

Bad business features analysis as the way to predict and improve

  • Can one learn from bad reviews (lets say 3 stars and less) something substantial about the business and possibly about the future “shape” of that business
  • The knowlege might be of an great interset for predicting the business dynamics in some given region and also for knowing what features should be improved.
  • I suggested the future predictions of bad features together with examination of various models for time series predictions

Methodology

  • As the proof of the concept case I picked the city of Las Vegas, the business type called “Cafes”
  • I used the data obtained from the YELP data challenge
  • To sum up the procedure:
  • (1) I subset the all (cleaned) data by the city of Las Vegas, by the “Cafes” business type, the year, the number of stars (picked to be less or equal to 3)
  • (2) On the above subset data I applied the sentiment analysis together with the wordtype tagging functions
  • (3) I looked for the nouns in vicinity of negative words
  • (4) I summed up the occurence of some particular noun in negative words vicinity in a before given subset, that gave me the measure of how bad the noun is
  • (5) I predicted the resulting time series for bad features

Methodology and results

  • the result for points (1) to (4) are given in the following table:
            food service experience price place menu burger
2006         0.0     0.0        0.0   0.0   0.0  0.0    0.0
2007         2.0     4.0        0.0   0.0   0.0  0.0    0.0
2008         4.0     6.0        2.0   0.0   0.0  0.0    0.0
2009         5.0    14.0        1.0   0.0   3.0  1.0    0.0
2010         9.0    19.0        0.0   3.0   7.0  3.0    0.0
2011        14.0    12.0        2.0   2.0   5.0  1.0    3.0
2012        22.0    35.0        5.0   2.0   3.0  1.0    3.0
2013        14.0    39.0       10.0   7.0  12.0  6.0    0.0
2014        38.0    74.0        9.0   2.0   2.0  6.0    0.0
2015_ARIMA  24.0    80.0        9.0   6.0   8.0  4.0    1.0
RMSE_ARIMA   5.8     9.9        1.7   1.5   2.9  1.5    0.9
2015_nnetar 25.0   552.0        7.0   3.0   6.0  5.0    1.0
RMSE_nnetar  6.6     8.9        1.0   1.8   2.9  1.8    1.2

Methodology and results

  • To predict the time series I used two statistical methods from package forecast: (1) method nnetar() that is the feed-forward neural network algorithm and (2) method Arima() that is univariate time series prediction method which uses the Hyndman and Khandakar algorithm.
  • the predictions for the most significant bad features: plot of chunk unnamed-chunk-2

Conclusion

  • I asked whether one can learn what was wrong with some business in some city in a given industry type in a given year. I answered this question and the answer is summarized in the table given previously.
  • I learned the measure of how bad the given issue was and its past time development
  • That allowed me to predict values of those bad features into nearby future (year or two years forward, see last set of graphs
  • The predictions are reasonable. An example is given in the last plot. For example complains about service seemed to steadily raise and the model predicts its raise. But the food complaint had some more significant fluctuations in the past. The model predicts some fluctuations as well, with the overall raise. The normalised predictions are also considered in the work but not shown here because of the constraint presentation length