David Manero
22th novemeber 2015
The target of this project is to study if the words a client of a restaurant uses in his or her review can predict the score (in stars) he or she gives to the restaurant.
This project uses varous methods to study the reviews and the words used.
The best algorith founded for the prediction is Random Forest.
The model studied are:
In this study, the use of the word frequency analysis must be used with care, because the most frequently words are common in positive and negative scores, so it must be counted only the unique words (Our bestword analysis).
The study of differents algorithms for the prediction could be done with more models, but the huge of the dataset made it impossible for my computer.
Anyway, the accurance of the final model selected is more than enough.