This work, analized statistically the words on 1000 reviews done by YELPERs in Pittsburgh and 1000 reviews done by YELPERs in Edinburgh, the opinions were split by their rating, 1/2 stars and 4/5 stars, based on the results was possible to identify the most frequent words in the reviews and also extract valuable information from the needs and behaviors of the YELPERs in these two cities.
The packages utilized were:
TEXT MINING: which involves the process of structuring the input text (usually parsing, along with the addition or removal of some derived linguistic features), deriving patterns within the structured data, and finally evaluation and interpretation of the output.
RANDOM FOREST: Improves the accuracy of the outcomes by generating a large number of trees, classifying and combining the results across all of the trees.
CTree: A non-parametric class of regression trees embedding tree-structured regression models into a well defined theory of conditional inference procedures. It is applicable to all kinds of regression problems.