John Slough
November 20, 2015
Capstone Project for Coursera Data Science Specialization
The data comes from the Yelp Dataset Challenge
The task: Identify a question or problem that you are interested in addressing with the data.
The question: Can we predict the sentiment of restaurant reviews (positive or negative) from the words in the text?
This kind of analysis has wide-reaching applications in multiple domains.
Natural Language Processing methods
The answer: Yes, we can.
Going further: Can we predict the number of stars of restaurant reviews (1-5) from the words in the text?
Natural Language Processing methods
The answer: Yes, we can, but not as well.
Emoticon Analysis
Coefficients for sentiment analysis logistic regression model:
For more information on this part of the analysis check out the R-code and the IPython notebook.
Also be sure to check out the full report.
Conclusion
For a more detailed look at the code and analysis go to the Github Repository. And check out the most frequent n-grams.