R Gehring
Problem: What is the next word?
Data: Sample from twitter, blogs, and news articles
Solution: Using natural language processing and data supplied, predict the third word in a sequence of words.
Before any modeling, I cleaned the data using R's tm package.
I created a milestone report, in which I looked at word clouds, word associations, n-gram (2,3,and 4)
I tried the model with and without stopwords and it produces more meaningful results without the inclusion of stopwords.
Given two words, my model predicts the third word. The accuracy of model fluctuates from 50-70% given different samples.
How does the model work?
I typed in “leaders around” and got a prediction of “country”.
In order to test drive the model, please enter two words (no stopwords please) and you will get a third word as prediction.