P.Wang
December 10, 2014
Using natural langurage processing techniques, this application performs text mining and next word prediction given a phrase is enetered.
Data Source: This app examines the three sets of writing samples as the following: US Twitter: ~ 2.36 M tweets; US Blogs: ~ 0.9 M blogs; and US News: ~ 1 M news.
Data Processing: Data from the twitter, blogs and news are processed to create 3-, 4-, and 5-gram models. And the data are preprocessed with the steps to remove numbers, punctuations, whitespace, profanity, and changed to lowercase etc, to clean the data.