As part of the final project for the Coursera Data Science Capstone, I worked on text mining and word prediction.
The text mining data is based on english news, twitter, and blog files.
The text data is then tokenized and checked for profanity words, and punctuations removed.
Then the n-gram model is built to compute frequency tables with given words or phrases.