Coursera Data Science Capstone: Course Project

Sriharsha
December 30, 2016

If you haven't tried out the app, go here to try it!

Predicts next word as the user types a sentence
Similar to the way most smart phone keyboards are implemented today using the technology of Swiftkey

Instructions

A subset of the original data was sampled from the three sources (blogs,twitter and news) which is then merged into one.
Next, data cleaning is done by conversion to lowercase, strip white space, and removing punctuation and numbers.
The corresponding n-grams are then created (Quadgram,Trigram and Bigram).
Next, the term-count tables are extracted from the N-Grams and sorted according to the frequency in descending order.
Lastly, the n-gram objects are saved as R-Compressed files (.RData files).

Checks for n=3 Trigram highest-order, n-gram is seen, if not then it check for n=2, Bigram n-gram is seen. if not it checks for n=1 Unigram is avaliable, The process starts with the highest order N-gram and goes lower in the order.

The code is available on GitHub Further improvement of this approch is planned, predicting the entire sentences etc.