If you haven't tried out the app, go here to try it!
Predicts next word as the user types a sentence
Similar to the way most smart phone keyboards are implemented today using the technology of Swiftkey
How To Use the App
Getting & Cleaning the Data
A subset of the original data was sampled from the three sources (blogs,twitter and news) which is then merged into one.
Next, data cleaning is done by conversion to lowercase, strip white space, and removing punctuation and numbers.
The corresponding n-grams are then created (Quadgram,Trigram and Bigram).
Next, the term-count tables are extracted from the N-Grams and sorted according to the frequency in descending order.
Lastly, the n-gram objects are saved as R-Compressed files (.RData files).
Underlying Algorithm
Checks for n=3 Trigram highest-order, n-gram is seen, if not then it check for n=2, Bigram n-gram is seen.
if not it checks for n=1 Unigram is avaliable, The process starts with the highest order N-gram and goes lower in the order.
Further Exploration
The code is available on GitHub
Further improvement of this approch is planned, predicting the entire sentences etc.