Word Predictor App

Algorithm

Bigrams, trigrams, fourgrams and fivegrams were made
They were combined into one dataframe
Data.table package was used
Ngrams having count less than 5 were discarded

Algorithm

The next word was predicted using Stupid Backoff
Lambda value was taken to be 0.4
For out of vocabulary words, a default prediction was made
Top 5 words with scores are displayed in the app
Prediction takes around 1 second

Further Improvements

Only about 5% of the data could be used due to RAM constraints
The speed could be improved by precomputing scores as well
Sixgrams or higher could be considered
Kneser-Key smoothing or other algorithms could be used

                 *THANK YOU*