Word Predictor App

Algorithm

  • Bigrams, trigrams, fourgrams and fivegrams were made
  • They were combined into one dataframe
  • Data.table package was used
  • Ngrams having count less than 5 were discarded

Algorithm

  • The next word was predicted using Stupid Backoff
  • Lambda value was taken to be 0.4
  • For out of vocabulary words, a default prediction was made
  • Top 5 words with scores are displayed in the app
  • Prediction takes around 1 second

Further Improvements

  • Only about 5% of the data could be used due to RAM constraints
  • The speed could be improved by precomputing scores as well
  • Sixgrams or higher could be considered
  • Kneser-Key smoothing or other algorithms could be used
                 *THANK YOU*