Next Word Predictor Application

Pawan Mishra
11th March 2018

Objective

Precursors to building the prediction Model

  1. Data acquisition and cleaning
    • For comutational efficiency, here we have utilized only 1% of the english language corpora data
  2. We create ngrams (unigrams, bigrams, trigrams and tetragrams) from the processed data

The Prediction Algorithm

Case1: if user enters only 1 word

  • Check if the entered word appears in the training data
    • If the word doesn't exist, suggest top 3 most frequently occuring words, from the Unigram table
    • If the word does exist, suggest top 3 most frequently occuring next words from Bigram table.

Case2: if user enters 2 words

  • Check if the entered words appear in the training data (Unigram table)

    • If the second / last word doesnt exist in Unigram table, we suggest the top 3 most frequently occuring words from the Unigram table.
    • If the first word doesnt exist in the training data but the second does, we proceed as if the user has entered only 1 word (the second one), and predict next word as described in case 1.

The Algorithm cont.

  • If both words exist in the training data, we check if they ever appear together in the entered order, i.e. we check if the bigram formed by these words exists.

    • If the bigram does exist, we look for the most frequently occuring next word from the Trigram table
    • If the bigram doesn't exist, we proceed as if the user has entered only 1 word (the second one), and predict the next word as described in case 1.

case 3: if user enters 3 words

  • We extrapolate the process mentioned above to predict the 4th word

case 4: if user enters more than 3 words

  • We predict next word considering only the last 3 input words

Application Preview

Application Link: https://pawnypro.shinyapps.io/NextWordPredictor/

Initial Screen

Alt text

Screen with prediction

Alt text