Next Word Predictor Application

Pawan Mishra
11th March 2018

As per https://www.statista.com/statistics/470018/mobile-phone-user-penetration-worldwide/, approx. 66% of the world population today uses a mobile phone.
Objective of the application is to make it easier for people to type on their mobile devices
This application predicts the next word given a user entered phrase, by using predictive text modelling techniques

Precursors to building the prediction Model

Data acquisition and cleaning
- For comutational efficiency, here we have utilized only 1% of the english language corpora data
We create ngrams (unigrams, bigrams, trigrams and tetragrams) from the processed data

Case1: if user enters only 1 word

Check if the entered word appears in the training data
- If the word doesn't exist, suggest top 3 most frequently occuring words, from the Unigram table
- If the word does exist, suggest top 3 most frequently occuring next words from Bigram table.

Case2: if user enters 2 words

Check if the entered words appear in the training data (Unigram table)
- If the second / last word doesnt exist in Unigram table, we suggest the top 3 most frequently occuring words from the Unigram table.
- If the first word doesnt exist in the training data but the second does, we proceed as if the user has entered only 1 word (the second one), and predict next word as described in case 1.

If both words exist in the training data, we check if they ever appear together in the entered order, i.e. we check if the bigram formed by these words exists.
- If the bigram does exist, we look for the most frequently occuring next word from the Trigram table
- If the bigram doesn't exist, we proceed as if the user has entered only 1 word (the second one), and predict the next word as described in case 1.

case 3: if user enters 3 words

case 4: if user enters more than 3 words

Application Link: https://pawnypro.shinyapps.io/NextWordPredictor/

Initial Screen

Screen with prediction

Alt text