Next Word Predictor

Angela
26 May, 2019

The Project

The Next Word Predictor is a shiny web application that takes in input text and predicts hte most likely next word based on a corpora of the English language.

Applications include:

  • word suggestions features for smartphones
  • writing help
  • teaching tool

The Prediction Model

The model is based on the frequency of n-grams in a corpora.

For example, if the input text has five or more words, a 5-gram is used to predict the next word. If the input is not in the 5-gram, the algorithm checks the 4-gram and so on.

Accuracy and Efficiency

The model is based on an existing corpora of the English language. If a string of n words does not exist in the predictive n-grams, the backoff approach is used such that the (n-1)-gram model is used and so on.

The predictive modeling approach is quite rudimentary as it relies on huge data tables.

Since we probably don't need all those ngrams, the frequency tables were cut down to a maximum number of 500,000 observations, which should still encompass most of the potential input words.

How to Use

The Next Word Predictor application is currently functional on https://angelamhkim.shinyapps.io/NextWordPredictor/ Input text into the text box in the left panel.

Next Word Predictor