Best Next Word Algorithm & App

Carlos I. PatiƱo
January 24, 2016

The Text Prediction Problem

The objective of the project is to develop and implement a text prediction algorithm.

  • In most applications (mobile), users need to type fast and easy
  • Therefore, most keyboards today either give the user the three best next words to be typed in –and the user can just select the best option–, or automatically fill in the output text with the single best prediction
  • The objective of this app (web-based) is to provide the user with the next word to be typed in –automatically typed in the output text– based on the last words typed by the user.

The Algorithm

We use an n-gram model to come up with the best next word.

  1. 1, 2, 3 and 4 n-gram tables are used for making predictions.
  2. Only n-grams that have fequency higher or equal to 2 are kept in the model.
  3. For a string of text that is input into the predictor the prediction algorithm performs a search on each n-gram table, starting with the 4-gram table.
  4. From the input text, the last three terms are obtained and searched in the 4-gram table. If there is a match, the output is the best next word given the previous terms.
  5. If there is no match, the search continues in the 3-gram table using the last two words from the input, and so on. If no match is found, the prediction is then the most common one-gram.

3-gram Distribution from Sample Data

plot of chunk unnamed-chunk-1

The Best-Next-Word App

Click here to explore the app

Just paste (or type) the text to be used as input. The app will automatically fill it in with the best next word. Use the other three tabs to explore sample histograms from the data used to build the prediction model.

plot of chunk unnamed-chunk-2