Next word prediction in R

Georgy Makarov
May 26, 2020

Next Word Prediction is a Shiny web app that predicts next word of user keyboard input. It uses 5-gram language model and Stupid Backoff algorithm.The app returns a choice of 5 possible next words. Precision of the model is 13.2% on first word and 21.5% on three words on randomly generated text. It takes 96 msec for the model to make a prediction.

The app is at shiny server

Project background

People spend a lot of time typing messages on mobile devices. Reducing the time required for typing can benefit to productivity or save time for other activities. One way to reduce the time of typing is to predict a next word, so the user does not have to type it in.

This project develops a Shiny app to demonstrate how N-gram language model works to help people reduce the time of typing. The result of this project can be used in smart keyboards and in search engines to autocomplete the sentences.

Project design

The app uses 5-gram language model and Stupid Backoff algorithm to predict next words. It generates a prediction in 2 steps: find at least 5 candidates for next words; rank the candidates.

  • Find candidates: the algorithm takes last 4 words from a line and finds 5-grams that complete those 4 words. If there are less than 5 candidates, the algorithm drops the first word of the 4 words and looks for 4-grams to complete them. This continues until there at least 5 candidates found.

  • Rank candidates: the algorithm ranks the candidates using Stupid backoff with \( \lambda \) = 0.4 - a standard value for this type of model. If there are no candidates, or the input is empty, or the input is unintelligent - then the algorithm returns top-5 words from the 1-gram language model.

Algorithm performance

Benchmark on randomly generated text with benchmark.R tool contains 28445 predictions. On average the app predicts next word in 96 msec. Major performance indicators follow:

indicator value units
Overall top-3 score 17.10 %
Overall top-1 precision 13.20 %
Overall top-3 precision 21.50 %
Average runtime 96.10 msec
Number of predictions 28445.00
Total memory used 210.01 MB

User instructions

To start the app go here. Type in your text in the textbox at the top of the page. Please note that the app accepts English words only.

The app will report a choice of five predictions for the next word in the middle textbox. Most likely predictions come first. Less likely words are at the end of the line. The whole text you have entered will appear at the bottom of the page.

If there is no text input in the text box, the app returns top-5 most frequent words. If input is unintelligent or contains numbers, symbols or non-ascii characters the app returns top-5 most frequent words.