The prediction of the next word in a sentence

Rutger Lakin
December 9, 2016

Introduction

This application tries to predict the next word in a sentence. It is written in R and uses a corpora provided by SwiftKey. It performs very quick: predicted words are given within 150ms and can be used on your mobile keyboard application.

In total 15k lines from blogs, 20k lines from news and 15k tweets has been used.

Algorithm

Katz's Backoff Model is used to predict the next word.

  • First, the model with the longest history is used
  • If no match is seen, than a model with a shorter history is used (backoff)

The backoff model has been simplified by not calculating the (discounted) probabilities, as the true probabilities are not needed to give the maximum likely word.

Shiny Application Development

The application has been written for Shiny in R. The application is divided into a frontend and backend.

  • The frontend shows a text input field where the user can fill in a sentence. If the “Predict Next Word” button is pressed, the sentence is sent to the backend which returns the word predicted. The frontend ahs been kept simply and therefore very mobile-friendly.

  • The backend uses the Katz's Backoff Model to search for the predicted word. A n-gram of 2 and 3 is used. The matched word with highest probability is returned in bold together with the sentence prepended. If no match is found, the backend will return the sentence without a bold predicted word.

Shiny Application

An application has been made in Shiny and is accessible using this link.

Screenshot of application

Fill in a sentence in the text input field and press the button “Predict Next Word” to get the next word prediction. The predicted word will be bold. If no bold word is shown, no prediction is available.

Please note that it is possible that the application needs a startup time of 1 minute.