Next Word Predictor

Antoine VILLATTE
Aug 28th, 2020

The Project

The purpose of this project is to build a Shiny app that takes incomplete sentences as an input and predicts the following word.

To do that, we were given three datasets :

  • One containing text taken from blogs,
  • One from news articles,
  • And one from twitter

These sets were provided by SwiftKey and were availables in several languages. For now, this app was build only on the english sets.

You can access my app by clicking <here>

The Model

The model I chose here for predictions is Katz' backoff model. Its main advantage is that it does not only predict only based on the matches of the highest ngrams orders, but backs off to lower orders to suggest multiple predictions, taking into account possibles patterns not found in the data set. You can see on the right side a quick diagram explaining how the model works.

Katz' model is often used along with a smoothing method, and I chose Good-Turing smoothing.

If you want more details, I wrote a complete description of how I built my model <here>

The App

The app's layout is simple and self explanatory :

We ask the user to write an input. When they click on “Predict !”, the computation is made and up to 5 predictions are returned.

Features and Computing

The majority of the computation has already been made beforehand. The frequency matrices that the model uses have already been created and uploaded to a GitHub repository. Therefore, the app simply has to load them, which is a download of around 80MB, and does minimal computation after that, which greatly improves its speed.

Aside from the predictor's panel, I added 3 panels that redirect to :

  • My detailed descritpion of how I built the app
  • My Shiny UI code
  • My Shiny server code

Thank you for reading !