2022-10-23

Introduction

Occasionally, it is of interest to predict the next word in a sentence. Several algorithms can accomplish this task, but choosing the right model can be challenging depending on computational resources and training data. In this presentation shows a shiny application that predicts the next word of a sentence using a model created by the stupid backoff algorithm (yes, that’s the name).

The following model and application were developed to complete the Johns Hopkins University Data Science specialization on Coursera.

The model

The model takes the input string and uses stupid backoff algorithm (Brants et al. 2007) with a pre-computed n-gram frequency table with its maximum n-gram is 3-gram to produce top 3 suggestions.

Example, if I were to input “I wish you a very happy”, it would predict “birthday”. This is because higher points are awarded to an n-gram if it occurs at a higher frequency than other n-grams of the same order. If not enough n-grams are found, it backs off (hence the name) to the table for the last n-1 words.

Features

Speed: The probabilities were all previously computed and are loaded before execution. The app searches through thousands of words down the tables to instantly recover the most likely next word.

Safety: Profanities and bleeped words (e.g. ’f***’ and ‘f#@%’) were also previously removed from the tables therefore they won’t be suggested.

Coherence: Stopwords were left in, as they are present in normal language and could be the expected next input from a user.

The application

The goal of this project is to create an application that highlights the built prediction model by providing a user-friendly interface.

The working app is available here.

Thanks for reading :)