Stupid Backoff Word Prediction

Andrew Martinez-Novoa
25/02/2018

Introduction

  • The objective of the Capstone project was to develop an algorithm to predict the next word in a sentence and implement it as a Shiny application

  • The problem was solved with the well-known N-gram model in natural language processing

  • The final project was a Stupid Backoff model using Unigrams, Bigrams and Trigrams

Why Stupid Backoff?

The implementation of a Stupid Backoff model has two key advantages over other models:

  • Inexpensive: It requires few resources compared other models such as the Katz Backoff Model

  • Accurate: It approaches an accuracy similar to Kneser-Ney Smoothing

Algorithm Explained

In the Stupid Backoff model, the backoff factor Alpha is heuristically set to a fixed value (0.4) instead of being computed to reduce complexity.

In our implementation, the N-gram (Unigram, bigram and trigram) data was generated normally. The backoff factor Alpha was applied within the prediction R script.

My Shiny App

  • The Word Prediction app can be found here: https://vradek.shinyapps.io/en_US/

  • It is simple and intuitive to use. Just type in a sentence and click the predict button, your word should then appear