The goal of the Coursera Data Science Capstone Project, in partnership with Swiftkey, is to develop an R Shiny app that can, given some input text, logically and reliably predict the next word.
This project is an exercise in Natural Language Processing (NLP), or the application of computational techniques to analyze and generate text.
Text prediction is carried out using the stupid backoff algorithm (see “Large Language Models in Translation” by Thorsten Brants). The model compares the input text to a set of n-grams, starting with the highest-order n-grams and “backing off” to lower-order n-grams if a suitable match isn’t found. The relative frequencies of predicted words are compared using the Stupid Backoff score (a higher score indicates higher frequency of occurrence).