Word Prediction App with Stupid Backoff

P. Fleer
September 5, 2017

Project Idea and Goals

Idea SwiftKey, the corporate partner in the capstone of Johns Hopkins University's Data Science Specialization, aims at making it easier for people to type on their mobile devices. One cornerstone of their technology is predictive text models. The idea of the capstone project is to build a shiny App that emulates this technique. Starting point is a large corpus of text documents from blogs, news and twitter sources provided by Swiftkey.

Goals

  • Build a predictive text App that optimizes preformance and accuracy
  • Explore a new data type and implement a useful model in a reasonable period of time
  • Apply the entire data analysis pipline in a text mining context

User Interface: Elements and Use

alt text

Prediction Model and Developments

Data base: Uni-, bi- and trigram tables

Stupid backoff model for prediction:

  1. Extracts uni- or bigram from text input as base for predicting the following word >
  2. If text input > 2 words, matches base bigram with trigram table and calculates highest scores of third words >
  3. If no matches found, backs off to bigram table and calculates highest scores of second words >
  4. If still no matches found, backs off to unigram table and calculates the three top words with highest likelyhood

Developments: The App could be developed in different directions. In particular, we could think of feeding back the choices made by users into the n-gram tables, which would enable the App to learn from these choices and enhance predictive quality.

References and Thanks

References (Selection)

Thanks

Particular thanks to Michael Szczepaniak, Len Greski and Fiona Elisabeth Young who provided much valuable advise.

Hope you'll like the App! (Try building the longest meaningful word sequence just by clicking on the word buttons, maybe choosing different starting words.)