2023-04-19

Introduction

  • Around the world, people spend an exorbitant amount of time on their mobile devices for email, social networking, banking, and a whole range of other activities; however, typing on mobile devices can be difficult.

  • SwiftKey, the corporate partner for this capstone course, builds smart keyboards that make it easier for people to type on their mobile devices. One cornerstone of their smart keyboard is predictive text models, where the keyboard presents three options for the next word. This project’s objective is to develop and build a predictive text model like those used by SwiftKey and incorporate it into a web app.

  • The presented web app incorporates Katz’s backoff model for text prediction. A backoff model is a generative n-gram language model that estimates the conditional probability of a word given its history in an n-gram. It accomplishes this estimation by backing off through progressively shorter history models under certain conditions (i.e., start with a trigram probability and then back off to a bigram or unigram probability based on data availability).

Back off model

  • The specific backoff model1 chosen for this project was the Stupid Backoff or SBO model. The SBO model does not generate normalized probabilities but rather relative frequencies.

  • SBO is inexpensive, from a resource standpoint, method that can easily be performed in a distributed environment while approaching the quality of Kneser-Ney smoothing for large amounts of data.

  • The lack of normalization does not affect the functioning of the language model and depends on relative rather than absolute feature-function values.

  • The predictive model we developed involved 75% of the provided Swiftkey dataset and using an n-gram = 2. N-grams lengths 3 and 4 were tested but they did not improve predictive accuracy enough to justify the additional computation resources required to run these models.

    1. Brants, T., Popat, A. C., Xu, P., Och, F. J., & Dean J. (2007). “Large language models in machine translation.”

Web app screenshot

Web app instructions

  • The left sidebar describes the problem and my solution.

  • The top right contains a text input box where a user will enter 1 to 4 words to be used in the prediction models to predict the next 1, 2, or 3 words.

    - Immediately underneath the text input box are two action buttons. 
    - The first button START initiates the modelling process. 
    - The second button REFRESH will refresh the web app and  clear the 
      text put box and the predicted words. 
  • The bottom right location lists the word predicted by the SBO mode directly under “predicted word.”