SwiftKey Word Predictor

Vlad Pascal
October 2016

WHAT?

  • Shiny web application that predicts next word.
  • A user enters the phrase into the search box and presses the submit button.
  • Input: The application takes a string as an input.
  • Output: Predicts next word given the relative frequency and probability given some history

WHAT?

  • Enter the phrase and hit the summit button.

HOW?

  • Using corpus created by Swiftkey, which includes news, blog and twitter posts.
  • Draw a sample from each category and combine them. Clean the data by removing invalid characters.
  • Using N-gram model, split the data into trigrams, bigrams and unigrams. Compute relative frequency at each step.
  • Break user input into ngrams and look it up in the N-grams.
  • If N-gram has zero counts, “back off” to a lower-level N-gram i.e. use trigram if there's a match, otherwise proceed to a bigram, otherwise proceed to the unigram.

DETAILS:

  • The sample is drawn to minimize the size of the application (approx. 3.1 million of trigrams and bigrams).
  • Lower frequency ngrams were removed from the sample.
  • All the data were pre-processed outside of the app.
  • The algorithm doesn't use interpolation (mixing of probabilities), smoothing and/or discounting.
  • A variation of stupid backoff based on relative frequency.

WHY AND WHERE?

  • The app is easy to use and navigate.
  • Powerful yet computationally manageable method without utilizing “expensive” algorithm.
  • The model can be easily deployed and scaled up on other platforms.

Location:

Click Here

or copy and paste the following link:

https://vpascal.shinyapps.io/word_predictor/