SwiftKey Prediction - John Hopkins Data Science Capstone

SwiftKey Prediction Summary

The SwiftKey prediction app is the John Hopkins Data Science Capstone project. When used in general text input scenarios, such as mobile keyboards, search engines, or social media chat, it can quickly come up with the next word. Source data link.

When the demo app loads, it will present a text prompt where you can type or input a partial phrase. Pause after entering at least 4 words, the app will predict the next word, and display it below the text prompt. Link to the app: SwiftKey Prediction

Why You Should Invest in the App

  • Quickly inputting text and reducing input errors is vital in today’s world.
  • Spend more time on organizing ideas and thoughts.
  • Quick response times when interacting in the app (3-5 secs).
  • Short model training times (<1 hr).
  • Fits into free Shiny App (1 GB memory limited).

Training the Prediction Model

  • The prediction algorithm relies on a trained model of a set of tokens or ngrams, an associated set of predicted words, and associated ranked estimates.
  • Ceate sets of ngrams where the predicted word is the last word.
  • Using word count or frequency and ranking the word frequencies, can give an estimate used to predict the next word.
  • After the training data is created, packaged, and imported into the app, the app is ready to use.

Using the Prediction Algorithm

  • Tokenize text input and create ngrams like training data.
  • Use Stupid Backoff algorithm to search the training data, based on partial ngram length.
  • If a matching ngam is found, return the word with the highest estimate.
  • If a match cannot be found, reduce ngram length by 1, and go to next iteration.
  • Finally, if no match is found, return the most common word as the predicted word.