Data Science Capstone Project

Paul Reiners
August 10, 2015

Algorithm Description

  • Uses n-gram models for n = 4, 3, 2, 1, combined with
  • Katz's back-off model with k = 0.
  • The data used to build the models is from a corpus called HC Corpora.
  • Most of the real work in creating this application was in data manipulation to create a small, usable model out of the large corpus.

SpeedyKey Description

  • The SpeedyKey app is a 'clone' of SwiftKey.
  • SpeedyKey predicts the next word you will type.
  • Has a prediction accuracy of 26%.
  • Prediction time is pretty much instantaneous.

SpeedyKey Instructions

  • To use the app, simply start typing in the text box.
  • Suggestions for the next word will appear on the buttons above the text box.
  • The middle button contains the primary prediction.
  • Click on a button to append the word to the text.

SpeedyKey Instructions

How SpeedyKey Works

  • The app stores the model data in a 9,836 KB binary file.
  • Loading the model takes less than 2 seconds
  • There is a delicate balance between accuracy and memory usage in prediction web apps that are backed by a large amount of data.
  • Since an app that is very slow can be more annoying than one with slightly less accuracy, I decided to emphasize execution speed.

Try the SpeedyKey app here.