Swift Key Word Prediction Algorithm

Philipp Brandl
04/27/2015

Objective

The goal of the Capstone Project is to create a shiny app that is able to predict the next word when entering a text.

Features of the App:

  • present three predicted words to the user
  • ability to click on predicted words to enter them into text box
  • display weights of predicted words

Usage

In the first half of the screen is the Input and Output section with the text box to display words either captured through the key strokes or from the word buttons.

Please wait until there are three predicted words are presented before using the application. Otherwise it is not fully loaded.

Interaction:

  • type word into text box
  • click on predicted word to automatically enter it into text box

Algorithm

Two concepts are used for the algorithm

  1. Stupid Backoff Algorithm - To find the probability of word appearing in a sentence it will first look for context for the word at the n-gram level and if there is no n-gram of that size it will recurse to the (n-1)-gram and multiply its score with 0.4. The recursion stops at unigrams.

  2. Spooky.x - The Jenkins hash functions are a collection of (non-cryptographic) hash functions for multi-byte keys designed by Bob Jenkins. They detect identical records in a database and also used as checksums to detect accidental data corruption. Spooky.32 helps to optimize the search time.

Summary

Application:
1. Dynamic and Multiple choices: recommends 3 words
2. Easy to use: Just write and then click the words as they appear

All recommended words are:
1. Whitestripped
2. Lowercased, Punctuation and numbers removed
3. Tokenized into 1,2,3 and 4 grams using RWeka n-gram tokenizer

Shiny App: https://zyrix.shinyapps.io/swift_key/