Text Predictor App

author: Erika Garcia-Boliou
date: 30 Sep 2016

My Text Predictor App -
(Click here to go to it)

Texting can be laborious and frustrating. This app was designed to help streamline texting by predicting the next word you would most likely use.
My app takes an input string and predicts the next likely choices (up to 5). Reading from left to right, the choices are ranked from most likely to least. When you hover, the buttons will enlarge with a colored shadow. When you click on the word you want, the phrase it creates populates at the bottom of the screen. A screenshot of the app is available in the last slide

Limitations:
Symbols and therefore contractions can not be used. It is also a little slow for the first search as the files are loading during that first search.

Katz Back-off Algorithm with Good-Turing Smoothing

Algorithm Explanation

Phrases are broken up into their separate words (tokens) and the probability of a word showing up after n words preceeding it is calculated. (Katz Back-off)

For example:

If our search phrase is “how are you” we would calculate the probably of words that usually follow this phrase (“doing”, “today”, etc.) We can label this ratio \( "X_4" \).

Calculate a residual probability (Good-Turing Smoothing)

We do this by using a subset of our phrase. This subset becomes “are you”. The probability of “doing” or “today” following this phrase is counted. We will call this ratio \( "X_3" \). The entire value can not be added to our first probability. This would create an inflation of the overall liklihood of “doing” or “today” appearing. We included it with a lambda factor of 0.4 which is a common factor to use for Good-Turing.

With n representing the number of words in the phrase, the generalized equation is as follows:
\[ Final Ratio = X_n * (1+\lambda(X_{n-1})) * (1+\lambda(X_{n-2})) ... * (1+\lambda(X_2)) \]
The final equation for this example is:
\[ Final Ratio = X_4 * (1+0.4(X_3)) \]