rtaph
April 26, 2015
Many people use predictive text everyday.
Whether it is a Google search or an SMS to a friend, the prevalence of text prediction is growing.
The app in this presentation is written in R. It can be deployed either as an API or through a graphical interface.
Given an input string, it predicts the most likely next word.

We determine the the likelihood of a given word by that which preceeds it.
\[ P_{interpolated}(red\ |\ are) = <.001 \]
\[ P_{interpolated}(red\ |\ roses\ are) = 0.64 \]
Above, the unigram “red” is not likely to follow the word “are” (less than once in a thousand). However, if you have the context “roses are,” it is more likely to trigger the prediction “red”.
By interpolating values from 1-, 2-, 3-, and 4-grams, we can use markov's chain rule to estimate probabilities. A half million entries of text are relied upon for this.
Instructions: Allow approximately 45 seconds for the app to load. You can then enter text in the input box. The main panel will display the main prediction in red, with alternates if any are found. A tickbox allows you to filter out profanity.
The algorithm should produce an accurate result 22% of the time.