When you enter text on your smartphone, it offers you several alternatives
for the next word you might want to type.
Your keyboard app compares the words you have written
with those in a text database, and searches for the next most likely word.
This is called an “n-gram
model,” where n refers to the number of words
used to predict the next one.
We have built an n-gram model as an n-th order Markov chain,
a process that undergoes transitions from one state to another
with a given probability.
The Shiny App
Shiny app
hosted on the Shiny Apps server.
Clean and simple interface:
Type some words
Hit Predict button
Get a prediction for the next word
The Model
Our model is an \( n \)-th order Markov chain, as suggested by
Ching et al. (2004).
The state of the system at time \( t \) depends on the \( n \) previous states,
\( X_{t} = \sum_{k=1}^{n} \lambda_{k} X_{t-k} Q_{k} \).
Here, \( Q_{k} \) is a \( k \)-th order transition matrix,
and \( \lambda_{k} \) are (positive) coefficients subject to
\( \lambda_{1} + \cdots + \lambda_{n} = 1 \).
The \( \lambda \)-coefficients are computed by minimizing
\( \left\lVert \sum_{k=1}^{n} \lambda_{k} \hat{X} Q_{k} - \hat{X} \right\rVert \),
where \( \hat{X} \) is the stationary distribution.
Accuracy
Testing against real texts yields 12% accuracy.
Model size (25,000 words, 2nd order Markov chain)
was limited by available computing power.
A larger model would likely perform better,
although size may be limited by speed concerns.
Instead of relying on a “Predict” button,
predict-as-you-type, SwiftKey-style,
should be faster and more accurate.
A future model must be able to learn from the user.