Next Word Prediction

Joe Willage (@lustyandlewd)
Jan 2016

Overview and Pitch

The Next Word Prediction app is a program that uses natural language processing (NLP) to predict a user's next word.
The app performs with a relatively high accuracy rate of about 20% for in-sample tests (inputs from which it 'learned') and 10% for out-of-sample tests.
Users are given a clean and simple, yet highly performing application. It allows for multiple options in a straightforward design.

NLP Algorithms - Stupid Backoff

In ordinary stupid backoff, input is fed into the highest order model (5-gram). If a prediction is found it is returned. If not, the next highest order model is searched. This process recurses until there is a match. If there is no match by the 2-gram model, the most frequent 1-gram is returned (typically “the”).
This app returns the top \( k \) results, as requested by the user (up to 5). This might span multiple models if the highest order model doesn't contain \( k \) matches. For example, take the phrase “the san antonio” with \( k \)=5. This phrase is found twice in the 4-gram model. To satisfy the remaining 3 predictions, the app returns results from the 3-gram model.



```
       gram    Freq n
      river 1.6e-07 4
 silverfish 1.6e-07 4
      spurs 9.1e-07 3
         tx 7.6e-07 3
        and 6.1e-07 3
```

Note that the lower order results actually return a higher frequency. But backoff chooses the higher order, regardless of frequency.

Part of Speech (POS) tagging - The training data has had it’s POS analyzed, and the most frequent next-word for each POS is aggregated into a model. If no match for the input is found, rather than returning “the”, the app analyzes the part of speech (POS) of the input’s last word and looks it up against the POS model.



```
 pos pred
  JJ   to
 JJS   of
  MD   be
  NN   of
 NNP they
  VB    a
```

NLP Algorithms - Linear Interpolation

Linear Interpolation
- All available \( n \)-gram models are utilized to pick the top \( k \) most likely words by score.
- Scores are equal to the following: \[ = \lambda_1 * Freq(w_i|w_{i-2}, w_{i-1}) + \\ \lambda_2 * Freq(w_i|w_{i-1}) + \\ \lambda_3 * Freq(w_i) \] Lambda is parameterized and can be changed on the fly, but the app uses a value of \( 0.4 \) for the highest order model that a match is found, \( 0.4^2 \) for the next highest order, etc.

Using the App

When the app is started up, it reads in the models saved as RDS. This should take no more than 15 seconds, please be patient. When the “Loading” message disappears and you can enter input, the app is ready.
After entering a phrase in the textbox, the user can choose the max results (\( k \)), and the method of prediction.
Results are displayed with the single most likely prediction on top of the list, in blue. The next most likely results, up to \( k - 1 \) are displayed below that, in decreasing order of likelihood.
Below the results list, a plot is displayed, which shows a 1-dimensional chart of the scores for each result.
For backoff predictions, results in the plot are sized so that higher order \( n \)-grams are larger and different colors. Due to the fact that the app supports multiple backoff predictions, there may be lower order results with a higher score (slide 3).
Styles
- CSS: https://bootswatch.com/spacelab
- ggthemes: Economist

https://jwillage.shinyapps.io/nlprediction/