Word Prediction Shiny Application

Giovanni Valentini
June 18th, 2016

The Algorithm - part 1

In order to balance accuracy and runtime needs, the model that I used to make the prediction is based on a combination of:

Markov Chains models
Back-Off process

A phrase is a sequence of n words: \( W_{1}, W_{2}, ..., W_{n-2}, W_{n-1}, W_{n} \)
The word \( W_{n} \) is the omitted word to predict.
First the model looks for \( W_{n} \) in the dataset of trigrams.
The probability of the next word is estimated as follows:

\[ P(W_{n} | W_{n-2}, W_{n-1}) = \frac{count(W_{n-2}, W_{n-1}, W_{n})}{count(W_{n-2}, W_{n-1})} \]

The Algorithm - part 2

If no trigram is found, that is for each \( W_{n} \) in the dataset it results:

\[ count(W_{n-2},W_{n-1},W_{n}) = 0 \]

then the search is extended to the dataset of bigrams (Back-Off process to a 1st order Markov model).

In this case the probability of the next word is estimated as follows:

\[ P(W_{n}|W_{n-1}) = \frac{count(W_{n-1},W_{n})}{count(W_{n-1})} \]

Shiny Application Layout Overview

alt text

Shiny Application Instructions

In order to get a prediction of the next word:
1. Type a phrase of 2 or more words in the Input Box
2. Press the button Predict

In the main panel it will be shown:

the next word with the highest probability
a barplot of the probable next words
the table of the probable next words

I hope you will enjoy using this app! WordPred-App