Giovanni Valentini
June 18th, 2016
In order to balance accuracy and runtime needs, the model that I used to make the prediction is based on a combination of:
A phrase is a sequence of n words: \( W_{1}, W_{2}, ..., W_{n-2}, W_{n-1}, W_{n} \)
The word \( W_{n} \) is the omitted word to predict.
First the model looks for \( W_{n} \) in the dataset of trigrams.
The probability of the next word is estimated as follows:
\[ P(W_{n} | W_{n-2}, W_{n-1}) = \frac{count(W_{n-2}, W_{n-1}, W_{n})}{count(W_{n-2}, W_{n-1})} \]
If no trigram is found, that is for each \( W_{n} \) in the dataset it results:
\[ count(W_{n-2},W_{n-1},W_{n}) = 0 \]
then the search is extended to the dataset of bigrams (Back-Off process to a 1st order Markov model).
In this case the probability of the next word is estimated as follows:
\[ P(W_{n}|W_{n-1}) = \frac{count(W_{n-1},W_{n})}{count(W_{n-1})} \]
In order to get a prediction of the next word:
1. Type a phrase of 2 or more words in the Input Box
2. Press the button Predict
In the main panel it will be shown:
I hope you will enjoy using this app! WordPred-App