Guilherme Folego
2020-04-13
The app is available at https://gfolego.shinyapps.io/CapstoneApp/
Is it very straightforward to use. Simply type a text in the input text box, and click to submit and predict.
The predicted word will be automatically displayed at the end of the input text box, so you can click to submit and predict as many times as you'd like.
The basis of the prediction algorithm is a number of n-gram models, with n ranging from 1 to 5.
First, we take the last 4 words from the input sentence and try to find a match in the 5-gram model. In case it fails, we take the last 3 words and try to find a match in the 4-gram model. In case it fails, we take the last 2 words, and so on…
This procedure is repeated until a match is found in one of the higher order n-gram models. In case no match is found in any of the models, we predict the most frequent word from the 1-gram model.
For efficiency, the complete model is implemented as a hash function with all n-grams included.
The final model contains a total of 317,029 unique keys that could be matched in order to predict the next word.
It was created by sampling 5% of all lines from each English text files, and then retrieving only the top 5% frequent terms for each n-gram. In the end, if there is more than one key predicting multiple words, only the most frequent one is kept.
Using 20% of the sampled lines for testing, we estimated an accuracy of around 15% for predicting the next word exactly. In practice, this is a reasonable number.
In principle, you could provide any text as input, including symbols and special characters. The app performs a clean up of the input before predicting. And the input will be kept as is.
Try something interesting, such as “I wish you a merry”, and predict the next word a few times
You can even try predicting with an empty input, and also predict the next word a few times.
Have fun!
:-D