Maurizio M. Murino
13/05/2016
The main goal is to build a shiny application capable to predict the next word given one.
This exercise contemplated:
Such a task is too computationally demanding for a PC such the one I use. First tests were performed on a 1% sample of the data. Later, the sample has been increased to 5%.
It works on a n-gram association rule: it checks in descending order from largest to smallest n-grams associated with the choosen term.
Simply add a word of you choice and push the botton. Because the small sample, some words could be too rare to create a match. If this occurs, you should add a second word to the rare one! It produces also a probability table with the most likely following words.
The app, the data, the presentation and the development tests are hosted on git hub: https://github.com/Maurizio-Mario/CP_Natural_Language.git
R predictions: https://www.youtube.com/watch?v=0le0ijNVP5M