MJM Beuken
juni 21, 2018
In this project I'am being asked to create a shiny application which predicts the next word in a sentence.
In this presentation there is an explanation for the app.
The application includes the following:
For this app data from Twitter, news and blogs is being used.
How it works? The n-gram theory is being used to predict the next word suggested to the user. The method is to match the last n-1 words of a given sentence with the corpus in the database. The predicted word will be the n-th one of the n-grams with the highest proportion (e.g. with the highest probability). Example: The sentence computed: “Hello, what are you” The last 3 (n-1) words are: “what are you” The predicted word will be: “doing”.
What if the sentence contains a word that is not in the database? Depending on the lenght of the sentence it starts with looking at the last three words, when the algorithm can't predict the next word (because the last three words contain a out of database word) it will look at the last two words, and so on. In the most unlikely situation (when even the last word in a sentence can't be found) the algorithm will randomly generate a word out of the ten most common words (e.g. with the highest probability in the database).
Github link where you can find the ui.R, the server.R, the data and the algorithm files:
https://github.com/MJMBeuken/Capstone-project/tree/master/Predictnextword
Shiny application link: