Capstone project: Data Science Specialization

MJM Beuken
juni 21, 2018

Overview of project:

In this project I'am being asked to create a shiny application which predicts the next word in a sentence.

In this presentation there is an explanation for the app.

The application includes the following:

  • A text box.
  • When providing a sentence as input the reactive output is the prediction of the most common word after the combination of the last three, two or one word(s).
  • There's also documentation so that a novice user can use the application.
  • The documentation is deployed at the Shiny website itself.

For this app data from Twitter, news and blogs is being used.

Information on the build of the application

How it works? The n-gram theory is being used to predict the next word suggested to the user. The method is to match the last n-1 words of a given sentence with the corpus in the database. The predicted word will be the n-th one of the n-grams with the highest proportion (e.g. with the highest probability). Example: The sentence computed: “Hello, what are you” The last 3 (n-1) words are: “what are you” The predicted word will be: “doing”.

Extra information:

What if the sentence contains a word that is not in the database? Depending on the lenght of the sentence it starts with looking at the last three words, when the algorithm can't predict the next word (because the last three words contain a out of database word) it will look at the last two words, and so on. In the most unlikely situation (when even the last word in a sentence can't be found) the algorithm will randomly generate a word out of the ten most common words (e.g. with the highest probability in the database).

Links to the shiny application and Github:

Github link where you can find the ui.R, the server.R, the data and the algorithm files:

https://github.com/MJMBeuken/Capstone-project/tree/master/Predictnextword

Shiny application link:

https://mjmbeuken.shinyapps.io/Wordprediction/