Coursera Data Science Capstone Project

Borja Perez
08/09/2019

Background - Objective

The main objective of this project is to create a shiny application. This shiny application must predict the next word of a phrase, introduced by the user as a text input. First of all, an exploratory data analysis was performed over the data, cleaning it.

Alhorithm

In order to predict the word, different ngrams have been created. Each of the n-gram is used to store the memory of a previously analysed tweets, blogs and news documents.

  • Unigram (n = 1)
  • Bigram (n = 2)
  • Trigram (n = 3)
  • Quagram (n = 4)
  • 5-gram (n = 5)
  • Sixgram (n = 6)

How the shiny app words:

The application consists of 2 driving files:

  • ui.R : User Interface
  • server.R: back end of system

There are also some files, called ngram_X.rds, that are those where the training data have been saved. The siny application requires the user to introduce an input text in a box. Then, it will predict the 5 most probable words. To find this, it will search for the most common combination of the words entered.

Screenshot of the application:

The app works as following:

plot of chunk unnamed-chunk-1