GAEL BERON
03/12/2017
As part of the fulfillment of the Data Specialization Capstone project, this Shiny application explores a particularly popular technology which is predictive text models. The tool aims to predict text, similarly to device applications such as SwiftKey smart keyboards.
This Shiny app takes as input a phrase (single or multiple words) in a text box input and interactively predicts the next word. It's very simple, intuitive and easy to use.
This app firstly computes data sets of text digests from HC Corpora's US-English twitter feeds, blogs and news articles. In order to prepare the data for the app usage, a large enough sample of data was pre-processed to remove non standard words (such as profanities), contractions, numbers and punctuation. Once ready for analysis, the Text Mining Package was used to generate ngrams of corpus (bigrams, trigrams and quadrigrams). This application uses the frequency of those ngrams for its predictive model.
Finally, because the most frequent words in our data sets are not relevant to predict, what is considered as “stop words” are ignored as often as possible by the predictive model (see: Stop_words page from Wikipedia).
This application simply aims to predict the next word of an input sentence, with the highest probability.
Steps to follow: