Shiny App For Text Prediction

GAEL BERON
03/12/2017

Overview

As part of the fulfillment of the Data Specialization Capstone project, this Shiny application explores a particularly popular technology which is predictive text models. The tool aims to predict text, similarly to device applications such as SwiftKey smart keyboards.

This Shiny app takes as input a phrase (single or multiple words) in a text box input and interactively predicts the next word. It's very simple, intuitive and easy to use.

Concept and algorithm

This app firstly computes data sets of text digests from HC Corpora's US-English twitter feeds, blogs and news articles. In order to prepare the data for the app usage, a large enough sample of data was pre-processed to remove non standard words (such as profanities), contractions, numbers and punctuation. Once ready for analysis, the Text Mining Package was used to generate ngrams of corpus (bigrams, trigrams and quadrigrams). This application uses the frequency of those ngrams for its predictive model.

Finally, because the most frequent words in our data sets are not relevant to predict, what is considered as “stop words” are ignored as often as possible by the predictive model (see: Stop_words page from Wikipedia).

Instructions

This application simply aims to predict the next word of an input sentence, with the highest probability.

Steps to follow:

    The user enters the required phrase or even a single word
    The application uses its built-in ngrams dictionary to determine which word has the highest probability of occurring after
    The application automatically and instantly displays the results (not even needed to click!)
    Optionally, the user can change the number of words to predict thanks to the dedicated cursor (from 1 to 10)

Shiny app screenshot