Data Science Capstone Final Project - Word Prediction App

Victoria O.
July 27, 2020

Shiny App Sypnosis

The goal of this final project is to create a product to highlight the prediction algorithm with an interface that can be accessed by others. The project submission includes:

  • A Shiny app that takes as input a phrase (multiple words) in a text box input and outputs a prediction of the next word.
  • An R Studio Presentation to pitch the word predictor algorithm and app.

Data Cleaning

Data clean up activities involve:

  • Converting text characters to lower case
  • Removing every numbers and punctuations
  • Removing white spaces
  • Removing stop words

The model was then created using the algorithm of N-grams model. Three N-grams tokens were created (unigram, bigram, trigram) and were transformed into frequency data frames. The model is then able to predict next word based on the corresponding n-gram frequencies.

Word Predictor Shiny App

The shiny App is easy to use. A user only has to enter data in the text box while the app predicts the next word with the output displayed in the output box

The predictive text model was built from a sample of 800,000 lines extracted from the large corpus of blogs, news and twitter data.

The data was then tokenized and cleaned using the tm package and a number of regular expressions using the gsub function.

Word Predictor App User Interface

The shiny App is hosted on Shiny.io and is available on this link https://vickkiee.shinyapps.io/Word_Predictor The R prresentation is available on this link https://rpubs.com/vickkiee/643442

The predicted next word will be shown when the app detects that you have finished typing one or more words.

When entering text, please allow a few seconds for the output to appear. Use the slider tool to select up to three next word predictions. The top prediction will be shown first followed by the second and third likely next words.