June 19th, 2020

Project Summary

The purpose of the project is to create text-prediction application with R Shiny package that predicts words using a natural language processing model i.e. creating an application based on a predictive model for text.

Given a word or phrase (multiple words) as input, the application will try to predict the next word, similar to the way most smart phone keyboards are implemented today using the technology of Swiftkey.

The predictive model will be trained using a corpus, a collection of written texts, called the HC Corpora which has been filtered by language.

Review Criteria

Data Product - Does the link lead to a Shiny app with a text input box that is running on shinyapps.io? - Does the app load to the point where it can accept input? - When you type a phrase in the input box do you get a prediction of a single word after pressing submit and/or a suitable delay for the model to compute the answer? - Put five phrases drawn from Twitter or news articles in English leaving out the last word. Did it give a prediction for every one?

Slide Deck - Does the link lead to a 5 slide deck on R Pubs? - Does the slide deck contain a description of the algorithm used to make the prediction? - Does the slide deck describe the app, give instructions, and describe how it functions? - How would you describe the experience of using this app? - Does the app present a novel approach and/or is particularly well done? - Would you hire this person for your own data science startup company?

Prediction Model

The prediction model uses the principles of tidy data applied to text mining in R. The following Key steps are involved in the prediction model.

  • As an input, it takes raw text files for model training.
  • Clean the raw data; separate into 2, 3, and 4 word n grams and save as tibbles.
  • Sort the n grams tibbles by frequency and save the data as .Rdata files.
  • N grams function uses a back-off type prediction model.
    • User enters a word or phrase (multiple words).
    • Model then uses last 3, 2, or 1 words to predict the best 4th, 3rd, or 2nd match from the data.
  • Output predicts next word.

Predictor App

Summary: The Predictor app provides a simple user interface to the prediction model for the next word. This Predictor app takes as a word or a phrase (multiple words) in the input text box and will output the prediction of the next word.

Next Word Predictor App