Coursera Data Science Specialization Capstone Project

Jeff C
5/31/2020

The Project

An app that takes as input a phrase (multiple words) in a text box input and outputs a prediction of the next word

Next Word Prediction Model

  • Input: raw text files for model training
  • Clean training data; separate into 2 word, 3 word, and 4 word n grams, save as tibbles
  • Sort n grams tibbles by frequency, save as repos
  • Model uses last 3, 2, or 1 words to predict the best 4th, 3rd, or 2nd match in the repos
  • Output: next word prediction

The App's Key Features

  • Text box for user input
  • Predicted next word outputs dynamically below user input
  • Side panel with user instructions