Predicting Next Words

Luis Hernandez
2019-12-09

Coursera Data Science Capstone Project

Executive Summary

Predicting next words from a sentence is a very useful capability, used by text processors and modern keyboards.

Using data sciences process and machine learning, this application predicts the next word after a sentence is entered.

This application also provides a graph of five words in order of predicted probability.

This experiment leverages data from SwiftKey and was created for the purpose of demonstrating the capability to predict words based on previously entered phrases.

Application Overview

  • The application has a text entry area and predicted words with a plotly graph.
  • The words predicted are presented on a plotly graph that allows for further investigation with graphical user interface.
  • Clicking on three line icon next to teh title the left side will collapse allow for more space.
  • Clicking on the “Info” section will provide additional information about the application.

Application Image1

Application Overview (Screenshot)

Application Image

  • Click on the Github icon to go to the application repository.
  • Select the info pane for additional info about the developer.

Prediction Algorithm

  • The algorith used for the prediction is leveraging N-grams.
  • N-grams for 1,2,3,4, and 5 words were generated using blogs, twitter, and news text data.
  • A database was created to allow the engine to perform fast enough for realtime prediction. These are the tables:

  • This optimized method allowed the software to work around the restrictions in R and provides great performance for large data sets.

  • Various methods were used to imporve performance including reducing the number of words and removing 'badwords'.

Thanks for reviewing my project!