Data Science Capstone Next Word Prediction Webapp

Yaakov Miller

Data Science Specialization - Johns Hopkins University

Next Word Prediction

Logos

Overview

Natural language processing (NLP) was applied to English documents coming from Tweets, News and Blogs.

The data was processed and combined to produce a webapp to predict the next word given a text input.

The webapp was created in R language (shiny app) using common R packages and is hosted at: https://ykv001.shinyapps.io/dsc10-cap/

Logos

Methodology and Modeling

The following steps provide more details on the webapp creation:

  • Data gathering and consolidation
  • Sampling the data
  • Data cleaning (lowercase conversion, removal of special characters, etc.)
  • Data tokenization
  • Creation of frequency dictionaries
  • Modeling prediction by word frequency Logos

Deployed Webapp

The usage is straighforward, simply enter text and see prediction:

Webapp

Next Word Prediction