Data Science Specialization Capstone Project: Predict the next word

Piyush Verma
2017-03-17

Background

This is the final submission for the fullfillment of Coursera-John Hopkins Data Science Specialization Track Certificate. In this capstone project the final objective is to build a predictive tool in R Shiny app that can take a string of words from user as input anad predict the next word using back-off model. Following concepts were used in this capstone project:

  • Text Mining: Uses of 'tm' package for data cleaning and corpus making
  • Back-off model: Uses n-gram word counts to predict the next word. The theory tells us that the maximum likelihood estimator for words is proportional to their word frequencies.
  • R Shiny application

Description and Functionality

The R shiny application should take a string of characters from the user and and predict the next word as output. Here, the input can be n number of characters but the output will be restricted to just one word. Below is an screenshot of the application.

appimage

How is it working? - N-Grams

appimage3

This app can be used as a POC to build more evolved products. For example, it can be used to predict a service complain even before a customer finishes typing his complaint: possibly cutting down the critical customer service time

References