Data Science Capstone Project

Karthik Chawala
11/30/2017

Introduction

This application is built to predict the next word for a given phrase and uses a corpus called HC Corpora.

All NLP and text mining was done using a variety of R packages.

Algorithm Description

The algorithm developed to predict the next word in a user-entered text string was based on a classic N-gram model. Using a subset of cleaned data from blogs, twitter, and news Internet files, Maximum Likelihood Estimation (MLE) of unigrams, bigrams, and trigrams were computed.

Shiny App

A Shiny application was developed using the algorithm that accepts a phrase as input from the user and suggests the next word from the unigrams, and predicts the most likely next word based on the linear interpolation of trigrams, bigrams, and unigrams. The web-based application can be found here.

Using the Shiny App

Using the application is very user friendly and can be embedded into many commerical applications. When you open the Shiny App in the browser, just start typing a phrase (without any punctuation) in the text box and press enter to see the next predicted word at the bottom of the screen.