NLP Word Prediction tool

Ankit Upadhyay

November 28, 2020

Overview

This is a word prediction tool to help us determine the next word in a sentence.

Assignment guidelines from coursera

The goal of this exercise is to create a product to highlight the prediction algorithm that you have built and to provide an interface that can be accessed by others. For this project you must submit:

How it works?

N-grams are used for estimating the most likely next word and for preprocessing of sentences correctly.

Process involved

Significant data cleaning has been done (converting to lower case, removing punctuation marks, numbers and non-printable characters) and lines have been processed for twitter,news and blog files (english version only).

After data cleaning, the next word is predicted base on the “Stupid Back-off” algorithm.

In general,by pruning the n-gram database and using a word-integer hash table, our application has low memory usage and thus is faster in predicting the output.

Shiny app

App can be found here: https://ankit-techspace.shinyapps.io/NLP_Word_Prediction/