CapstoneDataScienceTextPredictProject

Andi M Masanam
04 June 2016

Introduction

For this Capstone project, I created a simple next word prediction app

The app allows a user to input a sentence fragment and suggests the next word in the fragment

The app is trained on large samples of text from blogs, news reports and tweets.

You can try out the app using this link!

The Algorithm Used

Lists of n-grams are created from the source text and sorted according to frequency

The algorithm was used as it offers comparable accuracy to more sophisticated models while being less computationally expensive

The algorithm takes user input and matches it against the n-gram with the greatest frequency in order to predict the next word

If there is no match, it then recursively searches for a match against successive lists of (n-1)grams, until it finds a match

The algorithm terminates at the 1-gram level, in which it simply outputs the most common unigram in case no match is found

What the App Does

The app employs the algorithm to provide a best-guess prediction of the next word in a sentence fragment, and displays all possible predictions in an accompanying word cloud of the desired size

Instructions for Use

You can try out the app using this link!

Instructions:

Type in a sentence fragment in the attached text box

Select the maximum number of matches to be generated word cloud

Press “Predict”

The app will generate the suggested sentence and an accompanying word cloud. Source code for the app can be found here