JHU Coursera Data Science Capstone

Mike
11/26/2017

Application Overview

We have created an application that makes a prediction of what the next word in a sentence will be based on the inputs provided. Due to the many possible sentences a user may be trying to create, it provides three separate predictions. As the user types, the predictions will continue to update.

The design of the application is for quick predictions that create a dynamic experience for the user.

Methodology

Behind the scenes, this application uses a backoff model.

First, a corpus of text from blogs, tweets, and news articles was evaluated for n-grams (combinations of words of count n). In order to utilize them, there was a cleaning process that involved removing profanity, punctuation, numbers, etc. Once the user provides input, it goes through a similar cleaning process and then looks for matching combinations of up to three preceding words. If it finds a match, it provides the most frequent following word; but, if there is no match, then it goes and looks for matching combinations of just the preceding two words. If there is still no match, it will go to one word.

Example Use Case

Below is a screenshot of the application in action.

As you can see, the user first input the phrase 'When the'. The application's top prediction was truth. Once the user types 'truth', the top prediction became 'is'.

my image

my image

Try it out for yourself...

The app can be found here:

https://mbinz2.shinyapps.io/JHUDataScience-Capstone/

Let us know if you're interested in working together on the next stage of development for the application.