Data Science Specialization Capstone Project

Andy Rosa
2014-12-14

Introduction

Thank you for your interest in my text prediction app! The purpose of the application is to take a phrase or group of words as an input, run the input string against an algorithm, and return a predicted next word. The application mimics auto-suggestions similar to those experienced using the SwiftKey app for iOS and Android as well as Google search and a growing number other sites and applications.

Process

Using the data in this zip file steps were performed to build the application including:

  • Cleaning and merging the data from the three distinct sources (Blogs, News and Twitter)

  • Tokenizing the lines of text and creating n-grams to provide frequencies of phrases in the text.

  • Building the algorithm

Text Prediction Algorithm

The final algorithm does the following:

  • Receive text input (e.g. “I hope you have a good”), take up to the last five words in the string (e.g. “hope you have a good”)

  • Perform a back-off method by comparing 5 word sequence to n-grams set, if no match then use the 4-gram (“you have a good”), then tri-gram (“have a good”), etc, until a match is found

More rigorous statistical techniques were tested but accuracy compared to the “back-off” method was never improved enough to justify longer load and wait times to the user.

Using the Application

Here's an image of the application:

text

Add text, click Predict! and you'll see your predicted word to the right.