Next Word Prediction App - Coursera Data Science Capstone

Caroline Zimmerman
Dec 21, 2016

Introduction

This simple Predict Next Word app invites the user to enter a word or phrase into a text box, click the “Predict Next Word” button, and returns the most likely next word, based on an advanced probabilistic model.

alt text

The Algorithm (Part 1)

The algorithm behind the app is a probabilistic model based on a large body of English text, available for download here. The steps for building the algorithm are:

  • Reducing the dataset (use sampling) for faster computations.
  • Tokenizing the data into n word phrases, known as n-grams.
  • Calculating the probabilities of different last words for trigrams and bigrams and returning the most likely option (“Top Prediction”), like so:

alt text

The Algorithm (Part 2)

A simple probablity model overestimates the true probability of an n-gram. The most effective adjustment method discovered to date is Modified Kneser-Ney Smoothing, which was used in the algorithm.

alt text

Resources for understanding the formula:

The App

Try out the app out for yourself here. (Keep in mind that SwiftKey, the partner company on this project, achieves 30% accuracy in predicting the next word.)

Access the full code for all features of the app on GitHub.