Next Word Prediction App - Coursera Data Science Capstone

Caroline Zimmerman
Dec 21, 2016

Introduction

This simple Next Word Prediction app invites the user to enter a word or phrase into a text box, click the “Predict Next Word” button, and returns up to five possibilities for the next word, ranked in order of likelihood.

alt text

The Algorithm (Part 1)

The algorithm behind the app is a probabilistic model based on a large body of English text, available for download here. The steps for building the algorithm are:

  • Reducing the dataset (use sampling) for faster computations.
  • Tokenizing the data into n word phrases, known as n-grams.
  • Calculating the probabilities of different last words for trigrams and bigrams and returning the top 5 most likely options, like so:

alt text

The Algorithm (Part 2)

A simple probablity model overestimates the true probability of an n-gram. The most effective adjustment method discovered to date is Modified Kneser-Ney Smoothing.

alt text

Resources for understanding the formula:

The App

Try out the app out for yourself here. (Keep in mind that SwiftKey, the partner company on this project, achieves 30% accuracy in predicting the next word.)

Access the full code for all features of the app on GitHub.