Next Word Prediction App - Coursera Data Science Capstone

Caroline Z
Dec 21, 2016

Introduction

This simple Next Word Prediction app invites the user to enter a word or phrase into a text box, and returns up to five possibilities for the next word, ranked in order of likelihood.

alt text

The Algorithm (Part 1)

The algorithm behind the app is a probabilistic model based on a large body of English text, available for download here. The steps for building the algorithm are:

  • Reducing the dataset (use sampling) for faster computations.
  • Tokenizing the data into n word phrases, known as n-grams.
  • Calculating the probabilities of different last words for trigrams and bigrams and returning the top 5 most likely options, like so:

alt text

The Algorithm (Part 2)

A simple probablity model overestimates the true probability of an n-gram. The most effective adjustment method discovered to date is Modified Kneser-Ney Smoothing.

alt text

Resources for understanding the formula:

The App

Try out the app out for yourself here. (Keep in mind that SwiftKey, the partner company on this project, achieves 30% accuracy in predicting the next word.)

Access the full code for all features of the app on GitHub.