Next Word Prediction (NWP) Tool

Ilia Semenov
April 2016

John Hopkins University Data Science Specialization Coursera

Overview

Next Word Prediction (NWP) Tool by Ilia Semenov is:

  • Web-based app that predicts the next word of your input
  • Awesome design of UI
  • Strong beck-end based on latest NLP algorithms
  • Project supported by John Hopkins University, Swiftkey and Coursera
  • Lots of fun - try it yourself!

Showcase

Data and Algorithm

Data

  • HC Corpora: English
  • 550MB of text, 3M Lines, 70M words
  • N-Grams up to 4th order generated: 7GB of data
  • N-Grams are truncated: 600MB of data
  • N-Gram tables are stored in SQLite DB with indexes for search performance

Algorithm

  • Stupid Backoff (SBO, Brants et al. 2007): 16% accuracy, 0.13 seconds execution time
  • Get probability from highest N-Gram and back off to lower N-gram if no match
  • Implemented in R with RSQLite for live DB querying

Schema and Roadmap

Schema

Roadmap

  • Introduce sentence start/end recognition
  • Increase data size
  • Predict partial words
  • Introduce interpolation in case no more data available