Shiny App for Next Word Prediction

author: Jürgen Riedel

date: 21.08.2015

Overview

  • Lots of effort has been put into the developing of smart keyboards that makes it easier for people to type on their mobile devices.
  • The goal of this capstone project is to develop a Shiny application to predict the next word in a sentence entered by the user.
  • We developed an application based on Natural Language Processing (NLP). NLP deals with the application of computational models to text or speech data.
  • The application will predict the next word(s) in a typed sentence based on a trained language model.

Model

  • The data in English for the Capstone Project were downloaded from the course website ( original site: http://www.corpora.heliohost.org).
  • Since the text documents are rather large a sample is drawn of each text containing ca. 2 Mio characters. Since this is ca. 10% of the number of total characters in each text file, we should have a statistical significant sample size.
  • The predictive model is based on N-gram up to a size of 4 and is dealing with unknown N-grams by using a closed vocabulary. We trained two models. One implements Good-Turing smoothing with a Katz-style back-off algorithm. The other instead uses Good-Turing smoothing with interpolation.

Unser Interface

  • Enter the beginning of a sentence in the text box.
  • Hit the “Predict” button to display a list of most likely next words ordered from left to right by their probabilities.
  • Choose between two models for the prediction.

alt text Try the app: https://thescienceinstitute.shinyapps.io/nextwordprediction

Further steps

  • The model were optimized by calculation the intrinsic measure of perplexity as well as conducting the Shannon test. One could conduct a more thorough extrinsic model evolution via statistical testing.
  • There are many more smoothing and back-off/ interpolation methods which could be explored.
  • Applying adaptive learning which uses user feedback to improve the accuracy of the model.
  • Improving the application to a full functional word predictive app used for cell phones.

  • Find all source code at: https://github.com/jurgenriedel/Capstone-Project