Next Word Assistant

S.F.C
April 21st, 2016

  • Goal: Provide 10 most likely next words for user to choose from, after each word is entered.
  • Benefits:
    1. Less typing, more communication.
    2. Can be plugged into all web and mobile applications.
  • Prediction Technique: N-Gram Language Model

User Interface - Layout

UI Screenshot

  • For live demonstration, please visit https://mchen-gh.shinyapps.io/nextWordAssistant/.
  • An HTML textarea allowing multiple lines of text input
  • A [Clear text] button for easy new trial
  • 10 most likely next words appear below, each as a clickable button

User Interface - Interactive Mechanism

  • A reactive expression is assigned with the 3 words immediate prior to the last space of the input.
  • An observeEvent does the following when the reactive expression changes:
    1. gets the predicted top 10 next words from the model
    2. assigns the 10 words as the labels of the 10 word buttons
    3. assigns the 10 words to a reactiveValues variable
  • 10 observeEvents, each monitors a word button and updates the textarea upon click by appending the word to the end of the textarea.
  • A jQuery function moves the focus from the clicked word button back to the textarea.

Predictive Model - Generation Processes

Model Generation Processes

Predictive Model - Some Numbers

  • 50,000 lines of en_US.blogs.txt, 50,000 lines of en_US.news.txt, and 100,000 lines of en_US.twitter.txt were used for the demo models.
  • 4-gram model has 2,592,084 unique context 3-grams; 3-gram model has 686,351 unique context 2-grams; 2-gram model has 79,106 unique context 1-grams.
  • The backoff sequence guarantees 10 words to be returned: 4-gram model -> 3-gram model -> 2 gram model -> P-continuation
  • Shorter response time of prediction is achieved by partitioning the 4-gram model into 26 smaller sets based on the first character of the context string.