Next Word Assistant
S.F.C
April 21st, 2016
- Goal:
Provide 10 most likely next words for user to choose from, after each word is entered.
- Benefits:
- Less typing, more communication.
- Can be plugged into all web and mobile applications.
- Prediction Technique:
N-Gram Language Model
User Interface - Layout
- For live demonstration, please visit
https://mchen-gh.shinyapps.io/nextWordAssistant/.
- An HTML textarea allowing multiple lines of text input
- A [Clear text] button for easy new trial
- 10 most likely next words appear below, each as a clickable button
User Interface - Interactive Mechanism
- A reactive expression is assigned with the 3 words immediate prior to the last space of the input.
- An observeEvent does the following when the reactive expression changes:
- gets the predicted top 10 next words from the model
- assigns the 10 words as the labels of the 10 word buttons
- assigns the 10 words to a reactiveValues variable
- 10 observeEvents, each monitors a word button and updates the textarea upon click by appending the word to the end of the textarea.
- A jQuery function moves the focus from the clicked word button back to the textarea.
Predictive Model - Generation Processes
Predictive Model - Some Numbers
- 50,000 lines of en_US.blogs.txt, 50,000 lines of en_US.news.txt, and 100,000 lines of en_US.twitter.txt were used for the demo models.
- 4-gram model has 2,592,084 unique context 3-grams; 3-gram model has 686,351 unique context 2-grams; 2-gram model has 79,106 unique context 1-grams.
- The backoff sequence guarantees 10 words to be returned: 4-gram model -> 3-gram model -> 2 gram model -> P-continuation
- Shorter response time of prediction is achieved by partitioning the 4-gram model into 26 smaller sets based on the first character of the context string.