wordpredict

A Shiny app for predicting user input

Knut Behrends, Dec. 2014
Pitch Presentation for Capstone project 'Data Science Specialization', offered by Coursera / Johns Hopkins University

Word prediction app

  • Scenario: user enters small phrase into a text field
  • Goal: create a new kind of autocompleter app
  • Do not complete the current word being written, but predict the next word
  • Link: wordpredict

Data source

  • A text corpus of everyday English

Example queries:

  • "I think that you..." gets completed to "I think that you should".
  • There are lots of more challenging queries
  • E.g., how to avoid predicting profanities

Processing and App Architecture

  • Convert to lowercase, remove punctuation (675 MB)
  • Generate NGrams (1- to 5- Grams), load into SQLite Database (300 MB with indexes)
  • Upload to Amazon Cloud to avoid storage limits on shinyapps.io
  • Shiny App just queries web services. No big data structures loaded.

Prediction algorithm

  • If all words known: Stupid Back-Off, Unsmoothed.
  • Else: Simple Interpolation
  • Last resort measure: Predict the most common word, "the"

My Shiny App: wordpredict

Sample Text Input field and Output field Smore predictions
Two tabbed panes:
one for showing the user input in raw and in sanitized form. It also shows the string augmented with a predicted word.
On the second pane, ten other predictions can be seen, ordered by a likelihood score (lower is better).
App GUI, wordpredict

References

Self-Evaluation

  • Service-Oriented Architecture: App performance depends on network availability
  • App Characteristics: Back-off smartness, Recall and Precision of predictions can always be improved
  • New features: Will try this some time: Part-of-speech tagging