Word Prediction - Data Science Capstone

Rod Slagle
9/1/2017

Why is Word Prediction Needed?

  • Explosive use of cell phones and tablets for social media
  • Mobil devices have small screens but still need text input
  • Accurately predicting the “next word” saves typing stokes
  • Reduces typos and spelling errors
  • Enhances user experience by reducing manual typing

What is needed is a mobile App that:

  • Loads fast and responds quickly to text input
  • Has high accuracy to predict the next word

Proposed Solution

My proposed App addresses the problem using a Shiny app that is:

  • Fast Starting: Loads in <2 secs.
  • Fast Response: Average of 90 msec
  • Accurate: Benchmarked accuracy of 13.30% (top-1 only)
  • Efficient: Uses only 71 mb memory
  • Very Easy to Use: Just start the app and then start typing. The predicted next word appears on the fly (i.e., no Submit button)
  • Free and Easy to Access: Internet hosted at https://rslagle.shinyapps.io/Word_Prediction/

Proposed Solution App Details

  • Based on the full Capstone English Corpus
  • Cleaned, Tokenized, and summed ngram2-4
  • Removed “same next word” from higher ngram levels
  • Removed non-English Words
  • Removed lower frequency counts (ngram2 <2, ngram3-4 < 5)
  • Used r data.table Indexing
  • App implements “stupid back off” process (i.e., ngram4->ngram3->ngram2) for “next word” selection

What Next ?

Improve Response Time, Accuracy, & Utility

  • Response time is currently decent, but could be faster
  • Explore using Higher Level ngrams (ngram5+)
  • Improve the Training Text Cleaning (build smaller model input data)
  • Incorporate More Training Text (build better model input data)
  • Improve model processing to use better statistical approaches (katz Backoff, Kneser-Nay Smoothing, etc.)
  • Configure the App to “learn” the users individual lexicon
  • Explore Geographic specific “lexicons” for regional differences
  • Add Button to append the “predicted next word” to input phrase