TheNextWord

Bunny Laden
December 2014

1: TheNextWord: A Shiny App

  • User inputs a phrase
  • App responds with up to three word suggestions
  • Uses n-grams of 5, 4, 3, and 2 words
  • Adds previously constructed user-based n-grams to the core n-grams to make the app more responsive to an individual user
  • N-grams are stored in data frames as frequency and individual words
  • N-grams are presorted from high to low frequency
  • Saves unmatched words for later learning

2: The Matching Strategy

alt text

3: Learning from the User

  • Unmatched words can be processed offline and the resulting n-grams added to the model when the app launches in the future
  • Great feature for people with unusual vocabularies, such as data scientists, physicians, and lawyers

4: Predictions: TheNextWord and SwiftKey

Phrase (from NY Times) Actual TheNextWord SwiftKey
Despite reforms in the military's united, world, middle world, morning, past
You've roasted and rested your on, his to, the, I
The average price of an individual old, individual, american old, issue, email
intended to defuse pressure on, of, the to, the, I
Winter offers a slower less than than, pace, and
Demonstrators marched intermittently along the with, the, a the, with, to

5: Future Improvements

  • Make the matching algorithm more efficient
  • Finish implementing the learning feature
  • Add more n-grams
  • Use a Hadoop cluster for processing
  • Add parts-of-speech tagging (POS) to help when guessing is necessary
  • Investigate a hybrid semantic-word frequency model for making predictions based on semantic context