Coursera capstone project ...predicting the next word

Tomas Klinger
14.12.2014

The application description

  • At this location you may find a web application written in Shiny, which predicts the next word as you type.
  • It also shows a “detailed” chart of the most probable suggestions so that you can choose the second-best option if it makes better sense. This option is hidden by default.
  • Potential areas of usage:

    • A smartphone keyboard which suggests the next word
    • Helping Stephen Hawking speak
    • Learning languages
  • The app was developed as a final project for the Coursera Data Science specialization (more info here)

Instructions

  • At the beginning, the input textbox contains a short example of a part of a phrase.
  • In a second, the predicted word should appear below the input box.
  • “It also shows a "detailed” chart of the most probable suggestions so that you can choose the second-best option if it makes better sense. This option is hidden by default.
  • Checking the checkbox “Show more suggestions…” opens an optional panel with few of the most probable next words ordered by their probability. The top one at the lis should be the one below the input box.

Example screenshot

alt text

Description of the used algorithm

  • The algorithm is based on a simple n-gram model:
    • First, the english Twitter, news and blog data are loaded
    • Second, the individual pieces of information are split into sentences
    • The sentence dataset is cleaned so that it does not contain any non-english characters
    • Finally, a one, bi and trigrams are calculated
  • When you start writing the input, the model looks up the most probable next word from the n-gram database which is pre-calculated and cached.
  • As smoothing proved to provide little performance improvement on the test set, it has not been implemented