Coursera capstone project ...predicting the next word
Tomas Klinger 14.12.2014
The application description
At this location you may find a web application written in Shiny, which predicts the next word as you type.
It also shows a “detailed” chart of the most probable suggestions so that you can choose the second-best option if it makes better sense. This option is hidden by default.
Potential areas of usage:
A smartphone keyboard which suggests the next word
Helping Stephen Hawking speak
Learning languages
The app was developed as a final project for the Coursera Data Science specialization (more info here)
Instructions
At the beginning, the input textbox contains a short example of a part of a phrase.
In a second, the predicted word should appear below the input box.
“It also shows a "detailed” chart of the most probable suggestions so that you can choose the second-best option if it makes better sense. This option is hidden by default.
Checking the checkbox “Show more suggestions…” opens an optional panel with few of the most probable next words ordered by their probability. The top one at the lis should be the one below the input box.
Example screenshot
Description of the used algorithm
The algorithm is based on a simple n-gram model:
First, the english Twitter, news and blog data are loaded
Second, the individual pieces of information are split into sentences
The sentence dataset is cleaned so that it does not contain any non-english characters
Finally, a one, bi and trigrams are calculated
When you start writing the input, the model looks up the most probable next word from the n-gram database which is pre-calculated and cached.
As smoothing proved to provide little performance improvement on the test set, it has not been implemented