DataScience Capstone: Predict Next Word

Bahadir Girtten
"Sun Apr 26 2015"

Predict Next Word Shiny App


  • Predict Next Word predicts the next word depending of the input phrase you type in.

  • The app uses pre-built n-grams from samples of twitter,blogs and news data for predicting the next word.

  • n-grams built for the app (unigrams bigrams trigrams and quadgrams) are hashed for fast access in the shiny app.

  • The Milestone Report Tab in the app shows the interim Milestone report which explains the procedure of building the n-grams

Usage

You

  • Start Typing in the Text Input Box
  • Select whether you wan't to disable Profanity Filtering

The app will

  • Predict the next word as you are typing in reactive mode
  • Apply profanity filtering if it is enabled (enabled by default)

Sample ScreenShot for Sample Input

Detailed Algorithm - Katz's Back-off Model

Katz's Back-Off Model uses an ordered lookup at different n-gram models given the history of input. Algorithm's Pseudocode for “backing-off” to models starting with a quadgrams is as follows

if(predict with quadgrams == Success)
  {return     prediction}
else if(predict with trigrams == Success) 
  {return     prediction}
else if(predict with bigrams == Success) 
  {return     prediction}
else 
  {return a random word from the list of top50unigrams}

For more information: wikipedia