getWord App

Mridula Velagapudi
March 06, 2017

About getWord App

The getWord App predicts the next word for you with certain probability

  • Type text in text box
  • Click “Predict My Word”
  • You will see the predicted word in the main panel
  • Get the “getWord App” here

alt text

How does getWord App predict the next word?

  • The algorithm was built using Katz Back-off Model
  • This model reserves the probablity from the n-gram and distributes it over the unseen words. If the combination of words is not found in higher order n-gram, then it backs-off to lower order n-grams till it reaches zero count.
  • Redistribution of probability is done by first using the Good-Turing method of discounting the frequencies. The count of an event occurring \( a \) times is discounted with \( \displaystyle d_a = (a+1) \frac{c_{a+1}}{a\,.\,c_a} \)
  • BUT, there is a limitation - it fails if there is a case where \( c_a = 0 \), or as \( a \) increases the count \( c_a \) tends to zero.

Katz BackOff Model Helps Us Out...

  • A solution to Good-Turing problem was proposed by Katz, where he defines a cut-off value \( k \) at which counts \( a \) for \( a > k \) are not discounted, considering these more frequently observed counts as reliable.
  • The recommended value of \( k \) is 5. So, for higher \( a \) the model does not fail.
  • It takes care of the unseen words by adjusting the maximum likelihood estimates to reserve probability mass for unseen words combinations.

References:

App Performance

  • The full code takes total of 870ms to run.
  • The maximum time is consumed in loading the libraries, which are required to form the n-grams.


    Happy Word Predicting!

alt text