November 24, 2017

Next Word Prediction in r

This project is done as part of Capstone project offered by John Hopkins university on Coursera.org.

Natural language processing is very relevant and challenging in today's era of heavy reliance on IOT (internet of things). The coursera gave a large amount of text data collected from twitter, blogs and news. This collection of texts is called a corpura.

A language model is a model that computes either probability of a sequence of words or the probability of the nth word given the (n-1) words. Probability of a sequence of words W consisting of w1, w2,….wn is determined using various models. In the following 2 slides, I have discussed the two methods of modeling that I have implemented to predict the nextword:

  • Markov chain
  • Kneser Ney

Markov Chain Modeling

In Markov assumption probability of next word is computed by considering only the last few words in the sequence, instead of the entire sequence. This is the basis for MLE (Maximum likelihood estimate). Let's consider the example: The first argument can be a list of data In this above example, instead of considering all the 9 words to predict the 10th word, typically only the last few words are considerd. Maybe, only 'of data' (called a bigram- for a sequence of 2 words) or 'list of data' (called a trigram- for a sequence of 3 words) is considered, to predict the next word.

The advantage of this method is it's computational simplicity. Disadvantage is that it gives a probability of zero for unlikely n-grams (please see references in the last slide for more details).

In my code I used a maximum of trigram prediction. If this did not give a likely prediction, I backed off to bigrams and then unigram prediction to obtain the most likely nextword.

Kneser-Ney Modeling

Kneser-Ney modeling depends on a concept called discounting. A part of the probability from the higher probability n-grams, are discounted and distributed to other words that have zero probability. This helps to smooth the probability distribution. In Kneser-Ney, the lower order maximum likelihood factor becomes significant when no higher-order matches are found.

In my shiny app, a comparison between the 2 modeling is shown. Also a confidence is predicted based on square root of the count.

ShinyApp

My Shiny App and references