This project is done as part of Capstone project offered by John Hopkins university on Coursera.org.
Natural language processing is very relevant and challenging in today's era of heavy reliance on IOT (internet of things). The coursera gave a large amount of text data collected from twitter, blogs and news. This collection of texts is called a corpura.
A language model is a model that computes either probability of a sequence of words or the probability of the nth word given the (n-1) words. Probability of a sequence of words W consisting of w1, w2,….wn is determined using various models. In the following 2 slides, I have discussed the two methods of modeling that I have implemented to predict the nextword:
- Markov chain
- Kneser Ney