Diego Menin
21/04/2015
Developing a Natural Language Processing Model on Word Prediction
Coursera and SwiftKey Partnership
The data used to build the model was provided by Coursera and consits of sentences extracted from Twitter, News feeds and blogs;
The model was buit using n-grams (1, 2, 3 and 4), which were stored using Markov chains;
A sentence is predicted by looking up it's last N words on the chain (recursively on the 4 gram, 3 gram and so on…) and the match with the highest frequency is returned;
A match with small frequency on a higher gram has more weight than a match with high frequency on a smaller gram.