The algorithm was developed to predict the next word in a user-entered phrase, which was base on classic N-gram model using a subset of cleaned data from blogs, twitter and news Internet files. Maximum Likelihood Estimation (MLE) of unigrams, bigrams and trigrams were computed.
To improve accuracy, (Jerlinek-Mercer smoothing)[http://www.ee.columbia.edu/~stanchen/papers/h015l.final.pdf] was used in the algorithm,combining trigram, bigram, and unigram probabilities. Where interpretation failed, part-of-speech tagging (POST)[http://en.wikipedia.org/wiki/Part-of-speech_tagging] was used to provide default predictions by part of speech. Suggested word completion was based on the unigrams. A profanity filter was also utilized on all output using a list of bad words.