Craig Covey
10/6/2016
The objective of the capstone project is to build a predictive text model. An example of a predictive text model is the three possible word choices in a smartphones keyboard. One of the best third party smartphone keyboard apps is SwiftKey.
A predictive model takes a series of words or a phrase as an input and predicts what the most likely next word is. The most common method uses n-grams. N-grams are the counts of every n combination of words in a corpus (body of text). For example, a 2-gram of the sentence “The cow jumps over the moon” would be “the cow”, “cow jumps”, “jumps over”, “over the”, “the moon” and subsequent counts for each time a particular two-word combination occurs. Using the counts of every possible n-gram in a corpus, one can predict the next word given a phrase.
One issue with n-gram models is that it cannot predict the next word of a phrase if the phrase is not in the corpus. A popular solution to this problem is using an additional alogirthm called Stupid Backoff. Stupid Backoff was created by Google and “is inexpensive to train on large data sets and approaches the quality of Kneser-Ney Smoothing as the amount of training data increases.” Stupid Backoff starts with the highest n-gram constructed and if no exact matches are found it then reduces the number of words to n-1 and calculates again. This continues until all the n-grams are used. The final result is a comprehensive list of the most likely next words.
The Simple Word Prediction App can be found here