Paul Barry
20 May 2017
The goal of this application is to allow the user to type in a phrase and the application will respond with a list of suggested next words, from the most likely down. In fact, this application will also suggest the next two words.
The application uses what is called an n-gram model, where an n-gram is a sequence of n words. Thus “the cow jumped” is a 3-gram. The application uses a store of 1-grams, 2-grams, 3-grams, 4-grams and 5-grams gleaned from a selection of tweets, blogs and news feeds. It seeks to match the last n-1 words of the user's phrase to the first n-1 words of an n-gram, and then the n-th word (which is the last word) of the matching n-gram is used for the prediction. Of course, if the phrase is very long it has to be cut down to its last 4 words, which is then matched against available 5-grams. If no match is found, it is cut down to 3 words and matched against 4-grams, and so on. It may match more than one n-gram, in which case we take the next word from the most popular matched n-gram.
[1]: W. A. Gale, Good-Turing Smoothing without Tears, Journal of Quantitative Linguistics, 2 (1995), 217-237
The application is available at https://pbarry.shinyapps.io/Predict_New/