To insure accuracy and relevancy up to 4 preceding words used to predict the next one
Algorithm Description
Sample data from 3 corpora: news, blogs and twitter
Clean-up and stem the combined corpus
Create term matrix that contains 2 to 5 n-grams
Function is created to take a line of text and predict the word based on the maximum number of preceding words, i.e. start with 4, then 3, all the way to 1. The input does not need to be stemmed
The function outputs 5 most likely outcomes based on the frequency of occurrence in corpus. Results go through the stem completion to output most prevalent options based on same combined corpus (not stemmed)
If no matches found, the function returns “no match”
App Description
Enter text in the “Text Entry” box
Click “Predict”
Five most likely prediction in order of likelihood will appear
Select the appropriate suggestion and click “Accept”
Note: if you continue typing, suggestion will appear automatically and there is no need to click “Predict”