Philip Mayfield
1/22/2018
Predictive text using Markov Chain prediction methods.
The source text (Twitter, blogs, and news) were cleaned to remove puncuation, non-english words, and numbers and then divided into ngrams using the brand new (Nov 2017) aptly named R packaged called “ngrams”. An “nGram” is a sequence of n words that are commonly used. For example, a commonly used 4 word nGrams is “the end of the”. Thus, if some types “the end of” the algorithm will predict “the” as the next word.
My algorithm creates 2,3, and 4 word nGrams in order of commonality. The algorithm then uses the following order of priority.