Prediction Model:
According to Wikipedia (N-gram, n.d.), “an n-gram is a contiguous sequence of n items from a given sequence of text or speech.” This package takes a key word or phrase, matches that key to the most frequent n-1 term found in a TDM of n-word terms, and returns the nth word of that item.
Of course, not all possible words or phrases exist in the corpus from which the TDM was derived. For this reason, a simplified Katz’s back-off model is used, which backs off to smaller n-grams when a key is not found in the larger n-gram. The maximum n-gram handled is a trigram. The word returned is the match found in the largest n-gram where the key is found. When the key is not found in the unigram, the most common word in the corpus “will” is returned. This function is demonstrated using a Shiny app hosted on shinyapps.io