This program uses a combination of Markov chain rule & stupid backoff approaches. The algorithm was trained on samples (20%) of a set of 3 large text files : us_news, us_blogs and us_twitter.
Bi-grams, 3-grams and 4-grams have been built from the dataset thanks to the tidytext package.
Building predictions
Frequency of words have been computed out of those n-grams : P(word_n|value of n-1 words).
This basically means : based on the last words typed by the user (4,3,2 last words), what is the probability to have the next word being word_n ?
As an initial attempt, considering the last 3 words typed by the user, the program tries to figure out if there is a match in its 4-grams database.
If it exists, it returns 3 predictions for the next word by sorting results by considering highest probability.
If no match is found, the program tries to find entries corresponding to 3-grams and switches to 2-grams if entries aren’t found in the 3-grams database.