Next word suggestion based on frequency calculation of n-grams from a random sample taken from the input corpora.
Automatic filtering of profanities, and most common French and Spanish words.
2-grams to 4-grams are ordered by frequencies, and split “(n-1)+1” as “input+next word”, with minimal numbers of occurrences depending on chain length.
2-grams are complemented by a list of synonyms, and longer word chains by a list of common expressions.
Finally, Input | Next Word are gathered in a 2-column database ordered by likelihood.