The model has several features to make it run more quickly for the user.
- Pruning. N-grams that occur less than 4 times in the corpus have been removed from the n-gram tables loaded into the app, since they have little predictive value.
- Data tables. The app takes advantage of the fast indexing available using the data table package and setkey function, yielding predictions faster than using data frames.
- Output length. The algorithm stops and returns the list of unigrams once it has generated the number of next-word predictions that the user specifies with the slider input. The model only “backs off” to lower order n-grams if it needs to add more items to the list.