- Interpolation did not produce significantly different results from the backoff model, and required more resources, so its use was abandoned.
- Profanity was kept in the n-gram search set, but filtered out of the predictions. If Plus One predicts profanity, (censored) is returned.
- The 4-grams are given the highest priority when matching for predictive purposes. This may have caused overfitting of the training set, though testing outside the initial data showed good predictions.
See the app here: http://kbrenchley.shinyapps.io/PlusOne/