- The Kneser-Ney smoothing probability prediction algorithm had an accuracy <1% greater than that of the frequency/count prediction algorithm.
- The probability prediction algorithm took 28.83 minutes to predict the first 1000 lines of the test data, while the frequency/count prediction algorithm only took 3.33 minutes.
- The difference in data file/variable file size is significant, with the probability predictors being 10x larger for quadgrams than the frequency/count equivalents.
- The frequency/count prediction algorithm is recommended and used for it’s reduced size and computation time with minimal loss of accuracy.
Full methods documentation can be found here