Charin
The backoff model is a Markov-chain based model where the highest-order n-grams (in our case quad-grams) are used first to determine the next word. If there is no match, the lower-order n-grams are used ending with uni-grams, effectively selecting the single word with the highest probability in the corpus.
user system elapsed
0.754 0.018 0.774
pred pkn ngram
1: most 0.11220957 4
2: best 0.08201405 4
3: first 0.03617392 4
4: worst 0.02200244 4
5: biggest 0.01932833 4
Below is the benchmarking results using Jan-san's implementation. The numbers in parantheses are those of baseline predictions.
Overall top-3 score: 14.31 % (6.64 %)
Overall top-1 precision: 10.23 % (5.42 %)
Overall top-3 precision: 17.82 % (8.11 %)
Average runtime: 5.00 msec (0.09 msec)
Number of predictions: 28464
Total memory used: 704.73 MB (286.76 MB)