Joyce Clemente
2024-10-17 (v.1); 2024-10-17 (last update)
| ngram | n_size | c | c_size |
|---|---|---|---|
| ugram | 398317 | -5 | 1506385 |
| bgram | 4689189 | -6 | 1415277 |
| tgram | 2593472 | -7 | 1367120 |
| qgram | 1868130 |
Approach
Sample calculation
| Min. | 1st Qu. | Median | Mean | 3rd Qu. | Max. | |
|---|---|---|---|---|---|---|
| no_letter | 1 | 462 | 2725 | 9173 | 12396 | 44264 |
| one_letter | 0 | 23 | 122 | 510 | 642 | 4986 |
| two_letters | 0 | 4 | 20 | 96 | 92 | 2347 |
Test 3x 4000 phrases, validate 1x 4000 phrases.
Large number total predictions per phrase (1000s; target word in 81 - 82% of phrases).
Highest accuracy from models n and nc.
Low accuracy for top 1 match (m01 ~0.12 - 0.53 out of 1).
Accuracy improves with clue (i.e. user provides first n letters).