2025-01-31
https://satoshiohnishi.shinyapps.io/word_prediction_app/
The basic model is 4-gram, but combinations with low frequency may not yield desirable results. Combinations that are not seen at least 10 times in the corpus are excluded.
If a word combination does not fit within the above 4-gram model, then the prediction will fall back to a 3-gram (the preceding 2 words) with a frequency of at least 10 times. If that also doesn’t fit, then it will fall back to a 2-gram (1 word) with a frequency of at least 10 times, predicting the word with the highest frequency among these combinations.
| Previous 3 words | Next word | Probability |
|---|---|---|
| I have a | dream | 34.5% |
| pen | 10.2% | |
| book | 7.5% |
N-garm predicts the next word based on combinations of words. It statistically investigates word combinations from a large number of documents and predicts the next word with the highest probability.
In this case, the next word after ‘I have a’ would be ‘dream’. This app uses N-garm to predict the next word.”
The model is based on about 3 million lines of English news, blog, and Twitter text, but it predicts the most frequent vocabulary from up to 3 words ago, so it may not give you the results you want.
There are models that capture context more accurately, such as transformer. If given the chance I’d be up for the challenge.
Thank you!