The next word prediction application

Satoshi Ohnishi

2025-01-31

How to use the application

https://satoshiohnishi.shinyapps.io/word_prediction_app/

How to predict next word

  • The basic model is 4-gram, but combinations with low frequency may not yield desirable results. Combinations that are not seen at least 10 times in the corpus are excluded.

  • If a word combination does not fit within the above 4-gram model, then the prediction will fall back to a 3-gram (the preceding 2 words) with a frequency of at least 10 times. If that also doesn’t fit, then it will fall back to a 2-gram (1 word) with a frequency of at least 10 times, predicting the word with the highest frequency among these combinations.

What is N-grams

Previous 3 words Next word Probability
I have a dream 34.5%
pen 10.2%
book 7.5%
  • N-garm predicts the next word based on combinations of words. It statistically investigates word combinations from a large number of documents and predicts the next word with the highest probability.

  • In this case, the next word after ‘I have a’ would be ‘dream’. This app uses N-garm to predict the next word.”

Limitations and Challenges

The model is based on about 3 million lines of English news, blog, and Twitter text, but it predicts the most frequent vocabulary from up to 3 words ago, so it may not give you the results you want.

There are models that capture context more accurately, such as transformer. If given the chance I’d be up for the challenge.

Thank you!