Next Word App

Ricardo S. Carvalho
11 June 2016

Capstone Project

Data Science Specialization

How it works

How does the app calculates probability of next word?

  • It uses language modeling with trigrams and Modified Kneser-Ney Smoothing.

Why the app uses Modified Kneser-Ney Smoothing?

How does the app deals with unknown trigrams, bigram or unigrams?

  • It performs backoff for unknown trigrams to bigrams, bigrams to unigrams, and uses only known unigrams.

Kneser-Ney Smoothing

alt text The only difference for the modified is that the discount is different for each n-gram.

How to use

How to start using?

  • Start by typing any text in the input available on the left part of this page.
  • Right above the input the app shows three sugestions for next work based on the input text provided.

What are these sugestions for next word?

  • They are from LEFT-TO-RIGHT the most probable words you would type based on the input provided.
  • Therefore, the FIRST WORD on the LEFT is the MOST PROBABLE WORD you would type based on the input provided.

Conclusion

How is the app so fast to show the results?

  • It already loads pre-computed probabilities, so it does not re-calculates every time, just performs a fast lookup.

What is the novel approach here?

  • Modified Kneser-Ney Smoothing with backoff combined and super fast results for the suggested next word predictions.

Link for the app: https://ricardosc.shinyapps.io/NextWord/