Cassie (Xi) Guo
Dec 20, 2016
The goal of the project is to build a shiny app which takes two input words and generates the next word prediction. N-gram model is built based on the texts from blogs, news and twitter. The project includes the following tasks:
Modeling and testing
Final product
For observed third word: Discounted probability (default discount: 0.5)
For unobserved third word: Discounted probability mass is distributed to unobserved third word
Final prediction: Produce probabilities based on both bigram and trigram, take the word with the highest probability