Kapil Malik
25 Apr, 2015
This application predicts the next word by looking at previous 1, 2 or 3 words from user input text. The application builds an n-gram model from an english text corpus consisting of -
Similarly build bigram and trigram models.
Note : I used Apache Spark for processing raw data and output csv files. These were in-turn translated to RDS files using R to be loaded in application.
Used all 500,000+ unigram models, but only 1 million (out of 13 million) bigram models, representing 80% of all bigrams) and only 200,000 trigram models.
Please note that an ngram model here will lookup last n words to predict last word.
Backoff Prediction
The application is available here
Input Panel
You can find more about the application and dataset here -