Cho Seng Mong
16 April 2016
The purpose of this project is to build a natural language model that suggests an appropriate next unseen word in the user specified words sequence. Three types of data including twitter, news and blogs were consumed to train the model.
Prior to building word prediction algorithm, the following steps were performed to clean the data files
The next word prediction model is based on the Katz Back-off algorithm. Here are the steps involved in predicting the next word of the user specified sentence - Load four compressed data sets containing sorted n-grams with cumulative frequencies.
A Shiny application was developed based on the next word prediction model described previously. Here are key features of the App