P Bhanuprakash Reddy
02-Dec-2017
Objective: Create an Application to predict next word based on the words provided by the user
Methodology: The Application is built on Shiny interface. The algorithm used is “Katz Back Off” algorithm, where we use upto 3 words - Unigrams, Bigrams & Tri grams, to predict the next word.
Dataset: The dataset used for the application is the data provided with the assignment.
The algortihm used is “Katz Back Off”. Definition ( from Wiki) is as given below
Katz back-off is a generative n-gram language model that estimates the conditional probability of a word given its history in the n-gram.It accomplishes this estimation by “backing-off” to models with smaller histories under certain conditions. By doing so, the model with the most reliable information about a given history is used to provide the better results.
We have used Unigrams, Bigrams & Trigrams for predicting next word. Hence, the application needs at lease two words to predict the next word.