Sougata Biswas
26 April, 2015
The end goal of the Data Science Specialization Capstone Project is to produce a predictive text algorithm in R that based on a user’s text input the system will suggest the next most likely word to be entered.
If any user has a partial sentence or an incomplete sentence, then this algorithm can predict immediate next word which has highest probability to occur.
The steps are :
At the back, we need to load 1-grams, 2-grams, 3-grams & 4-grams data frame files.These data are already cleansed with N-Grams frequency in decending order.The data was convert to lower case, punctuations removed, numbers removed, white spaces removed, non print characters removed.Then, the algorithm uses Markov Chain Model for prediction.
The predictive model works fine. It predicts next word each time.Only, prediction accuracy is not good. I am working on to increase prediction accuracy. May be, in near future, I will be able to present more accurate model.