The end goal of the Data Science Specialization Capstone Project is to produce a predictive text algorithm in R that based on a user’s text input the system will suggest the next most likely word to be entered.
Sougata Biswas
Data Scientist
The end goal of the Data Science Specialization Capstone Project is to produce a predictive text algorithm in R that based on a user’s text input the system will suggest the next most likely word to be entered.
If any user has a partial sentence or an incomplete sentence, then this algorithm can predict immediate next word which has highest probability to occur.
The steps are : 1. The user enters the incomplete sentence or partial sentence. 2. The algorithm utilizes its trained Decision Tree to figure out next word. 3. The algorithm finds out the output ie. the "word".
At the back, we need to load 1-grams, 2-grams, 3-grams & 4-grams data frame files.These data are already cleansed with N-Grams frequency in decending order.The data was convert to lower case, punctuations removed, numbers removed, white spaces removed, non print characters removed.Then, the algorithm uses Markov Chain Model for prediction.
The predictive model works fine. It predicts next word each time.Only, prediction accuracy is not good. I am working on to increase prediction accuracy. May be, in near future, I will be able to present more accurate model.