1st August 2020
Word prediction using Katz backoff algorithm
This project is made with dedication to learn prediction of words with the help of natural language processing and choosing the best algorithm for the prediction.
Link to the app for the prediction model: https://parichayk.shinyapps.io/predict_word/.
The data used in the model is the data provided by the John Hopkins Univeristy.SInce the dataset was to large and was taking time to be processed we have subset the data to 10% using rbinom function.
The data has been tokenized using tm_map package and profanity words have been removed to enhance the output produced.
The data was first subdivided into ngrams and bigram and trigram were processed and smoothened to be used in the predictive algorithm.
The predictive algorithm used in the model is Katz backoff model