Ken Mwai
Data Science Specialization:Capstone Project
The main objective of this project is to to create an algorithm to predict the next word based on the previous words typed by a user.
We utilise data sets from a corpus called HC Corpora.
We use the Natural Language Processing algorithms to work on the prediction.
We use the Katz BackOff Model for data prediction
The initial step in model training and building is learning about the n-grams in the data set. I focused on
Each n-gram was then broken down to a data frame and then computed the conditional a-posterior probabilities, using the naiveBayes() function.
A shiny application is created where the user inputs a phrase and the application makes either two or single prediction as per the users option. The prediction algorithm
The word prediction is already pre-calculated and stored. So the prediction is quick. The application shows predictions in both bi-grams and tri-grams so it gives the user two choices to select from
Improvements
Offer a score for the most common word