Maher Harb
December 7, 2014
Presentation for the Coursera/JHU Data Science capstone project
The implemented n-gram prediction algorithm assumes that one can predict the next word in a phrase based on the previous n-1 words (Markov approximation). The following were key steps in the implementation of the algorithm:
The main findings of the validation exercise were an out-of-sample prediction accuracy of ~14% and determining that increasing the n-gram length beyond n=4 did not have any appreciable effect on accuracy.
The model was packaged as a shiny application, supported with the jQuery-ui autocomplete widget. This allows users to quickly pick predicted words with the keyboard. The instructions for using the application are:
The model has an acceptable prediction accuracy to begin with. Further improvements of the model may focus on 2 areas: