Umut Kahramankaptan
January 2nd, 2017
Data Science Specialization - Capstone Project
John Hopkins University - Coursera
1 - A predictive text model is build via statistical analysis of the Corpus.
2 - Uni-grams, bi-grams, tri-grams and quad-grams are extracted for statistical prediction.
Katz back-off is based on consecutive levels of n-grams to estimate the conditional probability of a word by using as much predecessing words as possible. More details can be found via Wikipedia.
Prediction Performance: Katz's Back-off implementation used in this application requires 85.92 ms (± 1.84 ms) in average to produce 3 proposed next words, which is well under mean visual stimulus response time (approximately 190 milliseconds) [Wikipedia].
Predictive Keyboard enables mobile users to type faster without typo, even in small screens. This Shiny application demonstrates its potential value and can be converted into a web service.
Future Work:
Looking for opportunities not only suggesting the next word, but also suggesting words which is being. The n-gram suggestion might be altered accordingly.
With more computational power available, I would like to use WordNet from Princeton University to remove or correct words to their proper english form such as a word “aaaalright” will become “allright” or it will be removed.