Pier Lorenzo Paracchini
28.05.2016
Around the world, people are spending an increasing amount of time on their mobile devices for email, social networking, banking and a whole range of other activities. But typing on mobile devices can be a serious pain.
When someone, for example, types “I went to the” the application should presents at least three options for what the next word might be and it should be able to run as a mobile/ web app in a responsive way.
The data to used for building the predictive model is coming from the HC corpora. The corpora is a collection of 3 different corpus (twitter, news and blogs) with the aim of getting a varied and comprehensive corpus of current use of the languages.
The original corpora, with focus only on the english language (en_US), includes:
Different models have been implemented: n-grams (n = 1,2,3), linear interpolation (n-grams, n = 1,2,3) with Good Turing smoothing and “Stupid” backoff (with no discount).
The model evaluations has been done using the perplexity measurement and an ad-hoc testing dataset (around 40 sentences). The “Stupid” Backoff model was the one able to minimize the perplexity measurement.
I would like to express my deepest appreciation to the great professors of Johns Hopkins University for making this specialization available at Coursera. Special kudos to all of the participants of this Capstone project for the valuable discussions, tips and tricks made available in the forums. If you want to keep in contact please just add my LinkedIn profile to your LinkedIn connections.
It has been a long and challenging journey with ups and downs, worth every single moment. Thank you to you all!!