Renato P. dos Santos
1st March 2017
Built to fulfill the Johns Hopkins Data Science Specialization Capstone Project requirements.
As people around the world are spending an increasing amount of time on their mobile and accessibility devices, predictive text becomes a most-needed input technology. However, old systems, such as T9 and WordWise, sometimes produced hilarious “damn you autocorrect” results.
This more advanced, probabilistic-language-modeling-based approach 'knows' how certain words tend to be combined together in our language and tends to exhibit a far greater accuracy.
It was developed within a partnership of Johns Hopkins with SwiftKey, the leading company on predictive text input for Android and iOS keyboards.
Speed: The probabilities were all previously computed and are loaded before execution. The app searches through millions of words down the tables to instantly recover the most likely next word.
Versatility: The algorithm is handles many contractions used in Internet language: e.g. “2 b or not 2” will be translated as “to be or not to” and “be” will be suggested as the next word.
Safety: Profanities and bleeped words (e.g. 'f***' and 'f#@%') are removed from user input as were also previously removed from the tables.
Naturalness: Stopwords, on the other hand, were left in, as they are present in normal language and could be the expected next input from a user.
It is based on the widely used 4-gram language model and on the “Stupid Backoff” approach. More details can be found here.
The goal of this project was to create a product to highlight the prediction algorithm built and to provide an interface that can be easily accessed by others.
The working app is available here
It is simple and intuitive to use. Just type in the first few words of a sentence and the suggested next word will immediatly show up on the right, as shown below.