Luis Espinosa
March 22, 2017
A text predictor is being installed in every smartphone, is used to improve speed in text writing by reducing the amount of typping a user have to make to write a message.
EASY TEXT uses more than 3 million sentences downloaded from 3 different sources (news, twitter and blogs), the text is cleaned from undesire sentences and formats, then created a tidy database containing 6 to 2 word phrases sort by the most common used.
Using these databases, easy predictor allow the user to enter a phrase and the system will predict the next word.
Downloaded 3 databases with more than 4.25 million phrases from three sources (news, blogs and twitter) and extracting a 10% sample.
Phrases where cleaned, removing hashtags, twitter mentions, links, non-ASCII symbols, numbers and profanities.
Phrases where separated in 5 databases each one containing phrases with 2 words, to 6 words. This strategy was decided to improve accuracy by using the context of the words.
The phrases where filtered by the frecuency of appearing in the text, assuming the most common phrases with became the best prediction. Only kept the 3 most common phrases of each “last word prediction” to reduce the size of databases and improve speed in prediction.
The final product uses only these 5 reduced databases, depending of the length of the phrases go in cascade from 5 to 2 word phrases seeking for the best prediction.
EASY PREDICTOR is 100% online, that mean you don't need to install anything in your devices, it was made using RStudio Shiny technology.
It can be used directly in https://ebouvy.shinyapps.io/textpredictor/
Using EASY PREDICTOR is simple and fast, just write or paste a phrase in English language, press PREDICT one time and it will show you the best prediction of the next word, press PREDICT one more time and the second best will be shown
EASY PREDICTOR was made in colaboration with SWIFTKEY and the John Hopkins University Data Science Coursera Course.
Luis Espinosa
espinosabouvy@gmail.com
Mexico
March 2017