Flavio Oliveri
2021/08/12
Johns Hopkins University
Coursera Data Science Specialization
The goal of this presentation is to pitch the Word Prediction app with a brief explanation about the algorithim used in the text prediction.
Also the user interface will be described.
Word Prediction application suggest the next word in a phrase using an n-gram algorithm.
The text used to build the model were collected from blogs, news and twitter data. Bigrams, Trigrams and 4grams were extracted from the corpus and used to build the model.
To build the model a sample of 1,000,000 lines from blogs news and twitter were used. The sample was tokenized and cleaned applying this conversions:
2grams, 3grams and 4grams were built by the resultant tokens
With the text entered by the user the algorithm iterates the n-grams to find a match. The result is the longest n-gram with the higher frecuency.
Once the user finish typing in the text box, up to 3 predictions will appear on the side