C. Giner-Baixauli
June 30, 2018
Next Text is an application which uses predictive text models to make it easier for people to type on their mobile devices.
The application gets an incomplete sentence as input and uses a dictionary to find a word that can continue the sentence. That word is given as output.
First of all, we created a corpus from the HC Corpora data.
We got a sample and cleaned it by converting to lowercase and removing punctuation, white space, numbers and other special characters.
Then, we tokenized the data sample into n-grams and created frequency dictionaries of bigrams, trigrams and tetragrams.
We also obtained a list of profanity terms in order to filter the prediction results.
The operation of the app is quite simple, it gets an incomplete sentence and measures its number of words.
In order to avoid errors, if the algorithm doesn't find any word to make the prediction, the app will use the word “it”, which is the most common noun in English.
The application is avaiable at https://ginerbaixauli.shinyapps.io/NextText