Next Text: a natural language processing application

C. Giner-Baixauli
June 30, 2018

Next Text: a natural language processing application

my image

Next Text is an application which uses predictive text models to make it easier for people to type on their mobile devices.

The application gets an incomplete sentence as input and uses a dictionary to find a word that can continue the sentence. That word is given as output.

Developing (I)

First of all, we created a corpus from the HC Corpora data.

We got a sample and cleaned it by converting to lowercase and removing punctuation, white space, numbers and other special characters.

Then, we tokenized the data sample into n-grams and created frequency dictionaries of bigrams, trigrams and tetragrams.

We also obtained a list of profanity terms in order to filter the prediction results.

Developing (II)

The operation of the app is quite simple, it gets an incomplete sentence and measures its number of words.

If it's a word, the app will use the bigrams dataset to predict the next word.
If it's two words, the app will use the trigrams dataset.
If it's three words or longer, the app will predict the next word by using tetragrams to the last part of the sentence.

In order to avoid errors, if the algorithm doesn't find any word to make the prediction, the app will use the word “it”, which is the most common noun in English.

Why Next Text?

Quick execution: the app works while you are writing the sentence.
Low memory usage: it can be loaded on smartphones and tablets.
Error-proof: it will always predict a word.
Profanity filter: inappropriate or obscene language cannot be found.
Simple and intuitive design: anybody can use it.

The application is avaiable at https://ginerbaixauli.shinyapps.io/NextText