PreDetective

A predictive text app by Nacho Rodriguez
Dec - 14th - 2014

What is PreDetective?

PreDetective is a free web application and a predictive text algorithm implementation.

How does it work?

- The user inserts a text in the input box and, automatically, gets a prediction for the next word of the phrase

  • It is easy! No more actions are required: no buttons, no dropdown selections…

Analytical Challenge: Learning from texts

The predictive algorithm uses a N-gram approach:

  • Starting from a big amount of text data extracted from news, blogs and Twitter, data is:
    • Read & Sampled, because using all data will not be neither usefull nor practical
    • Tokenized, which means that the texts are separated in small pieces (words)
    • Standarized, identifying lower and upper case conditions, numbers, blanks…
    • and Cleaned, for not using “bad words” or abbreviations

Then, frecuencies of 2,3 and 4 consecutive words are counted and the more frequent secuence will be selected as the prediction.

Technical Challenge: Quick web

The explained algorithm is implemented through a free web application and a quick response from the server when using the app was established as a key factor for success.

In order to achieve such a quick response, three different data structures are defined:

  • A dictionary of all the words known from the “learning” phase
  • A multidimensional matrix with all the relationships among words
  • A Hash environment which allows to transform every word to a number and viceversa, so every data structure could be written and read by direct access, optimizing the proccess

Summary

PreDetective is a free, easy-to-use, powerfull and quick web app for predictive text.

As a pilot project, it has demonstrated how to get through the:

  • Analytical challenge of creating a powerfull text-prediction algorithm
  • Technical challenge of leveraging it using a very agile web app

There is, of course, room for improvement:

  • Building a more complex and accurate algorithm by using bigger N-grams or sintactical models
  • Designing a prettier web user interface while keeping it as quick as it is right now

Thank you for your time, hope you like it!