VOICED

GE
4/15/2015

Virtual Online Input Capable Expression Dissector

VOICED

An online application that can predict future word output based of previous inputs.

Useful for:

  • Mobile text keyboard suggestions
  • Generating filler text
  • Creating online internet bot personalities

VOICED is currently hosted on Shinapps.io

How is it made?

This application was developed by learning from actual human authors. At it's heart is a what I call a stupid N-gram back-off model. Millions of internet posts were consumed from:

  • Twitter
  • Blogs
  • News Articles

Features were extracted from this data, cleaned, and stored as N-grams. An N-gram is a series of words found in a certain order in text. As an example “New York City” is a frequently found 3-gram, also called a trigram.

How does the application work?

The application looks at the last 3 words, and sees if it can find a matching 4-gram in it's database. If it does not, it searches for a matching trigram, then bigram, and ultimately unigram. It defaults to the longest and most popular gram.

The same procedure is used for shorter word inputs, but searching through smaller N-grams first.

Each predicted word takes less than a second to find. If multiple words are requested at once, then each word is appended to the end of the input text, and the model is run again.

Uses

Since this application can be trained with any human text input, we could use it to generate interesting and often funny bots. Some ideas:

  • Create a twitter account that posts poems in the style of Shakespeare by feeding it the complete master works.
  • Make a virtual “reporter” that summarizes the days events by looking at all the top articles each day and generating text.
  • Make a bot to post to your internet forum to make it look more active than it actually is.

However you use it, be careful to sanitize your input or you'll end up repeating Microsoft's mistake of creating an evil robot.