Babble App

JJ Espinoza
December 2015

Executive Summary

  • Over 80% of information today is unstructured and based on natural language- source: IBM
  • The success of films can be predicted by analyzing scripts-source: NY Times
  • Netflix has used text analytics to categorize and produce successful TV shows source: The Atlantic
  • The success of songs based on their lyrics is a continuing area of research source: M.I.T.
  • Given the rich applications of text analytics, the Babble App adds to the conversation via a free online predictive service

Babble Uses Robust Data

The data is from a corpus called HC Corpora (www.corpora.heliohost.org). See the readme file for details on the corpora available. The text data consist of blogs, tweets, and news stories collected via a web crawler.

plot of chunk unnamed-chunk-1

Katz's Backoff Model

  1. The frequency of word triples, doubles, and singles are calculated and saved in lookup tables

  2. Model uses these lookup tables to predict the following word

  3. If phrase not found returns the word “the”

Last Two Words Next Word Frequency
I love you 100
I love bacon 75
Second to Last Word Third Word Frequency
love it 200
love sucks 150

Babble User Guide

  • Click here to go to the Babble Predictive Text App
  • Input text into the side panel, hit submit
  • Babble uses predictive text modeling to return the next word
  • User is provided with an accurate prediction