Babble App

JJ Espinoza
December 2015

Executive Summary

Over 80% of information today is unstructured and based on natural language- source: IBM
The success of films can be predicted by analyzing scripts-source: NY Times
Netflix has used text analytics to categorize and produce successful TV shows source: The Atlantic
The success of songs based on their lyrics is a continuing area of research source: M.I.T.
Given the rich applications of text analytics, the Babble App adds to the conversation via a free online predictive service

Babble Uses Robust Data

The data is from a corpus called HC Corpora (www.corpora.heliohost.org). See the readme file for details on the corpora available. The text data consist of blogs, tweets, and news stories collected via a web crawler.

plot of chunk unnamed-chunk-1

Katz's Backoff Model

The frequency of word triples, doubles, and singles are calculated and saved in lookup tables
Model uses these lookup tables to predict the following word
If phrase not found returns the word “the”

Last Two Words	Next Word	Frequency
I love	you	100
I love	bacon	75

Second to Last Word	Third Word	Frequency
love	it	200
love	sucks	150

Babble User Guide

Click here to go to the Babble Predictive Text App
Input text into the side panel, hit submit
Babble uses predictive text modeling to return the next word
User is provided with an accurate prediction