CTPDeck

Carlos Mercado

December 5, 2017

The Plan

Using over 4 million text entries, combined from twitter, blog posts, and news articles, we will create a predictive text engine in a Shiny App.

The steps will be as followed:

The Benefits

Creating a text prediction app allows for more automation throughout web systems.

For example, many websites contain FAQs, but often people will search the website through a search bar instead. They use natural language and trust the search engine to get them to the answer. When they have trouble, they often go to the “contact” or “comment” feature of a site and leave a comment written in natural language as well.

A Predictive Text feature could be added to all websites, which would narrow down the most common searches on the website. From there, you could better connect pre-written FAQs to pre-predicted searches. This saves the business time and money by reducing the need for people to read people’s comments. It also better serves the customers by allowing them to narrow their searches to those that return pre-written FAQs.

The Process

The app takes an input, then indexes the corpus to find which entries contain the input. It then repeats the process by removing words to increase flexibility (default is everything but last two words). Entries that contain more of the words are weighted by duplication.

“I want to go to the beach because I like _________”

First, it removes stopwords - words that are extremely common in English and often interchangeable. Our example sentence becomes:

“want go beach like”

From there, it creates an n-gram of the relevant size (the number of words + 1), counts them, sorts them, and returns the last word (i.e. the predicted word) as a proportion of all the counts.

Some Test Cases

One example you can test is provided:

“I want to meet someone” predicts: new, else, like