Predict App Pitch

Sofia Riccomagno
02/08/2020

Overview

This presentation is a pitch for the Predict App.

The Shiny app can be found following the link:

Predict App

The source code can be found following the link:

GitHub repo

The Shiny App

Predict App is a Shiny app that uses a text prediction algorithm to predict the next word(s) based on text entered by a user. The application will suggest the next word in a sentence using an n-gram algorithm. An n-gram is a contiguous sequence of n words from a given sequence of text.

The text used to build the predictive text model came from a large corpus of blogs, news and twitter data. N-grams were extracted from the corpus and then used to build the predictive text model.

The Model

The predictive text model was built from a sample of 800,000 lines extracted from the large corpus of blogs, news and twitter data. The sample data was then tokenized and cleaned using the tm package and a number of regular expressions using the gsub function. As part of the cleaning process the data was converted to lowercase, removed all non-ascii characters, URLs, email addresses, Twitter handles, hash tags, ordinal numbers, profane words, punctuation and whitespace. The data was then split into tokens (n-grams).

As text is entered by the user, the algorithm iterates from longest n-gram (4-gram) to shortest (2-gram) to detect a match. The predicted next word is considered using the longest, most frequent matching n-gram. The algorithm makes use of a simple back-off strategy.

The User Interface

The UI is intuitive and user friendly. Even the less tech-savvy users can easily utilise the app.

The UI in split into two tabs, “Home” and “About”.

The Home tab is the app itself, where you interact and get your prediction.

The About tab is instructions on how to operate the app and contains a link to the source code in GitHub.