NLP Predictive Text App Pitch

2024-01-10

Background

Over the past few years, predictive text algorithms have become commonplace in many different applications and electronic devices
Thanks to a collaboration with SwiftKey, I was able to use their curated text corpuses/datasets in order to create my own predictive text application

An example of a common predictive text feature

As previously mentioned, SwiftKey has provided the data from a corpus of text called HC Corpora
The data used to train the model consists of webscraped lines from three different sources: news pages, blog postings, and Twitter posts
A training set of 100,000 lines was used to train the model (40,000 lines from the news sources, 40,000 lines from blog postings, and 20,000 lines from Twitter posts)

For this text processing model, I decided to use a Stupid Backoff Algorithm from the SBO package
The sbo_predictor() function does much of the heavy lifting when it comes to building the model, as it has convenient integrated arguments to preprocess and filter the text data to a user’s needs
The model is created through an sbo_predtable() object, which is then saved as an .rds object out of physical memory. The model can then be quickly recovered/loaded into the Shiny app itself
This means that model predictions are essentially instant within the app, with almost nonexistent lag or delay:

To use the app, input a sentence or phrase into the text box that you wish to predict the next word of
From there, simply click the button to generate the next word and the algorithm will return the top three most probable words that follow the inputted phrase
NOTE: This is a very basic, first-attempt implementation of a text prediction algorithm. There are certainly ways to improve the accuracy of such models that are outside the scope of this project