Capstone Data Science Specialisation: The TextPredictR App

07/10/2021

This app could help your business…

Does your business need to improve it’s technology game? …. the following text prediction app could help you do just that!

This simple to use application hosted on ShinyApps.io uses a wide corpus of tweets, blogs and news articles to help predict the next word in a sentance.
The app has been created in order to balance both speed for the user whilst also aiming to retain accuracy

The Text Prediction App

The course data was kindly supplied by SwiftKey in the form of numerous Tweets, Blogs and News articles
In total there were over four million lines of data to process, which made the balance between speed and accuracy all the more important

To make the app as quick as possible the processing was completed using the following R Packages:
- Quanteda: A text processing package used to create Tokens and N-Gram data
- Data.Table: A faster processing alternative of using dplyr and data.frames in R
- SQLdf: In order to quickly match input text to the N-Gram data using SQL calls
- DoParallel: Parallelising processing tasks to speed up run time

All text provided by Swiftkey was processed initially and then reduced by only including N-Grams which occur greater than 5 or 10 times in the text

The App is based on pre-computed N-Gram tables, from bigrams upto 5-Grams, containing prediction words and scores

An input string is converted to tokens with punctuation, stopwords etc.. removed
Each N-Gram table contains strings of words in one column, a predictor word and a frequency / score - e.g. String: “once_upon_a”, Predictor: “time”, Score: n
A backoff model is used to find the most likely next word prediction
- Match the last four words of the input string to the 5-Gram table and select the most frequent word
- If no match exists backoff to a lower order N-Gram using a stupid backoff score and match
- Failing this if no matches are found, the most popular unigrams are selected

The application is hosted on Shiny Apps and predicts the next word given a user input string

It is simple to use, just type in a text string on the left navigation bar, and the predicted word will appear on the right hand side
The app only updates once the button on the left hand side is clicked to refresh
Note that the app will only work at present with English as the language

The application can be run from the following address: https://ma3pab.shinyapps.io/TextPredictR/