07/10/2021

This app could help your business…

Does your business need to improve it’s technology game? …. the following text prediction app could help you do just that!

How

  • This simple to use application hosted on ShinyApps.io uses a wide corpus of tweets, blogs and news articles to help predict the next word in a sentance.

  • The app has been created in order to balance both speed for the user whilst also aiming to retain accuracy

The Text Prediction App

Balancing data and speed…

Data

  • The course data was kindly supplied by SwiftKey in the form of numerous Tweets, Blogs and News articles
  • In total there were over four million lines of data to process, which made the balance between speed and accuracy all the more important

Speed

  • To make the app as quick as possible the processing was completed using the following R Packages:
    • Quanteda: A text processing package used to create Tokens and N-Gram data
    • Data.Table: A faster processing alternative of using dplyr and data.frames in R
    • SQLdf: In order to quickly match input text to the N-Gram data using SQL calls
    • DoParallel: Parallelising processing tasks to speed up run time

The algorithm behind the app…

Complexity reduction

  • All text provided by Swiftkey was processed initially and then reduced by only including N-Grams which occur greater than 5 or 10 times in the text

Prediction process

The App is based on pre-computed N-Gram tables, from bigrams upto 5-Grams, containing prediction words and scores

  1. An input string is converted to tokens with punctuation, stopwords etc.. removed
  2. Each N-Gram table contains strings of words in one column, a predictor word and a frequency / score - e.g. String: “once_upon_a”, Predictor: “time”, Score: n
  3. A backoff model is used to find the most likely next word prediction
    • Match the last four words of the input string to the 5-Gram table and select the most frequent word
    • If no match exists backoff to a lower order N-Gram using a stupid backoff score and match
    • Failing this if no matches are found, the most popular unigrams are selected

The app itself…

Description

  • The application is hosted on Shiny Apps and predicts the next word given a user input string

How to Use the App

  • It is simple to use, just type in a text string on the left navigation bar, and the predicted word will appear on the right hand side
  • The app only updates once the button on the left hand side is clicked to refresh
  • Note that the app will only work at present with English as the language

How to Find the App