Ngram-based text prediction

Yudhanjaya Wijeratne
6/8/2020

Introduction

Text prediction is an extraordinarily useful thing. It exists in smartphones, accessibility applications, search engines and many more cases where helping the user input data improves quality of service.

This project demoes a simple implementation of text predication which uses Twitter data and news as the source of its patterns. It's build in a corpus of text from Swiftkey - news, blogs and twitter.

After cleaning the text, we obtained unigram, bigram and trigram tables, and devised an algorithm that searches based on the complexity of the input query - if one word, it looks among bigram pairs; if larger, it looks through trigrams to suggest what single word might most likely come next.

The demo in action, working in your browser

Goals

The purpose was to create an application that would be:

  • easy for anyone to use, with an intuitive interface
  • light on the resources, so that it can be used in mobile phone applications
  • true to the language that people actually use and encounter in their daily lives

Usage

Using the app is simple: head on over to https://yudhanjaya.shinyapps.io/Simplewordprediction/

Enter a word or a set of words in the space

Click the button

And viola! The program predicts your next word based on common language seen across the Internet. Obviously, this is just a demo - this can be made vastly more sophisticated, faster, and even more easy to use.