8/28/2020

Project Background

In today’s connected world, we are spending more and more time on our devices accessing social media, email, and many other reasons. However, typing on many of our devices can be difficult and frustrating.

As a joint-project with John Hopkins University and Swiftkey through Coursera, we have partnered to develop text prediction software. This prediction software can help provide a solution to this problem.

Model Description

The model has sourced text data from a variety of sources; blogs, news stories and twitter. The data has been cleaned and organized in a way to create ranked n-grams, or word combinations. These can range from single word n-grams to complex n-grams encompassing 5 or more words.

The Nextword app, implements the Katz Backoff model, which attempts to match the maximum n-gram that has been typed, then ‘back-offs’ to lower n-grams to find a match.

For instance, if you have typed, “Once upon a”, it will return “time”, however if no 4-grams existed beginning with that phrase, it would ‘back-off’ and check to see if any 3-grams are able to be matched, etc.

Ease of use…

As you type, your text, along with the next predicted word, are clearly displayed for you to see.

And, whenever dealing with large amounts of data, it is always a trade-off between accuracy and speed. You must find the correct balance.

Acknowledgements