Word Prediction Application

Murtuza Ali Lakhani
January 20, 2016

Introduction

We are well into the era of big data and small devices, including wearables, smartphones, and tablets.
An impediment of small devices is the lack of traditional keyboards for textual and data entry.
Providing users with preemptive assistance in typing can help alleviate the shortcoming of small devices.
This application is a first step toward creating a knowledge-based algorithm that predicts the next word, thereby facilitating data and textual entries by users.

Methodology

The foundation of this algorithm is a given corpora consisting of certain blogs, news, and twitter feeds.

A small random sample is drawn from of this corpora to build a training set, which forms the basis of machine learning.

Unigrams and trigrams are extracted–and probabilities of the terms are captured into transition matrices.

A trigram Markov model is built to read the probability matrices to determine the word most likely to follow, given a word, sentence, or phrase.

This prediction model is hooked up into a user interface that enables web-based interactivity and publication.

Data Product: Word Prediction

The word prediction application can be found at: https://alakhani.shinyapps.io/PredApp/

This application is available 24 x 7. There are three tabs in this product to guide the participants. 1. The PREDICTION RESULT tab displays the predicted word based on the user entry submission. 2. The DESCRIPTION tab provides a brief description of the application. 3. The HOW TO USE THE TOOL tab breaks down the instructions for using the application.

It must be noted that this application is less than perfect at this time. In many cases, the predictions do not match expected results. This presents an opportunity for continuous improvements.

Why this Matters

This word prediction application leverages the power of big data and machine learning to address a shortcoming of small devices.

By predicting the next word, this application is a humble attempt at enhancing user experience and effectiveness.

Ideas and inspiration for this project came from several sources, including: 1. NLP course at Stanford/Coursera. 2. Data Science certification material at JHU/Coursera. 3. Work of exceptional previous learners, such as Dormantroot, TomFonte, Ivanliu, and others.