Word Prediction Application

Riya Sutaria
15/12/2020

Introduction

The Coursera Data Science Specialization Capstone project by Johns Hopkins University(JHU) teaches students to create a public data product that can show their skills to their employers. For their capstone project JHU partnered with SwiftKey(http://swiftkey.com/en/) to apply data science in the field of Natural Language Processing(NLP).

The objective of this project was to build a word prediction model that predicts the next word after the input text. The data used in this model came from Coursera Dataset(https://d396qusza40orc.cloudfront.net/dsscapstone/dataset/Coursera-SwiftKey.zip).

Algorithm Development

N-gram model[1] was the base of the algorithm developed to predict the next word. As the data set was very large, a small portion of it was used to compute the n-grams. The n-grams like unigram, bigram and trigram are used.

To improve the accuracy, smoothing was applied to the algorithm, combining unigram, bigram and trigram probabilities. The default predictions were employed using the part of speech tagging(POST)[2].

1-https://en.wikipedia.org/wiki/N-gram

2-https://en.wikipedia.org/wiki/Part-of-speech_tagging

The Shiny Application

After the algorithm was developed a Shiny application was developed using that algorithm that accepts a phrase as input and predicts the most likely next word based on the n-grams. Here is the link to the Shiny Application (https://riyasutaria.shinyapps.io/app-10/).

Application Usage

The application is well organised and easy to use. It can be adapted to many educational and commercial uses also. The figure shows below how the application interface looks and how to use it. Just enter your phrase in the input text box and then the predicted word is shown in the output text box below.

alt text