Word Prediction algorithm

Henrys Kasereka
December 26 2020

Introduction

The Coursera Data Science Specialization Capstone project from Johns Hopkins University (JHU) allows students to create a usable public data product that can show their skills to potential employers. For this iteration of the class, JHU partnered with SwiftKey (http://swiftkey.com/en/) to apply data science in the area of natural language processing.

  • The goal of the capstone project was to build a Shiny Application that takes an English sentence as input and predicts the next word. Those kinds are well known e.g. from smartphone keyboards. The whole task was very strongly influenced by the complex and wide field of Natural Language Processing (NLP).
  • The data used to build the prediction model originated from various English texts out of twitter, news, and blog sources.
  • My Shiny App for the capstone can be found (https://henryskas.shinyapps.io/Data-Science-coursera-capstone-project-JHU/).
  • To use the app, simply type in your sentence on the left hand side and hit the button of your keyboard for the NWP. The top five single word predictions, if available, with their respective probabilities will be displayed on the right panel of the App.

Instructions and possible developments

  • Input can be entered.
  • different topics can be explored.
  • Predictions are displayed.

Algorithm Development

The algorithm developed to predict the next word in a user-entered text string was based on a classic N-gram model.Using a subset of cleaned data from blogs, twitter, and news Internet files.

The Shiny Application

Using the algorithm, a Shiny (http://shiny.rstudio.com/) application was developed that accepts a phrase as input, suggests word completion from the unigrams, and predicts the most likely next word based on the linear interpolation of trigrams, bigrams, and unigrams. The web-based application can be found here.

The Shiny Application

Use of the application is straightforward and can be easily adapted to many educational and commercial uses. As depicted below, the user begins just by typing some text without punctuation in the supplied input box. As the user types, the text is echoed in the field below along with a suggested word completion. At the bottom of the screen, the predicted next word in the phrase is shown