OCtober 12, 2018

Introduction

This Data Science capstone project aims at developing an application that predicts the next word on a sentence, based on what was typed.

In a partnership with Swiftkey, some data corpora from twitter, blogs and news sites were available for the model design.

Algorithm

The data was previously loaded, and pre-processed, by:

  • Removing unwanted characters
  • Removing numbers and punctuation
  • Lowering cases

Prediction Model

With the processed data, n-grams of the texts were created (one to five-grams). And the frequency of next words per gram was calculated.

So for every text input, the algorithm checked the highest gram it could use and for that model, what was the next word with higher frequency, to suggest the user.

Instructions

To use the application, the user needs to start typing an english sentence in an input box. The output box will automatically preidct the next most likely word, based on what was typed.

Example: