Capstone Project: Prediction Model

12 December 2018

Introduction

This project uses natural language processing to predict the next word, based on the text input by the user. Text prediction is becoming increasingly important as people spend more and more time on their mobile devices for email, social networking, banking and a whole range of other activities.

Text Prediction App

The app allows users to input text and it makes a prediction on the next possible word. Its prediction algorithm is based on a large data set which includes text from the news, blogs and twitter.

To use the app:

  • 1 Input text and click submit
  • 2 The prediction algorithm will provide options of the next possible word, ranked based on likelihood

It is that simple to use!

Prediction Algorithm

The prediction algorithm is based on the capstone dataset, which includes entries from blogs, news and twitter in the English language.

A sample dataset is used, as not all the data is required to build a model. Often, relatively few randomly selected chunks can yield an accurate approximation to results that would be obtained using all the data. The subset is then cleaned to ensure consistency.

Appropriate tokens are identified which allowed for n-grams are built. The prediction algorithm uses n-grams, which provide a better understanding of frequencies of word and word pairs in the data set.

End

Thank you!