Prediction of the Next Word.

Gangathren Pillay
20 April 2016

Data Scientist Specialization Capstone Project

Introduction

The Next Word Prediction Shiny App predicts text using concepts of Natural Language Processing, and works in similar fashion as the ones found in smartphone keyboards. The reference data used for training the prediction model comprises of text from blogs, tweets and news.

Application Features

  • Predicts the next word as the user types in a word/ sentence based on the highest frequency of the word used together with the trailling word/sentence
  • Shows next possible 40 words sized in proportion to their probabilities using the Cloud of Word Alt

Predictive Model

The key aspect of the algorithm is usage of n-grams. N-gram is a contiguous sequence of n items from a given sequence of text. Before creating n-grams(uni, bi, tri for the study) the data was cleaned by removing numbers, punctuations, transforming to lowercase and stripping whitespaces from the sample text corpus followed by tokenization.

The algorithm matches words in combinations of one through three words with the n-gram database and gives predictions for next possible words.

Link to Application