Next Word Predictor

Author: Vijay Mishra

This presentation will pitch an application for predicting the next word.

The Objective Statement

The main goal of this capstone project is to build a shiny application that is able to predict the next word.

This exercise was divided into seven sub tasks like data cleansing, Exploratory Data Analysis, the creation of a predictive model.

All text data that is used to create a frequency dictionary and thus to predict the next words comes from a corpus called HC Corpora.

The Methods & Models

This data sample was then tokenized into so-called n-grams.

In the fields of computational linguistics and probability, an n-gram is a contiguous sequence of n items from a given sequence of text or speech. (Source Link)

Those aggregated bi-,tri- and quadgram term frequency matrices have been transferred into frequency dictionaries.

The resulting data.frames are used to predict the next word in connection with the text input by a user of the described application and the frequencies of the underlying n-grams table.

The Usage of this Application

The user can enter the text (1), the field with the predicted next word (2) refreshes instantaneously and also the whole text input (3) gets displayed.

Application Screenshot

About