Data Science Capstone Project: Next Word Prediction

Nils Gimpl

The following presentation will give a short introduction for the next word prediction app which is part of JHUs final data science project.

Objective

The main goal of this capstone project is to build a shiny application that is able to predict the next word.

This exercise was divided into seven tasks such as data cleaning, exploratory data analysis and development of a predictive model.

All text mining and natural language processing was done with the usage of a variety of R packages.

Methods and Models

After creating a data sample, this sample was cleaned by conversion to lowercase, removing punctuation, links, white space, numbers and all kinds of special characters. In the next steps data sample was then tokenized into n-grams.

After this, the bi-, tri- and quadgrams are aggregated and transferred into frequency dictionaries. The generated n-gram data frames are used in the following step to predict the next step of the users text input in the app.

Usage

App screenshot

  • Enter your text into the text box.

References