Coursera: Data Science

Capstone Presentation

This presentation submitted in fulfilment of the requirements for The Johns Hopkins University Data Science Specialization Capstone coursework as delivered through Coursera.org

Objectives

Following are the objectives of this project:

Develop an algorithm which can predict the next word while a user enters a word or phrase
Present the work (Prediction Algorithm) as a Shiny App

Process of Algorithm Development: 1

The complete process of algorithm development will follow the below steps:

Download Data and Build Corpus:
Random Sampling is performed on provided corpus to get training, validation and test set.
Process Data and Create N-grams:
Training set is subsequently cleaned (removal of html tags, emails, twitter handles, punctuations etc) and N-grams tokens were created.
Create Text Model:
Develop N-gram frequency tables and model for text.

Continues…

Process of Algorithm Development: 2

Build Prediction Function:
Develop N-gram frequency tables and model for text.
Model Validation and Prediction:
Predict the next word based upon the algorithm developed.
Accuracy and Remodel Algorithm:
Check for prediction accuracy, if any, and remodel the algorithm for reprediction.

Final Project

Instructions:

Type any phrase in the test box in the Shiny App.
The prediciton algorithm (behind the app) will try to predict next word.
The Shiny App

The app can be accessed from this link:
Next Word Prediction ShinyApp