Coursera Data Science Final Capstone Project

Arezu A.
October 2016

This presentaion details the application for the data science capstone project from John Hopkins University.

The Objective

The goal of this project is to create a shiny application to highlight the prediction algorithm that was built and to provide an interface that can be accessed by others.

For this project a Shiny application was created. The app takes as input a phrase (multiple words) in a text box input and outputs a prediction of the next word.

The applied methods

In this project a sample file was made from three larger files: Twitter, News and Blogs. Then this sample file was cleaned by converting it to lowercae, removing punctuation, links and white space, numbers and all kind of characters. This clean sample data was then tokenized into n-grams.

The aggregated bi, tri and quadgrams have been used to trasferred into frequency dictionary. The resulting data files are used to predict the next word of a partially phrase enterd by the user in the application text box.

The user Interface

The user interface of this application accepts a partially entered phrase in the provided text box. Then next word prediction will be displayed along with the original phrase that was entered by the user:

User Interface

Additional Information