Coursera Data Science Capstone Final Project

"Abdelghani Elsagher"
"January 18th 2016"

Introduction

This presentation serves as an introduction to an application for the capstone project of the Coursera Data Science specialization by Johns Hopkins University in cooperation with SwiftKey.

About the Capstone Project

The capstone project is designed to allow students to create a usable/public data product that can be used to show skills to potential employers. The project's data is drawn from real-world data. The goal of this exercise is to create a product to highlight the prediction algorithm that can be accessed by an app interface easily used by others.

The Objective

The goal of the project is to build an application using real-world data to take a string of words and predict the next word.

The basis of the prediction algorithm is a set of three documents (corpus) containing text from blogs, news articles and tweets.

The data used in developing a dictionary to predict the next word comes from a corpus HC Corpora

For our corpora we have used the following three files:

en_US.blogs.txt
en_US.news.txt
en_US.twitter.txt

Data Analysis and Manipulation

After creating the Corpus from the HC Corpora data, the analysis concluded that a data cleaning is necessary for an accurate prediction algorithm to work with a high successful rate.

The sample data was transformed by eliminating extra Whitespace, removal of numbers, punctuation, profanity and converting the text to lower case.Many of R language natural language processing functions and Technics are used essentially the “tm” package to process the data.

The resulting dataset was split into three N-grams files.

Unigrams 
Bigrams 
Trigrams 

The Application

The user interface of the application was designed to predict English words from English text. The App has an interactive interface that refreshes the predicted word as text is being enterd.

To use the application, simply type in a word, phrase, or sentence. The app will show the next top predicted word. The user can enter additional words, or change their entry, and the app will respond to the new input.

To access The application on Shinny App application