Coursera Data Science Capstone: Final Project

AquilaT
December 17th, 2016

Presenting application APP PREDICT which predicts next word using N-gram model.

THE PROJECT SUMMARY

The final goal of the Data Science Capstone Project is to create a Shiny App product which uses prediction algorithm, WHICH takes as input a word or a phrase (multiple words) and outputs a prediction of the next possible word.

In this capstone I applyed data science in the area of natural language processing (NLP) and built predictive model based on the concept of n-gram sequence of words.

I assume that the word that we are trying to predict depends on the word(s) that precede(s) it.

The goal of this presentation is to highlight the prediction algorithm that I have built and to provide an interface that can be accessed by others

  • This deck contains a description of the algorithm used to make the prediction
  • This deck describes the app, gives instructions, and describe how it functions
  • the link, provided in this deck, leads to a Shiny app with a text input box that runns on shinyapps.io

ALGORITHM AND RELATED STEPS

In the course of creating the APP PREDICT I went through the following steps:

  • Data loading, selecting US dataset (written in the English language) which contains Internet blogs, Internet news and Twitter messages.
  • Sampling the 3 files and building Corpus using subsets of 3 files.
  • Text cleaning: tokenization, removing Stopwords, Stemming and Profanity filtering.
  • Building n-gram model,
  • Creating 2 gram,3 gram and 4gram frequency matrices and organizing them into frequency dictionaries.
  • Building predictive model using frequency dictionaries.
  • Building APP PREDICT shiny App.
  • Deploying APP PREDICT at shinyapps.io

THE "APP PREDICT" SHINY APP AND ITS USER INTERFACE

my image

INSTRUCTIONS FOR THE USER AND HOW THE ALGORITM WORKS

INSTRUCTIONS FOR THE USER: USER NEEDS TO MAKE ONLY 2 STEPS:

  • Under the “Enter Your Word(s) in Below Box” title, the User types phrase/words in a input form.
  • User hits the blue “PREDICT” button which is placed just below the entry box.

HOW THE ALGORITHM OF THE APP PREDICT WORKS:

  • The App cleans the input and tokenizes the words.
  • The entered word(s) are passed to the prediction algorithm.
  • Prediction function searches the N-gram dictionary for the top list of predicted words.
  • Suggested next word is displayed on the Shiny app.
  • Other available prediction options are displayed along with their likelihoods.

APP PREDICT can be accessed and used with the following link: https://aquilat.shinyapps.io/appredict