Coursera Data Science Capstone: Final Project

AquilaT
December 17th, 2016

Presenting application APP PREDICT which predicts next word using N-gram model.

APP PREDICT can be accessed and used with the following link: https://aquilat.shinyapps.io/appredict

THE PROJECT SUMMARY

The final goal of the Data Science Capstone Project is to create a Shiny App product which uses prediction algorithm, WHICH takes as input a word or a phrase (multiple words) and outputs a prediction of the next possible word.

In this capstone I applyed data science in the area of natural language processing (NLP) and built predictive model based on the concept of n-gram sequence of words

The goal of this presentation is to highlight the prediction algorithm that I have built and to provide an interface that can be accessed by others

  • This deck contains a description of the algorithm used to make the prediction
  • This deck describes the app, gives instructions, and describe how it functions
  • the link, provided in this deck, leads to a Shiny app with a text input box that runns on shinyapps.io

ALGORITHM AND RELATED STEPS

In the course of creating the APP PREDICT I went through the following steps:

  • Data loading, selecting US dataset (written in the English language) which contains Internet blogs, Internet news and Twitter messages.
  • Sampling the 3 files and building Corpus using subsets of 3 files.
  • Text cleaning: tokenization, removing Stopwords, Stemming and Profanity filtering.
  • Building n-gram model,
  • Creating 2 gram,3 gram and 4gram frequency matrices and organizing them into frequency dictionaries.
  • Building predictive model using frequency dictionaries.
  • Building APP PREDICT shiny App and Deploying APP PREDICT at shinyapps.io

THE "APP PREDICT" SHINY APP AND ITS USER INTERFACE

my image

USER INSTRUCTIONS AND THE APP'S ALGORITM

INSTRUCTIONS FOR THE USER: USER NEEDS TO MAKE ONLY 2 STEPS:

  • Under the “Enter Your Word(s) in Below Box” title, the User types phrase/words in a input form.
  • User hits the blue “PREDICT” button which is placed just below the entry box.

HOW THE ALGORITHM OF THE APP PREDICT WORKS:

  • The App cleans the input and tokenizes the words.
  • The entered word(s) are passed to the prediction algorithm.
  • Prediction function searches the N-gram dictionary for the top list of predicted words.
  • Suggested next word is displayed on the Shiny app.
  • Other available prediction options are displayed along with their likelihoods.

APP PREDICT can be accessed and used with the following link: https://aquilat.shinyapps.io/appredict