Capstone Project

Data Specialization

Vaibhav Bhatnagar

Capstone Project

The goal of project is to create a "Word Prediction" app. Below are the main tasks to be perform on raw data:

  • Data acquisition and cleaning: Read and clean the data
  • Exploratory analysis: Explore n-grams
  • Statistical modeling: Create a statistical model on the bases of data exploration
  • Predictive modeling: Create a prediction model
  • Creating a data product: Deploy final app to shiny

Approch - Algorithm

  1. Create dataset from raw file (twitter 5%, news 3% and blogs 3%)
  2. Calculate 2-grams and 3-grams
  3. Create a data fram of words and their Frequency of each ngrams
  4. Remove words with frequency 1
  5. Prediction model...

Predictive Model

  1. Predictive model start with highest ngram in create, in this case 3 gram is highest.
  2. Last two words from sentence is search in 3 gram, if not found then it will search in 2 gram
  3. Once search is successfull in ngram, app will list 4 words which have highest occurance in decreasing order.

How to use App

In the left-side box introduce your sentence and then press 'Submit' button. The predicted word will appear in the right-side of the screen.