9/1/2019

Summary

This document was rendered at September 11, 2019 at 22:43:10.

The capstone project covers 8 different areas and will produce the predictive texting app. The various steps are:

  1. Understanding the problem
  2. Data acquisition and cleaning
  3. Exploratory analysis
  4. Statistical modeling
  5. Predictive modeling 6 Creative exploration
  6. Creating a data product
  7. Creating a short slide deck pitching your product

Dataset uses are:

  1. Blogs
  2. News
  3. Twitter

Predictive Text App and How to use

Logic to build the predictive text app:

  1. extract.R : Reads the three input files, take 0.1% of data sample, clean up the data, tokenise and create TermDocumentMatrix - using the RWeka package. All the unigram to quadgram files are stored as .RDS.
  2. predict_next.R : two function are defined, one to clean up the text input and second is the core function that run the Katz- backoffN-gram model to find the next term. The function returns 5 possible next term.
  3. server.R : Shiny app server.R that saves the 5 possible next term as rendered text.
  4. ui.R : Shiny app ui.R that takes text input in english and displays the predictive text.

How to use the app

The app is developed to predict next word based on your input in the text box. All you need to start typing the WORD and it will give your 5 suggestion. Please ensure that you only type ENGLISH words, as the app only support English.

The key constraint while developing this project has been the lack of appropriate clean data set and compute capacity at shinyapp available with free login.

Shiny App

library(webshot)
webshot("https://arpagithub.shinyapps.io/PredictTextApp/", "r.png")

All Links