January 4, 2017

Objective

  • The main goal of this capstone project is to build a shiny application that is able to predict the next word.

  • This exercise was divided into seven sub tasks including data cleansing, exploratory analysis, the creation of a predictive model using Good Tuning and Katz BackOff Model Implementation, and User UI using Shiny application.

  • All text mining and natural language processing was done with the usage of a variety of well-known R packages such as Corpos TM and Weka, and wordnet for word cloud, and the data.table package.

The Applied Methods & Models

Data cleansing steps :

  • Sampling the data by 1% of three documents(3 files).
  • Remove The non-ASCII characters, punctuation,numbers,stop words and stemming.

Using N-gram Model

The Usage of Application

Reference