Data Science Capstone Project: Natural Language Processing - Text Prediction

Rowen Remis R. Iral
2015 April 19

Application of Natural Language Processing, Text Prediction

This project is an application of NLP to create a ShinyApp that will be able to predict the next text that the user will input.

Features:

  • Ability to accept a syntax input
  • Ability of App to predict an orderly system of lexicon
  • Ability to access App on web browser

Concept

The goal of the project is to apply Natural Language Processing(NLP) to create text prediction algorithm in a Shiny App using the following as a base for text prediction

  • Samples from Blogs
  • Samples from Tweets (Twitter)
  • Samples from News

Applied Concepts

The project involved the application of NLP to divide the words into one-gram, bi-gram, tri-gram and quad-gram words.

These will then create manageable chunks of processed information, determiners (stop words) are not removed so as to make the text prediction good on predicting phrases' and sentences' word.

Algorithm clears memory each time to make sure light consumption of memory. N-grams helps to handle the ranking of probabilities created to determine next word

Searching Function

The algorithm uses the grep function to perform some search on the possible words to be outputted as predicted word.

Packages used in Constructing the App:

  • NLP - Natural Language Processing
  • qdap
  • RWeka - Weka
  • StringDist
  • tm - Text Mining

App URL

Location of ShinyApp for NLP:

Thank you.

Rowen Remis R. Iral