Text_Prediction

Soubhagya Laxmi
22-12-2016

Overview

The goal of this project is to work with the real time data and build prediction algorithm. This explains exploratory analysis and goals for the eventual app and algorithm. This document explain only the major features of the data i have identified and briefly summarize my plans for creating the prediction algorithm and Shiny app in a way that would be understandable to everyone

The objective

This presentation serves as an introduction for an NLP Application for predicting the next word For the project a large sample of text was provided by SwiftKey from news articles, tweets and blog post sources

For building the corpora i have used the following three files:

en_US.blogs.txt
en_US.news.txt
en_US.twitter.txt

Data Analysis

For data analysis i took a sample subset of the three files.From this data a Corpus was created and then cleaned by converting to lowercase, removing punctuation, numbers, white space, non-alpha characters, and profanity.

The corpus was then tokenized into n-grams, i.e., a series of the most common n words, resulting in the following n-grams: 2-grams, 3-grams, 4-grams, 5-grams and 6-grams.

The model used for prediction is simple. It takes the user input words and compares to predict the next word from n-gram model.

The application

To use the application, simply type in a word, phrase or sentence in the text box located in the top left.Click the submit button. On the right side of the screen, a predicted word will appear in the screen. The user can enter additional words, or change their entry, and the app will respond to the new input. To access The application on Shinny App application