NLP PROJECT

Frank Murphy-Hernandez
2015-04-19

Introduction

The purpose of this presentation is to explain the how we deal with NLP problem. This problem is the last pratice of the Capastone of the Data Science Specialization from Hopkins university.

We want to develop a predictive data product, that is, given a text the product forecast the next word, in a similar way that google make suggestions.

The data was supplied by SwiftKey, it is a database in English of tweets, blogs, and news.

The Model

An n-gram is string of n words, that we obtained from the text that we already have. Actually we will obtain all the 1-grams, 2-grams and 3-grams. No more beacuase we want a light application that could work in a mobile.

The way that we will do the prediction is with the Kneser-Ney smoothing model. It is a generative n-gram language model that estimates the conditional probability of a word by its histogram in the n-gram.

In plain words, the model suggest the word with the highest empirical probabity

The application

The application is here:

Project

Please give it a try.

Just insert your phrase and the app will suggest you the top 5 words.