Data Science Specialization Capstone Project

Diego Menin
21/04/2015

Developing a Natural Language Processing Model on Word Prediction
Coursera and SwiftKey Partnership

Goals

Build a predictive model of english text using Natural Language Processing Model and Text mining;
Build a shiny App that uses the model to predict the next word on a sentence;

The Data and the Model

The data used to build the model was provided by Coursera and consits of sentences extracted from Twitter, News feeds and blogs;
The model was buit using n-grams (1, 2, 3 and 4), which were stored using Markov chains;
A sentence is predicted by looking up it's last N words on the chain (recursively on the 4 gram, 3 gram and so on…) and the match with the highest frequency is returned;
A match with small frequency on a higher gram has more weight than a match with high frequency on a smaller gram.

The Application

App

The app is composed only by one input were you type your text;

This project is being developed in partnership with Swiftkey, which develops text prediction apps for cellphones - thus the cellphone backgroud picture;

The predicted word will be outputed on a button right under the text imput - this is to simulte the same behaviour we see on a cellphone;

The button where the predicted word is returned is for display pourpouses only (nothing is meant to happen if you click it);

Link to the App;

Important Considerations

If you are one of my colleages from Coursera evaluation the App and it is unavailable, please email me at dmenin@gmail.com - My Shiny App account is close to the free usage limit and has actually been locked once so I'm afraid it can happen again during evaluation time;

There is a checkbox called “advanced” which , uppon clicked, will output the TOP 3 predictions plus a graph with the grams and frequency.

Advanced