Data Science Capstone - JHU

Reinaldo Maciel
december 03, 2017

Introduction

  • The goal of this project is to develop a Natural Language Processing data product for SwiftKey that is a next word prediction;
  • The source data was an unstructured text data in english language;

Algorithm and training data

The training data used was obtained from the following website: https://d396qusza40orc.cloudfront.net/dsscapstone/dataset/Coursera-SwiftKey.zip

The algorithm developed to predict the next word is based on a classic N-gram model. http://en.wikipedia.org/wiki/N-gram

1.The text prediction algorithm is based on building a vocabulary of N-grams over the training data.

2.The n-grams are arranged in descending order of their frequencies.

3.The same happens for the user input.

4.User input in form of n-grams are compared with the model.

5.The first match are returned with the highest frequency for the next word.

The Appplication

The application can be used by the following link:

https://rmaciel1988.shinyapps.io/jhu-datasciencecapstone/

Enjoy it! ;)