JHU - DataScience Capstone Presentation

Francisco Gonzalez Alonso
2016/12/30

With this slides I will present the use of my DataScience Capstone application for predicting the next word.

The Target

The goal of this exercise is to create a product to highlight the prediction algorithm that you have built and to provide an interface that can be accessed by others.

And thus for this project I submited:

The Data

The data used to create a frequency dictionary and predict the next words, using unigrams, bigrams and trigrams n-grams, comes from a corpus called Corpus Download.

The Applied Methods and Models

All text mining and natural language processing was done with the usage of follows R packages: NLP, tm, rJava, RWeka, SnowballC, and ggplot2.

The main concept to understand my prediction application is the use of unigrams, bigrams and trigrams to solve the fit of next word, i generated these and search in them the words from the input field.

The application return the result of the:

  • The 5 first Trigram.
  • The 5 first Bigram.
  • The 5 first unigram

If they are available, in other case show: “WITHOUT RECOMMENDATIONS”

The Usage Of The Application

You only need insert one or several words in the “Input your text” field, and the application do the rest.

It's clear, easy and simple.

Application Screenshot

And on the “Predicted word” field show the best result of the best trigram, bigram and unigram, in this order if they are availables.