Capstone Final Project

Ruben Nuñez

3/8/2020

Summary

This is the presentatio for the final assigment in Johns Hopkins University datascience capstone for datascience specialization.

This application is designed to predict at lest 7 words sentences with some source of information from

    - Blogs
    - News
    - twitter

Taking the 3% of samplimg of each one.

Application Cleaning

After taking the samples all the non desired characters like numbers os puntuaiton signs must be deleted.Using corpus functions.

    corpus <- VCorpus(VectorSource(data.sample))
    corpus <- tm_map(corpus, tolower)
    corpus <- tm_map(corpus, removePunctuation)
    corpus <- tm_map(corpus, removeNumbers)
    corpus <- tm_map(corpus, stripWhitespace)
    corpus <- tm_map(corpus, PlainTextDocument)

Application Algorith

It has been created a CORPUS where the information is stored and used to source groups of words fron one to 7.

Builded their matrixes and their frequencies to get the final words disribution. Stored in files to be provided to the Shiny app.

Inside the shiny app:

App Appearence And Usage

The use is simple: