Capstone Final Project

Ruben Nuñez

3/8/2020

Summary

This is the presentatio for the final assigment in Johns Hopkins University datascience capstone for datascience specialization.

This application is designed to predict at lest 7 words sentences with some source of information from

    - Blogs
    - News
    - twitter

Taking the 3% of samplimg of each one.

Application Cleaning

After taking the samples all the non desired characters like numbers os puntuaiton signs must be deleted.Using corpus functions.

    corpus <- VCorpus(VectorSource(data.sample))
    corpus <- tm_map(corpus, tolower)
    corpus <- tm_map(corpus, removePunctuation)
    corpus <- tm_map(corpus, removeNumbers)
    corpus <- tm_map(corpus, stripWhitespace)
    corpus <- tm_map(corpus, PlainTextDocument)

Application Algorith

It has been created a CORPUS where the information is stored and used to source groups of words fron one to 7.

Builded their matrixes and their frequencies to get the final words disribution. Stored in files to be provided to the Shiny app.

Inside the shiny app:

First is detected the length of the sentence.
Then is appliyed a funtion to look forward inside the corresponding distribution
And after getting the highest value of frequency the word is shown in the main pane

App Appearence And Usage

The use is simple:

The user has to type some words in the text box.
Cliking in the prediction model the next suggested word will appear.