Ruben Nuñez
3/8/2020
This is the presentatio for the final assigment in Johns Hopkins University datascience capstone for datascience specialization.
This application is designed to predict at lest 7 words sentences with some source of information from
- Blogs
- News
- twitter
Taking the 3% of samplimg of each one.
After taking the samples all the non desired characters like numbers os puntuaiton signs must be deleted.Using corpus functions.
corpus <- VCorpus(VectorSource(data.sample))
corpus <- tm_map(corpus, tolower)
corpus <- tm_map(corpus, removePunctuation)
corpus <- tm_map(corpus, removeNumbers)
corpus <- tm_map(corpus, stripWhitespace)
corpus <- tm_map(corpus, PlainTextDocument)
It has been created a CORPUS where the information is stored and used to source groups of words fron one to 7.
Builded their matrixes and their frequencies to get the final words disribution. Stored in files to be provided to the Shiny app.
Inside the shiny app:
The use is simple: