Isabel Méndez
18 March 2021
This Capstone project is part of a 10 course certification, which I strongly recommend as it strengths your skills and teach you in deep detail:
This is the link of the course: Data Science track by Johns Hopkins University on Coursera.
The goal of this exercise is to create a product to highlight the prediction algorithm that you have built and to provide an interface that can be accessed by others. For this project you must submit:
A Shiny app that takes as input a phrase (multiple words) in a text box input and outputs a prediction of the next word.
A slide deck consisting of no more than 5 slides created with R Studio Presenter pitching the algorithm and app as if you were presenting to your boss or an investor.
The data is from a corpus called Coursera Cappstone.
The data consist from three txt files: Twitter, Blog and News. At the end I used RWeka library.
I used a random sample of 90% from the raw data to build the final model.
Other libraries used to compare ngrams, to process graphs and build the Milestone:
The Milestone Report you can find it here: Milestone Report
The tokenization I made:
Once the data is cleaned, I used for the n-gram: unigram, bigram, trigram, and quadgrams. The data was saved in RData file to read in the server.R code, and again I pre-process the data with all filtering as in the previous step. I filtered the data with these n-gram and on the ui.R I gave format, I added a Shiny theme.
You can find the app here: Shiny App The Milestone Report you can find it here: Milestone Report