Jun Zhang
May 1, 2020
This is the presentation of the Coursera Data Science Capstone final project. The goal of this project is to build a prediction algorithm as a result of Shiny App that takes as input a word or a phrase in a text box input, and outputs a prediction of the next word.
Here is the link to my Shiny App.
The data is from a corpus called HC Corpora and they are available to download through the course website. The data are in .txt formats and they are retrieved from blogs, news, and twitter.
Since the sizes of all three data are pretty large and due to a large computation time, I can only train a subset of the data using random sampling (about 333670 lines of texts).
Below is an example of how this works. If you were entering “Thanks for the,” then the next predicted word is shown as “follow.”