Jon Ting
22/08/2020
This is the capstone project of the Data Science specialization offered by Johns Hopkins University. The project involves building a Shiny application to predict the subsequent word of a short phrase. The application will be the highlight in this presentation.
The English Swiftkey dataset is employed to create this application. It contains blog entries, news entries, and twitter feeds collected from publicly available sources by webcrawlers.
The whole English dataset is rather large, consisting of more than 3 million entries in total. Twitter feeds dominate the dataset while news entries contributes the least to the collection.
To produce a practical Shiny application, only 1% of the whole English dataset has been employed to create the training set corpus, which simply means a dictionary of words.
Try Out the app hosted on shinyapps.io!
The codes are documented a GitHub repository if you are interested.
Hints: