Ashley You
25th March, 2018
This data product is designed to fulfill Coursera Data Science Capstone Project.
The goal of the Coursera Capstone project is to build a Shiny application that is able to predict the next word.
Little Wordie-Teller has thus been designed and deployed on shinyapps.io website. It is a lightweight App built on a large corpus of textual data from HC Corpora provided by the Coursera website.
This presentation aims to discuss:
The dataset was downloaded using R. English text files, i.e. en_US.blogs.txt, en_US.news.txt and en_US.twitter.txt, are used.
Little Wordie-Teller is built on Markov Ngram model which assumes the next word can be predicted from its previous N words. A simple backoff strategy is used to choose the next word. If the input is not found in our training data, top 100 unigrams is sampled to be the “best guess”.
Little Wordie-Teller is hosted here. It is designed using shiny flexdashboard package. To increase loading speed, the app uses shiny_prerendered runtime. Packages, Ngram data tables, prediction functions are prerendered and as a result the App shall load and predict fairly fast.
Little Wordie-Teller is fun to use as it responds almost instantly. However, the accuracy of its prediction is low due to size limitation for shiny app deployment.
My future directions from this project are:
For more details, please see my github page here.