A simple next-word prediction RShiny Application
Sean Dobbs
As the Capstone Project for the Data Science Specialization from Johns Hopkins University in partnership with SwiftKey, a simple RShiny application was created to offer predictions/suggestions for the next word in a sequence, given input from a user.
This presentation will aim to briefly explain: A) the data that was used, B) the underlying algorithm and methodology, C) limitations of the model, and ways to improve, and D) an overview of the online User Interface
To skip all of the boring stuff and just play with the app, you can find it here:
https://deebo415.shinyapps.io/predictR
If you want an excess of the boring stuff, you can find the code at:
https://github.com/oraclejavanet/coursera-data-science-capstone/
The data for the model was an huge collection of text input from Twitter users, bloggers, and news articles. News is very structured. Twitter, blogs, and other social media are not. Ideally, we'd like all to use all data from everywhere. But this application needs to be fast while still having reasonable accuracy, so a subset of the data was used. More emphasis was placed upon Twitter data and blogs; less on news articles. Tweets and blogs are more likely to be what an everyday human might type.
N-gram methodology is only good for predicting human output given an extremely short human input (on or around the maximum n-gram that the model considers). We could make a 20- 30- or 100-gram model; while that may appear to be more accurate, large n-grams models like this:
With a continued investment in predictR!, future versions of the application could be based upon multi-level neural networks or other deep learning methodologies. Users will love the simplicity and fun of predictR! Version 0.1. They will love the incredible accuracy and capabilities of Versions 2.0, 3.0, and beyond!

This app is easy, fast, fun, reliable, and accurate