Data Science Capstone

Zach Eisenstein
July 2018

alt texta

The objective of this project was to develop a predictive model of english text delivered via a Shiny app.
The data upon which the model was built came from large amounts of text scraped from tweets, blogs and news articles and provides a wide breadth of english lanugage expression.
The predictive text app can be accessed here

The model employed within the app is a probabilistic language backoff n-gram (Markov) model.
Next word prediction informed by looking back at previous word groupings (n-grams)

Example
Input text: “I am going to the”
trigram = “going to the”, bigram = “to the”, unigram = “the”

n-gram next word frequencies retrieved from training set

The app has a simple interface suitable for mobile devices.
caption

As the user types, the interface updates in real time, utilizing the n-gram backoff algorithm to predict the next word.

The top 3 choices and their respective “scores” are shown for reference.

The app recognizes sentence ending punctuation and will cease the lookback at that point.

The predictive text app can be accessed here
The source data can be accessed from the following link
The ngram package was utilized in the buildout of this application
Click here to learn more about the Johns Hopkins Data Science Specialization from Coursera

Contact:
Zach Eisenstein
z.eisenstein2@gmail.com
LinkedIn