Capstone Presentation

Scott Purvis

Capstone Project Description

As people spend increasingly amounts of time on mobile devices typing emails, commenting on social networks, and a whole range of other activities, making the task of typing easier can be achieved using predicted text model.

The goal of this project is to:

Create a Simple Shiny app that takes as input a phrase (multiple words), predict the next word.
Build a 5 slide deck pitching your algorithm and app.

Building the Model

Text Sources for the prediction model include blog, news, and twitter files provided by swiftkey. A sample of American Humor Writings* was added for diversity of language

The model was derived from the above text source, combined together into a single corpus and cleaned. The cleaned text data was tokenized into Bigrams (2 words) and Trigrams(3 words), and then combined into a single model

*http://www.gutenberg.org/ebooks/18464

Stupid Backoff Implementation

The Stupid Backoff Method in text prediction is used to assign probabilites to predicted words.

In this simple implementation, the programming tries to find a trigram match, but if failes, “backoffs” to a bigram match, and so on to a unigram. With each “backoff”, the probalility of the predicted word is weighted by a factor (lambda = 0.4).

Shiny Implementation

https://scottpurvis.shinyapps.io/Capstone_Text_Prediction/