Data Science Capstone Final Project

Chris Rucker
07-Apr-15

Preamble

The goal of this exercise is to create a product to highlight the prediction algorithm that I have built and to provide an interface that can be accessed by others.

  • final project
  • shiny app
  • slide deck

Algorithm

Katz back-off language model was used for this project. Katz back-off is a generative n-gram language model that estimates the conditional probability of a word given its history in the n-gram. The equation for Katz back-off language model is:

alt text

Instructions

Katz back-off language model was used for this project. Katz back-off is a generative n-gram language model that estimates the conditional probability of a word given its history in the n-gram. It accomplishes this estimation by backing-off to models with smaller histories under certain conditions. By doing so, the model with the most reliable information about a given history is used to provide the better results. Essentially, this means that if the n-gram has been seen more than k times in training, the conditional probability of a word given its history is proportional to the maximum likelihood estimate of that n-gram. Use the app as follows:

  • enter word
  • press button
  • view n-gram

Image

A Shiny app was created that accepts an n-gram as input and predicts the next word!

alt text