Capstone Project Presentation

KRL
December 2017

Objective

This capstone project gives us the opportunity to apply the skills we have learned in the first nine modules of the Data Science Specialization to the area of natural language processing (NLP).

Data for this project is provided by Swiftkey, the industry partner for the Capstone Project. Our ultimate goal is to build a predictive algorithm which can guess the next word as a user enters text on a keyboard. For example, “I love…” might be followed with the algorithm guessing “you.”

Prediction Algorithm

We cleaned the data sets provided by our industry partner Swiftkey, sampled the data and then built a prediction model. Our model uses the Katz Backoff Algorithm. Briefly, it searches for the user's phrase and if it isn't found, “backs off” to a lower-order N-gram.

alt text

Shiny App

My app can be found here: https://oceangirl07.shinyapps.io/CapstoneFinal/

The user enters a phrase, and the next word is predicted.

alt text

References and Shout Out

  • Katz, Slava M. 1987. Estimation of probabilities from sparse data for the language model component of a speech recognizer. IEEE Transactions on Acoustics, Speech and Signal Processing, ASSP-35(3):400–401.
  • Thach-Ngoc Tran. Katz's Backoff Model Implementation in R, Wordpress. April 12, 2016.
  • The Coursera Discussion Boards and particularly the Mentor comments.

Thank you!