Coursera Data Science Capstone Final Project

Quang V. Nguyen
8/23/15

Objective

My goal was to develop the ultimate natural language processing app using my specialized knowledge in data science. To tackle this challenge required all of the knowledge I'd learned to date from the 9 courses. First up, was to understand our data and our goals.

  • Understand the problem
  • Data acquisition and cleaning
  • Exploratory anlaysis

The results of this can be found here: http://rpubs.com/quangface/milestone_report

Next Steps

Having explored our data, the next step was to model and build the app.

You can find that link here: https://quangface.shinyapps.io/wordpredictor

App Overview

The Word Predictor App is very easy to use. Here are the basics:

  • User inputs text
  • App predicts the next word
  • App shows a list of up to 10 potential matches
  • User is impressed with the app!

Behind the Scenes

This app was built by creating a sample from the provided HC Corpora data. This included text from blogs, news, and tweets. Using this text we cleaned the data (removed punctuation/numbers/special characters, converted to lowercase, etc).

Once that was complete we tokenized into n-grams, using the Stupid Backoff Model.

Summary

This is a top word prediction app: https://quangface.shinyapps.io/wordpredictor

I hope you think so too!