Piyush Neupane
06/01/2016
Coursera Data Science Capstone project
This App does the following:
Access the App here: https://piyush.shinyapps.io/shinyapp/
Initially the App might take some time to load. Once loaded, it should run pretty quick!
Here is how the App looks:https://piyush.shinyapps.io/shinyapp/
Next, we will talk about how the Prediction algorithm was built.
Used the corpus provided in the course, and extracted 20% sample (about 700k records)
Given the size of the dataset, the Stupid Backoff Algorithm was used. It is not as resource intensive as other algorithms such as Katz' Backoff Model, or Kneser-Ney Smoothing. Also, the quality is comparable to more intensive models.
A Collection of NLP notes: https://gist.github.com/ttezel/4138642
Coursera Stranford Natural Language Processing: https://www.coursera.org/course/nlp
Speech and Language Processing. Daniel Jurafsky & James H. Martin. https://lagunita.stanford.edu/c4x/Engineering/CS-224N/asset/slp4.pdf
Basic Text Mining in R https://rstudio-pubs-static.s3.amazonaws.com/31867_8236987cf0a8444e962ccd2aec46d9c3.html