Data Science SPecialization Final Presentation

HM
8/23/2015

Background

  • In the last decade or so, there has been an explosion of “smart” phones and other “smart” electronics electronic devices.
  • Parallelly, computer chips have become smaller and more powerful at a breathtaking pace.
  • This is all good! However, creating data input devices that are small enough to fit onto those small gadgets remain a major challenge.
  • Part of the solution is to reduce typing by allowing the user to pick the next word in one tap as data is being entered.
  • This capstone project is a rough example of how such technology works.

Methods

  • The data used was graciously made available at: http://www.corpora.heliohost.org/

  • After loading the data sets, I created corpora with the data sets.

  • I cleaned each corpus using the tm package in R removing punctuations, white space, dirty words.

  • I then tokenized them using the RWeka package

  • This resulted in 2-gram, 3-gram, 4-gram, and 5-gram

  • Finally, duplicates were removed and frequency for each term was calculated

Modeling

  • The R maxent package was used to model the predictions
  • The training set was 70% of the data and the test set was 30%
  • The output was converted to a data.table that is used by the app for prediction.

The APP

  • The app asks the user to first chose the type of writing activity he/she intends to do.
  • And then the user needs to type two words
  • Once the user clicks on the “Submit” button, the predicted next word appears on the page next to the side panel.
  • That's simple!!
  • The app can be found at: https://hm-datascience.shinyapps.io/DSSAPP
  • Hope you like it! ENJOY!!