Data Science SPecialization Final Presentation

HM
8/23/2015

In the last decade or so, there has been an explosion of “smart” phones and other “smart” electronics electronic devices.
Parallelly, computer chips have become smaller and more powerful at a breathtaking pace.
This is all good! However, creating data input devices that are small enough to fit onto those small gadgets remain a major challenge.
Part of the solution is to reduce typing by allowing the user to pick the next word in one tap as data is being entered.
This capstone project is a rough example of how such technology works.

The data used was graciously made available at: http://www.corpora.heliohost.org/
After loading the data sets, I created corpora with the data sets.
I cleaned each corpus using the tm package in R removing punctuations, white space, dirty words.
I then tokenized them using the RWeka package
This resulted in 2-gram, 3-gram, 4-gram, and 5-gram
Finally, duplicates were removed and frequency for each term was calculated

The R maxent package was used to model the predictions
The training set was 70% of the data and the test set was 30%
The output was converted to a data.table that is used by the app for prediction.

The app asks the user to first chose the type of writing activity he/she intends to do.
And then the user needs to type two words
Once the user clicks on the “Submit” button, the predicted next word appears on the page next to the side panel.
That's simple!!
The app can be found at: https://hm-datascience.shinyapps.io/DSSAPP
Hope you like it! ENJOY!!