S Das
Shiny App
03/20/2015
Cleaning of data requires removal of puncuation, conversion of words to lower case, removal of stop words, and stemming words. The cleaning task was done by using a program which selected n-grams, in this prototype application we used bi-grams, tri-grams.
Once the various bash and ruby scripts were worked out, processing the courpus takes less than two hours from start to finish. Moreover, the entire backend package was run on a single Amazon Micro instance, which was provided free.
In terms of the presentation layer, the overall performance is acceptable, i.e. the system is able to predict the next word with an at least 50% error rate. Considering, the promoter's own product Swiftkey has a greater than 50% error rate on single choice next word prediction.
Link of the app: http://subasish.shinyapps.io/app2
In the final milestone, I will attempt to use the ngrams and create a prediction algorithm to predict the next word.
The model will take in a phrase as an input and capture the last n-1 which will then be matched to the ngrams.
It will first compare against the highest ngram (namely 3-ngrams). If nothing matches, it will proceed to the 2-ngrams and followed by 1-ngrams.
Shiny app will be used in the final presentation. User will simply enter the phrases and submit the entry.
After submission, the Shiny app will showcase all possible 'next' words.