Marco M M
August 8, 2015
For the development of this Project we had 3 databases from:
I selected 3% of these 3 databases for creating a matrix with trigram, bigrams, and the count of words. The packages used for this activity were: tm, Weka, Slam. The cleaning of the database eliminated stopwords because it improved the accuracy of the predictions (for the quizzes!).
When you put a phrase on the desk I used a prediction algoritm based in a 3-gram model:
I put the phrase from twitter: When you meet someone (line 2 from the twitter archive), and pressed submit.
And this algorithm predicted well the next word (with 3-grams)¡¡¡ Moreover, the app tells you which data frame was used for prediction: 3-gram, 2-gram or unigram.
With R we can make a prediction model an publish it with slidify: https://marcomtzmtz.shinyapps.io/AppfinalCapstone
Problems during the process were the size of the database (that I improved cleaning it)