htann
2017/9/2
This slide is consisted in the motivation, methodology and manual to use prediction app by Shiny. It was developed as part of the data science specialisation. Purpose of that app is to predict the next word based on one or more previous words.
The task was to analyse and use preexisting corpora to build an app in Shiny. The three given corpora where taken from Blogs, News and Twitter. (Source link:https://d396qusza40orc.cloudfront.net/dsscapstone/dataset/Coursera-SwiftKey.zip)
After data cleansing of special characters such as $!* and etc. and the corpora were used to create repositories of n-grams. Through 3 difference of n-gram, having enough distinctive data:
1-gram (unigram) 2-gram (bigram) 3-gram (trigram)
Due to the enormous size of the result tables all n-grams which occurred 10 times were discarded. This ensured a sensible and agile compromise between accurracy, runtime and memory usage respectively.
Libraries
Algorithm
Repository
The app is hosted here. After a while for loading it shows following GUI:
