JGG
2017
This capstone project aims to create an app for PREDICTIVE TEXT to be done in Shiny. Shiny is a package from RStudio that can be used to build interactive web pages with R.
Practical applications of predictive text: text messaging, emails, search engine sites, customer management sites, chat apps, among others. Here is the PREVIEW and TRY the APP later…
In building the App the following concepts and models were used:
N-Gram: a sequence of N words (ex. 2-gram for “beautiful life”, 3-gram for “I am home” )
Markov Chain: the probability of a word to be the next word depends only on the previous words
Stupid Backoff for smoothing: use 4-gram if result is sufficient, otherwise use 3-gram, otherwise use 2-gram.
CLICK HERE FOR THE REFERENCE:D.Jurafsky & J. Martin (2014). Speech and Language Processing, Chapter 4: N-Grams
LOADING AND PROCESSING THE DATA: DATA SET was provided by SwiftKey, our corporate partner for this project.
RESULTS
Source LineCount WordCount Train_LineCt Train_WordCt Test_Ct Accuracy
1 Blogs 899,288 37,334,131 50,000 2,053,168 1,022 15%
2 Twitter 2,360,148 30,373,543 200,000 2,523,971 178 11%
3 News 77,259 2,643,969 54,081 1,843,581 592 15%
Number of stored 4-gram: 372,223. Accuracy can be improved by increasing the number of stored 4-gram.
CURRENT FEATURES of the App are:
CLICK HEREto try the App! CLICK HERE for the reproducible code!
FUTURE ENHANCEMENTS