7 August 2018

What is Typedict?

In this age of technology ,information sharing is at it's peak and a lot of that information is sent using text messages whether its' an email,or any kind of instant messenger. Around the world, people are spending an increasing amount of time on their mobile devices for email, social networking, banking and a whole range of other activities. But typing on mobile devices can be a serious pain. SwiftKey, our corporate partner in this capstone, builds a smart keyboard that makes it easier for people to type on their mobile devices. One cornerstone of their smart keyboard is predictive text models. When someone types: I went to the

the keyboard presents three options for what the next word might be. For example, the three words might be gym, store, restaurant.When you're typing away on your PC or mobile phones,do you know what would make your life easier? TYPEDICT! It predicts the next word you're most likely to type.

How does it do what it does?

  • Collaborating with SwiftKey ,this app uses data generously donated by them.
  • Takes in only English Words
  • The data is sourced from Blogs,Tweets & News Articles
  • Data collected is sampled and cleaned
  • N-Gram Prediction Models Used
  • Profanities Not Included
  • Check Out The App Here- Typedict

Playing with the Corpus

  • Cleaning the Sample
sample<-c(sampleusb,sampleusn,sampleust)
samplecorpus<-VCorpus(VectorSource(sample))
samplecorpus<-tm_map(samplecorpus,stemDocument)
samplecorpus<-tm_map(samplecorpus, removePunctuation)
samplecorpus <- tm_map(samplecorpus, removeWords, profanities)
samplecorpus <- tm_map(samplecorpus, stripWhitespace)
  • Tokenization
ngramTokenizer <- function(theCorpus, ngramCount) {
  ngramFunction <- NGramTokenizer(theCorpus, 
  Weka_control(min = ngramCount,max = ngramCount,
      delimiters =" \\r\\n\\t.,;:\"()?!"))
  ngramFunction <- data.frame(table(ngramFunction))
  ngramFunction <- ngramFunction[order(ngramFunction$Freq, 
                        decreasing = TRUE),][1:10,]}

Prediction Example