Scott Purvis
As people spend increasingly amounts of time on mobile devices typing emails, commenting on social networks, and a whole range of other activities, making the task of typing easier can be achieved using predicted text model.
The goal of this project is to:
Text Sources for the prediction model include blog, news, and twitter files provided by swiftkey. A sample of American Humor Writings* was added for diversity of language
The model was derived from the above text source, combined together into a single corpus and cleaned. The cleaned text data was tokenized into Bigrams (2 words) and Trigrams(3 words), and then combined into a single model
The Stupid Backoff Method in text prediction is used to assign probabilites to predicted words.
In this simple implementation, the programming tries to find a trigram match, but if failes, “backoffs” to a bigram match, and so on to a unigram. With each “backoff”, the probalility of the predicted word is weighted by a factor (lambda = 0.4).