Sarathy Jay
May 15 2016
Key Features
User Interface
Background Process
The app’s algorithm is based on N-grams. N-gram is a contiguous sequence of n items from a given sequence of text or speech. The items can be phonemes, syllables, letters, words or base pairs according to the application. The n-grams typically are collected from a text or speech corpus
Corpus
The “corpus” used were blogs, news & tweets in the English language. Based on these, we have build a pair on n-grams (unigrams, bigrams, trigrams & quadragrams) to help predict the most likely word to come next in a sentence, based on the frequency that same word was used in the corpus we analyzed.
The app is availble through shinny for exploration.
Link to shinny app: Word Predictor
References