Data was downloaded from Coursera-SwiftKey.zip. Read the blog, news and twitter dataset from the English language files and built a a collection of written texts called text corpus using VCorpus. The corpus is processed using tm_map to remove punctuation, numbers, whitespaces, stopwords, convert text to lower case and stemDocument.
Next we apply tokenization which is the splitting of a phrase, sentence, paragraph, or an entire text document into smaller units, such as individual words or terms. The processed corpus was then tokenized in n-grams frequency database, namely 2-gram, 3-grams and 4-grams with frequency of occurrence n.