devank
10 Apr 2016
Johns Hopkins University - Coursera Data Science Capstone Project in cooperation with SwiftKey.
Following tasks are performed.
Corpus data comes from a corpus called HC Corpora. Excisiting R packages were used for text mining and natural language processing.
This model reads last few words of a sentence and uses statistics about a large collection of English sentences to predict the most probable next word. Sample is from blogs, news and Twitter.
First the data sample was cleaned. Then the sample data was tokenized into n-grams. Following n-grams were created to get the frequency dictionaries.You need to enter a word in the input field given.