The goal of this exercise is to create a product to highlight the prediction algorithm that you have built and to provide an interface that can be accessed by others.
In order to prepare data for the prediction model:
- Blogs, Twitter and News data are combined to obtain main data source.
- Character “I” is substituted by character “i” since lower() function turns the upper case letter “I” to lower case letter (“i”).
- All of the letters are converted to lowercase. All punctuation, numbers,symbols,urls are removed.
- N grams (unigrams, bigrams,trigrams and quadgrams) are obtained in order of term frequency and data frames are saved.