The corpus that was used to build this app comes from three source files namely blogs, news and the twitter tweets, which were provided by the swiftkey officials for this project.
For better accuracy I have also incorporated the data of reviews from amazon taken from their site.
Basic cleaning and preprocessing steps were performed before feeding it to the algorithm.
NOTE: All of the data were used to build the model and algorithm for the web App.
feature freq pred base
1 one_of_the 20730 the one_of
2 a_lot_of 20062 of a_lot
3 thanks_for_the 14545 the thanks_for
4 to_be_a 13744 a to_be