This presentation illustrates the main features of a word prediction app developed for the Capstone project of the Data Science Specialization course offered by Johns Hopkins University (JHU) and Coursera in partnership with SwiftKey.
The objective of the Capstone project is to build a word prediction app and demonstrate how data science can be applied in the area of natural language processing.
The Data
The data used to develop this application comes from a corpus called HC Corpora. More details on the corpora can be found HERE.
A small sample of English text from blogs, twitter, and news articles published on the web has been used to develop this application. The sample text has been converted into lower case before being processed. Numbers, punctuation, white spaces, non-ASCII characters and profanity words have also been removed.