Henrys Kasereka
December 26 2020
The Coursera Data Science Specialization Capstone project from Johns Hopkins University (JHU) allows students to create a usable public data product that can show their skills to potential employers. For this iteration of the class, JHU partnered with SwiftKey (http://swiftkey.com/en/) to apply data science in the area of natural language processing.
The algorithm developed to predict the next word in a user-entered text string was based on a classic N-gram model.Using a subset of cleaned data from blogs, twitter, and news Internet files.
Using the algorithm, a Shiny (http://shiny.rstudio.com/) application was developed that accepts a phrase as input, suggests word completion from the unigrams, and predicts the most likely next word based on the linear interpolation of trigrams, bigrams, and unigrams. The web-based application can be found here.
Use of the application is straightforward and can be easily adapted to many educational and commercial uses. As depicted below, the user begins just by typing some text without punctuation in the supplied input box. As the user types, the text is echoed in the field below along with a suggested word completion. At the bottom of the screen, the predicted next word in the phrase is shown