Jose Bergiste
January, 2016
The Coursera Data Science Specialization project (sponsored by SwiftKey) is to create an application that predicts the next word in a phrase/sentence. Such an application would be very useful for a keyboard on a mobile device that helps a user type faster by anticipating what the next word would be.
On the following slides, I will explain the details of the application created to fulfill the objective. We will go over:
The Word Predict App (https://bergiste.shinyapps.io/word-predict-app/) is simple and easy to use, yet powerful!
Simply start typing on the text field and up to 4 possible next words will automatically display below the field. Each predicted word is clickable and clicking on the desired word will add it to your phrase and predict the next word.
The application uses text documents collected from blogs, news articles, and twitter to statistically model language patterns. N-Grams, Markov Model and Katz's back-off model were used to predict the next word. The modeling process used a large set of data so it was batched. The batch process created a much smaller set of data to be used in the application for fast real-time performance. The diagram below show the architecture of the application:
This project only scratches the surface of natural language processing and predictive analytics applications. There are many possibilities for improvement including:
I found this project and the Data Science Program to be very enjoyable! I learned quite a bit and had fun doing it.
For this project, here is a list of resources: