Capstone Project Presentation

Jordi Overeem
30 juli 2018

Objective

The Coursera Data Science Specialization Capstone Project sponsored by SwiftKey has two main objectives:

1) create an application that predicts the next word in a phrase/sentence. This is very useful for a end user of any keyboard (PC's, mobile devices…) to be able to type faster by anticipating what the next word would be.

2) create this slidedeck which:

  • includes a description of the algorithm used to make the prediction;
  • describes the app, contains instructions, and describes how it functions.

On the following slides the details of the application are explained by these sections:

  • Application Overview
  • Technical details
  • Possible extensions and improvements
  • Conclusion

The App overview

The “Predict Next Word - App” (https://testomgevingvanjordi.shinyapps.io/capstone_project_joov_001/) is simple and easy to use! Just start typing a word, sentence or text and the app predicts what word should follow. Imagine you have options here of more words or even sentences. You would be able to click your lines together!

Application screenshot

Technical details

The application uses text documents collected from blogs, news articles, and twitter as a source. From here N-Grams are created and stored for looking up from the app for improved performance. But if this results to no prediction there are two more ways implemented for looking up a prediction at the moment. See diaram below:

Technical details diagram

Possible extensions and improvements

This app only contains the very basics of natural language processing and predictive analytics applications. Possible extensions and improvements are:

  • show top-n options from prediction,
  • parallel dataprocessing to improve speed but also prediction quality,
  • include larger n-grams libraries, e.g. from https://www.ngrams.info/,
  • create larger n-grams, e.g. 5-grams and 6-grams,
  • use more data sources for more complete n-grams libraries,
  • create machine learning models so new n-grams are stored that can be used.

Conclusion

THANKS! I enjoyed this project a lot! Especially the quizzes challenged me to improve my app ny which I learned a lot.

Resources for reproducability: