Jasmin Pielorz
August 23rd, 2015
Background: This capstone project is part of the Johns Hopkins Data Science Specialization offered by Coursera.
Aim: Creating a ShinyApp that takes as input a phrase in a text box input and outputs a prediction of the next word prediction.
The presentation gives an overview of:
The training data stems from English news, blogs and twitter messages. They provide the basis for building a text corpus.
To analyze n-gram frequencies, the following preprocessing steps were performed:
Steps for building a prediction model:
I would like to thank the entire team from Johns Hopkins University and Coursera for offering a very interesting and inspiring specialization in Data Science. A special thanks goes to C.H. Lampert for introducing me to the Python Natural Language Toolkit.
Useful References for the Capstone: