Capstone Project - Data Science Specialization

Klever Mera

This App was created as a Capstone Project of the Data Science Specialization by Johns Hopkins University on Coursera.

This project creates a product to predict the next word based on a predictive algorithm and it will show the product in a Shiny Web application.

There were 8 tasks defined:

Understanding the problem, Data acquisition and cleaning, Exploratory analysis, Statistical modeling, Predictive modeling, Creative exploration, Creating a data product, and Creating a short slide deck pitching your product.

In this Slide Deck the model and the instructions will be presented.

In this part, there are also some initial tasks which includes:

Tokenization: segment the string in individual words, quanteda library was used for tokenization. Also, cleaning tasks: removing numbers, punctuantion, symbols, hyphens, lemmatization, stemming and make sure the language is English.
Frequencies of the words: three n-gram modeling (1-gram, 2-gram and 3-gram).

For Modeling:

Markov Assumption was considered (next word depends only on the current word)
Back off Model: start with 3-gram, if there isn't enough evidence back off to 2-gram or 1-gram. Moreover, only blog file was considered and it was sampled using binomial. Besides, words with frequency equal to 1 were eliminated.

As soon as the first word is entered the app will predict.

You can keep writing as long as you want.

The error messages shows to tell you the app is ready and also to warn you to write just English words.