Data Science Capstone Project

Web Application

We have created a web application that predicts a word that a preson is going to write. It works as very simple:

Data

Train data contains about 550MB of text that are tweets, news, blogs. It is 4,000,000 lines that we process by:

Predictive Algorithm: Simple model

Since the free shiny server has restictions of size of files we consider only grams that had appeared at least twice in the train data.

Accuracy

The accuracy of the model is 20%. We did not remove stop words since that leads to better accuracy.