Introduction

The goal of this project is just to display that I have gotten used to working with the data and that I am on track to create my prediction algorithm.

The motivation for this project is to: - Demonstrate that I have downloaded the data and have successfully loaded it in. - Create a basic report of summary statistics about the data sets. - Report any interesting findings that I amassed so far. - Get feedback on your plans for creating a prediction algorithm and Shiny app.

First, we do the basic initialization:

### Setting the working directory
setwd("~/_R_Projects/Capstone")

### Load libraries
library(stringi)
library(scales)
library(ggplot2)
library(tm)
library(RWeka)

When the n-gram matrices are ready, I will start working on the prediction algorithm.So far, I am thinking Bayesian Networks. I wanted to have the n-grams ready already in this report, but I am facing technical difficulties with crashing R (even though I am building the matrices from 10% sampled data).