Peer-graded Assignment: Final Project Submission

Ausrine
7 September 2018

Intro

The goal of this exercise is to create a product to highlight the prediction algorithm that you have built and to provide an interface that can be accessed by others. For this project you must submit:

A Shiny app that takes as input a phrase (multiple words) in a text box input and outputs a prediction of the next word.
A slide deck consisting of no more than 5 slides created with R Studio Presenter (https://support.rstudio.com/hc/en-us/articles/200486468-Authoring-R-Presentations) pitching your algorithm and app as if you were presenting to your boss or an investor.

The previous stage of the project can be found here: http://rpubs.com/ninja555/milestone

To see the shiny app, go here:

Summary of Project Steps

Loading Libraries - First step of the project is to load all the libraries necessary to complete all the tasks outlined in the introduction.
Loading Data - The data used in this project can be obtained from this link - https://d396qusza40orc.cloudfront.net/dsscapstone/dataset/Coursera-SwiftKey.zip
Summarizing Data Files - create a very basic overview of the data file statistics in terms of File Names, File Sizes, Number of Rows in each file as well as word count.
Creating a Data Sample - Due to the sheer size of the data files, we will use a sample 1000 lines from each file. Total sample size will be 3000.
Cleaning Data - Convert all text to lowercase, remove all punctuation, numbers, whitespace and “english” stop words that were present in the text
Creating the corresponding n-gram frequencies

Algorimth

N-gram model used ( from 1 to 4 ngram)
Stupid back-off strategy implemended
IMPORTANT: Due to the long processing time of the n-grams, the sample size has been shrunk from 3000 to 1000.

Shiny App Interface

User inputs a word into the app interface
The app then checks the word against the prediction algorithm
The next word is proactively predicted
This prediction is based from longerst to shortest N-gram frequency
Prediction is displayed