Coursera Data Science Capstone - Final Project

Maxime Verges

14th of May, 2019

Introduction

Application

  • The main challenge is to create an application which can predict the next word regarding a sequence of words given by the user

  • We need three datasets bigram.RData, trigram.RData and quadgram.RData that can be obtained with the milestone report (http://rpubs.com/maximeverges/495853). It contains respectively a list of 2 words, 3 words and 4 words.

  • The application and the current presentation source codes can be found at https://github.com/maximeverges/Data-Science-Capstone. 5 files are included: ui.R, server.R, bigram.RData, trigram.RData and quadgram.RData

Prediction Model

The prediction model to get the next word in based on back-off method detailed as below:

  • Compressed data are loaded
  • Input a sequence of words
  • First, quadgram.RData is used if the last used 3 words are in the dataset
  • else, trigram.RData is used if the last used 2 words are in the dataset
  • else, bigram.RData is used if the last used word is in the dataset
  • else, the word with highest frequency is returned

Shiny Application Tutorial

A basic tutorial is described as below:

Tutorial

Besides, the tab help is dedicated to the explanations of the application while the tab about provides some useful innformations such as links