Final Project Data Science

Luis Felipe Albarracin
19-Agu-2020

Introduction

This presentation is included in the last assignment from the online course Data Science Capstone (https://www.coursera.org/learn/data-science-project)

The main challenge of these slides is to provide some key information regarding the final project

Due to te requirements, it has been generated using RStudio specifically Rpres.

Description of the Aplication

The idea is to create an application which can predict the next word regarding a n-gram given by the user (max 6 letter n-gram).

for this, three datasets are used: bigram.RData, trigram.RData and quadgram.RData. this three datasets can be downloaded from: (http://rpubs.com/maximeverges/495853).

The application can be found specifically in:

https://lfasanchez.shinyapps.io/Word_Predictor/

Algorithm

The prediction model to get the next word/sentence is based on the following algorithm:

  • Compressed data are loaded
  • Input a sequence of words
  • Then the length of the n-gram is compared against the possible n-gram predictors (2-gram, 3-gram,..6-gram)
  • If nothing is found a message to the user is given, telling him thar with the amount of samples given is no possible to predict.

User Interface

The following is the user interfaces of the app:

User Interface