Capstone_Project

LyPu
6/6/2020

Requirements & Summary

  • The goal of this exercise is to create a product to highlight the prediction algorithm that you have built and to provide an interface that can be accessed by others

  • This presentation contains a brief introduction regarding below aspects:

    1. An introduction on the algorithm

    2. A description and an instruction of the ShinyApp

Prediction Model

  • Dataset:

The training dataset is from Coursera-SwiftKey.zip. It is a combination of 1% random sample of English language news, blogs and twitter dataset.

  • Language Model:

Quad-grams, tri-grams, bi-grams and uni-grams are applied to model text and Kneser-Ney Smoothing method is used to calculate predicted word probability. The top few words with highest probabilities are recommened based on backoff model.

Shiny App - Description & Instruction

The Shiny App contains 2 tabs App and Exploratory.

In the App page, there is a input textbar and a wordcloud.

  • You can type in words in the textbar and a few words will appear below that are predicted as the next word based on training dataset.

  • Below the input textbar is a wordcloud based on the predicted nextwords.

Shiny App - Description & Instruction

In the Eploratory page, an exploratory report that contains some descriptive analysis on the original dataset is attached.

  • This report is from milestone project which can be found here.

The app has been deployed to ShinyApps.io server.