27/05/2020

Introduction

The goal of this was to create a product to highlight the prediction algorithm that you have built and to provide an interface that can be accessed by others.

  • The primary product is the Shiny app that takes as input, one to more words, and provides a predicted response for the next word.
  • The project was completed using R scripts, Shinyapp.io, and R presentation, and a over 60 hours worth of researching and testing to find the best approach to complete the app.

Note: the data set was retrieved from: https://d396qusza40orc.cloudfront.net/dsscapstone/dataset/Coursera-SwiftKey.zip

Three datasets: - en_US.blogs.txt - en_US.news.txt - en_US.twitter.txt

Developer

Retrieved the data, checked the data size, and due to size, took a sample of the data to build a new corpus. The corpus was then cleansed.

Next, developed the Ngrams – unigrams, bigrams, trigrams, and quadgrams – to develop more efficient data processing for predicting outcomes.

Develop the Word Prediction algorithms.

The last step was to build the Shinyapp.io product.

Instructions

  • Step one, enter one or more words in the space provided to receive the next predicted word
  • Step two, click the “submit” button
  • Step three, the predicted word is presented to the right

Note: I researched online to determine the most commonly used english word and found it was the word “the”. I set that as a default value rather than to have and empty value returned. The app definitely has limitations due primarily to available memory.

Links