chintan
31 May 2020
The goal of this was to create a product to highlight the prediction algorithm that you have built and to provide an interface that can be accessed by others.
Note: the data set was retrieved from: https://d396qusza40orc.cloudfront.net/dsscapstone/dataset/Coursera-SwiftKey.zip
Three datasets: en_US.blogs.txt en_US.news.txt en_US.twitter.txt
Retrieved the data, checked the data size, and due to size, took a sample of the data to build a new corpus. The corpus was then cleansed.
Next, developed the Ngrams – unigrams, bigrams, trigrams, and quadgrams – to develop more efficient data processing for predicting outcomes.
Develop the Word Prediction algorithms.
The last step was to build the Shinyapp.io product.
Step one, enter one or more words in the space provided to receive the next predicted word Step two, click the “submit” button Step three, the predicted word is presented to the right
Note: I researched online to determine the most commonly used english word and found it was the word “the”. I set that as a default value rather than to have and empty value returned. The app definitely has limitations due primarily to available memory.