Step 1: Read the most extensive file on the Twitter’s file from the Coursera-SwiftKey.
Step 2: Init the object by ModelGenerator$new in which I sample 0,1% of Twitter file by performance using cleaning data, generating sample-clean.txt with 2.361 lines. My algorithm doesn’t transform words in steam, keeping the original words with a minimum frequency of one word.
Step 3: Generate the 4-gram model applying the method generate_model in the object generated before. The output is the file def-model-twitter.RDS.
Step 4: Predict the next word based on the user-chosen input sentence, applying the method predict_word in the 4-grams. Additionally, the APP shows the following predictive four terms.
Selecting Data and Print Outcomes
The user chose the sentence to predict its next word and four more words.
The outcome is printed in the side box.
I applied steps 1, 2, 3 apart for best performance because this process has significant time to execute. This process generates the file def-model-twitter.RDS. So, I use in the Shiny APP only this file as input that contains the 4-Grams Models to predict words.