Shiny App for Next Word Prediction
https://elenena810.shinyapps.io/word_predictor/
Coursera's Data Science Specialisation CAPSTONE Project
Total size of files required to run the app is 308 Mb, so it's easy to share with other RStudio users.
When running, it requires 800 Mb of RAM (shinyapps.io metrics are shown):
To test the app I used 853900 HC corpora sentences that weren't used to train the model, and for each I predicted the last word.
Mean prediction time according to the test (run in a non-interactive R session) was 0.0982 seconds.
Accuracy increases with the number of typed words. In detail:
To build the app, the starting point was processing a training set containing the 70% of HC corpora and obtaining the most frequent n-gram for each (n-1)-gram (with n from 2 to 5). The key elements of my final algorithm are:
I encourage you to visit https://github.com/Elenena/NLP_Capstone where you can find more detailed explanations in the README file, download the results of test and look at my entire R code.