2/5/23
Please visit the following website, risk free, to give the app a try: click here WordCrystalBall Shiny App
- This project involves Natural Language Processing. The critical task is to take a user’s input phrase (group of words) and to output a predicted next word.
- The App predicts a sequence of works as the user types a sentence.
- This app is similar to how many smart phone keyboards are use today using Swiftkey technology.
- A subset of the original data was sampled from three sources (blogs,twitter and news), which is then merged into one.
- Next, data cleaning is done by transforming to lowercase letters, stripping white space, and removing punctuation and numbers.
- The corresponding n-grams are then created (i.e., Bigram, Trigram, Quadgram, and Quintgram).
- Next, the term-count tables are extracted from the N-Grams and sorted according to the frequency, in descending order.
- Last, the n-gram objects are saved as R-Compressed files (.RData files).
The next word prediction app provides a simple user interface to the next word prediction model.
A simple text box for user input
One sees a predicted next word “output” dynamically, right below user input
The Tabs with the plots of the most frequent n grams in the data-set
Rapid response time.
Method allows for large training sets, leading to better next word predictions.
The Algorithm is expandable to other languages, such as German and Finnish.
Additional work can expand the main weakness of this approach (long-range context > 4-grams)
We can incorporate this into future work through clustering underlying training corpus/data and predicting what cluster the entire sentence would fall into.
Allows the user to predict using ONLY the data subset that fits the long-range context of the sentence, while preserving the performance characteristics of the n-gram prediction model structure.
Tidy Data: https://cran.r-project.org/web/packages/tidyr/vignettes/tidy-data.html
Text Mining with R, A Tidy Approach: https://www.tidytextmining.com/tidytext.html
Shiny App: https://zerimar.shinyapps.io/WordCrystalBall/
To ensure the proprietary nature of the app and algorithm, the R code is available upon request