Gabriel
30/12/2021
This Shiny Application is a smart keyboard that predicts text similar to the one developed by SwiftKey.
The app runs on a predictive text mining model. The model was trained on HC Corpora data set provided by Coursera and SwiftKey. The language is English.
I will be giving a short description of both the prediction model and the application in this presentation.
The app can be accessed at: https://cogabi.shinyapps.io/WordPrediction/
The basic building block of a language prediction model is an n-gram which is a continuous sequence of n words. For this app, I have assembled a collection of 1-grams, 2-grams and 3-grams, extracted from the HC Corpora.
The algorithm behind the app is an implementation of Katz's Back-off model. The basic concept of this model is that a conditional probability is estimated for a word given its history in the n-gram. The algorithm first estimates probabilities using 3-grams (last 3 words), then if none are observed, it moves to the 2-grams (last 2 words) and if no bigrams exist, it estimates using 1-gram (last word) probabilities.
The model also uses a smoothing method to account for the probability of unseen N-grams. An absolute discount value is selected which artificially lowers the counts of observed trigrams and gives some probability to unobserved ones.
A lot of preparation has been done to the data before passing it to the model. Therefore, no capital letters, punctuation marks, special characters or numbers are present.
One of the best features of the applications is its ease of use. Simply type words separated by space in the text field and the algorithm will find the best word to finish your sentence. However, if any words are not in the dictionary of the model, you might get an error. The best approach is to type the first word and let the app suggest the continuation.
The app will produce 5 buttons with predictions for the next possible word with likelihoods decreasing from left to right. By simply clicking on one of these buttons, you can add the word to the text box and complete the sentence. The app will automatically predict the next word. If you do not like any of the choices, you can type a word yourself.
The app takes a bit to load the data at the start, but after that it runs very quickly!
Thank you for your attention!