R. Holley
Nov. 24, 2020
This text-prediction application was built with R for data-wrangling and shiny for the UI and hosting. The initial model-building data was provided by Swiftkey for the Johns Hopkins University/Coursera Data Science Certificate Program.
This app is easy-to-use and has a clean, simple interface. A few of the salient features are:
In this screenshot, the user input 'rather' in the text box on the left, and selected output length 5 on the slider. The results on the right give possible words, ranked by probability.
This app uses a bigram Markov-chain model to build a 'dictionary.' The name of each dictionary entry is the input word; the entry itself contains a vector of named probabilities - each name is an output word, returned in order of highest probability.
Below are the first few lines of the dictionary entry named apple, printed in table format for easy reading.
apple
and 0.05120232
store 0.04104478
pie 0.04104478
cider 0.03565506
is 0.03130182
has 0.02611940
This type of structure makes retrieving entries incredibly efficient. Because each entry is named, it can be called directly without needing a search function to the scan all entries looking for a match.
When a user inputs an unknown word, the app uses the last known word as input instead. For example, if the user accidentally enters “Christmas trer” instead of “Christmas tree,” the algorithm skips the unknown word “trer” and uses “Christmas” as the input instead.
For further questions on this application and its development, visit the github repository here.