Fariz Abdul Rahman
8/31/2017
A word prediction app based on Stupid Backoff 5-gram model
Key in text in the app and after a few seconds, the top 5 candidate for the next word is served up.
Hence the product name, WordUp!
Behind the wheels, the app has two main components:
The n-gram frequency matrix
The prediction algorithm
The Stupid Backoff algorithm is inexpensive and approaches the quality of more expensive algorithm like Kneser-Ney Smoothing for large training dataset.
This compensates the slow performance using R, which is great for setting up models and graphics, but not for processing large amounts of data.
The corpus data provided has a total number of words exceeding 75 million using 556MB of storage.
With a 64-bit, 12GB RAM desktop, the largest attainable training dataset was using 15% random sampling.
After launching the app, the screenshot on the left should be visible.
A sample text input with the corresponding result will be visible.
The scores indicate the weight of each predicted word compared to other words in the list.
The app can be accessed here.