gsk
April 2015
WordPredictor is a prototype of a statistical learning model for word prediction.
Given a certain text, WordPredictor suggests one or more words for the following word.
The current prototype version of WordPredictor has been trained inspecting more than 400,000 documents in a blogs data set.
WordPredictor has an average accuracy rate of of 15%, 3% and 0.1% for the first three words suggested correspondingly.
WordPredictor has been completely developed from scratch in R. It includes the following capabilities.
WordPredictor uses an algorithm for encoding n-grams as floating-point numbers that allows for fast searches and efficient memory usage. The prototype application, code and data structures included, requires no more than 25 MB of storage.
You can find more information about WordPredictor here.