Juan Carlos Mayo
The model is based on a 4-gram table that collects the most frequent word combinations. The prediction function takes a text, splits it into words and looks for matches in the table
It takes the last three words available - trigrams - and looks for them, returning the fourth one if there is a match
If no match is found it repeats the process using the last two words - bigrams. If no match is found it proceeds to single words - unigrams
If there is no matching unigram, it returns at random three of the most common words used
Matches are sorted by their frequency and the three most frequent results are returned
The model relies on the data.table format, which makes it really fast to query and return results
The accuracy of the prediction function is limited: using a test corpus of more than 40K sentences. The last word of every sentence was predicted correctly 1.11% of the times
This indicates that the prediction function can be improved following more advanced methods, but which in turn require far more computing power
This simple model is, nevertheless, fast and lightweight and provides reasonable accuracy for common text input
As the user types in the text box, three suggestions are provided above Clicking on a suggestion adds it to the text
If no suggestion can be found an empty button is shown
Test it yourself at ShinyApps !