We need to consider for our application the size in memory and the CPU performances for a suitable delay for the model to compute the prediction since the free Shiny application plan has some strong restrictions. This will constrain the training data-set and the model to use.
The next word prediction model is based on the “Stupid Katz Back-off” algorithm given that this models is the best for web-scale data and work well in practice More details.
The performance of our implementation of the “Stupid Katz Back-off” algorithm has an accuracy of ~20% to compare to SwiftKey with an accuracy of >30% (couldn't find any official numbers). Removing stop word and using stem words didn't help. The novel approach was to optimized the code to run on the entire data-set quickly using parallelize vectorized functions.
Some possible improvements:
The references for this application “(see More/References)”
Below we give the instructions and describe how it function :
Below an example of the results:
This tool is offered under the standard Beerware license