Vincent Force
March 23th, 2018
The purpose is to design an algorithm for predicting next word based on the 3 (at most) preceding words, following the main steps above
The resulting algorithm has to use less than 1GB RAM on runtime to fit the limit on a shinyapps.io with a free plan, and return predictions in a short enough time to be used interactively.
We have to keep in mind that initial target is a smart-phone!
Many steps are involved in data processing, including statistical computations, as well as specific NLP operations and optimisation operations
We tried to use the stemming (e.g. going, gone, go get stemmed to go) methods provided by quanteda package to leverage the number of possible matches.
Four different backoff prediction algorithms have been implemented, in order to perform a benchmark.
Accuracy is given for different training sizes (% of initial data to train the model)
and for different accuracy levels
A responsive application has been designed for
you to test the algorithms.
It provides an interactive next word prediction
interface and an accuracy measurement tool on
a text given by user
Click to go to shiny application
Interactive next word prediction interface
Type an input phrase and hit space bar, for the interface to provide: