Additional filtering required due performance: 7 millions+ texts caused performance problem to product hosted in shiny.io. Discounted Kneser-Ney smoothing criteria http://mkoerner.de/media/bachelor-thesis.pdf helps in filtering using criteria like prior 1,2,3 words are fixed, maximum variability on the 4 word. The dataset reduced to 100,000 lines
Backoff mechanism implemented to find the match first with Five-gram, Four-gram until unigram.
Data Product Description
In the sidebar enter your text
Prediction result of the 5 words highest probabilty will be shown below
In the main panel, a maximum of 30 highest probablity words will displayed in a cloud