Product Presentation: N-Gram Word Prediction

Konstantin Mingoulin
March 17, 2018

Cool Features

  • Provide up to 5 suggestions
  • To insure accuracy and relevancy up to 4 preceding words used to predict the next one

Algorithm Description

  • Sample data from 3 corpora: news, blogs and twitter
  • Clean-up and stem the combined corpus
  • Create term matrix that contains 2 to 5 n-grams
  • Function is created to take a line of text and predict the word based on the maximum number of preceding words, i.e. start with 4, then 3, all the way to 1. The input does not need to be stemmed
  • The function outputs 5 most likely outcomes based on the frequency of occurrence in corpus. Results go through the stem completion to output most prevalent options based on same combined corpus (not stemmed)
  • If no matches found, the function returns “no match”

App Description

  • Enter text in the “Text Entry” box
  • Click “Predict”
  • Five most likely prediction in order of likelihood will appear
  • Select the appropriate suggestion and click “Accept”

Note: if you continue typing, suggestion will appear automatically and there is no need to click “Predict”