Natural Language Processing for word suggestion

Chiara Todaro
28 June 2019

HOW: By using emails, blogs, news, etc (training data set) a dictionary of 1-, 2-, 3-grams with associated frequencies is created.

  1gram freq.1     2gram freq.2       3gram freq.3
1  that  89737  has been   7914  one of the    741
2  with  66476 more than   6779 going to be    367
3   was  59086   as well   4915 some of the    296
4  said  52832 they were   4090  the end of    293
5    he  48397   he said   3723  if you are    283

dictionary with ~166000 1-,2-,3-grams occupies only 20MB

HOW: Given a phrase \( w_1 ... w_{n-1} w_n \)

  1. N-grams of the type [\( w_{n-1} w_n \sim \)] are searched throughout the dictionary
  2. Phrase probability is calculated by multiplying the probability of consecutive N-grams
  3. Probability of single N-grams are weighted by coefficients proportional to the predicted word
  4. Suggested word is extracted from the most likely phrase

link to app

WHAT: An app that

  • INPUT: a sequence of words

  • OUTPUT: the next word in the sequence

  • SIZE: 20MB (~dictionary space)

  • TIME: ~2sec per phrase



Customizable:

  • number of choices for the suggested word
  • N-gram used for prediction
  • chance to create your own dictionary

WHY:

Word suggestion/ predictive text

… and with few modifications:

  • Spelling correction
  • Machine translation
  • Part of speech tagging [noun,adjective,verb]