Natural Language Processing for word suggestion

Chiara Todaro
28 June 2019

HOW: By using emails, blogs, news, etc (training data set) a dictionary of 1-, 2-, 3-grams with associated frequencies is created.

  1gram freq.1     2gram freq.2       3gram freq.3
1  that  89737  has been   7914  one of the    741
2  with  66476 more than   6779 going to be    367
3   was  59086   as well   4915 some of the    296
4  said  52832 they were   4090  the end of    293
5    he  48397   he said   3723  if you are    283

dictionary with ~166000 1-,2-,3-grams occupies only 20MB

HOW: Given a phrase \( w_1 ... w_{n-1} w_n \)

N-grams of the type [\( w_{n-1} w_n \sim \)] are searched throughout the dictionary
Phrase probability is calculated by multiplying the probability of consecutive N-grams
Probability of single N-grams are weighted by coefficients proportional to the predicted word
Suggested word is extracted from the most likely phrase

link to app

WHAT: An app that

INPUT: a sequence of words
OUTPUT: the next word in the sequence
SIZE: 20MB (~dictionary space)
TIME: ~2sec per phrase

Customizable:

number of choices for the suggested word
N-gram used for prediction
chance to create your own dictionary

WHY:

Word suggestion/ predictive text

… and with few modifications:

Spelling correction
Machine translation
Part of speech tagging [noun,adjective,verb]