Chiara Todaro
28 June 2019
HOW: By using emails, blogs, news, etc (training data set) a dictionary of 1-, 2-, 3-grams with associated frequencies is created.
1gram freq.1 2gram freq.2 3gram freq.3
1 that 89737 has been 7914 one of the 741
2 with 66476 more than 6779 going to be 367
3 was 59086 as well 4915 some of the 296
4 said 52832 they were 4090 the end of 293
5 he 48397 he said 3723 if you are 283
dictionary with ~166000 1-,2-,3-grams occupies only 20MB
HOW: Given a phrase \( w_1 ... w_{n-1} w_n \)
INPUT: a sequence of words
OUTPUT: the next word in the sequence
SIZE: 20MB (~dictionary space)
TIME: ~2sec per phrase
Customizable:
WHY:
Word suggestion/ predictive text
… and with few modifications: