Predictive Text App
Aliakbar Safilian
June 10, 2019
What?
Suggesting words the user may wish to insert in a text field.
The user can choose the language model (n-gram)
The user can choose the number of top-suggestions
How? - Preprocessing
Corpus
:
The original corpus:
4,269,678
Lines &
102,080,244
Words
Random Sampled Corpus:
50%
fraction of the original
Preprocessing
:
lower-case
conversion & removing
hyphens
removing twitter & other
symbols
removing
separators
& removing
punctuations
removing
numbers
& words containing numbers
removing
profanities
& removing
non-English
words
etc…
How? - Language Modeling
4-Gram
&
3-Gram
Models
Follow the
Markov assumption
Use the
Kneser-Ney Smoothing
method for n-gram probabilities
Evaluation
:
Testing data:
1,392
lines &
28,658
words
Top-3 Precision:
21.48%
Top-1 Precision:
13.45%
Memory Used:
109 MB
Resources
Application
Repository
Technical Report
Any comments would be much appreciated.