As part of the Coursera Data Science Specialization, this text prediction application was developed using Natural Language Processing (NLP) techniques.
The process is illustrated below:
July 27, 2018
As part of the Coursera Data Science Specialization, this text prediction application was developed using Natural Language Processing (NLP) techniques.
The process is illustrated below:
The Text Analysis was made using quanteda in R. N-Grams were created starting with 1-grams and up to 5-grams.
For Smoothing, that is, to rebalance probability for unseen n-grams, I used Interpolation, defined, for a 3-gram example, as:
\[ \begin{aligned} P_{interp}(w_i | w_{i-2}w_{i-1}) = \lambda_1P(w_i | w_{i-2}w_{i-1}) +\\ \lambda_2P(w_i | w_{i-1}) +\\ \lambda_3P(w_i) \end{aligned} \] Where: \[ \lambda_1 + \lambda_2 + \lambda_3 = 1 \]
Main: to introduce text and get prediction for the next words.
Stats: Show some useful stats about the usage of the app.