July 27, 2018

NLP Word Prediction App

As part of the Coursera Data Science Specialization, this text prediction application was developed using Natural Language Processing (NLP) techniques.

The process is illustrated below:

Process

The Text Analysis was made using quanteda in R. N-Grams were created starting with 1-grams and up to 5-grams.

For Smoothing, that is, to rebalance probability for unseen n-grams, I used Interpolation, defined, for a 3-gram example, as:

\[ \begin{aligned} P_{interp}(w_i | w_{i-2}w_{i-1}) = \lambda_1P(w_i | w_{i-2}w_{i-1}) +\\ \lambda_2P(w_i | w_{i-1}) +\\ \lambda_3P(w_i) \end{aligned} \] Where: \[ \lambda_1 + \lambda_2 + \lambda_3 = 1 \]

App Sections

  • Main: to introduce text and get prediction for the next words.

  • Stats: Show some useful stats about the usage of the app.

Try the App!