23/4/2021

Shiny-App for text prediction

How does this work?

  • You need to introduce a phrase for the algorith look for the most probably words.
  • The app use the 10% of the total corpus created by the 3 given documents: News, Blogs and Twitter content.
  • Initially a n-grams model was done with “quanteda” package but it had some problems when unknown words were introduced. So, a “stupid backoff” model was implementated.
  • This algorith is between the best for n-grams model smoothing. It keeps a little bit of probability for unknown words, when appear a new word the total probability of the model is not going to be zero.
  • For the implementation of the model was used the “sbo” package, a very good tool for making language predictor based on n-grams models.

Process

To build the application, the following steps were taken:

  • Getting and downloading data
  • Interpret data
  • Perform a quality analysis of the data
  • Clean the data
  • Perform an exploratory analysis of the data
  • Build a good corpus of data
  • Create a model of n-grams that will model the corpus
  • Find ways to improve the n-grams algorithm
  • Design the Shiny application

Evaluation

For evualating the model was used the function eval_sbo_predictor() of the “sbo” package.

This was a wonderful experience for me! And for you?

THANK YOU VERY MUCH

If there’s any doubt you can write me to andres25@utp.edu.co