Natural Language Processing Project

23/4/2021

Shiny-App for text prediction

This app shows some windows where you can put any phrase and it’s going to return the most 3 probably words. Something like your Whatsapp system! Look at it here: https://andres25.shinyapps.io/NLP-capstone-project/

How does this work?

You need to introduce a phrase for the algorith look for the most probably words.
The app use the 10% of the total corpus created by the 3 given documents: News, Blogs and Twitter content.
Initially a n-grams model was done with “quanteda” package but it had some problems when unknown words were introduced. So, a “stupid backoff” model was implementated.
This algorith is between the best for n-grams model smoothing. It keeps a little bit of probability for unknown words, when appear a new word the total probability of the model is not going to be zero.
For the implementation of the model was used the “sbo” package, a very good tool for making language predictor based on n-grams models.

Process

To build the application, the following steps were taken:

Getting and downloading data
Interpret data
Perform a quality analysis of the data
Clean the data
Perform an exploratory analysis of the data
Build a good corpus of data
Create a model of n-grams that will model the corpus
Find ways to improve the n-grams algorithm
Design the Shiny application

Evaluation

For evualating the model was used the function eval_sbo_predictor() of the “sbo” package.

This was a wonderful experience for me! And for you?

THANK YOU VERY MUCH

If there’s any doubt you can write me to andres25@utp.edu.co