Daniela Varela Tabares
May 2021
NLP (Natural Language Processing) is a subfield of the Artificial Intelligence that focuses on the interaction between the machine and the interpretation, understanding and processing human languages.
The goal for this project was to create an app that presents the options for what the next word might be given the phrase input by the user.
For that purpose, many steps had to be taken: data extraction and exploration, modeling, programming the algorithm, predictive creating the data product…etc.
Here I present some of the most relevant concepts utilized in this solution:
The dataset was collected from publicly available sources by a web crawler. Text from blogs, news and twitters were used.
10 % of each source was sampled and merged into a full corpus to pro process everything together.
For more details of the exploration go to : https://rpubs.com/dvarelat/760261
After random sampling, cleaning and pre-processing the data, the algorithm was implemented to do the following tasks:
The functions mentioned above are in the functions.R script, and the actual code for running them and create the full dictionaries is in modelling.R. Prediction process is in pred.R
The interface is pretty simple, containing just an input text, a button, and some messages around them to guide the user.