Leonardo Gomez Castañeda
Mayo 26 de 2020
Text mining, also known as text data mining, equivalent to text analytics, is the process of deriving high-quality information from text. High-quality information is typically obtained through the design of patterns and trends through means such as the learning of statistical patterns.
Text mining generally involves the process of structuring the input text (generally analysis, along with adding some derived linguistic features and removing others, and then inserting them into a database), deriving patterns within structured data and, finally, evaluation and interpretation. of departure. “High quality” in text mining generally refers to a combination of relevance, novelty and interesting. Typical text mining tasks include categorizing text, grouping text, extracting concepts / entities, producing granular taxonomies, analyzing sentiments, summarizing documents, and modeling relationships between entities (i.e. , the learning relationships between named entities).
The goal of this exercise is to create a product to highlight the prediction algorithm that you have built and to provide an interface that can be accessed by others. For this project you must submit:
A Shiny app that takes as input a phrase (multiple words) in a text box input and outputs a prediction of the next word.
A slide deck consisting of no more than 5 slides created with R Studio Presenter (https://support.rstudio.com/hc/en-us/articles/200486468-Aut horing-R-Presentations) pitching your algorithm and app as if you were presenting to your boss or an investor.
The data set to be processed corresponds to random data extracted from the web through news portals, twees and blogs in 4 different languages such as English, German, Russian and Finnish.