2025-03-25

Libraries used for the app

  • shiny: Creates an interactive web application, allowing users to input text and receive predictions dynamically.
  • tm: Provides tools for text mining and preprocessing, such as cleaning and transforming text data.
  • tokenizers: Splits text into words, sentences, or n-grams, enabling efficient text analysis and modeling.
  • dplyer: Facilitates data manipulation and transformation, making it easier to filter, arrange, and summarize text data.

tm

Function to clean a vector of texts

clean_text <- function(texts) {
  # Convert to lowercase
  texts <- tolower(texts)
  
  # Remove URLs (http, https)
  texts <- str_replace_all(texts, "http[s]?://\\S+", "")
  
  # Remove mentions and hashtags (especially for Twitter)
  texts <- str_replace_all(texts, "@\\w+", "")
  texts <- str_replace_all(texts, "#\\w+", "")
  
  # Remove numbers
  texts <- str_replace_all(texts, "\\d+", "")
  # ...
  
  return(texts)
}

tokenizers

Function to obtain n-grams using tokenize_ngrams() from the tokenizers package

get_ngram <- function(texts, n) {
  ngrams <- tokenize_ngrams(texts, n = n, n_min = n)
  return(unlist(ngrams))
}

App display