Lightning Talks: Translating and Detecting Languages in a Diverse and Multilingual World

Brianna Brooks

I Bet You Didn’t Know…

  • That there are over 7,000 languages in the entire world! In fact:
    • 50% of all languages are spoken by just 1% of the population
    • Roughly 40% of languages are endangered

There is no single way to speak!

Lost in Translation?

  • Multilingual environments can result in confusion and misunderstanding!
    • Real world data is multilingual and not uniform
    • Hard to analyze consistently

Is there a way we can work to better understand each other in multilingual environments?

Tounges Finally Untied: polyglotr

Creator: Tomer Iwan

Year: 2024

polyglotr is an R package that:

  • serves as a language translation tool for the R programming language
  • enables consistent analysis and insights across diverse linguistic data sources

polyglotr in the Real World!

Examples of use in multilingual environments include:

  • Product Reviews
  • Social Media Analysis (ex. Video Comments, etc.)
  • Customer Support Automation
  • Intelligence

polyglotr in action!

  • The code below is an excerpt from the Language Detection and Conditional Translation Vignette. It:
  • Automatically detects languages
  • Translates non-English content
# Core Functions

# Skip English
if (detected_lang == "en") {
  return(tibble(
    original_text = input_text,
    english_text = input_text,
    was_translated = FALSE,
    detected_language = detected_lang
  ))
}

# Translate non-English (with error handling)
translated_text <- tryCatch(
  google_translate(input_text, "auto", target_language),
  error = function(e) "[TRANSLATION FAILED]"
)

Input and Output:

# Input
id user_feedback                                                           
  <int> <chr>                                                                            
1 Great product, very satisfied!                                
2 Excelente producto, muy satisfecho!                     
3 Produit fantastique, je le recommande!              
4 This service exceeded my expectations.               
5 Der Service war wirklich hervorragend.                 

# Output
 <chr>                                  <lgl>          <chr>            
1 Great product, very satisfied!         FALSE          en               
2 Excellent product, very satisfied!     TRUE           es               
3 Fantastic product, I recommend it!     TRUE           fr               
4 This service exceeded my expectations. FALSE          en               
5 The service was really excellent.      TRUE           de               

Real World Application:

Detection and Translation of Multilingual Social Media Comments

Detection and Translation of Multilingual Social Media Comments

I used polyglotr to translate multilingual comments of Guns and Roses’ “November Rain” music video

  • detects the language
  • retrurns a translation of the comment
# Input 
mixed_data <- tibble(
  id = 1:8,
  user_feedback = c(
    "In my opinion, one of the best bands in the world.",
    "Siempre me encanta..así yo no entienda nada.hasta me hace llorar!",
    "La France t’AIME.",
    "Einfach episch.")
)
# Output 
1 In my opinion, one of the best bands in the world.                                                  FALSE  
2 I always love it... even if I don't understand anything. It even makes me cry!                      TRUE
3 France LOVES you.                                                                                   TRUE  
4 Simply epic.                                                                                        TRUE   

Say It Right, Say It All:

  • polyglotr successfully detected and translated the comments
  • Below is an analysis of languages detected by polyglotr in the previous scenario:

Making the World Understandable, One Translation at a Time

Understanding each other shouldn’t depend on the language we speak.

  • polyglotr breaks down linguistic barriers and allows for

    • More consistent analysis
    • Better decision-making
    • Greater inclusivity across global audiences

THANK YOU

  • Link to github repo:

https://github.com/madisonbri06-cpu/Polyglot7-Translating-and-Detecting-Languages-in-a-Diverse-and-Multilingual-World