# Load reticulate into current R session
library(reticulate)
library(here)
use_python("env/Scripts/python.exe")
# Retrieve/force initialization of Python
::py_config() reticulate
Hello Transformers from R
Before we begin, it’s worth mentioning that this is just an exploratory coding blog post (for now) on how to leverage some of the libraries in the Hugging Face Ecosystem from R. It is based on the Python code of the first chapter of the book: Natural Language Processing with Transformers (amazing book by the way!) so if you’d like a detailed explanation on Transformers and NLP in general, please refer to it. Hoping to write a much more detailed post along with the code .
That said, let’s get started 🤗!
To leverage 🤗 Transformers from R, we’ll require the reticulate
package which provides a comprehensive set of tools for interoperability between Python and R. Let’s begin by creating a virtual environment where we’ll install 🤗 Transformers as described in 🤗 Transformers docs. In Python, virtual environments allow you to create isolated Python installations making it easier to manage different projects, and avoid compatibility issues between dependencies. Take a look at this guideon creating virtual environments depending on your OS.
Once you have a Python installation on your workspace, start by installing virtualenv
: used to manage Python packages for different projects.
The following commands are for creating a virtual environment on Windows and are to be executed on the command prompt
py -m pip install –user virtualenv
Create a virtual environment. venv will create a virtual Python installation in the env
folder:
py -m venv env
Activating a virtual environment:
.\env\Scripts\activate
That’s it! Python virtual environment created and activated. Now we can hop into RStudio (or VScode) and carry on everything from there.
# Check if python is available
::py_available() reticulate
[1] TRUE
Now we’re ready to install 🤗 Transformers with the following command:
# Install Python package into virtual environment
::py_install("transformers", pip = TRUE)
reticulate
# Also installing pytorch just as a contingency?
::py_install(c("torch", "sentencepiece"), pip = TRUE) reticulate
1 A Tour of Transformer Applications
🤗 Transformers has a layered API that allows you to interact with the library at various levels of abstraction. pipelines()
, abstracts away all the steps needed to convert raw text into a set of predictions from a fine-tuned model (like workflows
in R Tidymodels). In Transformers, we instantiate a pipeline by calling the pipeline()
function and providing the name of the task we are interested in.
Every NLP task starts with a piece of text, like the following made-up customer feedback about a certain online order:
<- ("Dear Amazon, last week I ordered an Optimus Prime action figure from your online store in Germany. Unfortunately, when I opened the package, I discovered to my horror that I had been sent an action figure of Megatron instead! As a lifelong enemy of the Decepticons, I hope you can understand my dilemma. To resolve the issue, I demand an exchange of Megatron for the Optimus Prime figure I ordered. Enclosed are copies of my records concerning this purchase. I expect to hear from you soon. Sincerely, Bumblebee.") text
1.1 Text Classification
Let’s begin with sentiment analysis which is part of the broader topic of text classification. Let’s have a look at what it takes to extract the sentiment from our piece of text using 🤗 Transformers.
Functions and other data within Python modules and classes can be accessed via the $
operator (analogous to the way you would interact with an R list, environment, or reference class).
# Importing 🤗 transformers into R session
<- reticulate::import("transformers")
transformers
# Instantiate a pipeline
<- transformers$pipeline(task = "text-classification") classifier
Now that we have a pipeline, let’s generate some predictions. Each prediction is a named list
(in Python, this resembles a dictionary), so we can use the tibble
package to display them nicely as a data frame:
# Load Tidyverse
library(tidyverse)
# Generate predictions
<- classifier(text)
outputs
# Convert predictions to tibble
%>%
outputs pluck(1) %>%
as_tibble()
# A tibble: 1 x 2
label score
<chr> <dbl>
1 NEGATIVE 0.902
In this case the model is very confident that the text has a negative sentiment, which makes sense given that we’re dealing with a complaint from an angry customer!
1.2 Named Entity Recognition
Predicting the sentiment of customer feedback is a good first step, but you often want to know if the feedback was about a particular item or service. In NLP, real-world objects like products, places, and people are called named entities, and extracting them from text is called named entity recognition (NER). We can apply NER by loading the corresponding pipeline and feeding our customer review to it:
# Download model for ner task
<- transformers$pipeline(task = "ner", aggregation_strategy = "simple")
ner_tagger
# Make predictions
<- ner_tagger(text)
outputs
# Convert predictions to tibble
# This takes some bit of effort since some of the variables are numpy objects
# Function that takes a list element and converts
# it to a character
<- function(idx){
to_r # Obtain a particular output from entire named list
= outputs %>%
output_idx pluck(idx)
# Convert score from numpy to integer
$score = paste(output_idx$score) %>%
output_idxas.double()
return(output_idx)
}
# Convert outputs to tibble
map_dfr(1:length(outputs), ~to_r(.x))
# A tibble: 10 x 5
entity_group score word start end
<chr> <dbl> <chr> <int> <int>
1 ORG 0.879 Amazon 5 11
2 MISC 0.991 Optimus Prime 36 49
3 LOC 1.00 Germany 90 97
4 MISC 0.557 Mega 208 212
5 PER 0.590 ##tron 212 216
6 ORG 0.670 Decept 253 259
7 MISC 0.498 ##icons 259 264
8 MISC 0.775 Megatron 350 358
9 MISC 0.988 Optimus Prime 367 380
10 PER 0.812 Bumblebee 502 511
Extracting all the named entities in a text is nice, but sometimes we would like to ask more targeted questions. This is where we can use question answering.
The start
and end
integers correspond to the character indices of the word
in the text. Python collections are addressed using 0-based indices rather than the 1-based indices you might be familiar with from R. So to locate the character indices in R, you would have to add 1 to the values of start and end.
1.3 Question Answering
In question answering, we provide the model with a passage of text called the context, along with a question whose answer we’d like to extract. The model then returns the span of text corresponding to the answer. Let’s see what we get when we ask a specific question about our customer feedback:
# Specify task
<- transformers$pipeline(task = "question-answering")
reader
# Question we want answered
<- "What does the customer want?"
question
# Provide model with question and context
<- reader(question = question, context = text)
outputs %>%
outputs as_tibble()
# A tibble: 1 x 4
score start end answer
<dbl> <int> <int> <chr>
1 0.631 335 358 an exchange of Megatron
1.4 Summarization
The goal of text summarization is to take a long text as input and generate a short version with all the relevant facts.
R and Python have different default numeric types. 56
is considered a floating point in R, whereas 56
in Python is considered an integer.
This means that when a Python API expects an integer, you need to be sure to use the L
suffix within R.
See reticulate’s Calling Python from R vignette.
<- transformers$pipeline("summarization")
summarizer <- summarizer(text, max_length = 56L, clean_up_tokenization_spaces = TRUE)
outputs outputs
[[1]]
[[1]]$summary_text
[1] " Bumblebee ordered an Optimus Prime action figure from your online store in Germany. Unfortunately, when I opened the package, I discovered to my horror that I had been sent an action figure of Megatron instead. As a lifelong enemy of the Decepticons, I"
The summary isn’t too bad. The model was still able to capture the essence of the problem and correctly identify that “Bumblebee” (which appeared at the end) was the author of the complaint.
1.5 Translation
Let’s use a translation pipeline to translate an English text to German:
# This requires python package sentencepiece
<- reticulate::import("sentencepiece")
sentencepiece
# Explicitly specifying the model you want
<- transformers$pipeline(
translator task = "translation_en_to_de",
model = "Helsinki-NLP/opus-mt-en-de")
<- translator(text, clean_up_tokenization_spaces = TRUE,
outputs min_length = 100L)
outputs
[[1]]
[[1]]$translation_text
[1] "Sehr geehrter Amazon, letzte Woche habe ich eine Optimus Prime Action Figur aus Ihrem Online-Shop in Deutschland bestellt. Leider, als ich das Paket öffnete, entdeckte ich zu meinem Entsetzen, dass ich stattdessen eine Action Figur von Megatron geschickt worden war! Als lebenslanger Feind der Decepticons, Ich hoffe, Sie können mein Dilemma verstehen. Um das Problem zu lösen, Ich fordere einen Austausch von Megatron für die Optimus Prime Figur habe ich bestellt. Anbei sind Kopien meiner Aufzeichnungen über diesen Kauf. Ich erwarte, bald von Ihnen zu hören. Aufrichtig, Bumblebee."
1.6 Text Generation
Let’s say you would like to be able to provide faster replies to customer feedback by having access to an autocomplete function. With a text generation model you can do this as follows:
<- transformers$pipeline("text-generation")
generator <- "Dear Bumblebee, I am sorry to hear that your order was mixed up."
response <- paste(text, "\n\nCustomer service response:\n", response)
prompt <- generator(prompt, max_length = 200L)
outputs
%>%
outputs pluck(1, "generated_text") %>%
cat()
Dear Amazon, last week I ordered an Optimus Prime action figure from your online store in Germany. Unfortunately, when I opened the package, I discovered to my horror that I had been sent an action figure of Megatron instead! As a lifelong enemy of the Decepticons, I hope you can understand my dilemma. To resolve the issue, I demand an exchange of Megatron for the Optimus Prime figure I ordered. Enclosed are copies of my records concerning this purchase. I expect to hear from you soon. Sincerely, Bumblebee.
Customer service response:
Dear Bumblebee, I am sorry to hear that your order was mixed up. I wish to thank you for your purchase of the Optimus Prime Optimus Prime figures via you. I do plan on doing many other things with the Transformers series as well as am planning on having a few other Transformers comics to add to my collection.
After a somewhat difficult decision, I have decided to return the order. Sincerely
OK, maybe we wouldn’t want to use this completion to calm Bumblebee down, but you get the general idea 😄 .
That’s it. So far so good 🙂! Seems it’s not too difficult to use 🤗 Transformers from R. Can’t wait to explore the second chapter of the book!
Happy Learning 🤗,
Eric.