Large Language Models (LLMs) like GPT-4 have revolutionized how we
communicate and understand information. In this activity, we’ll explore
how to leverage LLMs in R for Data Science using the openai
package.
We’ll start by playing with OpenAI API for free-text and structured responses. Then, we’ll use LLMs to answer - and grade - textual questions and answers.
Objectives:
Prerequisites
openai
, httr
,
tidyverse
, and jsonlite
packages.Download the Reasoning
20k dataset and place the combined_reasoning.json
file
somewhere you can find it.
# Install and load required packages
# install.packages("openai");
# install.packages("httr");
# install.packages("jsonlite");
# install.packages("furrr");
# install.packages("kableExtra")
library(openai);
library(httr);
##
## Attaching package: 'httr'
## The following object is masked from 'package:openai':
##
## upload_file
library(jsonlite);
library(tidyverse);
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ purrr::flatten() masks jsonlite::flatten()
## ✖ dplyr::lag() masks stats::lag()
## ✖ httr::upload_file() masks openai::upload_file()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(furrr); # for parallel map
## Loading required package: future
library(kableExtra);
##
## Attaching package: 'kableExtra'
##
## The following object is masked from 'package:dplyr':
##
## group_rows
Utility function to format tables nicely:
# Function to truncate text in all string (character) columns of a data frame
format_table <- function(df, max_length = 150) {
head_df <- data.frame(df %>% head(5))
# Function to truncate individual text entries
truncate_text <- function(text, max_length) {
text <- gsub("[\r\n]", "", text)
return (
ifelse(nchar(text) > max_length,
paste0(substr(text, 1, max_length), "..."),
text)
)
}
# Loop over all columns that are character type and apply truncation
for (col in colnames(head_df)) {
if (is.character(head_df[[col]])) {
head_df[[col]] <- sapply(head_df[[col]], truncate_text, max_length = max_length)
}
}
# Return the modified data frame
head_df %>%
kbl() %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed"), font_size=12)
}
# Set your OpenAI API key. Shilad or your instructor will give this to you.
Sys.setenv(OPENAI_API_KEY = "YOUR API KEY")
Our interaction with the LLM will be through the OpenAI Chat Completions API. This API allows us to interact with the LLM by providing prompts and receiving completions. As of 2024, this API is by far the most popular way to interact with LLMs. In this activity we will use this API to ask questions, generate text, and even grade responses.
To begin, let’s start with a simple question answering task using the
GPT-4o-mini model. We are going to encapsulate the question answering
logic in a function called completion
. It effectively asks
the LLM for an answer to a question.
completion <- function(prompt, max_tokens = 100) {
# Get the response from the LLM
response <- openai::create_chat_completion(
model = "gpt-4o-mini",
messages = list(list(role = "system", content = prompt)),
temperature = 0.1,
max_tokens = max_tokens
)
# Return the response
return (response$choices$message.content);
}
Experiment with the function by asking a simple question as shown in the example below. Change the question below to several that you are interested in or have expertise related to. How does it perform? What does it do well? What does it do poorly? Put your example questions and analysis in the code below.
# Ask a few questions. In comments answer: what does it do well? What does it do poorly?
completion("Why is the sky blue?", 200);
## [1] "The sky appears blue primarily due to a phenomenon called Rayleigh scattering. When sunlight enters the Earth's atmosphere, it is made up of different colors, each with varying wavelengths. Blue light has a shorter wavelength compared to other colors like red or yellow.\n\nAs sunlight passes through the atmosphere, it interacts with air molecules and small particles. Because blue light is scattered in all directions more effectively than other colors due to its shorter wavelength, we perceive the sky as blue during the day.\n\nAt sunrise and sunset, the sun is lower on the horizon, and its light has to pass through a greater thickness of the atmosphere. This increased distance scatters the shorter blue wavelengths out of our line of sight, allowing the longer wavelengths, such as red and orange, to dominate the sky's appearance during those times."
For programmatic responses, it’s often helpful to have the LLM returned structured responses where we can extract a variety of different types of information
json_completion <- function(prompt, max_tokens = 200) {
# Get the response from the LLM
response <- openai::create_chat_completion(
model = "gpt-4o-mini",
messages = list(list(role = "system", content = prompt)),
temperature = 0.0,
max_tokens = max_tokens
)
json <- response$choices$message.content;
# Shilad: This is a hack to remove occassional responses that wrap the json with ``json... ``` in 4o-mini.
# Ideally we would use {response_type : json_object} to avoid this but it's not supported by the R OpenAI wrapper.
pattern <- regex("```json(.*?)```", dotall = TRUE);
if (str_detect(json, pattern)) {
json <- str_match(json, pattern)[, 2]; # Extract the matched JSON content
}
return (fromJSON(json));
}
Below you can find an example of calling this structured completion.
The return value will be an object with fields
response$attendees
, etc.
response <- json_completion("
Your task is to extract a structured calendar invite by analyzing a short text.
The return value should be a JSON object with the following fields:
- attendees: A list of strings with attendee names.
- when: The starting date and time for the calendar event
- subject: Short description of the event
- description: Detailed few-sentence description of the event.
Perform this task for the following text:
Matthew and Ellen should meet Sunday at 4pm to discuss the future of the budget.
")
response
## $attendees
## [1] "Matthew" "Ellen"
##
## $when
## [1] "2023-10-29T16:00:00"
##
## $subject
## [1] "Budget Discussion"
##
## $description
## [1] "Matthew and Ellen will meet to discuss the future of the budget. This meeting will focus on planning and strategizing for upcoming financial decisions."
# Create your own example task (not a calendar invite) that produces structured output
# using the example above. Push the LLM with a hard example. Does it get it correct?
response <- json_completion("
Your task is to ....
")
response
# In your comments, reflect on: How might you use this for Data Science purposes?
In this assignment we will answer questions from the Reasoning 20k dataset. The dataset contains a set of challenging factual questions along with their answers. We will load the dataset, filter example questions, and then interact with the LLM to answer the questions. We chose this dataset because it’s created in October 2024 and the LLM could not possibly “cheat” by having trained on this data.
Download the JSON dataset to your computer and read it into a
variable named reasoning_20k_df
using code similar to the
following:
reasoning_20k_df <-
as.data.frame(fromJSON("~/Downloads/combined_reasoning.json")) %>%
select(user, assistant) %>%
rename(question = user, answer = assistant) %>%
mutate(id = row_number()) %>%
relocate(id, .before = question);
format_table(reasoning_20k_df)
id | question | answer |
---|---|---|
1 | Prove that the difference between two consecutive cubes cannot be divisible by 5, using the fact that the only possible remainders when a cube is divi… | Let the two consecutive cubes be \(n^3\) and \((n+1)^3\). Their difference is:\[(n+1)^3 - n^3 = 3n^2 + 3n + 1.\]When \(n^3\) is divided by 5, the possible r… |
2 | How can I integrate the function \(\arcsin(\sqrt{x+1}-\sqrt{x})\)? Is there an easier way than using the formula $f(x),dx=x f(x)-_{… | |
3 | Given the expression \(\frac{x^3+x+1}{(x^2+1)^2}\), decompose it into partial fractions. | The decomposition of \(\frac{x^3+x+1}{(x^2+1)^2}\) can be directly observed as \(\dfrac{x}{x^2+1}+\dfrac{1}{(x^2+1)^2}\). This is because \(x^3+x\) can be f… |
4 | Is it true that a man named Mûrasi from India is 179 years old, as claimed by certain sources?Sources:- eface India- News Origin- World News Daily Rep… | No, this claim is not accurate. The source of this information, the World News Daily Report, is known to publish fake news. They claim that Mûrasi was… |
5 | Find an example of a linear operator whose norm is not equal to the norm of its inverse. | Consider the linear operator T from \((\mathbb{R}^2, \|\cdot\|_{sup})\) to \((\mathbb{R}^2, \|\cdot\|_1)\) defined by \(T(x,y) = (y,x)\). The norm of T is 1… |
Now that we have the dataset loaded, pick some interesting questions
to ask the LLM. Open the dataset in the built in R dataset
viewer and search using the search field for 5-10 questions
that interest you. Write down their ids and create a dataframe called
interesting_df
that contains just those questions
# These are questions that interest Shilad related to Music theory.
# Pick ones that interest you. Locate them using the search function built in RStudio dataset viewer.
question_ids <- c(8248, 14377, 7769, 7311, 2568);
interesting_df <- reasoning_20k_df %>% filter(id %in% question_ids);
format_table(interesting_df)
id | question | answer |
---|---|---|
2568 | Why does a music synthesizer built with a chain of astable multivibrator circuits detune after a few hours, even with fixed resistor values instead of… | The detuning in your analog synthesizer is due to various factors that affect the oscillation frequency over time:1. Power supply voltage fluctuations… |
7311 | In an equal-tempered musical scale, how many intervals are there in an octave? | An octave in an equal-tempered musical scale is divided into 12 intervals. |
7769 | In a musical event called extreme quarteting, 142 people participated. How many different groups of 4 can be formed from these 142 people? | To calculate the number of different groups of 4 that can be formed from 142 people, we use the combination formula, which gives the number of ways to… |
8248 | In music theory, prove that stacking a series of just intonation pure fifths will never result in a perfectly tuned octave or its multiples. Mathemati… | To prove this, we start with the given equation: $ ()^m = ()^n $. We can rewrite this as $ 3^m = 2^{m-n} $. Since 2 and 3 are di… |
14377 | Does tuning musical instruments to 432Hz provide any significant benefits to human well-being or is it more “natural” compared to the standard 440Hz t… | There is currently insufficient scientific evidence to support the claim that 432Hz tuning is objectively superior or more beneficial to human beings … |
Carefully read through the questions and answers, and make sure they are accurate and make sense!
Throughout this activity we are going to use a helper function called
map_table_rows
that applies a function to each row of a
dataframe and returns a new dataframe with the original columns and the
new columns. This function is useful for applying the LLM to each row of
a dataset.
We are going to use the pmap
function from the furrr
package
which is a parallelized version to speed up the process.
map_table_rows <- function(df, mapping_function) {
result <- future_pmap(df, mapping_function);
return (cbind(df, bind_rows(result)));
}
Below is an example of how to use the map_table_rows
function. Take a look at the code and add a comment explaining what it
is doing.
Add a second example that adds the square of x + y to the dataframe.
# Add a comment below indicating exactly what is happening
df <- data.frame(
x = c(1, 2, 3, 4),
y = c(5, 6, 7, 8),
z = c(9, 10, 11, 12)
);
example_mapper <- function(x, y, ...) {
return (list(
sum = x + y,
product = x * y
))
}
df %>% map_table_rows(example_mapper)
## x y z sum product
## 1 1 5 9 6 5
## 2 2 6 10 8 12
## 3 3 7 11 10 21
## 4 4 8 12 12 32
# Add a second example that adds the square of x + y to the dataframe.
To speed up this mapping, we will ask furrr
to make up
to work on 10 rows in parallel. This will mean we execute up to 10 LLM
calls at the same time.
plan(multisession, workers = 10)
## Warning in checkNumberOfLocalWorkers(workers): Careful, you are setting up 10
## localhost parallel workers with only 8 CPU cores available for this R process
## (per 'system'), which could result in a 125% load. The soft limit is set to
## 100%. Overusing the CPUs has negative impact on the current R process, but also
## on all other processes of yours and others running on the same machine. See
## help("parallelly.options", package = "parallelly") for how to override the soft
## and hard limits
When you run this, depending on how many cores your laptop has, you may see a warning message indicating that this setting may saturate your CPU. Why is this unlikely when we use the parallel functions to interact with the OpenAI API?
Create a new dataset called simple_answers
that takes
interesting_df
and uses the vectorized function we just
created to add a predicted
column with the generated
answer.
# Complete your implementation of the function below.
# It should be a one-liner that calls the `completion` function.
# The name of the new column MUST be `predicted`
add_simple_answer <- function(question, ...) {
return (list(predicted = completion(question)));
}
simple_answers <-
interesting_df %>%
map_table_rows(add_simple_answer);
format_table(simple_answers)
id | question | answer | predicted |
---|---|---|---|
2568 | Why does a music synthesizer built with a chain of astable multivibrator circuits detune after a few hours, even with fixed resistor values instead of… | The detuning in your analog synthesizer is due to various factors that affect the oscillation frequency over time:1. Power supply voltage fluctuations… | A music synthesizer built with a chain of astable multivibrator circuits can experience detuning over time for several reasons, even when using fixed … |
7311 | In an equal-tempered musical scale, how many intervals are there in an octave? | An octave in an equal-tempered musical scale is divided into 12 intervals. | In an equal-tempered musical scale, there are 12 intervals in an octave. These intervals are typically referred to as semitones or half steps. Each se… |
7769 | In a musical event called extreme quarteting, 142 people participated. How many different groups of 4 can be formed from these 142 people? | To calculate the number of different groups of 4 that can be formed from 142 people, we use the combination formula, which gives the number of ways to… | To find the number of different groups of 4 that can be formed from 142 people, we can use the combination formula, which is given by:[C(n, r) = … |
8248 | In music theory, prove that stacking a series of just intonation pure fifths will never result in a perfectly tuned octave or its multiples. Mathemati… | To prove this, we start with the given equation: $ ()^m = ()^n $. We can rewrite this as $ 3^m = 2^{m-n} $. Since 2 and 3 are di… | To prove that stacking a series of just intonation pure fifths will never result in a perfectly tuned octave or its multiples, we need to analyze the … |
14377 | Does tuning musical instruments to 432Hz provide any significant benefits to human well-being or is it more “natural” compared to the standard 440Hz t… | There is currently insufficient scientific evidence to support the claim that 432Hz tuning is objectively superior or more beneficial to human beings … | The debate over tuning musical instruments to 432Hz versus the standard 440Hz is a topic of interest among musicians, sound therapists, and some alter… |
Take a look at the predicted answers and compare them to the actual answers. What do you notice?
To evaluate the quality of responses, we would traditionally use human experts to grade them. The rise of LLMs offers a new approach for labeling datasets using LLMs. The LLM-as-Judge paradigm uses the LLM to evaluate responses, providing feedback and scoring based on correctness and completeness. This allows us to assess the performance of the LLM or other models over a set of questions.
Think to yourself: What are the costs and benefits of using LLM-as-judge vs humans? When may it make sense to use one vs. the other?
Below is code that takes a dataframe and returns the grades for each question:
grade_predicted_answer <- function(question, answer, predicted, ...) {
prompt <- paste(
"
You are an expert grader.
Evaluate the student's answer to the following question.
Provide your response as a JSON object with the following attributes:
- feedback: A brief summary of feedback on the correctness of the student's answer.
- score: A score out of 10 based on the quality of the student's response.
Perform this task for the following question:
",
toJSON(
list(question = question, answer = answer, predicted = predicted),
auto_unbox = TRUE
)
);
grading_response <- json_completion(prompt);
return (list(
score = grading_response$score,
feedback = grading_response$feedback
));
};
Use the function above along with map_table_rows
to
assign grades to each predicted answer. In comments
# Use map_table_rows and grade_predicted_answer to assign grades to simple_answers
simple_answers %>%
map_table_rows(grade_predicted_answer) %>%
format_table()
id | question | answer | predicted | score | feedback |
---|---|---|---|---|---|
2568 | Why does a music synthesizer built with a chain of astable multivibrator circuits detune after a few hours, even with fixed resistor values instead of… | The detuning in your analog synthesizer is due to various factors that affect the oscillation frequency over time:1. Power supply voltage fluctuations… | A music synthesizer built with a chain of astable multivibrator circuits can experience detuning over time for several reasons, even when using fixed … | 7 | The student’s answer correctly identifies several factors that can lead to detuning in a music synthesizer built with astable multivibrators, such as … |
7311 | In an equal-tempered musical scale, how many intervals are there in an octave? | An octave in an equal-tempered musical scale is divided into 12 intervals. | In an equal-tempered musical scale, there are 12 intervals in an octave. These intervals are typically referred to as semitones or half steps. Each se… | 8 | The student’s answer is correct in stating that there are 12 intervals in an octave in an equal-tempered scale. However, it could be improved by menti… |
7769 | In a musical event called extreme quarteting, 142 people participated. How many different groups of 4 can be formed from these 142 people? | To calculate the number of different groups of 4 that can be formed from 142 people, we use the combination formula, which gives the number of ways to… | To find the number of different groups of 4 that can be formed from 142 people, we can use the combination formula, which is given by:[C(n, r) = … | 7 | The student correctly identifies the combination formula and applies it to calculate the number of groups of 4 from 142 people. However, the answer is… |
8248 | In music theory, prove that stacking a series of just intonation pure fifths will never result in a perfectly tuned octave or its multiples. Mathemati… | To prove this, we start with the given equation: $ ()^m = ()^n $. We can rewrite this as $ 3^m = 2^{m-n} $. Since 2 and 3 are di… | To prove that stacking a series of just intonation pure fifths will never result in a perfectly tuned octave or its multiples, we need to analyze the … | 8 | The student’s answer correctly identifies the mathematical relationship between the powers of 3 and 2, and effectively uses the Fundamental Theorem of… |
14377 | Does tuning musical instruments to 432Hz provide any significant benefits to human well-being or is it more “natural” compared to the standard 440Hz t… | There is currently insufficient scientific evidence to support the claim that 432Hz tuning is objectively superior or more beneficial to human beings … | The debate over tuning musical instruments to 432Hz versus the standard 440Hz is a topic of interest among musicians, sound therapists, and some alter… | 8 | The student’s answer accurately addresses the question by highlighting the lack of scientific evidence supporting the benefits of 432Hz tuning over 44… |
# Look at the scores and feedback. Does it make sense to you?
Chain of Thought is a prompting technique where you ask the LLM to work through the problem in structured steps.This can be useful for generating more detailed answers or exploring a topic in depth.
Apply the pattern you see in the grading function to create a new
data frame called cot_answers
(for “Chain of Thought”). *
Use the json_completion
function to generate a more
detailed response to each question. You will need to raise the second
argument from the default number of tokens to something higher
(e.g. 1000). * You should ask the LLM to produce a JSON object with the
following fields: - plan
: A step-by-step plan for solving
the problem. - details
: A detailed step-by-step solution to
the problem. - answer
: The final answer to the question. *
Note that the order of the fields in the JSON object is important. You
must force the LLM to generate output in the order it “thinks.”
add_cot_answer <- function(question, ...) {
prompt <- paste(
"
Your task is to provide a detailed response to the following question.
Generate a JSON object with the following fields:
- plan: A single string containing step-by-step plan for solving the problem.
- details: A single string containing detailed step-by-step solution to the problem.
- answer: The single string containing final answer to the question.
Perform this task for the following question:
",
question
);
cot_response <- json_completion(prompt, max_tokens = 1000);
return (list(
plan = cot_response$plan,
details = cot_response$details,
predicted = cot_response$answer
));
};
cot_answers <-
interesting_df %>%
map_table_rows(add_cot_answer);
format_table(cot_answers)
id | question | answer | plan | details | predicted |
---|---|---|---|---|---|
2568 | Why does a music synthesizer built with a chain of astable multivibrator circuits detune after a few hours, even with fixed resistor values instead of… | The detuning in your analog synthesizer is due to various factors that affect the oscillation frequency over time:1. Power supply voltage fluctuations… |
|
|
A music synthesizer built with a chain of astable multivibrator circuits detunes after a few hours due to temperature variations affecting resistor an… |
7311 | In an equal-tempered musical scale, how many intervals are there in an octave? | An octave in an equal-tempered musical scale is divided into 12 intervals. |
|
An octave in music theory is the interval between one musical pitch and another with double its frequency. In an equal-tempered scale, an octave is di… | There are 12 intervals in an octave in an equal-tempered musical scale. |
7769 | In a musical event called extreme quarteting, 142 people participated. How many different groups of 4 can be formed from these 142 people? | To calculate the number of different groups of 4 that can be formed from 142 people, we use the combination formula, which gives the number of ways to… |
|
|
16242880 |
8248 | In music theory, prove that stacking a series of just intonation pure fifths will never result in a perfectly tuned octave or its multiples. Mathemati… | To prove this, we start with the given equation: $ ()^m = ()^n $. We can rewrite this as $ 3^m = 2^{m-n} $. Since 2 and 3 are di… |
|
|
There are no positive integer solutions for the equation (3/2)^m = (1/2)^n, proving that stacking just intonation pure fifths will never result in a p… |
14377 | Does tuning musical instruments to 432Hz provide any significant benefits to human well-being or is it more “natural” compared to the standard 440Hz t… | There is currently insufficient scientific evidence to support the claim that 432Hz tuning is objectively superior or more beneficial to human beings … |
|
|
There is no significant scientific evidence to support that tuning musical instruments to 432Hz provides benefits to human well-being compared to the … |
Finally, use the same procedure you did earlier to grade the new responses. Do you see any interesting differences?
cot_answers %>%
map_table_rows(grade_predicted_answer) %>%
format_table()
id | question | answer | plan | details | predicted | score | feedback |
---|---|---|---|---|---|---|---|
2568 | Why does a music synthesizer built with a chain of astable multivibrator circuits detune after a few hours, even with fixed resistor values instead of… | The detuning in your analog synthesizer is due to various factors that affect the oscillation frequency over time:1. Power supply voltage fluctuations… |
|
|
A music synthesizer built with a chain of astable multivibrator circuits detunes after a few hours due to temperature variations affecting resistor an… | 8 | The student’s answer provides a comprehensive explanation of the factors contributing to detuning in a music synthesizer, including power supply fluct… |
7311 | In an equal-tempered musical scale, how many intervals are there in an octave? | An octave in an equal-tempered musical scale is divided into 12 intervals. |
|
An octave in music theory is the interval between one musical pitch and another with double its frequency. In an equal-tempered scale, an octave is di… | There are 12 intervals in an octave in an equal-tempered musical scale. | 10 | The student’s answer is correct and accurately states that an octave in an equal-tempered musical scale is divided into 12 intervals. The phrasing is … |
7769 | In a musical event called extreme quarteting, 142 people participated. How many different groups of 4 can be formed from these 142 people? | To calculate the number of different groups of 4 that can be formed from 142 people, we use the combination formula, which gives the number of ways to… |
|
|
16242880 | 7 | The student correctly explained the combination formula and applied it to the problem. However, the final calculation of the number of groups is incor… |
8248 | In music theory, prove that stacking a series of just intonation pure fifths will never result in a perfectly tuned octave or its multiples. Mathemati… | To prove this, we start with the given equation: $ ()^m = ()^n $. We can rewrite this as $ 3^m = 2^{m-n} $. Since 2 and 3 are di… |
|
|
There are no positive integer solutions for the equation (3/2)^m = (1/2)^n, proving that stacking just intonation pure fifths will never result in a p… | 7 | The student’s answer correctly identifies that there are no positive integer solutions to the equation, but it lacks clarity in the mathematical reaso… |
14377 | Does tuning musical instruments to 432Hz provide any significant benefits to human well-being or is it more “natural” compared to the standard 440Hz t… | There is currently insufficient scientific evidence to support the claim that 432Hz tuning is objectively superior or more beneficial to human beings … |
|
|
There is no significant scientific evidence to support that tuning musical instruments to 432Hz provides benefits to human well-being compared to the … | 8 | The student’s answer accurately addresses the question by stating that there is insufficient scientific evidence to support the benefits of 432Hz tuni… |
The RAG technique is a prompting technique where you ask the LLM to generate a response, then ask it to generate a response that is better than the first response, and then ask it to generate a response that is better than the second response. This can be useful for iteratively improving answers.
To perform RAG, we will ask the LLM to first generate search queries based on the questions.
# Step 1: Get search keywords
add_keywords <- function(question, ...) {
# Extract
prompt <- paste(
"
Your task is to identify three diverse, detailed search queries to gather information from Wikipedia about the following question.
The queries are going to be run through Wikipedia's internal search engine to retrieve relevant entire article.
So make sure they are in the \"goldilocks\" zone of specificity.
They should be specific enough to identify the best related article, but not more specific than that.
Provide your response as a JSON object with the following attributes:
- query1: A string containing the first search query
- query2: A string containing the second search query
- query3: A string containing the third search query
",
question
);
query_response <- json_completion(prompt, max_tokens = 500);
return (list(
query1 = query_response$query1,
query2 = query_response$query2,
query3 = query_response$query3
));
}
# Add keywords to the dataset
rag_df <-
interesting_df %>%
map_table_rows(add_keywords);
rag_df %>% format_table()
id | question | answer | query1 | query2 | query3 |
---|---|---|---|---|---|
2568 | Why does a music synthesizer built with a chain of astable multivibrator circuits detune after a few hours, even with fixed resistor values instead of… | The detuning in your analog synthesizer is due to various factors that affect the oscillation frequency over time:1. Power supply voltage fluctuations… | astable multivibrator circuits in music synthesizers | synthesizer tuning stability and temperature effects | effects of resistor values on synthesizer circuit performance |
7311 | In an equal-tempered musical scale, how many intervals are there in an octave? | An octave in an equal-tempered musical scale is divided into 12 intervals. | equal-tempered musical scale intervals in an octave | octave structure in music theory | number of semitones in an octave |
7769 | In a musical event called extreme quarteting, 142 people participated. How many different groups of 4 can be formed from these 142 people? | To calculate the number of different groups of 4 that can be formed from 142 people, we use the combination formula, which gives the number of ways to… | combinatorial mathematics groups of four | combinations formula example | binomial coefficient calculation |
8248 | In music theory, prove that stacking a series of just intonation pure fifths will never result in a perfectly tuned octave or its multiples. Mathemati… | To prove this, we start with the given equation: $ ()^m = ()^n $. We can rewrite this as $ 3^m = 2^{m-n} $. Since 2 and 3 are di… | just intonation and pure fifths music theory | mathematical proof stacking fifths octave tuning | properties of just intonation and octave equivalence |
14377 | Does tuning musical instruments to 432Hz provide any significant benefits to human well-being or is it more “natural” compared to the standard 440Hz t… | There is currently insufficient scientific evidence to support the claim that 432Hz tuning is objectively superior or more beneficial to human beings … | 432Hz tuning benefits human well-being | comparison of 432Hz and 440Hz musical tuning | effects of musical tuning frequencies on health |
Next, we use these search queries to fetch the article text from Wikipedia. After doing so, we ask the LLM to summarize the salient facts from the articles that are relevant to the question.
# Step 2: Get article text for each query
add_articles <- function(question, query1, query2, query3, ...) {
# Get the article text for each query
article1 <- search_wikipedia(query1)
article2 <- search_wikipedia(query2)
article3 <- search_wikipedia(query3)
# Use the LLM to extract key facts the LLM should focus on.
facts <- completion(paste(
"You are an expert synthesizer. Your job is to extract the most salient facts from three different articles related to a question.",
"Write three to five sentences capturing the most important facts needed to answer the question.",
toJSON(list(question = question, articles = list(article1$text, article2$text, article3$text)))
))
# Return the results
return (list(
article_title1 = article1$title,
article_title2 = article2$title,
article_title3 = article3$title,
article_text1 = article1$text,
article_text2 = article2$text,
article_text3 = article3$text,
facts=facts
));
}
# Add keywords to the dataset
rag_facts_df <-
rag_df %>%
map_table_rows(add_articles);
rag_facts_df %>%
format_table()
id | question | answer | query1 | query2 | query3 | article_title1 | article_title2 | article_title3 | article_text1 | article_text2 | article_text3 | facts |
---|---|---|---|---|---|---|---|---|---|---|---|---|
2568 | Why does a music synthesizer built with a chain of astable multivibrator circuits detune after a few hours, even with fixed resistor values instead of… | The detuning in your analog synthesizer is due to various factors that affect the oscillation frequency over time:1. Power supply voltage fluctuations… | astable multivibrator circuits in music synthesizers | synthesizer tuning stability and temperature effects | effects of resistor values on synthesizer circuit performance | Feedback | Crystal oscillator | Thermistor | Feedback occurs when outputs of a system are routed back as inputs as part of a chain of cause-and-effect that forms a circuit or loop. The system can… | A crystal oscillator is an electronic oscillator circuit that uses a piezoelectric crystal as a frequency-selective element. The oscillator frequency … | A thermistor is a semiconductor type of resistor whose resistance is strongly dependent on temperature, more so than in standard resistors. The word t… | A music synthesizer built with a chain of astable multivibrator circuits can detune after a few hours due to several factors. Even with fixed resistor… |
7311 | In an equal-tempered musical scale, how many intervals are there in an octave? | An octave in an equal-tempered musical scale is divided into 12 intervals. | equal-tempered musical scale intervals in an octave | octave structure in music theory | number of semitones in an octave | Equal temperament | Music and mathematics | Scale (music) | An equal temperament is a musical temperament or tuning system that approximates just intervals by dividing an octave (or other interval) into steps s… | Music theory analyzes the pitch, timing, and structure of music. It uses mathematics to study elements of music such as tempo, chord progression, form… | In music theory, a scale is “any consecutive series of notes that form a progression between one note and its octave”, typically by order of pitch or … | In an equal-tempered musical scale, specifically the most common system known as 12-tone equal temperament (12 TET), an octave is divided into 12 equa… |
7769 | In a musical event called extreme quarteting, 142 people participated. How many different groups of 4 can be formed from these 142 people? | To calculate the number of different groups of 4 that can be formed from 142 people, we use the combination formula, which gives the number of ways to… | combinatorial mathematics groups of four | combinations formula example | binomial coefficient calculation | Combinatorics | Combination | Binomial coefficient | Combinatorics is an area of mathematics primarily concerned with counting, both as a means and as an end to obtaining results, and certain properties … | In mathematics, a combination is a selection of items from a set that has distinct members, such that the order of selection does not matter (unlike p… | In mathematics, the binomial coefficients are the positive integers that occur as coefficients in the binomial theorem. Commonly, a binomial coefficie… | To determine how many different groups of 4 can be formed from 142 people, we use the concept of combinations in combinatorics. The number of ways to … |
8248 | In music theory, prove that stacking a series of just intonation pure fifths will never result in a perfectly tuned octave or its multiples. Mathemati… | To prove this, we start with the given equation: $ ()^m = ()^n $. We can rewrite this as $ 3^m = 2^{m-n} $. Since 2 and 3 are di… | just intonation and pure fifths music theory | mathematical proof stacking fifths octave tuning | properties of just intonation and octave equivalence | Just intonation | List of guitar tunings | Semitone | In music, just intonation or pure intonation is the tuning of musical intervals as whole number ratios (such as 3:2 or 4:3) of frequencies. An interva… | This article contains a list of guitar tunings that supplements the article guitar tunings. In particular, this list contains more examples of open an… | A semitone, also called a minor second, half step, or a half tone, is the smallest musical interval commonly used in Western tonal music, and it is co… | In music theory, stacking a series of just intonation pure fifths (ratios of 3:2) will never yield a perfectly tuned octave (ratio of 2:1) or its mult… |
14377 | Does tuning musical instruments to 432Hz provide any significant benefits to human well-being or is it more “natural” compared to the standard 440Hz t… | There is currently insufficient scientific evidence to support the claim that 432Hz tuning is objectively superior or more beneficial to human beings … | 432Hz tuning benefits human well-being | comparison of 432Hz and 440Hz musical tuning | effects of musical tuning frequencies on health | NA | NA | Psychoacoustics | NA | NA | Psychoacoustics is the branch of psychophysics involving the scientific study of the perception of sound by the human auditory system. It is the branc… | Tuning musical instruments to 432Hz is often claimed to provide benefits to human well-being, with proponents suggesting it is more “natural” than the… |
Finally, we use these facts to answer the questions.
add_rag_answer <- function(question, facts, ...) {
prompt <- paste(
"
Your task is to provide a detailed response to the following question.
You will be provided with facts from Wikipedia articles that may (or may not!) be relevant to the question.
Read the question and then the three articles, and then generate a response by generate a JSON object with the following fields:
- plan: A single string containing step-by-step plan for solving the problem.
- details: A single string containing detailed step-by-step solution to the problem.
- answer: The single string containing final answer to the question.
Perform this task for the following question:
",
toJSON(list(question = question, facts = facts))
);
rag_response <- json_completion(prompt, max_tokens = 2000);
return (list(
plan = rag_response$plan,
details = rag_response$details,
predicted = rag_response$answer
));
};
rag_answers <-
rag_facts_df %>%
mutate_if(is.character, list(~na_if(.,""))) %>% # Remove NA values
map_table_rows(add_rag_answer);
rag_answers %>% format_table()
id | question | answer | query1 | query2 | query3 | article_title1 | article_title2 | article_title3 | article_text1 | article_text2 | article_text3 | facts | plan | details | predicted |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2568 | Why does a music synthesizer built with a chain of astable multivibrator circuits detune after a few hours, even with fixed resistor values instead of… | The detuning in your analog synthesizer is due to various factors that affect the oscillation frequency over time:1. Power supply voltage fluctuations… | astable multivibrator circuits in music synthesizers | synthesizer tuning stability and temperature effects | effects of resistor values on synthesizer circuit performance | Feedback | Crystal oscillator | Thermistor | Feedback occurs when outputs of a system are routed back as inputs as part of a chain of cause-and-effect that forms a circuit or loop. The system can… | A crystal oscillator is an electronic oscillator circuit that uses a piezoelectric crystal as a frequency-selective element. The oscillator frequency … | A thermistor is a semiconductor type of resistor whose resistance is strongly dependent on temperature, more so than in standard resistors. The word t… | A music synthesizer built with a chain of astable multivibrator circuits can detune after a few hours due to several factors. Even with fixed resistor… |
|
To understand why a music synthesizer built with a chain of astable multivibrator circuits detunes after a few hours, we need to consider several fact… | A music synthesizer built with a chain of astable multivibrator circuits detunes after a few hours due to temperature changes affecting resistance and… |
7311 | In an equal-tempered musical scale, how many intervals are there in an octave? | An octave in an equal-tempered musical scale is divided into 12 intervals. | equal-tempered musical scale intervals in an octave | octave structure in music theory | number of semitones in an octave | Equal temperament | Music and mathematics | Scale (music) | An equal temperament is a musical temperament or tuning system that approximates just intervals by dividing an octave (or other interval) into steps s… | Music theory analyzes the pitch, timing, and structure of music. It uses mathematics to study elements of music such as tempo, chord progression, form… | In music theory, a scale is “any consecutive series of notes that form a progression between one note and its octave”, typically by order of pitch or … | In an equal-tempered musical scale, specifically the most common system known as 12-tone equal temperament (12 TET), an octave is divided into 12 equa… |
|
The question asks about the number of intervals in an octave within an equal-tempered musical scale. The relevant fact states that in the 12-tone equa… | There are 12 intervals in an octave in an equal-tempered musical scale. |
7769 | In a musical event called extreme quarteting, 142 people participated. How many different groups of 4 can be formed from these 142 people? | To calculate the number of different groups of 4 that can be formed from 142 people, we use the combination formula, which gives the number of ways to… | combinatorial mathematics groups of four | combinations formula example | binomial coefficient calculation | Combinatorics | Combination | Binomial coefficient | Combinatorics is an area of mathematics primarily concerned with counting, both as a means and as an end to obtaining results, and certain properties … | In mathematics, a combination is a selection of items from a set that has distinct members, such that the order of selection does not matter (unlike p… | In mathematics, the binomial coefficients are the positive integers that occur as coefficients in the binomial theorem. Commonly, a binomial coefficie… | To determine how many different groups of 4 can be formed from 142 people, we use the concept of combinations in combinatorics. The number of ways to … |
|
To find the number of different groups of 4 that can be formed from 142 people, we will use the combinations formula, which is given by C(n, k) = n! /… | 16234517 |
8248 | In music theory, prove that stacking a series of just intonation pure fifths will never result in a perfectly tuned octave or its multiples. Mathemati… | To prove this, we start with the given equation: $ ()^m = ()^n $. We can rewrite this as $ 3^m = 2^{m-n} $. Since 2 and 3 are di… | just intonation and pure fifths music theory | mathematical proof stacking fifths octave tuning | properties of just intonation and octave equivalence | Just intonation | List of guitar tunings | Semitone | In music, just intonation or pure intonation is the tuning of musical intervals as whole number ratios (such as 3:2 or 4:3) of frequencies. An interva… | This article contains a list of guitar tunings that supplements the article guitar tunings. In particular, this list contains more examples of open an… | A semitone, also called a minor second, half step, or a half tone, is the smallest musical interval commonly used in Western tonal music, and it is co… | In music theory, stacking a series of just intonation pure fifths (ratios of 3:2) will never yield a perfectly tuned octave (ratio of 2:1) or its mult… |
|
|
Stacking a series of just intonation pure fifths will never result in a perfectly tuned octave or its multiples, as demonstrated by the lack of positi… |
14377 | Does tuning musical instruments to 432Hz provide any significant benefits to human well-being or is it more “natural” compared to the standard 440Hz t… | There is currently insufficient scientific evidence to support the claim that 432Hz tuning is objectively superior or more beneficial to human beings … | 432Hz tuning benefits human well-being | comparison of 432Hz and 440Hz musical tuning | effects of musical tuning frequencies on health | NA | NA | Psychoacoustics | NA | NA | Psychoacoustics is the branch of psychophysics involving the scientific study of the perception of sound by the human auditory system. It is the branc… | Tuning musical instruments to 432Hz is often claimed to provide benefits to human well-being, with proponents suggesting it is more “natural” than the… |
|
First, we need to examine the claims made by proponents of 432Hz tuning, who argue that it is more natural and beneficial for human well-being compare… | There is no significant scientific evidence to support the claim that tuning musical instruments to 432Hz provides benefits to human well-being compar… |
And evaluate the results using LLM-as-judge.
rag_answers %>%
map_table_rows(grade_predicted_answer) %>%
format_table()
id | question | answer | query1 | query2 | query3 | article_title1 | article_title2 | article_title3 | article_text1 | article_text2 | article_text3 | facts | plan | details | predicted | score | feedback |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2568 | Why does a music synthesizer built with a chain of astable multivibrator circuits detune after a few hours, even with fixed resistor values instead of… | The detuning in your analog synthesizer is due to various factors that affect the oscillation frequency over time:1. Power supply voltage fluctuations… | astable multivibrator circuits in music synthesizers | synthesizer tuning stability and temperature effects | effects of resistor values on synthesizer circuit performance | Feedback | Crystal oscillator | Thermistor | Feedback occurs when outputs of a system are routed back as inputs as part of a chain of cause-and-effect that forms a circuit or loop. The system can… | A crystal oscillator is an electronic oscillator circuit that uses a piezoelectric crystal as a frequency-selective element. The oscillator frequency … | A thermistor is a semiconductor type of resistor whose resistance is strongly dependent on temperature, more so than in standard resistors. The word t… | A music synthesizer built with a chain of astable multivibrator circuits can detune after a few hours due to several factors. Even with fixed resistor… |
|
To understand why a music synthesizer built with a chain of astable multivibrator circuits detunes after a few hours, we need to consider several fact… | A music synthesizer built with a chain of astable multivibrator circuits detunes after a few hours due to temperature changes affecting resistance and… | 8 | The student’s answer provides a comprehensive explanation of the factors contributing to detuning in a music synthesizer, including power supply fluct… |
7311 | In an equal-tempered musical scale, how many intervals are there in an octave? | An octave in an equal-tempered musical scale is divided into 12 intervals. | equal-tempered musical scale intervals in an octave | octave structure in music theory | number of semitones in an octave | Equal temperament | Music and mathematics | Scale (music) | An equal temperament is a musical temperament or tuning system that approximates just intervals by dividing an octave (or other interval) into steps s… | Music theory analyzes the pitch, timing, and structure of music. It uses mathematics to study elements of music such as tempo, chord progression, form… | In music theory, a scale is “any consecutive series of notes that form a progression between one note and its octave”, typically by order of pitch or … | In an equal-tempered musical scale, specifically the most common system known as 12-tone equal temperament (12 TET), an octave is divided into 12 equa… |
|
The question asks about the number of intervals in an octave within an equal-tempered musical scale. The relevant fact states that in the 12-tone equa… | There are 12 intervals in an octave in an equal-tempered musical scale. | 10 | The student’s answer is correct and accurately states that an octave in an equal-tempered musical scale is divided into 12 intervals. The phrasing is … |
7769 | In a musical event called extreme quarteting, 142 people participated. How many different groups of 4 can be formed from these 142 people? | To calculate the number of different groups of 4 that can be formed from 142 people, we use the combination formula, which gives the number of ways to… | combinatorial mathematics groups of four | combinations formula example | binomial coefficient calculation | Combinatorics | Combination | Binomial coefficient | Combinatorics is an area of mathematics primarily concerned with counting, both as a means and as an end to obtaining results, and certain properties … | In mathematics, a combination is a selection of items from a set that has distinct members, such that the order of selection does not matter (unlike p… | In mathematics, the binomial coefficients are the positive integers that occur as coefficients in the binomial theorem. Commonly, a binomial coefficie… | To determine how many different groups of 4 can be formed from 142 people, we use the concept of combinations in combinatorics. The number of ways to … |
|
To find the number of different groups of 4 that can be formed from 142 people, we will use the combinations formula, which is given by C(n, k) = n! /… | 16234517 | 7 | The student correctly explained the combination formula and applied it to the problem. However, the final calculation of the number of groups is incor… |
8248 | In music theory, prove that stacking a series of just intonation pure fifths will never result in a perfectly tuned octave or its multiples. Mathemati… | To prove this, we start with the given equation: $ ()^m = ()^n $. We can rewrite this as $ 3^m = 2^{m-n} $. Since 2 and 3 are di… | just intonation and pure fifths music theory | mathematical proof stacking fifths octave tuning | properties of just intonation and octave equivalence | Just intonation | List of guitar tunings | Semitone | In music, just intonation or pure intonation is the tuning of musical intervals as whole number ratios (such as 3:2 or 4:3) of frequencies. An interva… | This article contains a list of guitar tunings that supplements the article guitar tunings. In particular, this list contains more examples of open an… | A semitone, also called a minor second, half step, or a half tone, is the smallest musical interval commonly used in Western tonal music, and it is co… | In music theory, stacking a series of just intonation pure fifths (ratios of 3:2) will never yield a perfectly tuned octave (ratio of 2:1) or its mult… |
|
|
Stacking a series of just intonation pure fifths will never result in a perfectly tuned octave or its multiples, as demonstrated by the lack of positi… | 7 | The student’s answer correctly identifies the mathematical reasoning behind the lack of solutions to the equation, but it could be clearer in its expl… |
14377 | Does tuning musical instruments to 432Hz provide any significant benefits to human well-being or is it more “natural” compared to the standard 440Hz t… | There is currently insufficient scientific evidence to support the claim that 432Hz tuning is objectively superior or more beneficial to human beings … | 432Hz tuning benefits human well-being | comparison of 432Hz and 440Hz musical tuning | effects of musical tuning frequencies on health | NA | NA | Psychoacoustics | NA | NA | Psychoacoustics is the branch of psychophysics involving the scientific study of the perception of sound by the human auditory system. It is the branc… | Tuning musical instruments to 432Hz is often claimed to provide benefits to human well-being, with proponents suggesting it is more “natural” than the… |
|
First, we need to examine the claims made by proponents of 432Hz tuning, who argue that it is more natural and beneficial for human well-being compare… | There is no significant scientific evidence to support the claim that tuning musical instruments to 432Hz provides benefits to human well-being compar… | 9 | The student’s answer accurately addresses the question by highlighting the lack of scientific evidence supporting the benefits of 432Hz tuning over 44… |
Through this activity, you’ve learned how to:
Understanding how to use LLMs for grading can help in assessing model performance and automating evaluation tasks.