Large Language Models (LLMs) like GPT-4 have revolutionized how we
communicate and understand information. In this activity, we’ll explore
how to leverage LLMs in R for Data Science using the openai
package.
We’ll start by playing with OpenAI API for free-text and structured responses. Then, we’ll use LLMs to answer - and grade - textual questions and answers.
Objectives:
Prerequisites
openai, httr,
tidyverse, and jsonlite packages.Download the Reasoning
20k dataset and place the combined_reasoning.json file
somewhere you can find it.
# Install and load required packages
# install.packages("openai");
# install.packages("httr");
# install.packages("jsonlite");
# install.packages("furrr");
# install.packages("kableExtra")
library(openai);
library(httr);
##
## Attaching package: 'httr'
## The following object is masked from 'package:openai':
##
## upload_file
library(jsonlite);
library(tidyverse);
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ purrr::flatten() masks jsonlite::flatten()
## ✖ dplyr::lag() masks stats::lag()
## ✖ httr::upload_file() masks openai::upload_file()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(furrr); # for parallel map
## Loading required package: future
library(kableExtra);
##
## Attaching package: 'kableExtra'
##
## The following object is masked from 'package:dplyr':
##
## group_rows
Utility function to format tables nicely:
# Function to truncate text in all string (character) columns of a data frame
format_table <- function(df, max_length = 150) {
head_df <- data.frame(df %>% head(5))
# Function to truncate individual text entries
truncate_text <- function(text, max_length) {
text <- gsub("[\r\n]", "", text)
return (
ifelse(nchar(text) > max_length,
paste0(substr(text, 1, max_length), "..."),
text)
)
}
# Loop over all columns that are character type and apply truncation
for (col in colnames(head_df)) {
if (is.character(head_df[[col]])) {
head_df[[col]] <- sapply(head_df[[col]], truncate_text, max_length = max_length)
}
}
# Return the modified data frame
head_df %>%
kbl() %>%
kable_styling(bootstrap_options = c("striped", "hover", "condensed"), font_size=12)
}
# Set your OpenAI API key. Shilad or your instructor will give this to you.
Sys.setenv(OPENAI_API_KEY = "YOUR API KEY")
Our interaction with the LLM will be through the OpenAI Chat Completions API. This API allows us to interact with the LLM by providing prompts and receiving completions. As of 2024, this API is by far the most popular way to interact with LLMs. In this activity we will use this API to ask questions, generate text, and even grade responses.
To begin, let’s start with a simple question answering task using the
GPT-4o-mini model. We are going to encapsulate the question answering
logic in a function called completion. It effectively asks
the LLM for an answer to a question.
completion <- function(prompt, max_tokens = 100) {
# Get the response from the LLM
response <- openai::create_chat_completion(
model = "gpt-4o-mini",
messages = list(list(role = "system", content = prompt)),
temperature = 0.1,
max_tokens = max_tokens
)
# Return the response
return (response$choices$message.content);
}
Experiment with the function by asking a simple question as shown in the example below. Change the question below to several that you are interested in or have expertise related to. How does it perform? What does it do well? What does it do poorly? Put your example questions and analysis in the code below.
# Ask a few questions. In comments answer: what does it do well? What does it do poorly?
completion("Why is the sky blue?", 200);
## [1] "The sky appears blue primarily due to a phenomenon called Rayleigh scattering. When sunlight enters the Earth's atmosphere, it is made up of different colors, each with varying wavelengths. Blue light has a shorter wavelength compared to other colors like red or yellow.\n\nAs sunlight passes through the atmosphere, it collides with air molecules and small particles. Because blue light is scattered in all directions more than other colors due to its shorter wavelength, we see a predominance of blue when we look up at the sky.\n\nDuring sunrise and sunset, the sky can appear red or orange because the sunlight has to pass through a thicker layer of the atmosphere. This longer path scatters the shorter blue wavelengths out of our line of sight, allowing the longer red wavelengths to dominate."
For programmatic responses, it’s often helpful to have the LLM returned structured responses where we can extract a variety of different types of information
json_completion <- function(prompt, max_tokens = 200) {
# Get the response from the LLM
response <- openai::create_chat_completion(
model = "gpt-4o-mini",
messages = list(list(role = "system", content = prompt)),
temperature = 0.0,
max_tokens = max_tokens
)
json <- response$choices$message.content;
# Shilad: This is a hack to remove occassional responses that wrap the json with ``json... ``` in 4o-mini.
# Ideally we would use {response_type : json_object} to avoid this but it's not supported by the R OpenAI wrapper.
pattern <- regex("```json(.*?)```", dotall = TRUE);
if (str_detect(json, pattern)) {
json <- str_match(json, pattern)[, 2]; # Extract the matched JSON content
}
return (fromJSON(json));
}
Below you can find an example of calling this structured completion.
The return value will be an object with fields
response$attendees, etc.
response <- json_completion("
Your task is to extract a structured calendar invite by analyzing a short text.
The return value should be a JSON object with the following fields:
- attendees: A list of strings with attendee names.
- when: The starting date and time for the calendar event
- subject: Short description of the event
- description: Detailed few-sentence description of the event.
Perform this task for the following text:
Matthew and Ellen should meet Sunday at 4pm to discuss the future of the budget.
")
response
## $attendees
## [1] "Matthew" "Ellen"
##
## $when
## [1] "2023-10-29T16:00:00"
##
## $subject
## [1] "Budget Discussion"
##
## $description
## [1] "Matthew and Ellen will meet to discuss the future of the budget. This meeting aims to address key financial strategies and planning for upcoming projects."
# Create your own example task (not a calendar invite) that produces structured output
# using the example above. Push the LLM with a hard example. Does it get it correct?
response <- json_completion("
Your task is to ....
")
response
# In your comments, reflect on: How might you use this for Data Science purposes?
In this assignment we will answer questions from the Reasoning 20k dataset. The dataset contains a set of challenging factual questions along with their answers. We will load the dataset, filter example questions, and then interact with the LLM to answer the questions. We chose this dataset because it’s created in October 2024 and the LLM could not possibly “cheat” by having trained on this data.
Download the JSON dataset to your computer and read it into a
variable named reasoning_20k_df using code similar to the
following:
reasoning_20k_df <-
as.data.frame(fromJSON("~/Downloads/combined_reasoning.json")) %>%
select(user, assistant) %>%
rename(question = user, answer = assistant) %>%
mutate(id = row_number()) %>%
relocate(id, .before = question);
format_table(reasoning_20k_df)
| id | question | answer |
|---|---|---|
| 1 | Prove that the difference between two consecutive cubes cannot be divisible by 5, using the fact that the only possible remainders when a cube is divi… | Let the two consecutive cubes be \(n^3\) and \((n+1)^3\). Their difference is:\[(n+1)^3 - n^3 = 3n^2 + 3n + 1.\]When \(n^3\) is divided by 5, the possible r… |
| 2 | How can I integrate the function \(\arcsin(\sqrt{x+1}-\sqrt{x})\)? Is there an easier way than using the formula $f(x),dx=x f(x)-_{… | |
| 3 | Given the expression \(\frac{x^3+x+1}{(x^2+1)^2}\), decompose it into partial fractions. | The decomposition of \(\frac{x^3+x+1}{(x^2+1)^2}\) can be directly observed as \(\dfrac{x}{x^2+1}+\dfrac{1}{(x^2+1)^2}\). This is because \(x^3+x\) can be f… |
| 4 | Is it true that a man named Mûrasi from India is 179 years old, as claimed by certain sources?Sources:- eface India- News Origin- World News Daily Rep… | No, this claim is not accurate. The source of this information, the World News Daily Report, is known to publish fake news. They claim that Mûrasi was… |
| 5 | Find an example of a linear operator whose norm is not equal to the norm of its inverse. | Consider the linear operator T from \((\mathbb{R}^2, \|\cdot\|_{sup})\) to \((\mathbb{R}^2, \|\cdot\|_1)\) defined by \(T(x,y) = (y,x)\). The norm of T is 1… |
Now that we have the dataset loaded, pick some interesting questions
to ask the LLM. Open the dataset in the built in R dataset
viewer and search using the search field for 5-10 questions
that interest you. Write down their ids and create a dataframe called
interesting_df that contains just those questions
# These are questions that interest Shilad related to Music theory.
# Pick ones that interest you. Locate them using the search function built in RStudio dataset viewer.
question_ids <- c(8248, 14377, 7769, 7311, 2568);
interesting_df <- reasoning_20k_df %>% filter(id %in% question_ids);
format_table(interesting_df)
| id | question | answer |
|---|---|---|
| 2568 | Why does a music synthesizer built with a chain of astable multivibrator circuits detune after a few hours, even with fixed resistor values instead of… | The detuning in your analog synthesizer is due to various factors that affect the oscillation frequency over time:1. Power supply voltage fluctuations… |
| 7311 | In an equal-tempered musical scale, how many intervals are there in an octave? | An octave in an equal-tempered musical scale is divided into 12 intervals. |
| 7769 | In a musical event called extreme quarteting, 142 people participated. How many different groups of 4 can be formed from these 142 people? | To calculate the number of different groups of 4 that can be formed from 142 people, we use the combination formula, which gives the number of ways to… |
| 8248 | In music theory, prove that stacking a series of just intonation pure fifths will never result in a perfectly tuned octave or its multiples. Mathemati… | To prove this, we start with the given equation: $ ()^m = ()^n $. We can rewrite this as $ 3^m = 2^{m-n} $. Since 2 and 3 are di… |
| 14377 | Does tuning musical instruments to 432Hz provide any significant benefits to human well-being or is it more “natural” compared to the standard 440Hz t… | There is currently insufficient scientific evidence to support the claim that 432Hz tuning is objectively superior or more beneficial to human beings … |
Carefully read through the questions and answers, and make sure they are accurate and make sense!
Throughout this activity we are going to use a helper function called
map_table_rows that applies a function to each row of a
dataframe and returns a new dataframe with the original columns and the
new columns. This function is useful for applying the LLM to each row of
a dataset.
We are going to use the pmap function from the furrr package
which is a parallelized version to speed up the process.
map_table_rows <- function(df, mapping_function) {
result <- future_pmap(df, mapping_function);
return (cbind(df, bind_rows(result)));
}
Below is an example of how to use the map_table_rows
function. Take a look at the code and add a comment explaining what it
is doing.
Add a second example that adds the square of x + y to the dataframe.
# Add a comment below indicating exactly what is happening
df <- data.frame(
x = c(1, 2, 3, 4),
y = c(5, 6, 7, 8),
z = c(9, 10, 11, 12)
);
example_mapper <- function(x, y, ...) {
return (list(
sum = x + y,
product = x * y
))
}
df %>% map_table_rows(example_mapper)
## x y z sum product
## 1 1 5 9 6 5
## 2 2 6 10 8 12
## 3 3 7 11 10 21
## 4 4 8 12 12 32
# Add a second example that adds the square of x + y to the dataframe.
To speed up this mapping, we will ask furrr to make up
to work on 10 rows in parallel. This will mean we execute up to 10 LLM
calls at the same time.
plan(multisession, workers = 10)
## Warning in checkNumberOfLocalWorkers(workers): Careful, you are setting up 10
## localhost parallel workers with only 8 CPU cores available for this R process
## (per 'system'), which could result in a 125% load. The soft limit is set to
## 100%. Overusing the CPUs has negative impact on the current R process, but also
## on all other processes of yours and others running on the same machine. See
## help("parallelly.options", package = "parallelly") for how to override the soft
## and hard limits
When you run this, depending on how many cores your laptop has, you may see a warning message indicating that this setting may saturate your CPU. Why is this unlikely when we use the parallel functions to interact with the OpenAI API?
Create a new dataset called simple_answers that takes
interesting_df and uses the vectorized function we just
created to add a predicted column with the generated
answer.
# Complete your implementation of the function below.
# It should be a one-liner that calls the `completion` function.
# The name of the new column MUST be `predicted`
add_simple_answer <- function(question, ...) {
return (list(predicted = completion(question)));
}
simple_answers <-
interesting_df %>%
map_table_rows(add_simple_answer);
format_table(simple_answers)
| id | question | answer | predicted |
|---|---|---|---|
| 2568 | Why does a music synthesizer built with a chain of astable multivibrator circuits detune after a few hours, even with fixed resistor values instead of… | The detuning in your analog synthesizer is due to various factors that affect the oscillation frequency over time:1. Power supply voltage fluctuations… | A music synthesizer built with a chain of astable multivibrator circuits can experience detuning over time for several reasons, even when using fixed … |
| 7311 | In an equal-tempered musical scale, how many intervals are there in an octave? | An octave in an equal-tempered musical scale is divided into 12 intervals. | In an equal-tempered musical scale, there are 12 intervals in an octave. These intervals are typically referred to as semitones or half steps. Each se… |
| 7769 | In a musical event called extreme quarteting, 142 people participated. How many different groups of 4 can be formed from these 142 people? | To calculate the number of different groups of 4 that can be formed from 142 people, we use the combination formula, which gives the number of ways to… | To find the number of different groups of 4 that can be formed from 142 people, we can use the combination formula, which is given by:[C(n, r) = … |
| 8248 | In music theory, prove that stacking a series of just intonation pure fifths will never result in a perfectly tuned octave or its multiples. Mathemati… | To prove this, we start with the given equation: $ ()^m = ()^n $. We can rewrite this as $ 3^m = 2^{m-n} $. Since 2 and 3 are di… | To prove that stacking a series of just intonation pure fifths will never result in a perfectly tuned octave or its multiples, we need to analyze the … |
| 14377 | Does tuning musical instruments to 432Hz provide any significant benefits to human well-being or is it more “natural” compared to the standard 440Hz t… | There is currently insufficient scientific evidence to support the claim that 432Hz tuning is objectively superior or more beneficial to human beings … | The debate over tuning musical instruments to 432Hz versus the standard 440Hz is a topic of interest among musicians, sound healers, and some wellness… |
Take a look at the predicted answers and compare them to the actual answers. What do you notice?
To evaluate the quality of responses, we would traditionally use human experts to grade them. The rise of LLMs offers a new approach for labeling datasets using LLMs. The LLM-as-Judge paradigm uses the LLM to evaluate responses, providing feedback and scoring based on correctness and completeness. This allows us to assess the performance of the LLM or other models over a set of questions.
Think to yourself: What are the costs and benefits of using LLM-as-judge vs humans? When may it make sense to use one vs. the other?
Below is code that takes a dataframe and returns the grades for each question:
grade_predicted_answer <- function(question, answer, predicted, ...) {
prompt <- paste(
"
You are an expert grader.
Evaluate the student's answer to the following question.
Provide your response as a JSON object with the following attributes:
- feedback: A brief summary of feedback on the correctness of the student's answer.
- score: A score out of 10 based on the quality of the student's response.
Perform this task for the following question:
",
toJSON(
list(question = question, answer = answer, predicted = predicted),
auto_unbox = TRUE
)
);
grading_response <- json_completion(prompt);
return (list(
score = grading_response$score,
feedback = grading_response$feedback
));
};
Use the function above along with map_table_rows to
assign grades to each predicted answer. In comments
# Use map_table_rows and grade_predicted_answer to assign grades to simple_answers
simple_answers %>%
map_table_rows(grade_predicted_answer) %>%
format_table()
| id | question | answer | predicted | score | feedback |
|---|---|---|---|---|---|
| 2568 | Why does a music synthesizer built with a chain of astable multivibrator circuits detune after a few hours, even with fixed resistor values instead of… | The detuning in your analog synthesizer is due to various factors that affect the oscillation frequency over time:1. Power supply voltage fluctuations… | A music synthesizer built with a chain of astable multivibrator circuits can experience detuning over time for several reasons, even when using fixed … | 8 | The student’s answer provides a comprehensive explanation of the factors contributing to detuning in a music synthesizer built with astable multivibra… |
| 7311 | In an equal-tempered musical scale, how many intervals are there in an octave? | An octave in an equal-tempered musical scale is divided into 12 intervals. | In an equal-tempered musical scale, there are 12 intervals in an octave. These intervals are typically referred to as semitones or half steps. Each se… | 8 | The student’s answer is correct in stating that an octave in an equal-tempered musical scale is divided into 12 intervals. However, it could be improv… |
| 7769 | In a musical event called extreme quarteting, 142 people participated. How many different groups of 4 can be formed from these 142 people? | To calculate the number of different groups of 4 that can be formed from 142 people, we use the combination formula, which gives the number of ways to… | To find the number of different groups of 4 that can be formed from 142 people, we can use the combination formula, which is given by:[C(n, r) = … | 8 | The student correctly identifies the use of the combination formula and provides a clear explanation of the calculation process. However, the final an… |
| 8248 | In music theory, prove that stacking a series of just intonation pure fifths will never result in a perfectly tuned octave or its multiples. Mathemati… | To prove this, we start with the given equation: $ ()^m = ()^n $. We can rewrite this as $ 3^m = 2^{m-n} $. Since 2 and 3 are di… | To prove that stacking a series of just intonation pure fifths will never result in a perfectly tuned octave or its multiples, we need to analyze the … | 6 | The student’s answer correctly identifies the equation and attempts to prove that there are no solutions for positive integers m and n. However, the e… |
| 14377 | Does tuning musical instruments to 432Hz provide any significant benefits to human well-being or is it more “natural” compared to the standard 440Hz t… | There is currently insufficient scientific evidence to support the claim that 432Hz tuning is objectively superior or more beneficial to human beings … | The debate over tuning musical instruments to 432Hz versus the standard 440Hz is a topic of interest among musicians, sound healers, and some wellness… | 8 | The student’s answer effectively addresses the question by highlighting the lack of scientific evidence supporting the benefits of 432Hz tuning over 4… |
# Look at the scores and feedback. Does it make sense to you?
Chain of Thought is a prompting technique where you ask the LLM to work through the problem in structured steps.This can be useful for generating more detailed answers or exploring a topic in depth.
Apply the pattern you see in the grading function to create a new
data frame called cot_answers (for “Chain of Thought”). *
Use the json_completion function to generate a more
detailed response to each question. You will need to raise the second
argument from the default number of tokens to something higher
(e.g. 1000). * You should ask the LLM to produce a JSON object with the
following fields: - plan: A step-by-step plan for solving
the problem. - details: A detailed step-by-step solution to
the problem. - answer: The final answer to the question. *
Note that the order of the fields in the JSON object is important. You
must force the LLM to generate output in the order it “thinks.”
add_cot_answer <- function(question, ...) {
prompt <- paste(
"
Your task is to provide a detailed response to the following question.
Generate a JSON object with the following fields:
- plan: A single string containing step-by-step plan for solving the problem.
- details: A single string containing detailed step-by-step solution to the problem.
- answer: The single string containing final answer to the question.
Perform this task for the following question:
",
question
);
cot_response <- json_completion(prompt, max_tokens = 1000);
return (list(
plan = cot_response$plan,
details = cot_response$details,
predicted = cot_response$answer
));
};
cot_answers <-
interesting_df %>%
map_table_rows(add_cot_answer);
format_table(cot_answers)
| id | question | answer | plan | details | predicted |
|---|---|---|---|---|---|
| 2568 | Why does a music synthesizer built with a chain of astable multivibrator circuits detune after a few hours, even with fixed resistor values instead of… | The detuning in your analog synthesizer is due to various factors that affect the oscillation frequency over time:1. Power supply voltage fluctuations… |
|
|
A music synthesizer built with a chain of astable multivibrator circuits detunes after a few hours due to temperature variations affecting resistor an… |
| 7311 | In an equal-tempered musical scale, how many intervals are there in an octave? | An octave in an equal-tempered musical scale is divided into 12 intervals. |
|
|
12 |
| 7769 | In a musical event called extreme quarteting, 142 people participated. How many different groups of 4 can be formed from these 142 people? | To calculate the number of different groups of 4 that can be formed from 142 people, we use the combination formula, which gives the number of ways to… |
|
|
16242880 |
| 8248 | In music theory, prove that stacking a series of just intonation pure fifths will never result in a perfectly tuned octave or its multiples. Mathemati… | To prove this, we start with the given equation: $ ()^m = ()^n $. We can rewrite this as $ 3^m = 2^{m-n} $. Since 2 and 3 are di… |
|
|
Stacking a series of just intonation pure fifths will never result in a perfectly tuned octave or its multiples. |
| 14377 | Does tuning musical instruments to 432Hz provide any significant benefits to human well-being or is it more “natural” compared to the standard 440Hz t… | There is currently insufficient scientific evidence to support the claim that 432Hz tuning is objectively superior or more beneficial to human beings … |
|
|
There is no substantial scientific evidence to support that tuning musical instruments to 432Hz provides significant benefits to human well-being comp… |
Finally, use the same procedure you did earlier to grade the new responses. Do you see any interesting differences?
cot_answers %>%
map_table_rows(grade_predicted_answer) %>%
format_table()
| id | question | answer | plan | details | predicted | score | feedback |
|---|---|---|---|---|---|---|---|
| 2568 | Why does a music synthesizer built with a chain of astable multivibrator circuits detune after a few hours, even with fixed resistor values instead of… | The detuning in your analog synthesizer is due to various factors that affect the oscillation frequency over time:1. Power supply voltage fluctuations… |
|
|
A music synthesizer built with a chain of astable multivibrator circuits detunes after a few hours due to temperature variations affecting resistor an… | 8 | The student’s answer provides a comprehensive explanation of the factors contributing to detuning in a music synthesizer, including power supply fluct… |
| 7311 | In an equal-tempered musical scale, how many intervals are there in an octave? | An octave in an equal-tempered musical scale is divided into 12 intervals. |
|
|
12 | 10 | The student’s answer is correct and accurately states that there are 12 intervals in an octave in an equal-tempered musical scale. |
| 7769 | In a musical event called extreme quarteting, 142 people participated. How many different groups of 4 can be formed from these 142 people? | To calculate the number of different groups of 4 that can be formed from 142 people, we use the combination formula, which gives the number of ways to… |
|
|
16242880 | 7 | The student correctly explained the combination formula and applied it to the problem, but the final calculation of the number of groups is incorrect…. |
| 8248 | In music theory, prove that stacking a series of just intonation pure fifths will never result in a perfectly tuned octave or its multiples. Mathemati… | To prove this, we start with the given equation: $ ()^m = ()^n $. We can rewrite this as $ 3^m = 2^{m-n} $. Since 2 and 3 are di… |
|
|
Stacking a series of just intonation pure fifths will never result in a perfectly tuned octave or its multiples. | 7 | The student’s answer correctly identifies the equation to be proven and provides a logical argument using the Fundamental Theorem of Arithmetic. Howev… |
| 14377 | Does tuning musical instruments to 432Hz provide any significant benefits to human well-being or is it more “natural” compared to the standard 440Hz t… | There is currently insufficient scientific evidence to support the claim that 432Hz tuning is objectively superior or more beneficial to human beings … |
|
|
There is no substantial scientific evidence to support that tuning musical instruments to 432Hz provides significant benefits to human well-being comp… | 8 | The student’s answer accurately addresses the question by highlighting the lack of scientific evidence supporting the benefits of 432Hz tuning over 44… |
The RAG technique is a prompting technique where you ask the LLM to generate a response, then ask it to generate a response that is better than the first response, and then ask it to generate a response that is better than the second response. This can be useful for iteratively improving answers.
To perform RAG, we will ask the LLM to first generate search queries based on the questions.
# Step 1: Get search keywords
add_keywords <- function(question, ...) {
# Extract
prompt <- paste(
"
Your task is to identify three diverse, detailed search queries to gather information from Wikipedia about the following question.
The queries are going to be run through Wikipedia's internal search engine to retrieve relevant entire article.
So make sure they are in the \"goldilocks\" zone of specificity.
They should be specific enough to identify the best related article, but not more specific than that.
Provide your response as a JSON object with the following attributes:
- query1: A string containing the first search query
- query2: A string containing the second search query
- query3: A string containing the third search query
",
question
);
query_response <- json_completion(prompt, max_tokens = 500);
return (list(
query1 = query_response$query1,
query2 = query_response$query2,
query3 = query_response$query3
));
}
# Add keywords to the dataset
rag_df <-
interesting_df %>%
map_table_rows(add_keywords);
rag_df %>% format_table()
| id | question | answer | query1 | query2 | query3 |
|---|---|---|---|---|---|
| 2568 | Why does a music synthesizer built with a chain of astable multivibrator circuits detune after a few hours, even with fixed resistor values instead of… | The detuning in your analog synthesizer is due to various factors that affect the oscillation frequency over time:1. Power supply voltage fluctuations… | astable multivibrator circuits in music synthesizers | synthesizer tuning stability and temperature effects | effects of component aging on electronic circuits |
| 7311 | In an equal-tempered musical scale, how many intervals are there in an octave? | An octave in an equal-tempered musical scale is divided into 12 intervals. | equal-tempered musical scale intervals in an octave | octave definition in music theory | music intervals and scales explanation |
| 7769 | In a musical event called extreme quarteting, 142 people participated. How many different groups of 4 can be formed from these 142 people? | To calculate the number of different groups of 4 that can be formed from 142 people, we use the combination formula, which gives the number of ways to… | combinatorial mathematics groups of four | combinations formula example | binomial coefficient calculation |
| 8248 | In music theory, prove that stacking a series of just intonation pure fifths will never result in a perfectly tuned octave or its multiples. Mathemati… | To prove this, we start with the given equation: $ ()^m = ()^n $. We can rewrite this as $ 3^m = 2^{m-n} $. Since 2 and 3 are di… | just intonation and pure fifths music theory | mathematical proof stacking fifths octave tuning | properties of just intonation and octave equivalence |
| 14377 | Does tuning musical instruments to 432Hz provide any significant benefits to human well-being or is it more “natural” compared to the standard 440Hz t… | There is currently insufficient scientific evidence to support the claim that 432Hz tuning is objectively superior or more beneficial to human beings … | 432Hz tuning benefits human well-being | comparison of 432Hz and 440Hz musical tuning | effects of musical tuning frequencies on health |
Next, we use these search queries to fetch the article text from Wikipedia. After doing so, we ask the LLM to summarize the salient facts from the articles that are relevant to the question.
# Step 2: Get article text for each query
add_articles <- function(question, query1, query2, query3, ...) {
# Get the article text for each query
article1 <- search_wikipedia(query1)
article2 <- search_wikipedia(query2)
article3 <- search_wikipedia(query3)
# Use the LLM to extract key facts the LLM should focus on.
facts <- completion(paste(
"You are an expert synthesizer. Your job is to extract the most salient facts from three different articles related to a question.",
"Write three to five sentences capturing the most important facts needed to answer the question.",
toJSON(list(question = question, articles = list(article1$text, article2$text, article3$text)))
))
# Return the results
return (list(
article_title1 = article1$title,
article_title2 = article2$title,
article_title3 = article3$title,
article_text1 = article1$text,
article_text2 = article2$text,
article_text3 = article3$text,
facts=facts
));
}
# Add keywords to the dataset
rag_facts_df <-
rag_df %>%
map_table_rows(add_articles);
rag_facts_df %>%
format_table()
| id | question | answer | query1 | query2 | query3 | article_title1 | article_title2 | article_title3 | article_text1 | article_text2 | article_text3 | facts |
|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2568 | Why does a music synthesizer built with a chain of astable multivibrator circuits detune after a few hours, even with fixed resistor values instead of… | The detuning in your analog synthesizer is due to various factors that affect the oscillation frequency over time:1. Power supply voltage fluctuations… | astable multivibrator circuits in music synthesizers | synthesizer tuning stability and temperature effects | effects of component aging on electronic circuits | Feedback | Crystal oscillator | Digital electronics | Feedback occurs when outputs of a system are routed back as inputs as part of a chain of cause-and-effect that forms a circuit or loop. The system can… | A crystal oscillator is an electronic oscillator circuit that uses a piezoelectric crystal as a frequency-selective element. The oscillator frequency … | Digital electronics is a field of electronics involving the study of digital signals and the engineering of devices that use or produce them. This is … | A music synthesizer built with a chain of astable multivibrator circuits can detune after a few hours due to several factors, even when using fixed re… |
| 7311 | In an equal-tempered musical scale, how many intervals are there in an octave? | An octave in an equal-tempered musical scale is divided into 12 intervals. | equal-tempered musical scale intervals in an octave | octave definition in music theory | music intervals and scales explanation | Equal temperament | Scale (music) | Scale (music) | An equal temperament is a musical temperament or tuning system that approximates just intervals by dividing an octave (or other interval) into steps s… | In music theory, a scale is “any consecutive series of notes that form a progression between one note and its octave”, typically by order of pitch or … | In music theory, a scale is “any consecutive series of notes that form a progression between one note and its octave”, typically by order of pitch or … | In an equal-tempered musical scale, specifically the most common system known as 12-tone equal temperament (12 TET), there are 12 intervals in an octa… |
| 7769 | In a musical event called extreme quarteting, 142 people participated. How many different groups of 4 can be formed from these 142 people? | To calculate the number of different groups of 4 that can be formed from 142 people, we use the combination formula, which gives the number of ways to… | combinatorial mathematics groups of four | combinations formula example | binomial coefficient calculation | Discrete mathematics | Combination | Binomial coefficient | Discrete mathematics is the study of mathematical structures that can be considered “discrete” (in a way analogous to discrete variables, having a bij… | In mathematics, a combination is a selection of items from a set that has distinct members, such that the order of selection does not matter (unlike p… | In mathematics, the binomial coefficients are the positive integers that occur as coefficients in the binomial theorem. Commonly, a binomial coefficie… | To determine how many different groups of 4 can be formed from 142 people participating in a musical event called extreme quarteting, we can use the c… |
| 8248 | In music theory, prove that stacking a series of just intonation pure fifths will never result in a perfectly tuned octave or its multiples. Mathemati… | To prove this, we start with the given equation: $ ()^m = ()^n $. We can rewrite this as $ 3^m = 2^{m-n} $. Since 2 and 3 are di… | just intonation and pure fifths music theory | mathematical proof stacking fifths octave tuning | properties of just intonation and octave equivalence | Just intonation | List of guitar tunings | Semitone | In music, just intonation or pure intonation is the tuning of musical intervals as whole number ratios (such as 3:2 or 4:3) of frequencies. An interva… | This article contains a list of guitar tunings that supplements the article guitar tunings. In particular, this list contains more examples of open an… | A semitone, also called a minor second, half step, or a half tone, is the smallest musical interval commonly used in Western tonal music, and it is co… | In music theory, stacking a series of just intonation pure fifths (ratios of 3:2) will not yield a perfectly tuned octave (ratio of 2:1) or its multip… |
| 14377 | Does tuning musical instruments to 432Hz provide any significant benefits to human well-being or is it more “natural” compared to the standard 440Hz t… | There is currently insufficient scientific evidence to support the claim that 432Hz tuning is objectively superior or more beneficial to human beings … | 432Hz tuning benefits human well-being | comparison of 432Hz and 440Hz musical tuning | effects of musical tuning frequencies on health | NA | NA | Psychoacoustics | NA | NA | Psychoacoustics is the branch of psychophysics involving the scientific study of the perception of sound by the human auditory system. It is the branc… | Tuning musical instruments to 432Hz is often claimed to provide benefits to human well-being and is considered by some to be more “natural” than the s… |
Finally, we use these facts to answer the questions.
add_rag_answer <- function(question, facts, ...) {
prompt <- paste(
"
Your task is to provide a detailed response to the following question.
You will be provided with facts from Wikipedia articles that may (or may not!) be relevant to the question.
Read the question and then the three articles, and then generate a response by generate a JSON object with the following fields:
- plan: A single string containing step-by-step plan for solving the problem.
- details: A single string containing detailed step-by-step solution to the problem.
- answer: The single string containing final answer to the question.
Perform this task for the following question:
",
toJSON(list(question = question, facts = facts))
);
rag_response <- json_completion(prompt, max_tokens = 2000);
return (list(
plan = rag_response$plan,
details = rag_response$details,
predicted = rag_response$answer
));
};
rag_answers <-
rag_facts_df %>%
mutate_if(is.character, list(~na_if(.,""))) %>% # Remove NA values
map_table_rows(add_rag_answer);
rag_answers %>% format_table()
| id | question | answer | query1 | query2 | query3 | article_title1 | article_title2 | article_title3 | article_text1 | article_text2 | article_text3 | facts | plan | details | predicted |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2568 | Why does a music synthesizer built with a chain of astable multivibrator circuits detune after a few hours, even with fixed resistor values instead of… | The detuning in your analog synthesizer is due to various factors that affect the oscillation frequency over time:1. Power supply voltage fluctuations… | astable multivibrator circuits in music synthesizers | synthesizer tuning stability and temperature effects | effects of component aging on electronic circuits | Feedback | Crystal oscillator | Digital electronics | Feedback occurs when outputs of a system are routed back as inputs as part of a chain of cause-and-effect that forms a circuit or loop. The system can… | A crystal oscillator is an electronic oscillator circuit that uses a piezoelectric crystal as a frequency-selective element. The oscillator frequency … | Digital electronics is a field of electronics involving the study of digital signals and the engineering of devices that use or produce them. This is … | A music synthesizer built with a chain of astable multivibrator circuits can detune after a few hours due to several factors, even when using fixed re… |
|
To understand why a music synthesizer built with a chain of astable multivibrator circuits detunes after a few hours, we first need to identify the co… | A music synthesizer built with a chain of astable multivibrator circuits detunes after a few hours due to component instability from temperature chang… |
| 7311 | In an equal-tempered musical scale, how many intervals are there in an octave? | An octave in an equal-tempered musical scale is divided into 12 intervals. | equal-tempered musical scale intervals in an octave | octave definition in music theory | music intervals and scales explanation | Equal temperament | Scale (music) | Scale (music) | An equal temperament is a musical temperament or tuning system that approximates just intervals by dividing an octave (or other interval) into steps s… | In music theory, a scale is “any consecutive series of notes that form a progression between one note and its octave”, typically by order of pitch or … | In music theory, a scale is “any consecutive series of notes that form a progression between one note and its octave”, typically by order of pitch or … | In an equal-tempered musical scale, specifically the most common system known as 12-tone equal temperament (12 TET), there are 12 intervals in an octa… |
|
The question asks about the number of intervals in an octave within an equal-tempered musical scale. The relevant fact states that in the 12-tone equa… | There are 12 intervals in an octave in an equal-tempered musical scale. |
| 7769 | In a musical event called extreme quarteting, 142 people participated. How many different groups of 4 can be formed from these 142 people? | To calculate the number of different groups of 4 that can be formed from 142 people, we use the combination formula, which gives the number of ways to… | combinatorial mathematics groups of four | combinations formula example | binomial coefficient calculation | Discrete mathematics | Combination | Binomial coefficient | Discrete mathematics is the study of mathematical structures that can be considered “discrete” (in a way analogous to discrete variables, having a bij… | In mathematics, a combination is a selection of items from a set that has distinct members, such that the order of selection does not matter (unlike p… | In mathematics, the binomial coefficients are the positive integers that occur as coefficients in the binomial theorem. Commonly, a binomial coefficie… | To determine how many different groups of 4 can be formed from 142 people participating in a musical event called extreme quarteting, we can use the c… |
|
To find the number of different groups of 4 that can be formed from 142 people, we will use the combinations formula C(n, k) = n! / (k! * (n - k)!), w… | 16515035 |
| 8248 | In music theory, prove that stacking a series of just intonation pure fifths will never result in a perfectly tuned octave or its multiples. Mathemati… | To prove this, we start with the given equation: $ ()^m = ()^n $. We can rewrite this as $ 3^m = 2^{m-n} $. Since 2 and 3 are di… | just intonation and pure fifths music theory | mathematical proof stacking fifths octave tuning | properties of just intonation and octave equivalence | Just intonation | List of guitar tunings | Semitone | In music, just intonation or pure intonation is the tuning of musical intervals as whole number ratios (such as 3:2 or 4:3) of frequencies. An interva… | This article contains a list of guitar tunings that supplements the article guitar tunings. In particular, this list contains more examples of open an… | A semitone, also called a minor second, half step, or a half tone, is the smallest musical interval commonly used in Western tonal music, and it is co… | In music theory, stacking a series of just intonation pure fifths (ratios of 3:2) will not yield a perfectly tuned octave (ratio of 2:1) or its multip… |
|
|
Stacking a series of just intonation pure fifths will never result in a perfectly tuned octave or its multiples, as the equation (3/2)^m = (1/2)^n has… |
| 14377 | Does tuning musical instruments to 432Hz provide any significant benefits to human well-being or is it more “natural” compared to the standard 440Hz t… | There is currently insufficient scientific evidence to support the claim that 432Hz tuning is objectively superior or more beneficial to human beings … | 432Hz tuning benefits human well-being | comparison of 432Hz and 440Hz musical tuning | effects of musical tuning frequencies on health | NA | NA | Psychoacoustics | NA | NA | Psychoacoustics is the branch of psychophysics involving the scientific study of the perception of sound by the human auditory system. It is the branc… | Tuning musical instruments to 432Hz is often claimed to provide benefits to human well-being and is considered by some to be more “natural” than the s… |
|
First, we need to examine the claims surrounding 432Hz tuning, which is often said to be more harmonious with nature and beneficial for human well-bei… | There is no significant scientific evidence that tuning musical instruments to 432Hz provides benefits to human well-being compared to the standard 44… |
And evaluate the results using LLM-as-judge.
rag_answers %>%
map_table_rows(grade_predicted_answer) %>%
format_table()
| id | question | answer | query1 | query2 | query3 | article_title1 | article_title2 | article_title3 | article_text1 | article_text2 | article_text3 | facts | plan | details | predicted | score | feedback |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2568 | Why does a music synthesizer built with a chain of astable multivibrator circuits detune after a few hours, even with fixed resistor values instead of… | The detuning in your analog synthesizer is due to various factors that affect the oscillation frequency over time:1. Power supply voltage fluctuations… | astable multivibrator circuits in music synthesizers | synthesizer tuning stability and temperature effects | effects of component aging on electronic circuits | Feedback | Crystal oscillator | Digital electronics | Feedback occurs when outputs of a system are routed back as inputs as part of a chain of cause-and-effect that forms a circuit or loop. The system can… | A crystal oscillator is an electronic oscillator circuit that uses a piezoelectric crystal as a frequency-selective element. The oscillator frequency … | Digital electronics is a field of electronics involving the study of digital signals and the engineering of devices that use or produce them. This is … | A music synthesizer built with a chain of astable multivibrator circuits can detune after a few hours due to several factors, even when using fixed re… |
|
To understand why a music synthesizer built with a chain of astable multivibrator circuits detunes after a few hours, we first need to identify the co… | A music synthesizer built with a chain of astable multivibrator circuits detunes after a few hours due to component instability from temperature chang… | 8 | The student’s answer provides a comprehensive explanation of the factors contributing to detuning in a music synthesizer, including power supply fluct… |
| 7311 | In an equal-tempered musical scale, how many intervals are there in an octave? | An octave in an equal-tempered musical scale is divided into 12 intervals. | equal-tempered musical scale intervals in an octave | octave definition in music theory | music intervals and scales explanation | Equal temperament | Scale (music) | Scale (music) | An equal temperament is a musical temperament or tuning system that approximates just intervals by dividing an octave (or other interval) into steps s… | In music theory, a scale is “any consecutive series of notes that form a progression between one note and its octave”, typically by order of pitch or … | In music theory, a scale is “any consecutive series of notes that form a progression between one note and its octave”, typically by order of pitch or … | In an equal-tempered musical scale, specifically the most common system known as 12-tone equal temperament (12 TET), there are 12 intervals in an octa… |
|
The question asks about the number of intervals in an octave within an equal-tempered musical scale. The relevant fact states that in the 12-tone equa… | There are 12 intervals in an octave in an equal-tempered musical scale. | 10 | The student’s answer is correct and accurately states that an octave in an equal-tempered musical scale is divided into 12 intervals. The phrasing is … |
| 7769 | In a musical event called extreme quarteting, 142 people participated. How many different groups of 4 can be formed from these 142 people? | To calculate the number of different groups of 4 that can be formed from 142 people, we use the combination formula, which gives the number of ways to… | combinatorial mathematics groups of four | combinations formula example | binomial coefficient calculation | Discrete mathematics | Combination | Binomial coefficient | Discrete mathematics is the study of mathematical structures that can be considered “discrete” (in a way analogous to discrete variables, having a bij… | In mathematics, a combination is a selection of items from a set that has distinct members, such that the order of selection does not matter (unlike p… | In mathematics, the binomial coefficients are the positive integers that occur as coefficients in the binomial theorem. Commonly, a binomial coefficie… | To determine how many different groups of 4 can be formed from 142 people participating in a musical event called extreme quarteting, we can use the c… |
|
To find the number of different groups of 4 that can be formed from 142 people, we will use the combinations formula C(n, k) = n! / (k! * (n - k)!), w… | 16515035 | 7 | The student correctly explains the combination formula and applies it to the problem, but the final calculation of the number of groups is incorrect. … |
| 8248 | In music theory, prove that stacking a series of just intonation pure fifths will never result in a perfectly tuned octave or its multiples. Mathemati… | To prove this, we start with the given equation: $ ()^m = ()^n $. We can rewrite this as $ 3^m = 2^{m-n} $. Since 2 and 3 are di… | just intonation and pure fifths music theory | mathematical proof stacking fifths octave tuning | properties of just intonation and octave equivalence | Just intonation | List of guitar tunings | Semitone | In music, just intonation or pure intonation is the tuning of musical intervals as whole number ratios (such as 3:2 or 4:3) of frequencies. An interva… | This article contains a list of guitar tunings that supplements the article guitar tunings. In particular, this list contains more examples of open an… | A semitone, also called a minor second, half step, or a half tone, is the smallest musical interval commonly used in Western tonal music, and it is co… | In music theory, stacking a series of just intonation pure fifths (ratios of 3:2) will not yield a perfectly tuned octave (ratio of 2:1) or its multip… |
|
|
Stacking a series of just intonation pure fifths will never result in a perfectly tuned octave or its multiples, as the equation (3/2)^m = (1/2)^n has… | 8 | The student’s answer correctly identifies the equation and provides a valid mathematical argument to show that there are no solutions for positive int… |
| 14377 | Does tuning musical instruments to 432Hz provide any significant benefits to human well-being or is it more “natural” compared to the standard 440Hz t… | There is currently insufficient scientific evidence to support the claim that 432Hz tuning is objectively superior or more beneficial to human beings … | 432Hz tuning benefits human well-being | comparison of 432Hz and 440Hz musical tuning | effects of musical tuning frequencies on health | NA | NA | Psychoacoustics | NA | NA | Psychoacoustics is the branch of psychophysics involving the scientific study of the perception of sound by the human auditory system. It is the branc… | Tuning musical instruments to 432Hz is often claimed to provide benefits to human well-being and is considered by some to be more “natural” than the s… |
|
First, we need to examine the claims surrounding 432Hz tuning, which is often said to be more harmonious with nature and beneficial for human well-bei… | There is no significant scientific evidence that tuning musical instruments to 432Hz provides benefits to human well-being compared to the standard 44… | 8 | The student’s answer accurately addresses the question by highlighting the lack of scientific evidence supporting the benefits of 432Hz tuning over 44… |
Through this activity, you’ve learned how to:
Understanding how to use LLMs for grading can help in assessing model performance and automating evaluation tasks.