Working with the Open AI API

Initialization

Today, we’re going to move away from traditional social media data to work with an emerging source of data: large language models. Specifically, we will be working with Open AI’s API. Open AI is the company behind ChatGPT, and as we previously learned in class, an API (or Application Programming Interface) is an intermediary to send requests to access a company’s software or data.

Typically, you would have to create an account with Open AI. You can either use your ChatGPT account if you have one, or create one here: https://platform.openai.com/docs/overview

Today, I will be giving you API keys to use my account with.

Then, navigate to the Settings page, and click on the API Keys tab on the left. You may remember from our Bluesky session that you ned to create a key or password to connect your R session to your account with the platform. Copy this key somewhere safe.

Now, we install the Open AI R package and import the library:

## 
## The downloaded binary packages are in
##  /var/folders/4n/jrzd6fkx7mn9rgtyb7vz7c4c0000gn/T//RtmpBoX8kr/downloaded_packages
Sys.setenv(
  OPENAI_API_KEY = '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' 
)

Simulating a chatbot conversation

Now, we’re going to use the API to query the Open AI model for responses. We will be using the create_chat_completion() function. Familiarize yourself with the function in the Help window.

?openai::create_chat_completion

Let’s break down how we can use this function.

To view the outputs of this query, we can look into the ‘answer’ variable we created.

answer$choices$message.content
## [1] "On January 6, 2021, a violent mob of supporters of then-President Donald Trump stormed the United States Capitol in an attempt to overturn the results of the 2020 presidential election. The riot resulted in multiple injuries, deaths, and widespread damage to the Capitol building. The event led to the evacuation and lockdown of lawmakers, the certification of the Electoral College results being temporarily halted, and the eventual impeachment of President Trump for incitement of insurrection."

There are several different use cases for this. One of them could be to study the model itself - what kind of political content does it output? How does it respond to different kinds of political prompts? Does the text show a certain political bias?

Another use case is for data labelling or other tedious research tasks. Note that there is almost always an alternative that is more secure and less costly to the environment, but learning how to do it through the Open AI API is a good start.

For example, let’s give our API a series of input. Below is a vector of tweets made by UK politicians ahead of the 2024 UK General Election:

tweets <- c(
  "The injustice faced by women born in the 1950s continues. The DWP were found to have acted with maladministration by the Ombudsman. Compensation needs to be paid to all @WASPI_Campaign women who have been affected including over 5000 across Coatbridge, Chryston and Bellshill. http://t.co/f19YKFWckE",
  "“We deny that any government should use any coercive measure—including zoning laws or permits—to restrict religious speech or worship, based on the theological content of that speech or worship” From the 2011 SBC “On Religious Liberty In A Global Society”",
  "Law of Unintended Consequences. Republic of Ireland and Scotland shows the impact of rent freezes & rent controls reduces supply. Will UK legislative changes have same effect? Either way; we need to build more houses, more affordable houses, community housing, more Almshouses.",
  "Whatever the failures of Boris Johnson’s time as PM, I will always be thankful for him joining the Brexit campaign.",
  "It has never been a crime until the recent act that fascists like you have tried to force on Scotland.",
  "The extinction of Corbynite Podemos and Liberal Ciudadanos is almost complete. http://t.co/QkItADlDg5",
  "The UK does not need any more ugly 'housing units' that ruin our towns, cities and countryside We need to control our borders and stop giving out over 1,000,000 visas a year to foreign students doing noddy degrees, cheap foreign labourers and bogus asylum seekers.",
  "The @UKLabour government has left a generation an investment nest egg. Make sure you claim yours.",
  "This is absolutely disgusting and disgraceful. So she should be ‘shot’ just because she is a black woman. You may disagree with someone, not share their views and opinions but to incite racist hate is totally unacceptable."
)

Let’s say we want the model to label each tweet’s political ideology. Let’s create a function to do that.

answer <- function(tweet) { openai::create_chat_completion(
  #specify the Open AI model you want to query; let's go with gpt 3.5
  model = "gpt-3.5-turbo",
  
  #this determines the randomness of the response, and ranges from 0 to 2. The lower, the more statistically likely the response will be
  temperature = 0,
  
  #this is where you prompt the API -- what role is it assuming?
  messages = list(
    list(
      "role" = "system",
      "content" = "You are a political analyst, classifying the political ideology of content. For each query, respond with a classification of Left, Right, or Uncertain."
    ),
    
  #this is where you make a specific request
    list(
      "role" = "user",
      "content" = tweet
    )
  )
)
}

Now let’s apply this function to our vector of tweets.

classifications <- sapply(tweets, answer)

And view a sample classification.

classifications["choices",][[1]]$message.content
## [1] "Left"

Now let’s create a dataframe of tweets and classifications.

df <- data.frame(tweet = colnames(classifications))
df$ideology <- ""
  

for(i in 1:nrow(df)) {
  df$ideology[i] <- classifications["choices",][[i]]$message.content
}

head(df)
##                                                                                                                                                                                                                                                                                                         tweet
## 1 The injustice faced by women born in the 1950s continues. The DWP were found to have acted with maladministration by the Ombudsman. Compensation needs to be paid to all @WASPI_Campaign women who have been affected including over 5000 across Coatbridge, Chryston and Bellshill. http://t.co/f19YKFWckE
## 2                                             “We deny that any government should use any coercive measure—including zoning laws or permits—to restrict religious speech or worship, based on the theological content of that speech or worship” From the 2011 SBC “On Religious Liberty In A Global Society”
## 3                       Law of Unintended Consequences. Republic of Ireland and Scotland shows the impact of rent freezes & rent controls reduces supply. Will UK legislative changes have same effect? Either way; we need to build more houses, more affordable houses, community housing, more Almshouses.
## 4                                                                                                                                                                                         Whatever the failures of Boris Johnson’s time as PM, I will always be thankful for him joining the Brexit campaign.
## 5                                                                                                                                                                                                      It has never been a crime until the recent act that fascists like you have tried to force on Scotland.
## 6                                                                                                                                                                                                       The extinction of Corbynite Podemos and Liberal Ciudadanos is almost complete. http://t.co/QkItADlDg5
##    ideology
## 1      Left
## 2 Uncertain
## 3 Uncertain
## 4     Right
## 5 Uncertain
## 6 Uncertain

Exercise

1a. Use the list of posts you generated from Bluesky for Labs 3 and 4. (Note: you can get a new one if you want to work with another Bluesky user!). For each post text, use the Open AI API to give it a 1 word topic classification besides ideology (e.g., sentiment, political topic, etc.). You can decide what this parameter of classification is, and prompt the model accordingly. Look at the first 10 classifications; are they generally correct in your assessments? Are there any you disagree with?

1b. Add this classification as a column in your dataframe for the Bluesky user’s posts. Perform analysis on the relationship between the classification output and the number of likes a post gets.

2a. Prompt Open AI to produce a policy recommendation on a topic of your choice (e.g., healthcare, climate, etc.) from a Democratic perspective.

2b. Re-prompt it to produce the recommendation from a Republican perspective.