Introduction

This code is structured in three main parts to authenticate with Reddit’s API, collect data, and save it to a CSV file.

Step

1. Auth2 Authorization Setup

In this step, we set up OAuth2 authentication to securely access Reddit’s API. First, we define the endpoints required for authorization and token exchange. I registered an app on Reddit with the name “Shaheen,” providing the client ID and secret.

Next, the script generates an authorization URL based on these credentials. When I visit this URL, Reddit prompts me to grant permission to access my account data (in this case, read-only access). Once permission is granted, Reddit provides a temporary authorization code, which I use in the next step for token exchange. This code is essential for securely obtaining an access token, allowing the script to make further API requests without needing to log in again.

# Define Reddit OAuth2 endpoints
endpoint <- oauth_endpoint(
  authorize = "https://www.reddit.com/api/v1/authorize",
  access = "https://www.reddit.com/api/v1/access_token"
)

# App credentials
app <- oauth_app(
  "Shaheen",  # My app name
  key = "VbG**************",  # Your actual client ID
  secret = "yiHZNw******taIyWYX7X7******"  # Your actual client secret
)


# Generate the authorization URL
auth_url <- oauth2.0_authorize_url(
  endpoint = endpoint,
  app = app,
  redirect_uri = "http://localhost",
  scope = "read",
  state = "randomstring"
)

print(auth_url)  # Verify the constructed URL

2.Token Exchange and Access Request

Once I receive the authorization code from Reddit, the next step is to exchange it for an access token. The script sends a POST request to Reddit’s token endpoint, including essential details like the client credentials (ID and secret), the authorization code, and the redirect URI (http://localhost).

If everything is set up correctly and the request is successful, Reddit returns an access token. This token is crucial because it allows the script to make further API requests without needing to re-authenticate. The token is stored and used to access Reddit data in subsequent steps. If the request fails, the script prints an error message and stops execution.

# Defineing credentials and the authorization code
client_id <- "VbGF******************"
client_secret <- "yiHZ*************************"
redirect_uri <- "http://localhost"
code <- "3VHUx*******************"  #  authorization code

# Make a POST request to exchange the code for an access token
token_response <- POST(
  url = "https://www.reddit.com/api/v1/access_token",
  authenticate(client_id, client_secret),  # Client ID and Secret for basic auth
  body = list(
    grant_type = "authorization_code",  # Grant type for code exchange
    code = code,
    redirect_uri = redirect_uri
  ),
  encode = "form"
)

# Check the response
if (http_status(token_response)$category == "Success") {
  token <- content(token_response, as = "parsed", type = "application/json")
  print(token)  # Access token and other details
} else {
  print(content(token_response, as = "text"))
  stop("Failed to exchange authorization code for access token.")
}

3. Data Saving

After collecting the data from Reddit, the script processes it and saves it to a CSV file named reddit_posts.csv. This step ensures that the data is stored locally for further analysis or sharing.

Once the file is successfully created, the script prints a confirmation message to let me know that the file has been saved without any issues. This provides a final checkpoint to verify that the data collection process was completed successfully.

Note that the echo = FALSE parameter was added to the code chunk to prevent printing of the R code that generated the plot.

# Save the dataset to a CSV file
write.csv(df, file = "reddit_posts.csv", row.names = FALSE)

# Confirm the file was saved
cat("File 'reddit_posts.csv' has been saved successfully!")