ES300: Portland Energy Analysis

Intro

Add lab info here.

library(tidyverse)
install.packages("eia")
library(eia)

API Key

To access the data, you’ll need to register for a free API key on the US En. An API key is a unique string of characters that serves as an individual identifier to a web service. When you are issued an API key, this means that the server for the data knows who is asking for it and can set limits on what you access. You can think of it like a library card for checking out data.

Go to the EIA’s API request page and enter your information. An email will then be sent to you to verify your email address and then you’ll receive another email with the key. It looks like a long string of gibberish.

Save this key as a variable called api_key.

api_key <- "uWJ0V7dHw0bsqbfluEWxWRwwcGLh4rTAz6JVzY1n"
eia_set_key(api_key)

Exploring the Database

Now we can look to see how the database is organized to help us know what data we want to pull. The data is structured as a set of nested folders, and you can use commands to look through each to see what is available to you.

The eia_dir() function with no arguments shows all the top-level categories that are available (coal, electricity, natural gas, petroleum, etc.). The id is their variable name and they each have a short description.

# view all
eia_dir()

You can then pick one of these and move it inside the function to see more detail about that specific dataset.

#view electricity
eia_dir("electricity")

This shows the sub-categories of datasets within electricity. One of those is rto, which stands for Regional Transmission Organization and contains the daily grid operations data we’re looking for.

To Do

Use eia_dir() to go one level deeper to see what data exists inside the rto folder.

Use the / symbol to separate folder names. 
eia_dir("electricity/rto")

We’ll use the Daily Demand data inside the daily_region_data folder. We’ll use the command eia_metadata() to see what exists in that folder.

eia_metadata("electricity/rto/daily-region-data")

This shows you the data available, what you can filter it by (Facets), and what the data format will be. We can look at each of these facets to see what the options are with the function eia_facets(). Our options are respondent, type, and timezone.

eia_facets("electricity/rto/daily-region-data", "respondent")

# you'll see that this has many rows
# as it suggests will use print() to show all

print(eia_facets("electricity/rto/daily-region-data", "respondent"), n = 100)

You can look through the list and see that PGE is one of the options.

To Do

Now look through the type and timezone facets to see your options.

eia_facets("electricity/rto/daily-region-data", "type")
eia_facets("electricity/rto/daily-region-data", "timezone")

From this you should see that we want to get data for D for demand and the Pacific time zone. We can also see from the metadata that the time period available to us is 2019-01-01 to 2026-01-31. (Depending on when you’re looking at this your end date may be later.)

Now we’ll use the eia_data() function to pull the data we want using the correct names from the metadata. We’ll store this in a variable object called demand.

demand <- eia_data(
  dir = "electricity/rto/daily-region-data",
  data = "value",
  facets = list(respondent = "PGE", 
                type = "D", 
                timezone = "Pacific"),
  freq = "daily",
  start = "2019-01-01",
  end = "2026-01-31"
)

Now let’s look at our data to see what we’ve got. You can click the word demand in the Global Environment or use View(demand). You should have a dataframe with 9 columns and 2579 rows (or more depending on your end date).

Visualizing the Data

We can now now graph the daily demand over time. We’ll use the package {ggplot2} for this, which was loaded when we loaded {tidyverse}.

This package has a simple, common syntax for any type of graph:

ggplot(data = _______, aes(x = _______, y = _______) +
  geom_TYPEOFPLOT()

Let’s parse that out:

  • ggplot() is the base command that makes graphs, it comes from the {ggplot2} package

  • data = the data set we want to use

  • aes() stands for aesthetics, it’s where you tell R what you want to be on the graph

  • x = is where you name your x variable

  • y = is where you name your y variable

  • the line ends with a + showing you that the code continues on the next line

  • geom_TYPEOFPLOT() is where you specify what kind of graph you want to make, the most popular options are:

    • geom_point()
    • geom_line()
    • geom_col() or geom_bar()
    • geom_histogram()
    • geom_boxplot()

There are many other add-ons, but just those two lines of code will get you started for most types of graphs.

The most appropriate type of graph for a time series is a line graph. Before we make our graph, we need to make sure our data is in the right format. Use glimpse() to show the data type for each column.

glimpse(demand)

We can see that all our variables are <chr> which means they are character data.

We want value to be numeric, which we can change using as.numeric().

demand$value <- as.numeric(demand$value)

We also need to ensure period is treated like a date. We’ll do that by applying a function from the package {lubridate} that converts things to a date format.

demand$period <- ymd(demand$period)
To Do

Recheck your data by using glimpse() again to see if the data types have changed.

Now we should be able to make our line graph.

To Do

Alter the code below to have period be the x variable and value be the y variable for our line graph.

ggplot(data = _______, aes(x = _______, y = _______)) +
  geom_TYPEOFPLOT()
ggplot(data = demand, aes(x = period, y = value, color = year)) +
  geom_line()

Making Prettier Graphs

The graph shows the data, but we can make a number of improvements to make our graph easier to understand and more professional.

Start by using the labs() command to specify the axis labels. To do this, add a + and another line of code that specifies the labels for each axis.

To Do

Add labels to your axes by putting the name inside the quotation marks.

ggplot(data = demand, aes(x = period, y = value)) +
  geom_line() + 
  labs(x = "_____",
       y = "_____")

You can also change the overall look of the graph. If you don’t like the gray background, look at the Themes section of this workshop to see how you can change it. If you don’t like the tick placement or labeling look at the Axis Ticks section of this workshop to change them.

You can also add color to your graph by adding a color option to geom_line.

ggplot(data = demand, aes(x = period, y = value)) +
  geom_line(color = "aquamarine4")

R knows a lot of colors by name. Here’s a document that lists them. You can also use hex values or specific color packages like this Wes Anderson palette (scroll down to see it).


Figure it Out

Here’s a hard one for you to explore.

One good thing to graph might be all the years on top of each other so you can make more direct comparisons. To do this, you’ll need to do two things:

  1. Make a new column that has year as a categorical variable, not a numeric variable

  2. Make a column for day of year that counts up 1 to 365.

To make a new column, the simplest way is to use the syntax dataset$newcolumn <- x where newcolumn is what you want the name to be and x is the value you want in it.

If I wanted to make a new column called average and have it be the average of all demand values, I would do:

demand$average <- mean(demand$value)

Use that format to create your new columns. The functions you will need to use can be found in the {lubridate} package. Use the following resources and/or Googling to figure out what to add to this code to make the correct new columns.

To Do

Add to this code to create the columns.

demand$year <- ____(demand$period)
demand$day_of_year <- ____(demand$period)

Now you need to make these columns a factor and a number, respectively. Run this code to do that.

demand$year <- as.factor(demand$year)
demand$day_of_year <- as.numeric(demand$day_of_year)

Now you’re ready to graph. This time day_of_year will go on the x-axis. You will use year as a variable to separate the data by. This is accomplished by making the color set to the variable. Look at the Using Color to Add Complexity section of this workshop to see how to do that.

From there, use Google or the {ggplot2} documentation to help make your graph.

To Do

Add your graph code.