Temperature Sensitive Demand

This tutorial will walk you through how to analyze the relationship between the temperature and electrical demand. The first thing we always need to do is load in our data.

To Do

Read in your data and store it as a variable called full_data.

full_data <- read_csv("full_data.csv")

Visualize the Temperature vs Energy

To begin we’ll visualize the daily temperature versus the electricity demand. To do this we’ll use a scatterplot which in {ggplot2} world is geom_point().

Use glimpse() to remind your self of the columns in the data to consider what you want to graph.

glimpse(full_data)

If there are NAs in year, run this line of code. (Might as well just run it.)

full_data <- full_data |>
  mutate(year = year(date))

We want to only graph the days that we actually have energy values for, so let’s filter to remove anything where energy is NA. To do this you can use ! which means “does not equal” in conjunction with the is.na() function.

To Do

See if you can add to the code to make the filter.

energy_record <- full_data |>
  _______________

The filter would look like !is.na(count).

energy_record <- full_data |>
  filter(!is.na(energy))
To Do

Add to this code to create the scatterplot.

ggplot(data = ______, aes( x = ______, y = ______)) +
  ______
ggplot(data = energy_record, aes(x = temp, y = energy)) +
  geom_point()

Fitting a Model

We can add a trend line to our data by adding to the code for the plot. The typical trend line you may have seen is a linear model, which is just a straight line through the points. Our data obviously wouldn’t fit a straight line. We can make a line that better fits our data using a loess regression instead of linear regression. loess stands for LOcally Estimated Scatterplot Smoothing, it doesn’t have the assumption of any specific shape.

We can add a loess regression line to our graph with the function geom_smooth(). We just need to specify that the method of smoothing we want is “loess”.

ggplot(data = energy_record, aes(x = temp, y = energy)) +
  geom_point() +
  geom_smooth(method = "loess") 

Now we’ll actually create the model. This will calculate the actual line that is being drawn. If it were a linear model, the line formula would by y = mx + b. A loess model doesn’t have as straightforward of an equation, but is essentially doing the same thing. The equation is now stored as loess_model.

# Fit loess model on historical data
loess_model <- loess(energy ~ temp, data = energy_record)

Finding our Balance Point from the Loess Curve

From our predicted data we can find out the balance point. It will be the lowest part of our loess curve. We can use the model to find that point. Because the loess curve doesn’t have traditional parameters like a y = mx + b curve, the easiest way to find the lowest point is to use it to predict values for the same time period and then take the lowes of those values.

To do this we’ll create a new dataset called predicted_data from our historical data that we will add the predicted curve values to.

predicted_data <- full_data |>
  filter(scenario == "historical")

Now we’ll use the predict() function to fill the energy values. For our input temperature values we’ll the tmax temperatures.

input_temps <- data.frame(temp = predicted_data$temp)

predicted_energy_values <- predict(loess_model, newdata = input_temps)

Now we have a vector (the same as a single column, just not yet in a table) of energy predictions, that we can add to our predicted dataset.

predicted_data <- predicted_data |>
  mutate(predicted_energy = predicted_energy_values)

Now we’ll add the predicted energy usage to the graph. We can do this just by adding another geom_point() statement.

ggplot(data = energy_record, aes(x = temp, y = energy)) +
  geom_point() +
  geom_smooth(method = "loess") +
  geom_point() +
  geom_point(data = predicted_data, aes(x = temp,y = predicted_energy, color = "red"))

You can see that the predicted values fit the curve exactly.

Now we can find the minimum of these values and that will be our balance point temperature. The energy usage at this point reflects our lowest level of energy use.

To find the lowest value, we use slice() which will slice off however many rows we tell it to. We’ll put our data in order of energy use and slice off the last row.

balance_point <- predicted_data |>
  arrange(predicted_energy) |>
  slice(1)

Now we can run select() on balance_point to see just the relevant information.

balance_point |>
  select(temp, predicted_energy)

From this you should see the temp value, which is our estimated balance point temperature, and predicted_energy which is the estimated minimum amount of energy usage.

Find the Extreme Heat and Cold Degree Days

We’re using a balance point temperature of 60 to find our cooling degree days, which shows how far above the balance point the temperature is. To do this we’ll find all of the differences from 60 and then we’ll categorize the positive ones as cooling and the negatives as heating.

full_data <- full_data |>
  mutate(degrees_off = 60 - temp) |>
  mutate(day_type = case_when(
    degrees_off > 0 ~ "cooling",
    degrees_off < 0 ~ "heating", 
    .default = NA
))

The only default values here, meaning the ones not being set to heating or cooling, will be 0s–days where the temperature is 60.

Now we’ll visualize these days versus the electricity demand. To do this we’ll use a scatterplot which in {ggplot2} world is geom_point().

To Do

Add to this code to create the scatterplot

ggplot(data = ______, aes( x = ______, y = ______)) +
  ______
ggplot(data = full_data, aes(x = degrees_off, y = energy)) +
  geom_point()

We can add a color option to our aes() to color the days by whether they are heating or cooling.

To Do

Add a color option with color = and set it equal to day_type.

ggplot(data = full_data, aes(x = degrees_off, y = energy, color = day_type)) +
  geom_point()
To Do

Make your plot look nicer. At minimum make sure the axis labels are informative, but you can also look at picking your own colors with manual coloring or changing the background from gray.


This stuff is for next time, sorry! You can ignore it for now!


Predicting Future Energy Use

Now we can use that model to predict future energy needs. We’ll create a new dataset called predicted_data that is everything except the historical data, so it’s just the models.

predicted_data <- full_data |>
  filter(scenario != "historical")

Now we’ll use the predict() function to fill the energy values. For our input temperature values we’ll the tmax temperatures.

input_temps <- data.frame(temp = predicted_data$tmax)

predicted_energy_values <- predict(loess_model, newdata = input_temps)

Now we have a vector (the same as a single column, just not yet in a table) of energy predictions, that we can add to our predicted dataset.

predicted_data <- predicted_data |>
  mutate(predicted_energy = predicted_energy_values)
To Do

Add to the code to make the plot.

ggplot(predicted_data, aes(x = ______, y = ______, color = ______)) +
  geom_line(alpha = 0.5) 
ggplot(predicted_data, aes(x = date, y = predicted_energy, color = scenario)) +
  geom_line(alpha = 0.5) 

Since every date is represented, this is too messy to really see much. Let’s aggregate by month to see if trends stick out more. We’ll group our data by month and year, as well as by scenario. Then for each group, we’ll take the mean energy prediction, which will get us one prediction per month. We’ll also change the date column to just be month and year.

monthly_data <- predicted_data |>
  group_by(year, month, scenario) |>
  summarize(avg_energy = mean(predicted_energy, na.rm = TRUE)) |>
  mutate(date = make_date(year, month, 1))

Note: The na.rm statement is to remove any missing values.

Now we can plot our monthly data. We’ll also add a loess line to see the relationship better.

ggplot(monthly_data, aes(x = date, y = avg_energy, color = scenario)) +
  geom_point() +
  geom_smooth(method = "loess") 

This looks better, but still messy. We can actually drop the geom_point() line and just graph the loess lines. The se = FALSE part removes a gray standard error strip. You can set it to TRUE to see the difference.

ggplot(monthly_data, aes(x = date, y = avg_energy, color = scenario)) +
  geom_smooth(method = "loess", se = FALSE) 

This is much better for showing our trends!

To Do

Make your plot look nicer. At minimum make sure the axis labels are informative, but you can also look at picking your own colors with manual coloring or changing the background from gray.

Summarizing Annual Energy Predictions

Now we’ll look at the predicted annual energy expenditure over time. To do this we’ll use another summarize() statement to add up energy for the whole year. We’ll group just by year and scenario. Then we’ll use sum() to find the total.

annual_energy <- predicted_data |>
  group_by(year, scenario) |>
  summarize(total_energy = sum(predicted_energy, na.rm = TRUE))

You can click annual_energy in the Environment pane to see the totals.

Now we’ll graph total energy and color by scenario.

ggplot(annual_energy, aes(x = year, y = total_energy, color = scenario)) +
  geom_line()

We have fewer distinct points now so it might be nicer to have them also show on the graph. We can do this by adding a geom_point() function in addition to the geom_line().

ggplot(annual_energy, aes(x = year, y = total_energy, color = scenario)) +
  geom_line() +
  geom_point()
To Do

Again make sure to label your plot well. You should keep a consistent color scheme for the models across all of your plots. If you want to change the axes ticks or labels, see this tutorial on graph aesthetics.

Comparing the Decades

Now we’ll combine the historical energy_record data with the future predicted_data. To show things clearly, we’ll just include the average from each decade rather than every year.

To get monthly totals for each decade, we’ll combine the years into decade groups with case_when().

monthly_by_decade <- predicted_data |>
  mutate(decade = case_when(
    year %in% 2020:2029 ~ "2020",
    year %in% 2030:2039 ~ "2030", 
    year %in% 2040:2049 ~ "2040",
    year %in% 2050:2059 ~ "2050",
    year %in% 2060:2069 ~ "2060",
    year %in% 2070:2079 ~ "2070",
    year %in% 2080:2089 ~ "2080",
    year %in% 2090:2099 ~ "2090",
    .default = NA
  )) 

From this dataset, we want to group together the months, decades and scenarios. For each of these groups we want to find the mean energy usage.

monthly_by_decade <- monthly_by_decade |>
  filter(!is.na(decade)) |>
  group_by(month, decade, scenario) |>
  summarize(avg_energy = mean(predicted_energy, na.rm = TRUE))

Now we’ll add do the same for our historical recorded data. We’ll summarize by month and find the average. Then we’ll add decade and scenario columns to match the monthly_by_year data.

historical_monthly <- energy_record |>
  group_by(month) |>
  summarize(avg_energy = mean(energy)) |>
  mutate(decade = "Historical (2001-2019)",
         scenario = "historical")

Now we can combine these datasets.

combined_monthly <- bind_rows(historical_monthly, monthly_by_decade) |>
  

Now we’ll plot the combined data. We’ll color by decade and we’ll change the linetype by scenario.

ggplot(combined_monthly, aes(x = month, y = avg_energy, color = decade, linetype = scenario)) +
  geom_line()

Make the Graph Prettier

This graph looks pretty good, but you can use the graphing workshops or online guides to improve it. For starters changing the order or the kind of the linetype will help distinguish rcp45 from rcp85.

The STHDA site walks you through how to make changes to line graphs. The R Graph Gallery has information on many types of plots, including line plots.

To Do

Make whatever changes you like to lines, colors, backgrounds, decade aggregation, and more to create a graph you are happy with.