Predicting Future Temperature Sensitive Demand

To see how demand may change in the future, we’ll analyze the minimum demand over time.

From the previous analysis, we found the predicted balance point and minimum energy demand for our historical data.

balance_point <- predicted_data |>
  arrange(predicted_energy) |>
  slice(1)

From this you should see that the predicted balance point temperature for all historical data together is 61 and the energy usage at that temp, which is the minimum energy usage, is 48250.

Let’s save these as variables.

bp_temp <- 61
min_energy <- 48250

We can compare this to the actual minimum recorded each year.

historical_min_energy <- full_data |>
  filter(!is.na(energy)) |>
  group_by(year) |>
  summarize(minimum = min(energy))

When viewing that data, you can see that the actual minimum energy usages are around the same figure as the model prediction with the exception of 2018. Here’s some information on what was happening in 2018.

Predicting Future Energy Use

Now we can use the model from last time to predict future energy needs. We’ll create a new dataset called predicted_future that is everything except the historical data, so it’s just the modeled data from RCP4.5 and RCP8.5.

predicted_future <- full_data |>
  filter(scenario != "historical")

Now we’ll use the predict() function to fill the energy values. For our input temperature values we’ll the tmax temperatures.

input_temps <- data.frame(temp = predicted_future$tmax)

predicted_energy_values <- predict(loess_model, newdata = input_temps)

Now we have a vector (the same as a single column, just not yet in a table) of energy predictions, that we can add to our predicted dataset.

predicted_future <- predicted_future |>
  mutate(predicted_energy = predicted_energy_values)

From these predicted energy values, we can calculate demand due to heating or cooling by subtracting the minimum energy days. First we’ll do that by subtracting the estimated minimum demand, then later we’ll classify days by whether they are above or below the balance point temperature.

To get the increased energy demand over our baseline minimum, we just add a column and the subtraction formula.

predicted_future <- predicted_future |>
  mutate(increased_energy = predicted_energy - min_energy)

Now let’s graph increase energy by date.

To Do

Add to the code to make the plot.

ggplot(predicted_future, aes(x = ______, y = ______, color = ______)) +
  geom_line(alpha = 0.5)

Answer

ggplot(predicted_future, aes(x = date, y = increased_energy, color = scenario)) +
  geom_point(alpha = 0.5)

Since every date is represented, this is too messy to really see much. Let’s aggregate by year to see if trends stick out more. We’ll group our data by year and scenario, then for each group, we’ll take the mean energy prediction, which will get us one prediction per year.

yearly_data <- predicted_future |>
  group_by(year, scenario) |>
  summarize(avg_energy = mean(increased_energy, na.rm = TRUE))

Note: The na.rm statement is to remove any missing values.

Now we can plot our yearly data. We’ll also add a loess line to see the relationship better.

ggplot(yearly_data, aes(x = year, y = avg_energy, color = scenario)) +
  geom_point() +
  geom_smooth(method = "loess")

This looks better, but still messy. We can actually drop the geom_point() line and just graph the loess lines to very clearly show the trends.

ggplot(yearly_data, aes(x = year, y = avg_energy, color = scenario)) +
  geom_smooth(method = "loess", se = FALSE)

Note: The se = FALSE part removes the gray standard error strip. You can set it to TRUE to see the difference.

This is much better for showing our trends!

To Do

Make your plot look nicer. At minimum make sure the axis labels are informative, but you can also look at picking your own colors with manual coloring or changing the background from gray.

Analyze Frequency of Minimum Demand Days

We know that there will be days of minimum demand when our temperature is at the balance point of 61. We’ll use the tmax value to start and then you can go through the code again with tmin. Obviously, each of these will skew a different direction, but we have to work with the data we have, and if we did the average of tmax and tmin that makes our data look milder than it actually is.

predicted_future <- predicted_future |>
  mutate(pred_temp = tmax)

Now, we’ll add our historical data to our predicted data. This way we can summarize historical and future minimum demand days.

add_future <- predicted_future |>
  filter(year > 2025)

add_historical <- full_data |>
  filter(!is.na(temp), scenario == "historical")
  
all_data <- bind_rows(add_historical, add_future)

Now we’ll just move the historical, recorded temps to the avg_temp column, so that we can just analyze that column.

all_data <- all_data |>
  mutate(pred_temp = coalesce(pred_temp, temp))

Now we can classify the days as “heating”, “cooling”, or “MDD” (minimum demand day).

all_data <- all_data |>
  mutate(degrees_off = 61 - pred_temp) |>
  mutate(day_type = case_when(
    degrees_off > 0 ~ "cooling",
    degrees_off < 0 ~ "heating", 
    degrees_off == 0 ~ "MDD",
    .default = NA
))

Now we’ll add a decade column to our data to see how many minimum demand days we have per decade. Note that the label is the middle of the time frame.

all_data <- all_data |>
  mutate(decade = case_when(
    year %in% 2016:2025 ~ 2020,
    year %in% 2026:2035 ~ 2030,
    year %in% 2036:2045 ~ 2040, 
    year %in% 2046:2055 ~ 2050,
    year %in% 2056:2065 ~ 2060,
    year %in% 2066:2075 ~ 2070,
    year %in% 2076:2085 ~ 2080,
    year %in% 2086:2095 ~ 2090,
    .default = NA
  ))

Now we can summarize the number of heating, cooling, or MDDs per decade by scenario. We’ll use one summarize() to get the total number of days, then another to get total by type, so that we can have frequency.

mdd_summary <- all_data |>
  group_by(decade, scenario, day_type) |>
  summarize(count = n(), .groups = "drop") |>
  group_by(decade, scenario) |>
  mutate(frequency = count / sum(count))

Now we can graph these frequencies over time.

Try either of these:

ggplot(mdd_summary, aes(x = decade, y = frequency, color = day_type)) +
  geom_line()

ggplot(mdd_summary, aes(x = decade, y = frequency, color = day_type, line_type = scenario)) +
  geom_line()

To Do

Both graphs are lacking a little bit of information. Play around with the graph types and with filtering to get a graph you are satisfied with.

These calculations used a very strict cutoff of 61 degrees exactly. You can add a wider window to get a more realistic depiction of changes over time.

To Do

Adjust this code to what you think is a good standard for each day type. Then rerun the rest of the code.

all_data <- all_data |>
  mutate(degrees_off = 61 - pred_temp) |>
  mutate(day_type = case_when(
    degrees_off > 0 ~ "cooling",
    degrees_off < 0 ~ "heating", 
    degrees_off == 0 ~ "MDD",
    .default = NA
))

ANOVA: Testing for Differences Across Decades

Now that we’ve visualized frequency of day type across decades, we want to test whether the differences we see are statistically significant. To do this, we’ll use an ANOVA (Analysis of Variance). ANOVA is a statistical test that compares the means of three or more groups to see if at least one group is significantly different from the others.

We can’t run the ANOVA on the summarized data because there is only one data point per decade. So, we’ll repeat our code, but calculate frequency by year instead. We’ll also add in a complete() function to include 0 as a possible count.

mdd_yearly_summary <- all_data |>
  group_by(year, scenario, day_type) |>
  summarize(count = n(), decade = first(decade), .groups = "drop") |>
  complete(year, scenario, day_type, fill = list(count = 0)) |>
  group_by(year, scenario) |>
  mutate(frequency = count / sum(count))

With our ANOVA we want to see if a single y-variable varies by our groupings. So, we will have to measure MDD, heating, and cooling days separately.

Let’s filter to get those subsets.

mdd_for_anova <- mdd_yearly_summary |>
  filter(day_type == "MDD")
  
heating_for_anova <- mdd_yearly_summary |>
  filter(day_type == "heating")
  
cooling_for_anova <- mdd_yearly_summary |>
  filter(day_type == "cooling")

Now we’re set to run our ANOVAs. The command to run aov() and the syntax we use for most models in R is y ~ x, where the tilde can be read as “is a function of” or “as predicted by”. So, we’re looking at if the y-values can be predicted by the x-variable groupings.

We use this syntax but plug in our variables and state what dataset we’re pulling from.

anova_mdd <- aov(frequency ~ decade, data = mdd_for_anova)

Now we run summary() to see the output of the model.

summary(anova_mdd)

From summary() the p-value is Pr(>F). The asterisks correspond to significance codes. An ANOVA tells you whether or not there is a difference between our groups. It doesn’t tell you where the difference lies, meaning what groups are driving the difference, you would use other graphs for that.

Another option for looking at model output is to use the {broom} package. It has a function tidy() that is very useful. If we want to use something from a package but not load the full library we use the library::function syntax. Run the code and see if it is easier to interpret.

broom::tidy(anova_mdd)

Note: you may have to run install.packages(“broom”) for this to work.

We could also have run our ANOVA with scenario instead of decade.

anova_mdd_2 <- aov(frequency ~ scenario, data = mdd_for_anova)
broom::tidy(anova_mdd_2)

Thi shows us if scenario affects the variance in MDD days.

We can also combine terms in an ANOVA. If you wanted to use both decade and scenario you would just add a + or an * between them. The * will look at the interaction of the two terms. Run this code:

anova_mdd_3 <- aov(frequency ~ decade * scenario, data = mdd_for_anova)
broom::tidy(anova_mdd_3)

Think

Why is the interaction of scenario and decade highly not significant?

Answer

For scenario = “historical” there is only one decade, so there’s no way to evaluate the pairings of historical with the other decades. A better model would just use the + to evaluate each term separately:

anova_mdd_4 <- aov(frequency ~ decade + scenario, data = mdd_for_anova)
broom::tidy(anova_mdd_4)

To Do

Run ANOVAs for the other day types. Is there a difference in frequency of heating and cooling days by decade?

Visualizing Frequency Difference

Another way we can see the shift in heating and cooling days over time is to plot the same u-shaped energy by temperature plot for our different decades.

First we’ll prep our data to be combined with the historical data and we’ll do the same for the historical data.

energy_record <- energy_record |> 
    mutate(year = year(date),
           temp_used = temp,
           energy_used = energy) |>
    filter(year == 2020) |>
    mutate(decade = "2020 (Historical)")
  
predicted_data <- predicted_data |>
    filter(year %in% c(2030, 2040, 2050, 2060, 2070)) |>
    mutate(temp_used = tmax,
           energy_used = predicted_energy,
           decade = paste0(year, " (", scenario, ")"))

combined_data <- bind_rows(energy_record, predicted_data)

ggplot(combined_data, aes(x = tmax, y = predicted_energy , color = decade)) +
  geom_point(alpha = 0.3) +
  geom_smooth(method = "loess", se = FALSE)

There’s a lot of overlap between our curves, so we’ll add facet_wrap() to separate by decade.

ggplot(combined_data, aes(x = tmax, y = predicted_energy , color = decade)) +
  geom_point(alpha = 0.3) +
  geom_smooth(method = "loess", se = FALSE) +
  facet_wrap(~decade)

Now we can see the shift from a u-shape to more of a check mark as the number of days shifts towards hotter temps and higher energy usage.

To Do

Edit this graph until you are satisfied. Find out more about facet_wrap() here.