Future Heating-demand (as based on Cold Events)

Introduction

In this analysis, we’ll examine the heat wave patterns of both historical data and future climate projections from moderate emissions (RCP4.5) and high emissions (RCP8.5) scenarios

If you’re starting a new script, be sure to load your libraries and data. If you’re carrying on, you should still have full_data and cold_events_summary. Either way, make sure you’re starting with a dataset that represents all the combined data and one that is a summary of just the cold events. You can use glimpse() to ensure you have all the needed columns.

glimpse(full_data)
glimpse(cold_events_summary)

Cold Wave Analysis

Frequency

To understand our cold events let’s summarize them by finding the frequency of cold waves over time, the intensity of past and future events, and the duration of cold events. First we’ll ensure that we can map events by year by adding a year column.

cold_events_summary <- cold_events_summary |>
  mutate(year = year(start_date))

Now we can use the summarize() function in a similar fashion as before to add up the number of cold events per year. To do this, you use the function n(), leaving it blank inside so that it just calculates for every group.

Complete the code so that we can see total cold events per year for each scenario.

To Do

cold_wave_frequency <- cold_events_summary |>
  group_by(_________) |>
  summarize(num_events = n())

Hint

You’ll need two arguments inside the group_by().

Answer

cold_wave_frequency <- cold_events_summary  |>
  group_by(scenario, year) |>
  summarize(num_events = n())

Now we can plot this frequency over time and color by each emissions scenario. Complete the code and also ensure your axes are labeled well.

To Do

ggplot(________, aes(x = year, y = ________, color = scenario)) +
  geom_line()

Answer

ggplot(cold_wave_frequency, aes(x = year, y = num_events, color = scenario)) +
  geom_line() +
  labs(x = "", y = "")

Intensity

To see if cold waves are getting colder, let’s compare the minimum temperatures reached during cold waves. We can group our data by decade to simplify the graph and make it more interpretable.

First add the decade column. case_when will use our year range to assign a new value, for example, year %in% 2020:2029 ~ 2020 means for any year between 2020 and 2029, label the decade 2020.

cold_events_summary <- cold_events_summary |>
  mutate(decade = case_when(
    year %in% 1970:1979 ~ 1970,
    year %in% 1980:1989 ~ 1980,
    year %in% 1990:1999 ~ 1990,
    year %in% 2000:2009 ~ 2000,
    year %in% 2010:2019 ~ 2010,
    year %in% 2020:2029 ~ 2020,
    year %in% 2030:2039 ~ 2030, 
    year %in% 2040:2049 ~ 2040,
    year %in% 2050:2059 ~ 2050,
    year %in% 2060:2069 ~ 2060,
    year %in% 2070:2079 ~ 2070,
    year %in% 2080:2089 ~ 2080,
    year %in% 2090:2099 ~ 2090
  )) |>
  filter(!is.na(decade))

Then summarize by minimum.

cold_wave_intensity <- cold_events_summary |>
  group_by(scenario, decade) |>
  summarize(coldest_temp = min(min_temp))

Now we’ll make a bar graph and color by scenario. geom_col() stands for columns. The position = "dodge" term tells it to put different groups next to each other rather than stacked on top of each other. You can remove that part to see the difference.

ggplot(cold_wave_intensity, aes(x = decade, y = coldest_temp, fill = scenario)) +
  geom_col(position = "dodge")

This graph is okay, but we can force the graph to make all the bars the same width even if there aren’t scenarios for that decade.

To Do

Add labels and anything else you’d like to improve your graph.

ggplot(cold_wave_intensity, aes(x = decade, y = coldest_temp, fill = scenario)) +
  geom_col(position = position_dodge2(preserve = "single"))

Duration

We can visualize the duration of cold waves in a similar fashion as we did finding the intensity. Instead of using the maximum value as we did for intensity, use the average value for duration.

To Do

Alter the code to visualize duration.

# This is copied code for intensity, change relevant variables.
cold_wave_intensity <- cold_events_summary |>
  group_by(scenario, decade) |>
  summarize(coldest_temp = min(coldest_temp))
  
ggplot(cold_wave_intensity, aes(x = decade, y = coldest_temp, fill = scenario)) +
  geom_col(position = "dodge")

Hint

cold_wave________ <- cold_event_summary |>
  group_by(scenario, decade) |>
  summarize(____ = ____(_______))
  
ggplot(cold_wave________, aes(x = decade, y = ________, fill = scenario)) +
  geom_col(position = "dodge")

Answer

cold_wave_duration <- cold_events_summary |>
  group_by(scenario, decade) |>
  summarize(max_duration = max(duration))
  
ggplot(cold_wave_duration, aes(x = decade, y = max_duration, fill = scenario)) +
  geom_col(position = "dodge") 

# Use the additional code from above to make your bars even if you'd like.

Fitting a Model to Frequency

Let’s test whether the frequency of cold waves is significantly increasing over time using a linear regression. Linear regression at its most simple is calculating the slope of a line where slope is m in y = mx + b.

The function to create this linear regression is lm() for linear model. The syntax we use for models in R is y ~ x where the tilde can be read as “is a function of” or “as predicted by”. So, we’re looking at how well our x-variable will predict our y-variable.

We use this syntax but plug in our variables and state what dataset we’re pulling from.

model <- lm(num_events ~ year, data = cold_wave_frequency)

Now we run summary() to see the output of the model.

summary(model)

Your output will look something like this (just the format, not the numbers):

We interpret this by looking at the Coefficients section and seeing the Estimate for our variable. Here, the estimate for year is 0.066. That value is our slope coefficient, ‘m’. The -130 value is the intercept, ‘b’; we typically don’t care about this value. The Pr(>|t|) is our p-value. <2e-16> means it is very small, this is the smallest number R will calculate unless explicitly told to go further.

Another option for looking at model output is to use the {broom} package. It has a function tidy() that is very useful. If we want to use something from a package but not load the full library we use the library::function syntax. Run the code and see if it is easier to interpret.

broom::tidy(model)

Note: you may have to run install.packages(“broom”) for this to work.

Think

What do our results say about the frequency of future cold events?

We can add another term to our model to add complexity. If we do so, the equation is no longer y = mx + b, but we can still get coefficients from it the same as we would the simple model.

Here’s the scenario variable added to the model.

model2 <- lm(num_events ~ year + scenario, data = cold_wave_frequency)

To Do

Run the summary() or broom::tidy() functions to see the output.

Think

What does the more complex model say about cold events by year and scenario?

Heating Demand Analysis

Building a Demand by Temperature Model

We can use our data to create a model that predicts future temperature and demand based on historical temperatures. To look at just heating days, we’ll first filter our data for only temperatures below our balance point temperature of 61 degrees.

heating_days <- full_data |>
  filter(temp < 61)

To create our model we want to use actual historical data, so we’ll further filter for actual recorded temperatures.

energy_record <- heating_days |>
  filter(scenario == "historical")

Then we can fit our model, just a linear model, a y = mx + b model.

heating_model <- lm(energy ~ temp, data = energy_record)

summary(heating_model)
broom::tidy(heating_model)

Predict Future Energy Demand

Now we can use our model and the predict() function to find the energy values for the future. To do this we’ll use the tmin temperatures as our x-values and predict the y-value for demand.

We need just our future values for this, so we’ll filter out historical. We’ll also ensure that all rows have a value for year and month. We just want heating days so we’ll make sure temp is less than 61

predicted_heating <- full_data |>
  filter(scenario != "historical", 
         tmin < 61) |>
  mutate(year = year(date),
         month = month(date))

From that, we’ll pull out our tmin values to be used as the input to our model.

input_temps <- data.frame(temp = predicted_heating$tmin)

Now we’ll get predicted demand from the heating_model by plugging in the input_temps values as x and getting out predicted_demand_values, which are the ys.

predicted_demand_values <- predict(heating_model, newdata = input_temps)

Now we have a vector (the same as a single column, just not yet in a table) of energy predictions, that we can add to our predicted dataset.

predicted_heating <- predicted_heating |>
  mutate(predicted_demand = predicted_demand_values)

We can check to see if our data looks as expected with a graph.

ggplot(predicted_heating, aes(x = year, y = predicted_demand, color = scenario)) +
  geom_point()

Now we’ll add a column for decade so that it’s easier for us to view large changes over time.

predicted_heating <- predicted_heating |>
  mutate(decade = case_when(
    year %in% 2020:2029 ~ 2020,
    year %in% 2030:2039 ~ 2030, 
    year %in% 2040:2049 ~ 2040,
    year %in% 2050:2059 ~ 2050,
    year %in% 2060:2069 ~ 2060,
    year %in% 2070:2079 ~ 2070,
    year %in% 2080:2089 ~ 2080,
    year %in% 2090:2099 ~ 2090
  )) |>
  filter(!is.na(decade))

Now we can summarize our data to see the pattern in total predicted energy demand. We’ll group by decade and scenario and for each of those pairs we’ll use sum() to add up demand.

annual_energy <- predicted_heating |>
  group_by(decade, scenario) |>
  summarize(total_demand = sum(predicted_demand))

Then we’ll graph our summarized data.

ggplot(annual_energy, aes(x = decade, y = total_demand, color = scenario)) +
  geom_line()

We can add our historical energy demand to this table by putting it in the same format and then binding it.

Here we summarize the historical data in the same way as the predicted.

historic_heating <- energy_record |>
  group_by(scenario) |>
  summarize(total_demand = sum(energy, na.rm = TRUE)) |>
  mutate(decade = 2010)

Now bind them together.

total_energy <- bind_rows(historic_heating, annual_energy)

To find the amount of energy that is due to heating days, you can subtract the minimum energy usage, which is the usage at the balance point. Another group has calculated that to be 48250. Add an increased_energy column using this value.

total_energy <- total_energy |>
  mutate(increased_energy = total_demand - 48250)

Now you can graph total increased demand for heating days.

ggplot(total_energy, aes(x = decade, y = increased_energy, fill = scenario)) +
  geom_col(position = "dodge")

Make the Graphs Prettier

Use the graphing resources from previous tutorials to improve your graphs.

To Do

Create versions of the last two graphs that you are satisfied with. Make sure the labels make the graphs easy to interpret.

ANOVA: Testing for Differences Across Decades

Now that we’ve visualized cooling demand across decades, we want to test whether the differences we see are statistically significant. To do this, we’ll use an ANOVA (Analysis of Variance). ANOVA is a statistical test that compares the means of three or more groups to see if at least one group is significantly different from the others.

The command to run an ANOVA is aov(). The syntax is similar to that of our linear model.

anova_model <- aov(total_demand ~ decade, data = total_energy)

summary(anova_model)
broom::tidy(anova_model)

From summary() the p-value is Pr(>F). It is more obvious from broom.

Think

What was our null hypothesis and what can this p-value tell you about it? What conclusions can we draw? Why might we not see what we expected?

Answer

One thing that will skew our results is that we are using tmin to make predictions in the future data, but we only have the average daily temperature, temp, for the historical data. This is just a limitation of the data, but if you wanted you could use the model values for historical dates rather than the recorded temps.