Future Cooling-demand (as based on Heat Events)

Introduction

In this analysis, we’ll examine the heat wave patterns of both historical data and future climate projections from moderate emissions (RCP4.5) and high emissions (RCP8.5) scenarios

If you’re starting a new script, be sure to load your libraries and data. If you’re carrying on, you should still have full_data and heat_event_summary. Either way, make sure you’re starting with a dataset that represents all the combined data and one that is a summary of just the heat events. You can use glimpse() to ensure you have all the needed columns.

glimpse(full_data)
glimpse(heat_event_summary)

Heat Wave Analysis

Frequency

To understand our heat events let’s summarize them by finding the frequency of heat wave events over time, the intensity of past and future events, and the duration of heat events. First we’ll ensure that we can map events by year by adding a year column.

heat_event_summary <- heat_event_summary |>
  mutate(year = year(start_date))

Now we can use the summarize() function in a similar fashion as before to add up the number of heat events per year. To do this, you use the function n(), leaving it blank inside so that it just calculates for every group.

Complete the code so that we can see total heat events per year for each scenario.

To Do
heat_wave_frequency <- heat_event_summary |>
  group_by(_________) |>
  summarize(num_events = n())

You’ll need two arguments inside the group_by().

heat_wave_frequency <- heat_event_summary |>
  group_by(scenario, year) |>
  summarize(num_events = n())

Now we can plot this frequency over time and color by each emissions scenario. Complete the code and also ensure your axes are labeled well.

To Do
ggplot(________, aes(x = year, y = ________, color = scenario)) +
  geom_line()
ggplot(heat_wave_frequency, aes(x = year, y = num_events, color = scenario)) +
  geom_line() +
  labs(x = "", y = "")

Intensity

To see if heat waves are getting hotter, let’s compare the maximum temperatures reached during heat waves. We can group our data by decade to simplify the graph and make it more interpretable.

First add the decade column. case_when will use our year range to assign a new value, for example, year %in% 2020:2029 ~ 2020 means for any year between 2020 and 2029, label the decade 2020.

heat_event_summary <- heat_event_summary |>
  mutate(decade = case_when(
    year %in% 1970:1979 ~ 1970,
    year %in% 1980:1989 ~ 1980,
    year %in% 1990:1999 ~ 1990,
    year %in% 2000:2009 ~ 2000,
    year %in% 2010:2019 ~ 2010,
    year %in% 2020:2029 ~ 2020,
    year %in% 2030:2039 ~ 2030, 
    year %in% 2040:2049 ~ 2040,
    year %in% 2050:2059 ~ 2050,
    year %in% 2060:2069 ~ 2060,
    year %in% 2070:2079 ~ 2070,
    year %in% 2080:2089 ~ 2080,
    year %in% 2090:2099 ~ 2090
  )) |>
  filter(!is.na(decade))

Then summarize by maximum.

heat_wave_intensity <- heat_event_summary |>
  group_by(scenario, decade) |>
  summarize(max_temp = max(max_temp))

Now we’ll make a bar graph and color by scenario. geom_col() stands for columns. The position = "dodge" term tells it to put different groups next to each other rather than stacked on top of each other. You can remove that part to see the difference.

ggplot(heat_wave_intensity, aes(x = decade, y = max_temp, fill = scenario)) +
  geom_col(position = "dodge") 

This graph is okay, but a y-axis that started higher would improve it and allow us to see the differences more easily.

To Do

Adjust the y-axis limits in the coord_cartesian() function to improve the graph.

ggplot(heat_wave_intensity, aes(x = decade, y = max_temp, fill = scenario)) +
  geom_col(position = "dodge") +
  coord_cartesian(ylim = c(50, 150))

You can also make the bars all the same width by subbing geom_col(position = position_dodge2(preserve = "single")) for the geom_col line in that code.

Duration

We can visualize the duration of heat waves in a similar fashion as we did finding the intensity. Instead of using the maximum value as we did for intensity, use the average value for duration.

To Do

Alter the code to visualize duration.

# This is copied code for intensity, change relevant variables.
heat_wave_intensity <- heat_event_summary |>
  group_by(scenario, decade) |>
  summarize(hottest_temp = max(hottest_temp))
  
ggplot(heat_wave_intensity, aes(x = decade, y = hottest_temp, fill = scenario)) +
  geom_col(position = "dodge") 
heat_wave________ <- heat_event_summary |>
  group_by(scenario, decade) |>
  summarize(____ = ____(_______))
  
ggplot(heat_wave________, aes(x = decade, y = ________, fill = scenario)) +
  geom_col(position = "dodge") 
heat_wave_duration <- heat_event_summary |>
  group_by(scenario, decade) |>
  summarize(avg_duration = mean(duration))
  
ggplot(heat_wave_duration, aes(x = decade, y = avg_duration, fill = scenario)) +
  geom_col(position = "dodge") 

Fitting a Model to Frequency

Let’s test whether the frequency of heat waves is significantly increasing over time using a linear regression. Linear regression at its most simple is calculating the slope of a line where slope is m in y = mx + b.

The function to create this linear regression is lm() for linear model. The syntax we use for models in R is y ~ x where the tilde can be read as “is a function of” or “as predicted by”. So, we’re looking at how well our x-variable will predict our y-variable.

We use this syntax but plug in our variables and state what dataset we’re pulling from.

model <- lm(num_events ~ year, data = heat_wave_frequency)

Now we run summary() to see the output of the model.

summary(model)

Your output will look something like this:

We interpret this by looking at the Coefficients section and seeing the Estimate for our variable. Here, the estimate for year is 0.066. That value is our slope coefficient, ‘m’. The -130 value is the intercept, ‘b’; we typically don’t care about this value. The Pr(>|t|) is our p-value. <2e-16> means it is very small, this is the smallest number R will calculate unless explicitly told to go further.

Another option for looking at model output is to use the {broom} package. It has a function tidy() that is very useful. If we want to use something from a package but not load the full library we use the library::function syntax. Run the code and see if it is easier to interpret.

broom::tidy(model)

Note: you may have to run install.packages(“broom”) for this to work.

We can add another term to our model to add complexity. If we do so, the equation is no longer y = mx + b, but we can still get coefficients from it the same as we would the simple model.

Here’s the scenario variable added to the model.

model2 <- lm(num_events ~ year + scenario, data = heat_wave_frequency)
To Do

Run the summary() or broom::tidy() functions to see the output.

Think

What do our results say about the frequency of future heat events?


Cooling Demand Analysis

Building a Demand by Temperature Model

We can use our data to create a model that predicts future temperature and demand based on historical temperatures. To look at just cooling days, we’ll first filter our data for only temperatures above our balance point temperature of 61 degrees.

cooling_days <- full_data |>
  filter(temp > 61)

To create our model we want to use actual historical data, so we’ll further filter for actual recorded temperatures.

energy_record <- cooling_days |>
  filter(scenario == "historical")

Then we can fit our model, just a linear model, a y = mx + b model.

cooling_model <- lm(energy ~ temp, data = energy_record)

summary(cooling_model)
broom::tidy(cooling_model)

Predict Future Energy Demand

Now we can use our model and the predict() function to find the energy values for the future. To do this we’ll use the tmax temperatures as our x-values and predict the y-value for demand.

We need just our future values for this, so we’ll filter out historical. We’ll also ensure that all rows have a value for year and month. We just want cooling days so we’ll make sure temp is more than 61

predicted_cooling <- full_data |>
  filter(scenario != "historical", 
         tmax > 61) |>
  mutate(year = year(date),
         month = month(date))

From that, we’ll pull out our tmax values to be used as the input to our model.

input_temps <- data.frame(temp = predicted_cooling$tmax)

Now we’ll get predicted demand from the cooling_model by plugging in the input_temps values as x and getting out predicted_demand_values, which are the ys.

predicted_demand_values <- predict(cooling_model, newdata = input_temps)

Now we have a vector (the same as a single column, just not yet in a table) of energy predictions, that we can add to our predicted dataset.

predicted_cooling <- predicted_cooling |>
  mutate(predicted_demand = predicted_demand_values)

We can check to see if our data looks as expected with a graph.

ggplot(predicted_cooling, aes(x = year, y = predicted_demand, color = scenario)) +
  geom_point()

Now we’ll add a column for decade so that it’s easier for us to view large changes over time.

predicted_cooling <- predicted_cooling |>
  mutate(decade = case_when(
    year %in% 2020:2029 ~ 2020,
    year %in% 2030:2039 ~ 2030, 
    year %in% 2040:2049 ~ 2040,
    year %in% 2050:2059 ~ 2050,
    year %in% 2060:2069 ~ 2060,
    year %in% 2070:2079 ~ 2070,
    year %in% 2080:2089 ~ 2080,
    year %in% 2090:2099 ~ 2090
  )) |>
  filter(!is.na(decade))

Now we can summarize our data to see the pattern in total predicted energy demand. We’ll group by decade and scenario and for each of those pairs we’ll use sum() to add up demand.

annual_energy <- predicted_cooling |>
  group_by(decade, scenario) |>
  summarize(total_demand = sum(predicted_demand)) 

Then we’ll graph our summarized data.

ggplot(annual_energy, aes(x = decade, y = total_demand, color = scenario)) +
  geom_line() 

We can add our historical energy demand to this table by putting it in the same format and then binding it.

Here we summarize the historical data in the same way as the predicted.

historic_cooling <- energy_record |>
  group_by(scenario) |>
  summarize(total_demand = sum(energy)) |>
  mutate(decade = 2010)

Now bind them together.

total_energy <- bind_rows(historic_cooling, annual_energy)

To find the amount of energy that is due to cooling days, you can subtract the minimum energy usage, which is the usage at the balance point. Another group has calculated that to be 48250. Add an increased_energy column using this value.

total_energy <- total_energy |>
  mutate(increased_energy = total_demand - 48250)

Now you can graph total increased demand for cooling days.

ggplot(total_energy, aes(x = decade, y = increased_energy, fill = scenario)) +
  geom_col(position = "dodge") 

Make the Graphs Prettier

Use the graphing resources from previous tutorials to improve your graphs.

To Do

Create versions of the last two graphs that you are satisfied with.

Think

One thing to consider is that our graphs will be skewed by using tmax to make predictions in the future data, but we only have the average daily temperature, temp, for the historical data. Trying to use the average of tmax and tmin as a correspondent to temp just makes the data seem milder than it is, so that’s not useful either. This is just a limitation of the data, but if you wanted you could use the model values for historical dates rather than the recorded temps.

ANOVA: Testing for Differences Across Decades

Now that we’ve visualized cooling demand across decades, we want to test whether the differences we see are statistically significant. To do this, we’ll use an ANOVA (Analysis of Variance). ANOVA is a statistical test that compares the means of three or more groups to see if at least one group is significantly different from the others.

The command to run an ANOVA is aov(). The syntax is similar to that of our linear model.

anova_model <- aov(total_demand ~ decade, data = total_energy)

summary(anova_model)
broom::tidy(anova_model)

From summary() the p-value is Pr(>F). It is more obvious from broom.

Think

What was our null hypothesis and what can this p-value tell you about it? What conclusions can we draw?