Cold Events & Heating Degree Days

What Makes a Cold Event?

You’ll need to define the threshold for what constitutes a cold event. This document will give some example parameters, but you need to choose the variables you want and their values.

For the example, we’ll define a cold event as a period of cold that is lower than 28 degrees for more than five consecutive days.

Example criteria:

threshold temp: 28 degrees
5 consecutive days

To Do

Decide what will define a cold event.

Finding Cold Events in Our Data

To Do

Read in your data and store it as a variable called full_data.

Answer

full_data <- read_csv("full_data.csv")

First we’ll add a threshold value to our data. For the example criteria, this should be a static 28 degrees. Use the mutate() function to make the new column. We’ll also add arrange() to make sure our data is ordered by date.

To Do

Add to the code to create the threshold column

full_data <- full_data |>
  mutate(____________) |>
  arrange(date)

Answer

Add to the code to create the threshold column

full_data <- full_data |>
  mutate(threshold = 28) |>
  arrange(date)

Now we’ll make a new column to signify if the min temp is lower than the cutoff threshold. If it is, we’ll put TRUE in the extreme column, if it isn’t we’ll put FALSE. To decide which case each row is we’ll use case_when(). You can learn more on how to use case_when() here.

full_data <- full_data |>
  mutate(extreme = case_when(      # name the new column extreme
    tmin < threshold ~ TRUE,       # put TRUE if tmax is more than threshold
    .default = FALSE               # otherwise put FALSE
  ))

Check your data. You should see the extreme column. It will mostly be FALSE but there should be some TRUE values.

Now we’ll count up the consecutive days. To do this we’ll use a for loop to go through each line and see how it compares to the previous line. We’ll make a count column that keeps track of days based on if there is TRUE or FALSE in the extreme column, and we’ll make a run_id column to keep track of the consecutive runs of hot days.

We need to do this for each of our different scenarios separately in order to make sure we count the runs correctly. So, we’ll make the for loop into a function we can use on each scenario dataset.

count_days <- function(dataset) {       # name the function
  dataset$count <- NA                   # add the count column and start it out with NA
  dataset$run_id <- NA                  # add a column to keep track of consecutive runs
  run_counter <- 0                      # start the run counter at 0

  for (i in 1:nrow(dataset)) {          # this says to go through the data one row at a time

    # see if extreme is FALSE for that row
    if (dataset$extreme[i] == FALSE) {  
      dataset$count[i] <- NA            # if yes, put NA
      dataset$run_id[i] <- NA           # the run_id is also NA

      # see if extreme is TRUE for the current row & FALSE for the previous row
    } else if (i == 1 || full_data$extreme[i-1] == FALSE) { 
      dataset$count[i] <- 1                          # if yes, put 1 to mark the first extreme day
      run_counter <- run_counter + 1                 # restart run counter at a new run
      dataset$run_id[i] <- as.character(run_counter) # make the run_id equal to the current count

      # if both of the previous don't work, then the row is TRUE & the one before it is TRUE
    } else {                            
      dataset$count[i] <- 1 + dataset$count[i-1]     # set the count equal to 1 + the previous row count
      dataset$run_id[i] <- as.character(run_counter) # make the run_id equal to the current count
    }
  }
  return(dataset)   # the output is the updated dataset  
}

Now we can divide our data into the different scenarios and run them through the function. This might take a moment since it has to go through each row one at a time. When it finishes, you can check to see if the count and run_id columns are there.

historical <- full_data |>
  filter(scenario == "historical")
rcp45 <- full_data |>
  filter(scenario == "rcp45")
rcp85 <- full_data |>
  filter(scenario == "rcp85")
  
historical <- count_days(historical)
rcp45 <- count_days(rcp45)
rcp85 <- count_days(rcp85)

Now we can recombine the data after calculating their runs individually.

recombined <- rbind(historical, rcp45, rcp85)

Now let’s simplify the data down to just the heat events and not the normal days. We just need to filter out the NA values. To do this you can use ! which means “does not equal” in conjunction with the is.na() function.

To Do

See if you can add to the code to make the filter.

extremes <- recombined |>
  _______________

Hint

The filter would look like !is.na(count).

Answer

extremes <- recombined |>
  filter(!is.na(count))

Summarizing the Cold Events

Now we have a dataset with all the extreme cold days, but we need to filter it by our consecutive day threshold, which we put at 5.

We’ll use group_by() to make sure our run_id distinguishes each distinct string of extreme cold days. We’ll group our runs together and then see if there are any where the max number of days, the max count is more than 5.

cold_events <- extremes |>
  group_by(run_id) |>
  filter(max(count) >= 5)

Now we’ll make a summary table to show when the events are, how long they last, and the min temperature reached. To do that we use summarize() which will make a new table with whatever stats we dictate. Within summarize() anything before the = becomes the column name and the stats after are what it fills with.

To Do

Add to this code to create the table.

cold_events_summary <- cold_events |>
  summarize(
    start_date = min(date),
    end_date = ____(date),
    duration = max(_____),
    min_temp = _________
  )

Answer

cold_events_summary <- cold_events |>
  summarize(
    start_date = min(date),
    end_date = max(date),
    duration = max(count),
    min_temp = min(tmin)
  )

You should now see a new object that shows the cold event summary. Let’s add arrange() again, just to make sure it’s in order.

cold_events_summary <- cold_events_summary |>
  arrange(start_date)

Find the Heating Degree Days

We’re using a balance point temperature of 60 to find our cooling degree days, which shows how far above the balance point the temperature is. To do this we’ll find all of the differences from 60 and then we’ll categorize the positive ones as cooling and the negatives as heating.

full_data <- full_data |>
  mutate(degrees_off = 60 - temp) |>
  mutate(day_type = case_when(
    degrees_off > 0 ~ "cooling",
    degrees_off < 0 ~ "heating", 
    .default = NA
))

The only default values here, meaning the ones not being set to heating or cooling, will be 0s–days where the temperature is 60.

Now we’ll visualize these days versus the electricity demand. To do this we’ll use a scatterplot which in {ggplot2} world is geom_point().

To Do

Add to this code to create the scatterplot

ggplot(data = ______, aes( x = ______, y = ______)) +
  ______

Answer

ggplot(data = full_data, aes(x = degrees_off, y = energy)) +
  geom_point()

We can add a color option to our aes() to color the days by whether they are heating or cooling.

To Do

Add a color option with color = and set it equal to day_type.

Answer

ggplot(data = full_data, aes(x = degrees_off, y = energy, color = day_type)) +
  geom_point()

Fitting a Model

We can add a trend line to our data by adding to the code for the plot. The typical trend line you may have seen is a linear model, which is just a straight line through the points. Our data wouldn’t fit a straight line as much as it would a line that starts with a small slope and then increases. We can make a line that better fits our data using a loess regression instead of linear regression. loess stands for LOcally Estimated Scatterplot Smoothing, it doesn’t have the assumption of any specific shape.

We can add a loess regression line to our graph with the function geom_smooth(). We just need to specify that the method of smoothing we want is “loess”.

ggplot(data = full_data, aes(x = degrees_off, y = energy, color = day_type)) +
  geom_point() +
  geom_smooth(method = "loess", color = "black")

To Do

Make your plot look nicer. At minimum make sure the axis labels are informative, but you can also look at picking your own colors with manual coloring or changing the background from gray.