Welcome to the first script of Module 3!
At the example of weather stations, you will here learn how to:

  • identify extreme weather events
  • summarize the occurrence of extreme weather events in time
  • define own functions in R

Please read the following code instructions carefully, try to understand the code that follows each instruction, execute it and see what happens. Do not hesitate to change the code or write your own code, and execute it as well!


1. Preparations

We start off again with defining our working directory and loading all the packages that we will need:

dir <- "C:/Users/Max/Desktop/IAMO_neu/eLearning/Module_3_Extreme_weather/M03_Part01_Weather-stations/"

library(reshape2)
library(dplyr)
library(ggplot2)
library(Kendall)

In this script, we will work again with the measurements from the weather station “Sisian”. In module 2, we already learned how to edit and transform the original data. Here, we import a different version than in module 2 which is already pre-processed and contains only five parameters:

sisian_raw <- read.csv(paste0(dir, "Weather_by_station/Sisian_edit.csv"), sep=",")
head(sisian_raw)
STATION PARA_SHORT PARA_LONG YEAR MONTH DAY VALUE DATE DOY
Sisian PRCP Precipitation 1995 1 1 0 1995-01-01 1
Sisian PRCP Precipitation 1995 1 2 0 1995-01-02 2
Sisian PRCP Precipitation 1995 1 3 0 1995-01-03 3
Sisian PRCP Precipitation 1995 1 4 0 1995-01-04 4
Sisian PRCP Precipitation 1995 1 5 0 1995-01-05 5
Sisian PRCP Precipitation 1995 1 6 0 1995-01-06 6
unique(sisian_raw$PARA_SHORT)
## [1] "PRCP"     "TAVG"     "TMAX"     "TMIN"     "WIND_MAX"

We will only work with the TMAX data in this script, so we remove everything else, and also take out the columns “STATION”, “PARA_SHORT” and “PARA_LONG”. We also rename the column “VALUE” to “TMAX”:

sisian <- filter(sisian_raw, PARA_SHORT == "TMAX")
sisian <- select(sisian, c("YEAR", "MONTH", "DAY", "DATE", "DOY", "VALUE"))
names(sisian)[6] <- "TMAX"
head(sisian)
YEAR MONTH DAY DATE DOY TMAX
1995 1 1 1995-01-01 1 2.1
1995 1 2 1995-01-02 2 6.5
1995 1 3 1995-01-03 3 9.6
1995 1 4 1995-01-04 4 9.3
1995 1 5 1995-01-05 5 10.3
1995 1 6 1995-01-06 6 4.7

Let us create a copy of “sisian”, we will need it later:

sisian_copy <- sisian

2. Identify heat days by fix thresholds

The easiest way of investigating extreme weather is to define a certain threshold that needs to be exceeded by a certain variable. For example, we could define heat days as days with maximum temperatures above 25 degrees Celsius. To identify the days on which TMAX exceeds this threshold, we can use the which() function. This function identifies the positions of the “TRUE” values in a boolean vector. Accordingly, the following command returns the IDs of those rows where the value in the column “TMAX” is above 25:

heatdays <- which(sisian$TMAX > 25)
head(heatdays)
## [1] 142 170 171 172 179 180

To label the respective days in our table, we can create a new column called “above25”. We first fill this column with zeros, and then replace “0” by “1” at all positions in the column where the maximum temperature exceeds 25 degrees:

sisian$above25 <- 0
sisian$above25[heatdays] <- 1

Have a look at the table and scroll down until you find a “1” in the column “above25” (in row 142!):

# View(sisian)

For now, we have only identified the days on which the temperature threshold of 25 degrees is exceeded. This does not tell us much about the actual heat intensity. To do so, let us calculate the difference between the measured maximum temperature and the threshold (25 °C) for all days during which the threshold was exceeded. We write the result to a new column called “above25_diff”:

sisian$above25_diff <- 0
sisian$above25_diff[heatdays] <- sisian$TMAX[heatdays] - 25

Again, have a look at the table and scroll down until you find a value larger than zero in the column “above25_diff”:

# View(sisian)

Above, we have arbitrarily defined a threshold of 25 degrees Celsius. Imagine you would like to change this threshold, but not run the entire code again and again. In such instances, it is very convenient to define own functions in R, because it makes the code more efficient. We have already learned how to define functions in module 0. Let’s apply this here to the weather station data and define a function that takes two arguments: (i) the name of a data frame object that is to be processed (“table”), and (ii) the temperature threshold that should be applied (“threshold”). Let’s name this function “HeatEvents”. Note that in order to assign dynamic column names inside the function definition, we have to change the syntax and index colums by double square brackets [[ ]] instead of dollar signs $:

HeatEvents <- function(table, threshold) {
  
  heatdays2 <- which(table$TMAX > threshold)
  
  newcolumnA  <- paste0("above", threshold)
  table[[newcolumnA]] <- 0
  table[[newcolumnA]][heatdays2] <- 1

  newcolumnB  <- paste0("above", threshold, "_diff")
  table[[newcolumnB]] <- 0
  table[[newcolumnB]][heatdays2] <- table$TMAX[heatdays2] - threshold
  
  return(table)
}

Let us now apply this function to the copy of the Sisian data that we created before (because this copy only contains the original six columns), with a threshold of 18 degrees Celsius:

head(HeatEvents(table = sisian_copy, threshold = 18))
YEAR MONTH DAY DATE DOY TMAX above18 above18_diff
1995 1 1 1995-01-01 1 2.1 0 0
1995 1 2 1995-01-02 2 6.5 0 0
1995 1 3 1995-01-03 3 9.6 0 0
1995 1 4 1995-01-04 4 9.3 0 0
1995 1 5 1995-01-05 5 10.3 0 0
1995 1 6 1995-01-06 6 4.7 0 0

You can save the result of the function “HeatEvents” in a new object. Let’s execute the function with different thresholds:

sisian_20 <- HeatEvents(table = sisian_copy, threshold = 20)
sisian_21 <- HeatEvents(table = sisian_copy, threshold = 21)
sisian_22 <- HeatEvents(table = sisian_copy, threshold = 22)

# head(sisian_20)
# head(sisian_21)
# head(sisian_22)

If we want to include the results of several thresholds in the same table, we can achieve this by slightly expanding the previous function. Let’s create a new function called “HeatEventsMulti” that can handle cases when “threshold” is a vector and not a single value:

HeatEventsMulti <- function(table, threshold) {
  
 for (i in 1:length(threshold)) {
  heatdays2 <- which(table$TMAX > threshold[i])
  
  newcolumnA  <- paste0("above", threshold[i])
  table[[newcolumnA]] <- 0
  table[[newcolumnA]][heatdays2] <- 1

  newcolumnB  <- paste0("above", threshold[i], "_diff")
  table[[newcolumnB]] <- 0
  table[[newcolumnB]][heatdays2] <- table$TMAX[heatdays2] - threshold[i]
 }
  return(table)
}

Let us now apply the new function to the copy of the Sisian data again, with a threshold ranging from 21 to 25 degrees Celsius. You can see that the object “sisian_21_25” contains ten new columns:

sisian_21_25 <- HeatEventsMulti(table = sisian_copy, threshold = c(21:25))
head(sisian_21_25)
YEAR MONTH DAY DATE DOY TMAX above21 above21_diff above22 above22_diff above23 above23_diff above24 above24_diff above25 above25_diff
1995 1 1 1995-01-01 1 2.1 0 0 0 0 0 0 0 0 0 0
1995 1 2 1995-01-02 2 6.5 0 0 0 0 0 0 0 0 0 0
1995 1 3 1995-01-03 3 9.6 0 0 0 0 0 0 0 0 0 0
1995 1 4 1995-01-04 4 9.3 0 0 0 0 0 0 0 0 0 0
1995 1 5 1995-01-05 5 10.3 0 0 0 0 0 0 0 0 0 0
1995 1 6 1995-01-06 6 4.7 0 0 0 0 0 0 0 0 0 0

By re-arranging our data, let us count for each year and each temperature threshold from 21 to 25 degrees Celsius (i) the number of heat days, and (ii) the sum of heat differences as a measure of heat intensity, and visualize that. You see that, obviously, the amount of heat days and the sum of heat differences decreases when we increase the threshold:

sisian_melted  <- melt(sisian_21_25, id.vars = c("YEAR", "MONTH", "DAY", "DATE", "DOY", "TMAX")) 
sisian_summary <- sisian_melted %>% group_by(YEAR, variable) %>% summarize (heatsum = sum(value)) %>% as.data.frame()

sisian_summary$threshold <- as.factor(paste0(substring(sisian_summary$variable, 6,7), " °C"))
sisian_summary$YEAR <- as.factor(sisian_summary$YEAR)
sisian_summary$indicator <- NA
sisian_summary$indicator[sisian_summary$variable %in% c("above21", "above22", 
                                                        "above23", "above24", "above25")] <- "heat days"

sisian_summary$indicator[sisian_summary$variable %in% c("above21_diff", "above22_diff", 
                                                        "above23_diff", "above24_diff", "above25_diff")] <- "heat differences"

ggplot(data=sisian_summary, aes(x=threshold, y=heatsum, group=YEAR, color=YEAR)) +
    geom_line(size=1) + 
    geom_point(size=2) +
    labs(x="", y="") +
    ggtitle("Heat days and differences for different TMAX thresholds, by year") +
    facet_wrap(~indicator, scales="free")

3. Identify heat days by percentile thresholds

Extreme weather means different things in different contexts, or in different regions of the world. For example, a daily temperature of 30 degrees Celsius is very hot for Scandinavia, but not for the Arabian peninsula. To account for this, we can define a threshold based on the distribution of our historical data at a given location. This is done by quantile distributions. For example, by calculating the 95th percentile of a set of temperature measurements, we can define the highest 5% of our data as extreme events.

To illustrate this, we first define a vector that contains all January temperature measurements from Sisian station, from all years 1995 to 2020:

sisian_january <- sisian$TMAX[which(sisian$MONTH==1)]

The average January temperature from 1995 to 2020 was about 3 degrees Celsius, with minima up to -9.6 and maxima up to 14.6 degrees Celsius. This is also illustrated in the histogram:

mean(sisian_january)
## [1] 3.047146
min(sisian_january)
## [1] -9.6
max(sisian_january)
## [1] 14.6
hist(sisian_january)

In the histogram, we see that there were a few extremely cold and extremely cold days. We can use the function quantile() to determine the “limit” for a measurement to be, for example, among the 5 % lowest measurements (= below the probability of 0.05), or 5 % highest measurements (= above the probability of 0.95):

quantile(sisian_january, probs = c(0.05, 0.95))
##     5%    95% 
## -3.875  8.675

The result of the chunk above states that temperatures below -3.875 °C were measured on 5 % of all days in January from 1995 to 2020. Temperatures below 8.675 °C were measured on 95 % of all days, and consequently, days with temperatures above 8.675 °C were measured on 5 % of all days. Let us calculate the 95 % threshold for all 12 months separately. The resulting vector “monthly_thresholds” tells us that a day in February counts as heat day when it is warmer than 10.7 °C, but for a March day to count as heat day, it has to be warmer than 16.875 °C, etc.:

monthly_thresholds <- NA

for (i in 1:12)
{ sisian_month <- sisian$TMAX[which(sisian$MONTH == i)]
  monthly_thresholds[i] <- quantile(sisian_month, probs = 0.95) }

monthly_thresholds
##  [1]  8.675 10.700 16.875 21.500 24.500 28.000 30.100 31.075 28.500 23.730
## [11] 17.455 12.530

Let us now integrate these 12 monthly thresholds into the table “sisian_copy”:

sisian_copy$threshold <- monthly_thresholds[sisian_copy$MONTH]
head(sisian_copy)
YEAR MONTH DAY DATE DOY TMAX threshold
1995 1 1 1995-01-01 1 2.1 8.675
1995 1 2 1995-01-02 2 6.5 8.675
1995 1 3 1995-01-03 3 9.6 8.675
1995 1 4 1995-01-04 4 9.3 8.675
1995 1 5 1995-01-05 5 10.3 8.675
1995 1 6 1995-01-06 6 4.7 8.675

Analogously to the procedure above, we now calculate the number of heat days and heat differences, but this time based on the thresholds defined in the column “threshold”:

sisian_copy$heatday  <- 0
sisian_copy$heatdiff <- 0

heatdays2 <- which(sisian_copy$TMAX > sisian_copy$threshold)

sisian_copy$heatday[heatdays2] <- 1
sisian_copy$heatdiff[heatdays2] <- sisian_copy$TMAX[heatdays2] - sisian_copy$threshold[heatdays2]

Finally, we visualize the results of this percentile threshold approach to investigate whether there is an increase in heat days in recent years. It seems that there is no overall pattern of increasing numbers of monthly heat days over time. However, for example, the figure suggests there is a trend for July, and the Mann-Kendall test confirms this!

sisian_copy_summary <- sisian_copy %>% group_by(YEAR, MONTH) %>% summarize (heatdays_sum = sum(heatday)) %>% as.data.frame()

ggplot(data=sisian_copy_summary, aes(x=YEAR, y=heatdays_sum)) +
    geom_bar(stat="identity") +
    labs(x="", y="") +
    geom_smooth(method="lm", se=F) +
    theme(axis.text.x=element_text(angle=45, hjust=1)) +
    ggtitle("Heat days at Sisian station during different months, by year") +
    facet_wrap(~MONTH)

sisian_copy_summary_July <- filter(sisian_copy_summary, MONTH == 7)

teststat <- MannKendall(sisian_copy_summary_July$heatdays_sum)  
teststat[2]$sl[1] < 0.05
## [1] TRUE

Congratulations, you made it to the end of the first script! You can now go on with script 02, or solve the exercises 01 to 05.