Welcome to the first script of Module 2!
At the example of weather stations, you will here learn how to:

  • visualize time series
  • calculate temporal means of climate parameters
  • calculate time trends and assess their significance

Please read the following code instructions carefully, try to understand the code that follows each instruction, execute it and see what happens. Do not hesitate to change the code or write your own code, and execute it as well!


1. Import and edit weather station measurements

We start off again with defining our working directory and loading all the packages that we will need:

dir <- "C:/Users/Max/Desktop/IAMO_neu/eLearning/Module_2_Climate_trends/M02_Part01_Weather-stations/"

library(reshape2)
library(dplyr)
library(ggplot2)
library(viridis)
library(pals)
library(lubridate)
library(Kendall)

Let us import the data of the station “Sisian”, and the file “parameters.csv” which contains an overview about the different parameters measured at each station:

sisian <- read.csv(paste0(dir, "data/Weather_by_station/Sisian.csv"), sep=";")
paras  <- read.csv(paste0(dir, "data/parameters.csv"), sep=";")
head(sisian[,c(1:10)])
station parameter year_month day1 day2 day3 day4 day5 day6 day7
Sisian 2 1995-01 2.1 6.5 9.6 9.3 10.3 4.7 4.9
Sisian 2 1995-02 6.4 5.3 8 6.3 -1.6 2.9 6
Sisian 2 1995-03 8.8 7.6 10.3 11.5 11.7 11.7 12.9
Sisian 2 1995-04 16.3 17.5 14.8 10.6 7.5 9.5 12
Sisian 2 1995-05 19.5 13.3 17 17 17.4 18.4 15.2
Sisian 2 1995-06 21.8 22.5 18.2 22 21.8 19.3 19.8
head(paras)
parameter parameter_long parameter_short
2 Maximum_Temperature TMAX
3 Minimum_Temperature TMIN
4 Average_Temperature TAVG
5 Precipitation PRCP
17 Relative_Humidity HUMI
19 Vapour_Pressure VAPR

You can see that the data in “sisian” is not yet in a format that allows us to efficiently work with it. We need to apply some processing steps:

### substitute hyphens by NA
sisian[sisian == " -"] <- NA
sisian[sisian == "-"]  <- NA

# class(sisian[,4])  ## the data is not formatted as numeric!

### transform data entries to numbers
sisian[, 4:34] = sapply(sisian[, 4:34], as.numeric)  
### transform column names to numbers
colnames(sisian)[4:34] <- gsub("day", "", names(sisian))[4:34]
  
### melt dataframe
sisian <- melt(sisian, id.vars=c("station", "parameter", "year_month"))

### create a column for year and month
sisian$YEAR  <- as.numeric(substr(sisian$year_month, 1, 4))
sisian$MONTH <- as.numeric(substr(sisian$year_month, 6, 7))
  
### join "sisian" with "paras" 
sisian <- merge(sisian, paras, by="parameter")
  
### rename, reorder and delete some columns
names(sisian) <- c("PARA_CODE", "STATION", "YEAR_MONTH", "DAY", "VALUE", "YEAR", "MONTH", "PARA_LONG", "PARA_SHORT")
sisian     <- sisian[,c(2,9,8,6,7,4,5)]
sisian$DAY <- as.numeric(sisian$DAY)

### create a date-column and delete all entries with non-existing dates (e.g. February 30th, etc.)
sisian$DATE = as.Date(paste(sisian$YEAR, sisian$MONTH, sisian$DAY, sep = "-"))
sisian = sisian[!is.na(sisian$DATE), ]

Let’s have a look at the final data set. It contains measurements of a total of 17 parameters measured during the years 1995 to 2020:

head(sisian)
STATION PARA_SHORT PARA_LONG YEAR MONTH DAY VALUE DATE
Sisian TMAX Maximum_Temperature 1995 1 1 2.1 1995-01-01
Sisian TMAX Maximum_Temperature 1995 2 1 6.4 1995-02-01
Sisian TMAX Maximum_Temperature 1995 3 1 8.8 1995-03-01
Sisian TMAX Maximum_Temperature 1995 4 1 16.3 1995-04-01
Sisian TMAX Maximum_Temperature 1995 5 1 19.5 1995-05-01
Sisian TMAX Maximum_Temperature 1995 6 1 21.8 1995-06-01
unique(sisian$PARA_LONG)
##  [1] "Maximum_Temperature"      "Minimum_Temperature"     
##  [3] "Average_Temperature"      "Precipitation"           
##  [5] "Relative_Humidity"        "Vapour_Pressure"         
##  [7] "Snow_Cover"               "Average_Wind_Speed"      
##  [9] "Maximum_Wind_Speed"       "Soil_Temperature_05cm"   
## [11] "Soil_Temperature_10cm"    "Soil_Temperature_20cm"   
## [13] "Soil_Temperature_15cm"    "Maximum_Soil_Temperature"
## [15] "Minimum_Soil_Temperature" "Average_Soil_Temperature"
## [17] "Sunshine_Duration"
unique(sisian$PARA_SHORT)
##  [1] "TMAX"      "TMIN"      "TAVG"      "PRCP"      "HUMI"      "VAPR"     
##  [7] "SNOW"      "WIND_AVG"  "WIND_MAX"  "T_SOIL_05" "T_SOIL_10" "T_SOIL_20"
## [13] "T_SOIL_15" "TMAX_SOIL" "TMIN_SOIL" "TAVG_SOIL" "SUNSHINE"
unique(sisian$YEAR)
##  [1] 1995 1996 1997 1998 1999 2000 2001 2002 2003 2004 2005 2006 2007 2008 2009
## [16] 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020

2. Visualize weather station measurements

From the newly created column “DATE”, let us also calculate the DOY, i.e. day of the year, with the help of the function yday() from the package lubridate:

sisian$DOY <- yday(sisian$DATE)

We then sort the entire dataframe by the column “DATE”:

sisian <- sisian[order(sisian$PARA_SHORT, sisian$DATE),]

Let us now plot the average daily temperature from 1995 to 1999. The function scale_x_date() ensures that every year is written once on the x-axis:

sisian_sel1 <- filter(sisian, PARA_SHORT == "TAVG", YEAR <= 1999 )

ggplot(data=sisian_sel1, aes(x=DATE, y=VALUE)) +
    geom_line(size=1) + labs(x="", y="avg. temp. in °C") +
    ggtitle("Average daily temperature at Sisian station") +
    scale_x_date(date_breaks = "years" , date_labels = "%Y")

We can use the previously created column “DOY” to calculate the average temperature of each day of the year between 1995 and 1999. Let’s also calculate the standard deviation, and plot it as a shadow around the line of the mean using geom_ribbon():

sisian_sel1_mean <- sisian_sel1 %>% group_by(DOY) %>%  summarize(mean  = mean(na.omit(VALUE)), sd = sd(na.omit(VALUE))) %>% as.data.frame()

ggplot(data=sisian_sel1_mean, aes(x=DOY, y=mean)) +
    geom_line() + 
    geom_ribbon(aes(ymin = mean-sd, ymax = mean+sd), alpha=0.2) +
    labs(x="Day of the year", y="avg. temp. in °C") +
    ggtitle("Average daily temperature at Sisian station, mean 1995-1999")

Obviously, there is some year-to-year variation in our data. We are interested in knowing out whether this variation is random, or whether there is some tendency in it, for example a trend of warming temperatures over time. To find that out, we have to further process our data. Let us start off with calculating monthly temperatures for all years:

sisian_tavg         <- filter(sisian, PARA_SHORT == "TAVG")
sisian_tavg_monthly <- sisian_tavg %>% group_by(YEAR, MONTH) %>%  summarize(mean  = mean(na.omit(VALUE))) %>% as.data.frame()

head(sisian_tavg_monthly)

We select the June temperatures and plot them as a line:

sisian_JUN <- filter(sisian_tavg_monthly, MONTH == 6)

ggplot(data=sisian_JUN, aes(x=YEAR, y=mean)) +
    geom_line() + 
    geom_point(size=2) +
    theme(axis.text.x=element_text(angle=45, hjust=1)) +
    ggtitle("June temperatures at Sisian station") + 
    ylab("avg. temp. in °C") + xlab("")

We calculate the differences between yearly June temperatures and the average June temperature of the reference period 2000-2015 to represent anomalies:

mean_2000_2015 <- mean(filter(sisian_JUN, YEAR %in% c(2000:2015))$mean)
sisian_JUN$diff <- sisian_JUN$mean - mean_2000_2015

ggplot(data=sisian_JUN, aes(x=YEAR, y=diff, fill=diff)) +
    geom_bar(stat="identity") + 
    scale_fill_gradientn(colors=coolwarm(10), limits=c(-2.5, 2.5)) +
    theme(axis.text.x=element_text(angle=45, hjust=1)) +
    ggtitle("June temperature anomalies at Sisian station, reference: 2000-2015") + 
    ylab("avg. temp. in °C") + xlab("")