Preparing Enviroment
library(data.table)
library(ggplot2)
colset_4 <- c("#D35C37", "#BF9A77", "#D6C6B9", "#97B8C2")
runoff_summary <- readRDS('./data/runoff_summary.rds')
runoff_stats <- readRDS('./data/runoff_stats.rds')
runoff_day <- readRDS('./data/runoff_day.rds')
runoff_month <- readRDS('./data/runoff_month.rds')
runoff_summer <- readRDS('./data/runoff_summer.rds')
runoff_winter <- readRDS('./data/runoff_winter.rds')
runoff_year <- readRDS('./data/runoff_year.rds')
runoff_summary_key <- readRDS('data/runoff_summary_key.rds')
runoff_day_key <- readRDS('data/runoff_day_key.rds')
runoff_month_key <- readRDS('data/runoff_month_key.rds')
runoff_summer_key <- readRDS('data/runoff_summer_key.rds')
runoff_winter_key <- readRDS('data/runoff_winter_key.rds')
runoff_year_key <- readRDS('data/runoff_year_key.rds')
runoff_winter[, value_norm := scale(value), sname]
runoff_summer[, value_norm := scale(value), sname]
Navigator’s Tasks
- In our boxplot comparison of DOMA, BASR and KOEL we have used summer and winter period. Can you repeat it for annual and monthly data? Is there is some useful new information presented?
runoff_year_key <- runoff_year_key[sname %in% key_stations]
runoff_year_key[year <= 2000, period := factor('pre_2000')]
runoff_year_key[year > 2000, period := factor('aft_2000')]
ggplot(runoff_year_key, aes(period, value, fill = period)) +
geom_boxplot() +
facet_wrap(~sname, scales = 'free_y') +
scale_fill_manual(values = colset_4[c(4, 1)]) +
theme_bw()
runoff_month_key <- runoff_month[sname %in% key_stations]
runoff_month_key[year <= 2000, period := factor('pre_2000')]
runoff_month_key[year > 2000, period := factor('aft_2000')]
ggplot(runoff_month_key, aes(factor(month), value, fill = period)) +
geom_boxplot() +
facet_wrap(~sname, scales = 'free_y') +
scale_fill_manual(values = colset_4[c(4, 1)]) +
theme_bw()
- In their research, Middelkoop and colleagues also mentioned changes in the high/low runoff. Do our data agree with their results? We define high runoff as the daily runoff above the 0.9 quantile and low runoff as the daily runoff below the 0.1 quantile. Then we can estimate the mean high/low runoff per station. Finally, we also compare the number of days with values above/below 0.9 and 0.1 correspondingly (hint: .N function in data.table might help).
runoff_day_key <- runoff_day[sname %in% key_stations]
runoff_day_key[year <= 1986, period := factor('pre_1986')]
runoff_day_key[year > 1986, period := factor('aft_1986')]
runoff_day_key[, q_10 := quantile(value, 0.1), by = .(sname, month)]
runoff_day_key[, q_90 := quantile(value, 0.9), by = .(sname, month)]
runoff_day_key[, runoff_type := factor('normal')]
runoff_day_key[value <= q_10, runoff_type := factor('low')]
runoff_day_key[value >= q_90, runoff_type := factor('high')]
runoff_day_key[, n_days := .N, .(sname, runoff_type, season, year)]
to_plot <- unique(runoff_day_key[, .(sname, n_days, period, runoff_type, season, year)])
ggplot(to_plot[season == 'winter' | season == 'summer'], aes(season, n_days, fill = period)) +
geom_boxplot() +
facet_wrap(runoff_type~sname, scales = 'free_y') +
scale_fill_manual(values = colset_4[c(4, 1)]) +
xlab(label = "Season") +
ylab(label = "Number of days")
- How sensitive are slopes to adding new data? Redo the 1950-today regression plots, but instead of 2016, use data till 2010. What do you observe? What if you used linear regression instead of loess?
# loess regression
ggplot(runoff_winter[year > 1950 & year < 2010], aes(x = year, y = value_norm, col = sname)) +
geom_smooth(method = 'loess', formula = y~x, se = 0) +
ggtitle('Winter runoff') +
xlab(label = "Year") +
ylab(label = "Runoff (m3/s)") +
theme_bw()
ggplot(runoff_summer[year > 1950 & year < 2010], aes(x = year, y = value_norm, col = sname)) +
geom_smooth(method = 'loess', formula = y~x, se = 0) +
ggtitle('Summer runoff') +
xlab(label = "Year") +
ylab(label = "Runoff (m3/s)") +
theme_bw()
# linear regression
ggplot(runoff_winter[year > 1950 & year < 2010], aes(x = year, y = value_norm, col = sname)) +
geom_smooth(method = 'lm', formula = y~x, se = 0) +
scale_color_manual(values = colorRampPalette(colset_4)(n_stations)) +
ggtitle('Winter runoff') +
xlab(label = "Year") +
ylab(label = "Runoff (m3/s)") +
theme_bw()
ggplot(runoff_summer[year > 1950 & year < 2010], aes(x = year, y = value_norm, col = sname)) +
geom_smooth(method = 'lm', formula = y~x, se = 0) +
scale_color_manual(values = colorRampPalette(colset_4)(n_stations)) +
ggtitle('Summer runoff') +
xlab(label = "Year") +
ylab(label = "Runoff (m3/s)") +
theme_bw()
By removing 7 years of data, the regression lines become quite different. There is no upward trend in summer runnof, while in winter it appears to be abruptly decreasing instead of being steady or increase. This example highlights how sensitive is loess regression in data availability. Linear regression is less sensitive to adding or removing data.
Explorer’s questions
- In retrospect, is DOMA a representative station? Why do you think its behaviour is so different than the other stations?
There are many reasons that suggest that it is not. It is at mountainous region with small catchment area. In addition, there is abrupt change in runoff in 1960s. A reasonable hypothesis is that there could be anthropogenic interference, such as a dam construction. Indeed, in one of the studies we have found during our preparation (Pfeiffer and Ionita, 2017)), we can find the following paragraph:
“We have shown that Rhine River is only moderately altered since the middle of the 20th century, except for Ems hydrological station, which is significantly affected. The Ems station is located in Domat/Ems, Switzerland, where a run-of-the-river power station is operated. The Reichenau power plant was built between 1959 and 1962 in the Alpenrhein. It is located at Domat/Ems only a few kilometers below the confluence of Vorderrhein and Hinterrhein near Reichenau. The Rhine River is dammed up above Domat/Ems and the water is led into a channel about 1 km long, which partly runs underground. At the end of this canal lies the power plant, after which the water flows back into the natural riverbed. As shown above, drastic changes in the hydrologic regime at Ems are observed after about 1960, coinciding with the time when the Reichenau power plant was built.”
- In our analysis, we have used only river runoff. Precipitation is a factor strongly linked with runoff. Can you perform a similar analysis (boxplots and regression) for precipitation? Precipitation data averaged over the whole Rhine region can be found in the file
precip_day.rds in folder data. What do you observe?
precip_day <- readRDS('./data/raw/precip_day.rds')
precip_day[, year := year(date)]
precip_day[, month := month(date)]
precip_day <- precip_day[year < 2019]
precip_day[month == 12 | month == 1 | month == 2, season := 'winter']
precip_day[month == 3 | month == 4 | month == 5, season := 'spring']
precip_day[month == 6 | month == 7 | month == 8, season := 'summer']
precip_day[month == 9 | month == 10 | month == 11, season := 'autumn']
precip_day[, season := factor(season, levels = c('winter', 'spring', 'summer', 'autumn'))]
precip_winter <- precip_day[season == 'winter', .(value = sum(value)), by = year]
precip_summer <- precip_day[season == 'summer', .(value = sum(value)), by = year]
year_thres <- 1980
to_plot <- rbind(cbind(precip_winter, season = factor('winter')),
cbind(precip_summer, season = factor('summer')))
to_plot[year < year_thres, period := factor('1950-1980')]
to_plot[year >= year_thres, period := factor('1981-2016')]
to_plot[year < year_thres, period := factor('1950-1980')]
to_plot[year >= year_thres, period := factor('1981-2016')]
to_plot <- to_plot[year >= 1950]
ggplot(to_plot, aes(season, value, fill = period)) +
geom_boxplot() +
scale_fill_manual(values = colset_4[c(4, 1)]) +
xlab(label = "Season") +
ylab(label = "Precitation (mm)") +
theme_bw()
ggplot(to_plot, aes(year, value, col = season)) +
geom_point() +
geom_line() +
scale_color_manual(values = colset_4[c(4, 1)]) +
geom_smooth(se = F) +
theme_bw()
Which are your thoughts about the changes in Rhine runoff after completing EDA?
Which are some future analyses or other factors that should be examined? Present some arguments related to the findings so far.