EDA of European Rivers
Exploratory Data Analysis in Geosciences
Yannis Markonis
In the final project in Exploratory Data Analysis in Geosciences you will explore 208 time series from 153 European rivers with at least 80 years of data. The dataset were provided by Global Runoff Data Center and the stations are located over 45 – 70 deg N and -20 – 70 deg E.
Your objective is to explore how winter and summer runoff has changed since 1980.
The project should consist of three main parts. Below are some ideas of what there should be within each part. Feel free to use them or add your own.
1. Stations
Using information in the file runoff_eu_info.rds try to answer the questions:
- Where are the stations located?
- How many stations/rivers exist per country?
- How many stations exist per river?
- Which is the distribution of stations in space (latitude, longitude and altitude)?
- Which is the distribution of record length?
2. Annual and seasonal runoff
There are two datasets available. Annual runoff is available in runoff_eu_year.rds (daily average, minimum and maximum), while daily runoff can be found in runoff_eu_day.rds.
- Estimate and present some descriptive statistics including, but not limited to mean, coefficient of variance, minimum and maximum runoff with annual data.
- Estimate and present the ratios of mean/high and mean/low runoff.
- Using the information from statistics, as well as the station location (latitude, longitude and altitude), create some meaningful categories for the stations.
- Estimate and present the change in the ratios of mean/high and mean/low runoff ratios for before and after 1980 for each station separately and for the categories.
- Aggregate daily to winter/summer runoff.
- Estimate and present the percentage of change in winter/summer runoff for before and after 1980. Then create a map showing positive and negative change in winter/summer runoff after 1980.
3. Five rivers
Focus to 5 stations in different rivers. Justify why you have chosen these stations, by explaining their different properties. Then perform the following analyses.
- Find information from the internet about any human interference (dams, reservoirs etc.) to the station rivers for the whole record length.
- Present the runoff seasonality at monthly scale.
- Estimate and present the correlation matrix at annual scale.
- Using regression, present the slopes in winter/summer runoff per station.
- Assess whether the stations you have chosen are representatives for the rest.
Deliverables
After performing EDA, you should prepare a report with R Markdown about your findings and publish it rpubs. Your assesment on how winter and summer runoff has changed since 1980 should be there. In addition, all the code should be uploaded in your github repository. The deadline for submission is 1st of June.

This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.