This script illustrates a comprehensive approach to evaluating hydrological data (here, EgaEnEstellaQts, a series of streamflow measurements on the Ega River at the Estella station in Spain) through different methods. First, it applies a ROC (Receiver Operating Characteristic) analysis to test the discriminative capacity of the predictions. It then implements performance measures (goodness-of-fit) to compare observations and simulations, supported by graphical visualizations such as adjustment curves, scatter plots, and time series. Finally, the code provides descriptive statistics and an analysis of the data distribution. The purpose of this script is to combine statistical and graphical tools in order to assess the quality of the simulations, detect potential biases, and provide a comprehensive view of model performance when applied to environmental or hydrological datasets.
We begin by loading the necessary R packages for ROC analysis, visualization, and goodness-of-fit evaluation.
library(pROC)
library(ggplot2)
library(hydroGOF)
library(psych)
par(col = "#FF0000", col.axis = "#FF0000", col.lab = "#FF0000")
The EgaEnEstellaQts dataset is loaded, and we check for
missing values to ensure data integrity.
data(EgaEnEstellaQts)
obs <- EgaEnEstellaQts
# Check for missing values
any(is.na(obs))
## [1] TRUE
obs <- na.omit(obs) # Uncomment if missing values need to be removed
ROC analysis is performed to evaluate the performance of a simulated predictive model. A binary classification variable is created using the median as a threshold, and the ROC curve is plotted with the Area Under the Curve (AUC) metric.
set.seed(123)
predictions <- rnorm(length(obs), mean = mean(obs), sd = sd(obs))
# Create binary classification variable
binary_obs <- as.factor(ifelse(obs > median(obs), 1, 0))
print(table(binary_obs))
## binary_obs
## 0 1
## 1825 1825
# Vérifier si nous avons au moins deux classes
seuil <- quantile(obs, 0.4) # 40ème percentile au lieu de la médiane
binary_obs <- as.factor(ifelse(obs > seuil, 1, 0))
# Compute ROC curve
roc_obj <- roc(binary_obs, predictions)
## Setting levels: control = 0, case = 1
## Setting direction: controls < cases
auc_value <- auc(roc_obj)
# Plot ROC curve
plot(roc_obj,
main = "ROC Curve for EgaEnEstellaQts",
col = "#FF0000",
lwd = 2,
print.auc = TRUE,
auc.polygon = TRUE,
auc.polygon.col = "#FFB3B3",
max.auc.polygon = TRUE)
plot(roc_obj,
main = "ROC Curve for Extreme Events (95th percentile)",
col = "#FF0000",
lwd = 2,
print.auc = TRUE,
auc.polygon = TRUE,
auc.polygon.col = "#FFB3B3",
max.auc.polygon = TRUE)
To assess the agreement between observed and simulated values, we compute goodness-of-fit metrics. Simulated data is generated by adding random noise to a subset of the observations.
sim <- obs
n <- min(2000, length(obs))
sim[1:n] <- obs[1:n] + rnorm(n, mean = 10)
# Compute goodness-of-fit measures
gof_measures <- gof(sim = sim, obs = obs)
print(gof_measures)
## [,1]
## ME 5.47
## MAE 5.47
## MSE 55.19
## RMSE 7.43
## NRMSE % 37.10
## PBIAS % 34.60
## RSR 0.37
## rSD 1.04
## NSE 0.86
## mNSE 0.57
## rNSE -0.56
## d 0.97
## md 0.78
## rd 0.63
## cp 0.42
## r 0.97
## R2 0.94
## bR2 0.83
## KGE 0.65
## VE 0.65
# Visualize goodness-of-fit
ggof(sim = sim, obs = obs, ftype = "dm", FUN = mean) +
theme_minimal() +
scale_color_manual(values = c("#FF0000", "#CC0000")) +
scale_fill_manual(values = c("#FFB3B3", "#FFD1D1"))
## NULL
A scatter plot is created to visually compare observed and simulated values, with a 1:1 line for reference.
plot(obs, sim,
main = "Observations vs. Simulations",
xlab = "Observations",
ylab = "Simulations",
col = "#FF0000")
abline(0, 1, col = "#FF0000") # 1:1 line in red
Descriptive statistics are calculated to summarize the central tendency, dispersion, and shape of the dataset.
describe(obs)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 3650 15.81 20.02 8.7 11.55 7.78 1.1 240 238.9 3.58 19.24 0.33
Assuming the data is temporal, a time series plot is generated to explore trends and patterns.
plot(obs, type = "l", lwd = 2,
main = "Time Series of EgaEnEstellaQts",
col = "#FF0000")
A histogram is plotted to examine the distribution of the
EgaEnEstellaQts data.
hist(obs,
main = "Distribution of EgaEnEstellaQts",
xlab = "Values",
col = "#FFB3B3",
border = "#FF0000")
Abdi-Basid ADAN
abdi-basid@outlook.com