Comprehensive Exam - Professor Marshall Questions

By: Cássio Rampinelli

Date: May 20, 2020

_____________________________________________________________________

QUESTION 1: Numerical Concerns in Hydrologic Model Implementation The paper ‘Numerical troubles in conceptual hydrology: Approximations, absurdities and impact on hypothesis testing’ by Kavetski and Clark outlines issues in numerically implementing hydrologic models. The paper notes that numerical artifacts may result in errors that can dwarf structural errors of the model conceptualization.

a) Briefly outline the main types of errors that are described in the paper. Give specific examples (in your own words) of modeling case studies that may encounter such errors.

SOLUTION

______________________________________________________________________________

The paper focus on numerical errors resulting from approximations of numerical solutions implemented for hydrologic models based on the first order linear store. The authors highlight the importance of being aware about such issues since they might impact model interpretations in regular modeling steps that usually don’t address numerical inconsistencies.

For instance, when calibrating or validating a model, poor performance or unexpected behaviors are usually credited to the complex behavior or nonlinearity of the objective function. Although such explanation is reasonable, non-linearities are also found in the solution of the hydrological model structure itself and might play a significant role in explaining model poor performance. Neglecting such assessment may mislead sensitivity and uncertainty analysis of model parameters, and affect the reliability of model predictions.

b) Page 663 of the paper considers a simple exponential storage and notes the model may be: “applied in two near-identical cases: both cases use the same parameter values…and receive the same precipitation input P, but case A has a higher initial storage \(S_n\) than case B. Because of the exponential dependence of the outflow on the storage, there is always a point beyond which increasing the storage \(S_n\) at the start of the step will result in decreasing the storage estimate at the end of the step. Thus, case A could end up drier than case B despite having higher initial storage (with all parameters and forcing being identical). Such behaviour makes no hydrological or mathematical sense… “ Undertake a numerical exercise showing this effect. Note you must specify some values for the model parameters, the input data and initial storage values. Explain why this error occurs. Explain what the authors mean by “Such behavior makes no hydrological or mathematical sense”.

SOLUTION

______________________________________________________________________________

We first recover the single bucket hydrologic model presented in Kavetski and Clark (2011).

\[ \frac{dS(t)}{dt}=P(t)-Q(S(t)) \] \[ Q=b \ exp(kS) \] where \(S(t)\), \(P(t)\), and \(Q(S(t))\) are, respectively, the storage, inflow and outflow at time \(t\), \(k\), and \(b\) are positive model parameters, and \(S(t_0)=S_0\) is the initial condition.

The numerical implementation of the model is presented below:

#Defining the model
model<-function(P,para,S0) 
  
{
  
  ##################################
  #INPUTs:
  # Precipitation
  # Model parameters vector (b,k)
  # Initial storage
  ###################################
  
  
  #Forcing Data 
  ####################################
  Prec<-P#Precipitation
  
  ###################################
  #Model Parameters 
  ####################################
  b=para[1] 
  k=para[2] 
  
  ###################################
  #State Variables
  ####################################
  S<-S0       #Soil storage
  outflow<-0  #Simulated outflow
  
  
  #Compute water balance
  for(i in 1:length(Prec)) {
    
    outflow[i]=b*exp(k*S[i])
    
    S[i+1]=S[i]+Prec[i]-outflow[i]
    
    
  }
  
  #Grouping outputs
  output=list(outflow,S)
  
  return(output) 
  
  ##################################
  #OUTPUTs:
  # Outflow volume
  # Storage
  ###################################
  
}

Now, we consider three scenarios with three distinct precipitation input vectors for time length of 20 units. Precipitation is constant along all time, for the sake simplicity.

Scenario A: Precipitation 1.5 volume units .
Scenario B: Precipitation 2 volume units.
Scenario C: Precipitation 2.5 volume units.

Model parameters are kept constant, and the initial storage is set to 0:

\(b=1\),\(k=1.42\), \(S_0=0\)

The code presented below runs the three scenarios, and as a result, a plot comparing the storage behavior across time is shown. The plot is similar to the one presented in Kavetski and Clark (2011). The only difference is that we included a new scenario with larger precipitation volume.

#Precipitation vectors
PA<-rep(1.5,20)
PB<-rep(2,20)
PC<-rep(2.5,20)

#Model parameters and initial storage
para<-c(1,1.42)
S0=0

#Running the three scenarios
scenarioA<-model(PA,para,S0)
scenarioB<-model(PB,para,S0)
scenarioC<-model(PC,para,S0)

#Accessing storage along time for each scenarios
SA<-scenarioA[[2]]
SB<-scenarioB[[2]]
SC<-scenarioC[[2]]

########################
#Plot- Outputs
########################
plot(SA,ylim=c(-5,2),main="",type="p",pch=19,col="blue",xlab="Time Step (time unit)",ylab="Volume (volume unit)",axes=F)
axis(1, 1:length(SA))
axis(2,-5:2)
abline(v=1:length(SA),col="gray", lty=3)
abline(h=seq(from=-5,to=2,by=1),col="gray", lty=3)

lines(SA,col="blue",lwd=2)

points(SB,pch=1,col="red",lwd=2)
lines(SB,col="red",lwd=2,lty=2)

points(SC,pch=8,col="black",lwd=2)
lines(SC,col="black",lwd=2,lty=4)

#Legend
legend("bottomright", c("A (P=1.5)", "B (P=2)", "C(P=2.5)"), col = c("blue","red","black"),
       lty = c(1,2,4), pch = c(19,1,8))

The three scenarios start at time step 1, at the same reservoir water level (\(S_0=0\)), however, scenario A receives 1.5 units of precipitation (or inflow), scenario B, 2 units and scenario C, 2.5 units. As expected, from the first to the second time step, the reservoir which receives more water has larger increase in water level (black line), followed by scenario B, that receives the second largest inflow (red line), and scenario A. So far, the process behavior is as expected.

From time step 2 to 3, since reservoirs for the three scenarios are the same (model parameters \(b\), and \(k\) are the same), one should expect that the reservoir with more storage would remain with more amount of water for the next step, when comparing to the other ones. However, as can be seen in the plot this surprisingly doesn’t happen. Which explains the argument of the authors that such a behavior has no hydrological or mathematical sense.

Actually, what happens is exactly the opposite. Reservoir from scenario C, that has about 1.5 units of volume in time step 2 (the highest storage) goes to time step 3, reaching about -4.5 units, followed by reservoir from scenario B that passes from 1 to about -1 units, and finally the reservoir from scenario A, that had less volume in the previous time step, ends up having ”higher” (near zero) volume in time step 3.

This error occurs because of numerical approximations when implementing the finite difference solution for the continuous domain. When implementing the numerical solution in time domain, linear approximations are established for a solution that is well-known as non-linear. Thus, while jumping across time steps such errors might also accumulate along the simulation.

Note: here we don’t consider the sign of the number, since negative values for storage wouldn’t make much sense in practice. Since the idea was only to reproduce the numerical exercise proposed by the authors, we stick with the same values they adopted. However, such behavior is also expected in case of positive values for the initial storage

Additional comments for easy numerical approximation improvement

Reminding about the episode the authors mentioned related to updating the fluxes at the beginning or at the end of the time step, one might think an easy numerical alternative to attenuate this issue. Based on the results reproduced above, it could be hypothesized that the correct solution should be in between the initial and final computed values. Therefore, one easy-applicable numerical alternative would be adding a correction factor when updating the storage level at the end of the time step. This factor, for instance, might be the average between the initial and final volumes computed along the time step. Then, the volume at the end of the time step would be the initial storage added by this corrector factor.

In the code lines below we try to demonstrate this proposal. We repeat the same process, now including a correction factor to update the storage. We name the function model B, and we suppress scenario C, for the sake of simplicity. Code and Plot are presented next.

#Defining the model
modelB<-function(P,para,S0) 
  
{
  
  ##################################
  #INPUTs:
  # Precipitation
  # Model parameters vector (b,k)
  # Initial storage
  ###################################
  
  
  #Forcing Data 
  ####################################
  Prec<-P#Precipitation
  
  ###################################
  #Model Parameters 
  ####################################
  b=para[1] 
  k=para[2] 
  
  ###################################
  #State Variables
  ####################################
  S<-S0       #Soil storage
  outflow<-0  #Simulated outflow
  
  
  #Compute water balance
  for(i in 1:length(Prec)) {
    
    outflow[i]=b*exp(k*S[i])
    
    S[i+1]=S[i]+Prec[i]-outflow[i]
    
    #Computing the correction factor
    correctionF<-(S[i+1]+S[i])/2
    
    #Updating the storage with the correction factor
    S[i+1]<-S[i]+correctionF
    
  }
  
  #Grouping outputs
  output=list(outflow,S)
  
  return(output) 
  
  ##################################
  #OUTPUTs:
  # Outflow volume
  # Storage
  ###################################
  
}

#Precipitation vectors
PA<-rep(1.5,20)
PB<-rep(2,20)


#Model parameters and initial storage
para<-c(1,1.42)
S0=0

#Running the three scenarios
scenarioA<-modelB(PA,para,S0)
scenarioB<-modelB(PB,para,S0)


#Accessing storage along time for each scenarios
SA<-scenarioA[[2]]
SB<-scenarioB[[2]]


########################
#Plot- Outputs
########################
plot(SA,ylim=c(-0,1.5),main="",type="p",pch=19,col="blue",xlab="Time Step (time unit)",ylab="Volume (volume unit)",axes=F)
axis(1,1:length(SA))
axis(2,0:2)
abline(v=1:length(SA),col="gray", lty=3)
abline(h=seq(from=-5,to=2,by=1),col="gray", lty=3)

lines(SA,col="blue",lwd=2)

points(SB,pch=1,col="red",lwd=2)
lines(SB,col="red",lwd=2,lty=2)


#Legend
legend("bottomright", c("A (P=1.5)", "B (P=2)"), col = c("blue","red"),
       lty = c(1,2), pch = c(19,1))

As can be seen in the plot, now both scenarios behave accordingly. This simple initiative of implementing a correction factor would be a numerical strategy to attenuate approximation errors in this case.

c) In your own work, you have implemented multiple models describing catchment behavior that are solved on some temporal discretization. Describe potential examples in your work of how numerical troubles may affect your model results. How you might guard against these or test for their effect?

SOLUTION

______________________________________________________________________________

As described by the authors, interpretations related to sensitivity analysis, parameter inference and uncertainty analysis might be affected by numerical troubles. Addressing results by applying a Bayesian approach would allow not only considering hydrologic and model parameters uncertainty, but also the ones related to the model structure itself (including approximation, or numerical errors). However, discrimination between the sources of uncertainties related to the model would be a difficult issue. The distinction between the ones related to conceptual limitations, and the ones that account for numerical approximations, would require specific frameworks.

Simulation tests as the ones proposed by the authors to evaluate the storage behavior as a function of time for finer time resolution would be an alternative to screen and identify if such effects would be of great concern when compared to the other types of errors. Simple numerical adjustments as the one presented in the previous assessment would also be an alternative to mitigate errors associated to numerical approximation for models based on linear reservoirs.

d) Oreskes, 1994 notes “Verification and validation of numerical models of natural systems is impossible. This is because natural systems are never closed and because model results are always non- unique.” In the model classification study, explain what you see as the biggest sources of error or uncertainty in your analysis. How can you assess your model performance and validity given these types of uncertainties?

SOLUTION

______________________________________________________________________________

Based on some simulations I have been performing I observed that the likelihood assumptions might significantly affect the parameters behavior. Therefore, uncertainty arising from the error model should be more relevant than uncertainty related to model parameters. Therefore, exploring the impact of the likelihood function on the parameter patterns, or verify compatible performance between likelihood and residuals behavior for the study sites might be an alternative to assess model validity.

In addition, another issue that really drawn my attention during several simulations was how the sensitivity, and the uncertainty analysis might be dramatically affected by the searching parameters range. Some simulations have indicated a completely different result when expanding or restricting the parameter space, even by adopting reasonable acceptable interval ranges. For instance, if we consider the parameter SAT that represents maximum soil storage capacity for SMAP model, for a specific catchment, if we use [1500; 5000], or [1000; 8000], as lower and upper searching limits the results for the posteriors distribution would converge to completely different posteriors, when using DREAM algorithm, depending on the interaction among other parameters. Testing frameworks with different MCMC algorithms would be a strategy to investigate if this is a converging issue related to the algorithm.

QUESTION 2: Benchmark Studies List two papers that have most inspired you during your PhD studies. How have the methods or opinions expressed in these studies helped shape your own point of view regarding your research or discipline?

Several papers have been inspiring me along my research journey. As my background evolves in the way to my PhD degree, the potential to explore and to understand different topics in hydrologic systems improves, opening new branches to explore and to acquire knowledge in areas I have never imagined I would be capable of understanding.

However, If I had to highlight two inspiring papers, I certainly would list the ones that brought me here and push me to the challenge of applying Bayesian analysis in hydrologic modeling.

Marshall, L., Nott, D., & Sharma, A. (2004). A comparative study of Markov chain Monte Carlo methods for conceptual rainfall-runoff modeling. Water Resources Research, 40(2), 1–11. https://doi.org/0043-1397/04/2003WR002378.
Smith, T., Marshall, L., & Sharma, A. (2015). Modeling residual hydrologic errors with Bayesian inference. Journal of Hydrology, 528, 29–37. https://doi.org/10.1016/j.jhydrol.2015.05.051

The first paper was significantly helpful in clarifying the mistery behind the MCMC algorithms functioning. Actually, I found this paper after accessing the PhD thesis related to it: Bayesian analysis of rainfall-runoff models: Insights to parameter estimation, model comparison and hierarchical model development. Marshall (2005). At the time I was at the beginning of everything. My interest in this field of research started by influence of my former Masters’ advisor, professor Dirceu Reis, who earned his PhD at Cornell University, under the supervision of Professor Stendinger.

He motivated me a lot to study this area, and provided me with a lot of support to get started. Since I was struggling to understand MCMC using the Gelman’s Book at first, and my chains were not converging, he shared this PhD Thesis with me. After some sleepless weeks I could finally implement my first AM algorithm, what brought me a lot of enthusiasm.

The next stage was how to use/select suitable likelihoods. Then, I found the second paper (this time searching for the first author of the first paper). The framework presented in the Likelihood paper provided helpful guidance in relating likelihoods functions to residuals behavior, and consequently, selecting the appropriate ones.

REFERENCES

Kavetski, D., & Clark, M. P. (2011). Numerical troubles in conceptual hydrology: Approximations, absurdities and impact on hypothesis testing. Hydrological Processes, 25(4), 661–670. https://doi.org/10.1002/hyp.7899.

Marshall, L., Nott, D., & Sharma, A. (2004). A comparative study of Markov chain Monte Carlo methods for conceptual rainfall-runoff modeling. Water Resources Research, 40(2), 1–11. https://doi.org/0043-1397/04/2003WR002378.

Smith, T., Marshall, L., & Sharma, A. (2015). Modeling residual hydrologic errors with Bayesian inference. Journal of Hydrology, 528, 29–37. https://doi.org/10.1016/j.jhydrol.2015.05.051.