We compare the three models (J, DD, RJD) on both in-sample and out of sample durations.

In-sample durations (1, 24, 36, 48, 72 hours)

The model performance on in-sample durations treats locally fit GEV distributions as “truth”. Comparing QDF model output to these local GEV fits allows us to compute two different scoring measures, the integrated quantile distance (IQD) and the mean absolute percentage error (MAPE).

Overall model ranking with the IQD

We use the IQD over all quantiles to get an overall model ranking.

IQD <- function(muQ,sigmaQ,xiQ,muG,sigmaG,xiG){
  integrand <- function(x){
    a = SpatialExtremes::pgev(x,loc=muQ, scale=sigmaQ, shape=xiQ)
    b = SpatialExtremes::pgev(x,loc=muG, scale=sigmaG, shape=xiG)
    return( (a-b)^2 )
    }
  iqd = integrate(integrand,-Inf,Inf)$value
  return(iqd)
}

medp[,
     iqd:=lapply(.SD,
                 function(cols)   IQD(muQ=med.mu.x,
                                      sigmaQ=med.sigma.x,
                                      xiQ=med.xi.x,
                                      muG=gev.mu,
                                      sigmaG=gev.sigma,
                                      xiG=gev.xi)),
     by = list(lnxID,d,type)]

medp[,.(iqd_score=mean(iqd)),by=list(type)]
##    type  iqd_score
## 1:   DD 0.03479781
## 2:    J 0.03664582
## 3:  RJD 0.04283840

So RJD comes in last in the overall ranking as measured by the IQD. Looking at this ranking broken up by duration we see:

medp[,.(iqd_score=mean(iqd)),by=list(type,d)]
##     type  d   iqd_score
##  1:   DD  1 0.055311409
##  2:    J  1 0.074313301
##  3:  RJD  1 0.078874040
##  4:   DD 24 0.043531423
##  5:    J 24 0.045788541
##  6:  RJD 24 0.054753174
##  7:   DD 36 0.020500921
##  8:    J 36 0.020781124
##  9:  RJD 36 0.024378903
## 10:   DD 48 0.011165912
## 11:    J 48 0.003385071
## 12:  RJD 48 0.010714084
## 13:   DD 72 0.043479406
## 14:    J 72 0.038961083
## 15:  RJD 72 0.045471795

So DD wins the 1 hour and 24 hour durations, barely edges out J on the 36 hour duration, and then J takes the 48 and 72 hour durations. RJD doesn’t win any durations.

High quantile model ranking with MAPE

We found that the IQD was not sensitive enough to differences in the tails to be able to use the weighted IQD to get a measure of fit for the upper quantiles, so we turn to the mean absolute percentage error (MAPE) at the 100 year and 1000 year return periods.

thesequant <- returnlevels[which(returnlevels$rp==100 | returnlevels$rp==1000),]

thesequant[, 
            mape_score:=1/.N * sum(abs((gev.rl-qdf.rl)/gev.rl)) * 100, 
            by = list(rp,type,d,lnxID)]

It makes less sense to average the MAPE across stations and / or durations like we did with the IQD (less robust, since we only evaluate the curves at one selected quantile and we find a huge spread among durations and stations, so taking the mean muddies the signal). If we break it up by duration we see a pattern that mimics what we saw with the IQD, where DD wins the lower durations and loses the higher ones:

hundredyr <- thesequant[rp==100]
hundredyr[,mean(mape_score),by=list(d,type)]
##      d type        V1
##  1:  1   DD  6.759489
##  2:  1    J  9.044136
##  3:  1  RJD  8.078574
##  4: 24   DD  8.286621
##  5: 24    J  6.663262
##  6: 24  RJD  7.574936
##  7: 36   DD  8.397746
##  8: 36    J  5.114164
##  9: 36  RJD  6.747481
## 10: 48   DD 10.838496
## 11: 48    J  5.140138
## 12: 48  RJD  7.151901
## 13: 72   DD 15.180731
## 14: 72    J  6.470893
## 15: 72  RJD  9.319452
thousandyr <- thesequant[rp==1000]
thousandyr[,mean(mape_score),by=list(d,type)]
##      d type        V1
##  1:  1   DD 11.500956
##  2:  1    J 13.273314
##  3:  1  RJD 12.399695
##  4: 24   DD 12.327831
##  5: 24    J 10.096227
##  6: 24  RJD 11.197660
##  7: 36   DD 11.879291
##  8: 36    J  8.710750
##  9: 36  RJD 10.869300
## 10: 48   DD 15.124248
## 11: 48    J  9.066035
## 12: 48  RJD 11.849739
## 13: 72   DD 20.135787
## 14: 72    J 10.768168
## 15: 72  RJD 14.556801

Notice that now the RJD model falls in between J and DD instead of coming in third like it did in the IQD ranking; this is because for these specific quantiles the RJD model always falls in between J and DD; this is not always the case when we look at the entirety of the curve. The return level for the RJD model often falls in between J and DD but is not guaranteed to do this because it is a sample from some non-standard density estimate (this is something I mixed up before and explains why RJD can come in last place in the IQD rankings).

We can break this up further and, taking only the DD and J models, look at MAPE by duration and station:

The stations are ordered by catchment area and the background color of each panel indicates the model with the lowest MAPE. The dots inside each panel indicate the actual MAPE value.

There’s no one model that wins at every duration for every station, but overall it seems like the smaller catchments switch over to Javelle around 24 hours, while the larger catchments (>100km^2) switch over to Javelle later, maybe after 48 hours or so. A bit hard to tell with only 12 stations.

Out-of-sample durations (predicting 1 and 12 hours from 24-72 hour data)

Evaluating 1 and 12 hour predictions with IQD

We use IQD to compare the predicted QDF output to (1) GEV distributions locally fit to the 1 & 12 hour durations and (2) the ecdf of the data points from the 1 & 12 hour durations.

Comparing QDF predictions to locally fit GEV, 1 & 12 hours

pmedp[,mean(iqd),by=list(type)]
##    type        V1
## 1:   DD 0.3342309
## 2:    J 0.3575667
## 3:  RJD 0.3718160
pmedp[,mean(iqd),by=list(type,d)]
##    type  d        V1
## 1:   DD  1 0.5416779
## 2:    J  1 0.5924942
## 3:  RJD  1 0.6030700
## 4:   DD 12 0.1267839
## 5:    J 12 0.1226392
## 6:  RJD 12 0.1405621

DD wins the one hour prediction but J wins the 12 hour prediction as measured by the IQD over the entire distribution. We repeat this analysis but using the ecdf generated from the data points (so not comparing to the local GEV fit).

Comparing QDF predictions to ecdf from data points from 1 & 12 hours

for(station in unique(medp$lnxID)){
  for(dd in unique(medp$d)){
    for(thismodel in unique(medp$type)){
      subP <- medp[lnxID==station&d==dd&type==thismodel]
      subDatpts <- datapts[lnxID==station&d==dd]
      
      integrand <- function(x){
        a = SpatialExtremes::pgev(x,loc=subP$med.mu.x, 
                                  scale=subP$med.sigma.x, 
                                  shape=subP$med.xi.x)
        fn <- ecdf(subDatpts$pt)(x)
        return( (a-fn)^2 )
      }
      iqd = integrate(integrand,-Inf,Inf,subdivisions = 5000)$value
      
      ecdfiqd <- rbind(ecdfiqd,data.table(lnxID=station,d = dd,
                                          type = thismodel,iqd = iqd))
    }
  }
}

ecdfiqd[,mean(iqd),by=list(type)]
##    type        V1
## 1:   DD 0.4729314
## 2:    J 0.4103947
## 3:  RJD 0.5004813
ecdfiqd[,mean(iqd),by=list(type,d)]
##    type  d         V1
## 1:   DD  1 0.70343519
## 2:    J  1 0.72749998
## 3:  RJD  1 0.75712105
## 4:   DD 12 0.24242770
## 5:    J 12 0.09328942
## 6:  RJD 12 0.24384156

Now Javelle wins even more on the 12 hour duration.

Evaluating 1 and 12 hour predictions with MAPE

We can also compare local GEV fits to QDF predictions at the 100 and 1000 year return periods using MAPE:

pmape[,mean(V1),by=type]
##    type       V1
## 1:   DD 11.93027
## 2:    J 13.80599
## 3:  RJD 13.31287
pmape[,mean(V1),by=list(type,d)]
##    type  d        V1
## 1:   DD  1 14.491762
## 2:    J  1 15.807899
## 3:  RJD  1 15.377455
## 4:   DD 12  9.368775
## 5:    J 12 11.804083
## 6:  RJD 12 11.248295
pmape[,mean(V1),by=list(type,d,rp)]
##     type  d   rp        V1
##  1:   DD  1  100 12.394464
##  2:    J  1  100 13.880617
##  3:  RJD  1  100 13.519472
##  4:   DD  1 1000 16.589060
##  5:    J  1 1000 17.735181
##  6:  RJD  1 1000 17.235437
##  7:   DD 12  100  7.339755
##  8:    J 12  100 10.276979
##  9:  RJD 12  100  9.556842
## 10:   DD 12 1000 11.397794
## 11:    J 12 1000 13.331187
## 12:  RJD 12 1000 12.939748

The DD model wins both the 1 and 12 hour predictions at both the 100 year and 1000 year return period.