So my last set of calibration results I looked at the difference between adding the volscl. But we noticed that there were some calibration results that were a little iffy (we suspect that optim went to crazy town or that it found a local minimum.) Previously I had used the default Hector values as my initial parameter guess this time I used what I am calling the “best guess” for the parameters. I compared the comparison ESM data we are calibrating to with the large ensemble of Hector results we generated for the PC analysis. The parameter combination that corresponds to the results that best resemble the ESM comparison data are now being used as the initial parameter guess for optim (however for incm4 I did manual set the volscl parameter value to 0 since that model does not do volcanoes). I also increased the max number of iteration that optim can do.
Then I looked at the following
How many runs converged when we increased the max number of iterations and used a better initial parameter guess (before I was using the default hector parameters as the initial parameter guess and now I use the parameter combination that corresponds to the Hector results that most closely resemble the comparison data from the large Hector ensemble we generated for the PC analysis.)
Convergence from last time
Convergence of the best-guess calibration
With the best guesses and the higher max iteration we see an increase in the number of runs that converged. Now there are only 4 runs that do not converge where as before there were 14 runs that did not converge.
Can we determine why they are now passing?
The best_guess and old_rlts columns are the convergence fail info. If the value is equal to 0 then the run converged. If the value is 1 then the maxit (the max number of iterations was too low). The best_guess_fn_count column contains the function iteration counts from best guess calibration. As part of the best guess calibration I increased the max it to 800 where as before it was set at 500.
This table only contains info for the models that had a change in the convergence code between the two different calibration experiments.
| model | best_guess | old_rslts | best_guess_fn_count |
|---|---|---|---|
| ACCESS1-3 | 0 | 1 | 615 |
| CESM1-BGC | 0 | 1 | 637 |
| CESM1-FASTCHEM | 0 | 1 | 691 |
| CESM1-WACCM | 0 | 1 | 603 |
| CMCC-CESM | 0 | 1 | 531 |
| CNRM-CM5-2 | 1 | 0 | 801 |
| EC-EARTH | 0 | 1 | 781 |
| FGOALS-g2 | 0 | 1 | 539 |
| GISS-E2-R-CC | 0 | 1 | 695 |
| IPSL-CM5B-LR | 0 | 1 | 663 |
| MPI-ESM-MR | 0 | 1 | 595 |
| MRI-ESM1 | 0 | 1 | 433 |
Take always
MRI-ESM1 was the only model that had a best_guess_fn_count less than 500, to me this means that providing different initial parameter values made a big difference. For the other runs that are now converging but have a function iteration run count greater than 500 I know that increasing the maxit made a difference but would like to assume that changing the initial parameters did something. But I think that I will have to take a look at some other results before I draw any conclusions.CNRM-CM5-2 used to converge but now does not. That is not really what I was expecting…What about the runs that still do not converge?
| model | convergence | function. | S | alpha | volscl | diff |
|---|---|---|---|---|---|---|
| CNRM-CM5-2 | 1 | 801 | 0.240 | -0.031 | 3.568 | 0.000 |
| GISS-E2-H-CC | 1 | 801 | 21.175 | -0.831 | 4.670 | 202.113 |
| MPI-ESM-P | 1 | 801 | 1446.024 | 0.206 | 4.821 | 2547.386 |
| MRI-CGCM3 | 1 | 801 | 275237.716 | 1.638 | 2.005 | 49.588 |
Hmmmm it looks like all of these calibration attempts hit the maxit on their way to crazy town….
Did using the a more informed initial parameter combination affect the results?
How did the MSR change for the calibration fits?
If the best guess method worked better then we would expect to see a decrease in MSR.
Summary of the change in MSR
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -2.784e-05 -1.145e-09 -2.000e-10 -8.592e-07 2.696e-09 1.691e-05
Green indicates that the best guess method returns a value that is better (a smaller MSR) where as red indicates that the best guess method returns a worse value.
Bar Plot by Model
For a lot of the models there is little to no change in the MSR. However for NorESM1-ME and inmcm4 we see a pretty large decrease in MSR. And for CSIRO-Mk3-6-0 the best guess method does remarkably worse :(
Scatter Plot
I’ve included a 1:1 line to highlight where there is no change.
Most of the calibration results are pretty close to the 1:1 line, so if there is a change in fit performance is pretty small. The change in MSR was pretty small it is hard to tell if that means that the parameters have changed. I can’t really get a sense if the change in method really changed anything.
Did changing the initial parameter guess impact the parameters returned by optim?
Summary info about the absolute change in the parameters
| param | min | mean | max | sd |
|---|---|---|---|---|
| alpha | 0 | 0.00 | 0.02 | 0.01 |
| diff | 0 | 0.78 | 15.62 | 3.49 |
| S | 0 | 1.18 | 23.53 | 5.26 |
| volscl | 0 | 0.00 | 0.05 | 0.01 |
It looks like for at least some of the models there was no change in the parameter values, which is not surprising since there were some MSR that had little to no change in the MSR. However for at least some runs there was a change in the diff and S values. Which fingers crossed was in the right direction and not towards crazy town. Once again the points on the 1:1 line changed a little bit between the calibrations. I’ve tried to label the more interesting points.
Change in S
Well it looks like our friend inmcm4 went further into crazy town for S :( let’s also take a look at the the plot when we exclude inmcm4
More models are clustered towards the lower end of the S range, there is only one above 7 which is not surprising and reflects what we have assumed to be true about the S prior.
Change in diff
Once again out dear buddy inmcm4 is wonky but this time the new calibration improved it slightly I guess.
What happens when we exclude incm4?
What is the min value for the diff?
It looks like we still get some high diff values, CISRO is above 20! And NorESM1-ME has a dif value that is essentially 0 which would mean that the ocean is not absorbing any heat. I think it is a yellow flag if not a red flag.
Change in aero
Well it looks like inmcm4 finally has a reasonable parameter value but now GISS-E2-H has a negative aerosol scalar which counts as a red flag!
** Change in volscl**
CMCC-CMS and CCMCC-CM have negative volscl parameter values which is unlikely. Also inmcm4 has a high volscl even though we would expect that to have a value closer to 0.
Take Aways
Here I compare the results from the new calibration where we use the best guess and the old calibration. So far I have only plotted the values for the models we highlighted as being wonky yesterday and included the quote from the slack channel about them.
Let’s look at the calibration results for the runs had the largest change in the MSR, which was NorESM
Hmmm they are not that different from one another and it looks like both calibrations have diff that are essentially 0.
Let’s look at inmcm4 because we know it is a troubled one.
There really has been no change in the MSR and the output despite having very difference parameter values, S
Models we talked about in the slack channel yesturday
CSIRO-Mk3L-1-2 : adding the volscl parameter looks like it helped things, but it still looks way off. Also the diffusivity is over 20, which seems a little questionable.
It looks like nothing really changed here.
CMCC-CM : another volscl < 0, diffusivity around 0.8, plus the fit just doesn’t look that good.
No change :(
CESM1-CAM5 : Parameters look ok, but the fit just doesn’t look that good. Also, this is an important one to get right.
No change again, this is actually the plot that made me wonder if we need to use weights for the ensemble members. I followed up and tired to calibrate again with weighting the average by the number of ensemble memebers for each experiment but it did not change the answer :|
CCSM4 : diffusivity is 0.1, and the fit doesn’t look very good.
I also think that something is wrong with the diagnostic plot because it looks like we are missing several new calibration runs.
ACCESS1-0 : Fit doesn’t look right. Surely we can get closer.
The calibration got pretty much the same answers.