This experiment will require the results of the monte-carlo runs produced by the experiment-1*.R programs in this directory. Those programs produce the *.rds files loaded in this section.
library('hectorcal')
library('hector')
library('coda')
library('dplyr')
library('foreach')
mcrslts_mesa_full <- readRDS('mcrslts_mesa_full.rds')
mcmc_mesa_full <- metrosamp2coda(mcrslts_mesa_full)
mcrslts_hi10 <- readRDS('mcrslts_hi10.rds')
mcmc_hi10 <- metrosamp2coda(mcrslts_hi10)
mcrslts_wide_priors <- readRDS('mcrslts_wide_priors.rds')
mcmc_wide_priors <- metrosamp2coda(mcrslts_wide_priors)
ncore <- 8
doParallel::registerDoParallel(cores=ncore)
In the runs that we did for the poster back in the fall, it looked as if the runs might not be covering the full range of the CMIP results. Moreover, there seemed to be a marked falloff in the density closer to the edge of the CMIP range, despite the fact that the likelihood function was flat over much of this range. A third observation was that the hector outputs for most of the sampled parameters seemed to be approximately parallel to one another; there were very few runs, for example, that started high and finished low, or vice versa.
All of these observations were made using the spaghetti plots of hector outputs for a subsample of the Monte Carlo parameter samples, which is not a very precise way of diagnosing these effects. Therefore, the purpose of this experiment is to examine these matters more carefully to ensure that the Monte Carlo sampling is behaving as expected.
In this part we reran the full-range Monte Carlo calculations. The code for performing this run is in experiment-1A.R; the results were saved as mcrslts_mesa_full.rds.
We were having trouble getting good mixing in these calculations. Different chains were converging to different, non-overlapping regions of the parameter space. We were able to mitigate this problem by increasing the scale factor for the proposal distribution. This had the side effect of introducing a lot of autocorrelation into the Markov chains, necessitating longer runs than we had anticipated. Even with 8 chains of 10,000 samples each, our effective sample size is only a few hundred. Together with the Gelman-Rubin diagnostic, this suggests that our production runs could stand to be a bit longer; however, these results should suffice for the purposes of this experiment.
autocorr.diag(mcmc_mesa_full)
S aero kappa beta q10 pre_co2
Lag 0 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000 1.0000000
Lag 1 0.9945311 0.9770455 0.9973593 0.9740759 0.9906464 0.9816639
Lag 5 0.9729339 0.8937591 0.9868652 0.8832526 0.9545899 0.9123663
Lag 10 0.9469552 0.8033823 0.9736485 0.7904637 0.9128460 0.8335643
Lag 50 0.7742524 0.4228882 0.8773665 0.3913000 0.6407137 0.4097901
effectiveSize(mcmc_mesa_full)
S aero kappa beta q10 pre_co2
219.4074 868.8586 105.4508 865.1348 369.4425 734.8216
gelman.diag(mcmc_mesa_full)
Potential scale reduction factors:
Point est. Upper C.I.
S 1.03 1.06
aero 1.03 1.05
kappa 1.15 1.32
beta 1.01 1.02
q10 1.03 1.05
pre_co2 1.01 1.01
Multivariate psrf
1.15
The basic results of the calculation are below. The marginal densities are consistent with what we saw in the poster results.
summary(mcmc_mesa_full)
Iterations = 1:10000
Thinning interval = 1
Number of chains = 8
Sample size per chain = 10000
1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:
Mean SD Naive SE Time-series SE
S 3.1520 0.9580 0.0033869 0.068024
aero 0.8622 0.7125 0.0025192 0.024917
kappa 2.9394 1.5821 0.0055935 0.182824
beta 0.1868 0.1166 0.0004122 0.003985
q10 3.0319 1.4021 0.0049571 0.074766
pre_co2 283.1705 11.4316 0.0404168 0.420561
2. Quantiles for each variable:
2.5% 25% 50% 75% 97.5%
S 1.70606 2.42789 2.9986 3.7336 5.4052
aero -0.66319 0.41213 0.9071 1.3542 2.1168
kappa 0.26928 1.69093 2.9037 4.0637 6.2236
beta 0.01327 0.09586 0.1719 0.2578 0.4506
q10 0.70319 1.99792 2.8879 3.8998 6.1183
pre_co2 261.04290 275.26891 282.9438 290.9784 305.6866
densplot(mcmc_mesa_full)
Do these outputs in fact cover the whole range of the CMIP outputs? We can compare the ranges in the comparison data to the distribution of values that come out of the hector runs from our sampled parameters at selected times.
testtimes <- c(2006, 2050, 2100)
pnames <- c(ECS(), AERO_SCALE(), DIFFUSIVITY(), BETA(), Q10_RH(), PREINDUSTRIAL_CO2())
comp_esmrcp85 <- filter(esm_comparison, experiment=='esmrcp85')
esmrcp85_ini <- system.file("input/hector_rcp85.ini", package = "hector")
hcores <- lapply(1:ncore, function(i){newcore(esmrcp85_ini)})
samps_all <- do.call(rbind, lapply(mcrslts_mesa_full, function(x){x$samples}))
filter(comp_esmrcp85, year %in% testtimes)
hector_output_stats(samps_all, hcores, pnames, nsamp=1000,
times=testtimes, quantiles=c(0,0.1, 0.5, 0.9, 1))
Tgav.2006 Tgav.2050 Tgav.2100 Ca.2006 Ca.2050 Ca.2100
0% 0.0346022 1.684036 2.882639 375.7247 566.9888 959.5308
10% 0.6345116 2.134996 3.960110 390.3704 579.2609 1002.7980
50% 1.0437802 2.554989 4.872801 401.3136 593.5946 1048.9146
90% 1.4524874 2.957057 5.859509 412.2670 609.7593 1102.5479
100% 1.8783490 3.416993 6.743401 425.2825 637.6514 1163.1063
The mina and maxb columns give the range of the ensemble, and as can be seen in the quantile stats, our minimum and maximum values are actually a tiny bit outside the CMIP range. This is because our likelihood function is not a sharp cutoff, so the values can be a little outside the range at some times, especially if they were in range over the rest of the run. So, we do seem to be producing the full temperature and CO2 range.
On the other hand, looking at the 10-90 percentile range for temperature shows that for our MC samples it is a little narrower than the corresponding range for the CMIP models. This indicates that the probability density is falling off as you get to the upper and lower edge of the range, relative to what we are seeing in the CMIP models. We also observed this in the poster results, and it’s still not entirely clear why that should be, since the likelihood function is mostly flat within the CMIP range. We do have some hypotheses; the leading one is that the volume of the parameter space that can produce runs in these ranges is rather small, making it hard to land in that part of the output space.
Interestingly, our 10-90 percentile range for CO2 seems to match the corresponding CMIP range more closely than is the case for temperature. This observation supports the restricted volume hypothesis. CO2 outputs depend only on the parameters that affect the carbon cycle, while temperature outputs depend on all six parameters jointly. Thus, in order to get temperatures toward, say, the high end of the range, you need a confluence of high CO2 values and high temperature parameters. By contrast, to get high CO2 values, you need only have the right carbon cycle parameters; the temperature parameters can be anything. The part of the parameter space that produces high temperature values is therefore approximately a subset of the part that produces high CO2 values.
Looking at a sample of the hector output traces is also instructive.
spaghetti_plot(mcrslts_mesa_full, 512, hcores, pnames, alpha=0.1) + ggplot2::ggtitle('Full range')
First, we can see that the lack of any historical constraint is allowing some slightly absurd models into the sample. When we rerun with our new dataset in place, we can expect the parameter distributions to be more constrained (though the difference may or may not be apparent on casual inspection.) Another thing that seems apparent is that the the concern about the lack of crossing traces, at least for temperature, seems to have been unfounded. The CO2 results, on the other hand, show a lot less diversity in the outputs. Evidently, the model doesn’t have as much flexibility in producing CO2 output curves with dramatically different shapes.
To investigate the falloff toward the edge of the range, we set up a Monte Carlo run where we force the temperature to end the century in the upper decile of the range, while the final CO2 concentration is constrained to the full range. Temperature and CO2 values were also constrained to the full range in years 2006 and 2050. As might be expected, the end of century temperature constraint leave us with a changed set of viable parameters.
summary(mcmc_mesa_full) # Results from the previous run
Iterations = 1:10000
Thinning interval = 1
Number of chains = 8
Sample size per chain = 10000
1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:
Mean SD Naive SE Time-series SE
S 3.1520 0.9580 0.0033869 0.068024
aero 0.8622 0.7125 0.0025192 0.024917
kappa 2.9394 1.5821 0.0055935 0.182824
beta 0.1868 0.1166 0.0004122 0.003985
q10 3.0319 1.4021 0.0049571 0.074766
pre_co2 283.1705 11.4316 0.0404168 0.420561
2. Quantiles for each variable:
2.5% 25% 50% 75% 97.5%
S 1.70606 2.42789 2.9986 3.7336 5.4052
aero -0.66319 0.41213 0.9071 1.3542 2.1168
kappa 0.26928 1.69093 2.9037 4.0637 6.2236
beta 0.01327 0.09586 0.1719 0.2578 0.4506
q10 0.70319 1.99792 2.8879 3.8998 6.1183
pre_co2 261.04290 275.26891 282.9438 290.9784 305.6866
summary(mcmc_hi10) # Results from this run
Iterations = 1:10000
Thinning interval = 1
Number of chains = 8
Sample size per chain = 10000
1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:
Mean SD Naive SE Time-series SE
S 5.9121 2.8150 0.0099524 0.229596
aero 1.2845 0.5279 0.0018666 0.014871
kappa 3.5094 1.8435 0.0065178 0.162424
beta 0.2319 0.1374 0.0004858 0.004625
q10 3.1448 1.3716 0.0048492 0.072611
pre_co2 286.8444 11.5337 0.0407777 0.419723
2. Quantiles for each variable:
2.5% 25% 50% 75% 97.5%
S 2.90946 4.1666 5.0203 6.3994 14.3436
aero 0.22112 0.9345 1.2940 1.6355 2.3103
kappa 0.31101 2.3017 3.3689 4.6017 7.6025
beta 0.01577 0.1230 0.2186 0.3245 0.5319
q10 0.95156 2.1159 2.9989 4.0181 6.1722
pre_co2 263.41385 279.1949 286.9464 294.8994 308.3523
As might be expected, the major difference here is that the equilibrium climate sensitivity, \(S\), is substantially higher. The aerosol scaling factor is also pushed higher, and extremely low values for that parameter are eliminated entirely. However, there doesn’t seem to be any evidence that the parameters need to be fine-tuned to land in this range of outputs. In particular, the width of the distributions of the temperature parameters (whether measured by standard deviation or by interquartile range) isn’t much smaller than what we saw in Part A. Likewise, apart from a slight increase in \(\beta\), the values of the carbon cycle parameters aren’t much different in the Part B run, as compared to Part A. This translates to CO2 concentrations that are roughly consistent between the two runs.
samps_hi10 <- do.call(rbind, lapply(mcrslts_hi10, function(x){x$samples}))
hector_output_stats(samps_all, hcores, pnames, nsamp=1000, times=testtimes, quantiles=c(0,0.1, 0.5, 0.9, 1))
Tgav.2006 Tgav.2050 Tgav.2100 Ca.2006 Ca.2050 Ca.2100
0% 0.001665378 1.631145 2.992223 364.4389 558.8366 972.8673
10% 0.666938457 2.152579 3.956133 390.3788 578.5180 1002.7479
50% 1.070544658 2.586149 4.886327 401.1097 593.2610 1045.3772
90% 1.482269918 2.977646 5.911557 412.6221 610.3333 1099.8711
100% 2.031461483 3.422937 6.839982 425.2825 627.6173 1165.6513
hector_output_stats(samps_hi10, hcores, pnames, nsamp=1000, times=testtimes, quantiles=c(0,0.1, 0.5, 0.9, 1))
Tgav.2006 Tgav.2050 Tgav.2100 Ca.2006 Ca.2050 Ca.2100
0% -0.2654081 2.097500 5.092173 370.8038 545.7227 973.5479
10% 0.4901906 2.603464 5.702038 384.8988 572.5231 1013.3611
50% 0.9975547 2.973607 6.221850 399.3375 592.7377 1067.9391
90% 1.4611747 3.301513 6.709304 412.0814 611.9490 1120.1316
100% 2.0387268 3.906699 7.359927 427.1280 629.8856 1156.9528
The output traces don’t show anything particularly unexpected.
spaghetti_plot(mcrslts_hi10, 512, hcores, pnames, alpha=0.1) + ggplot2::ggtitle('Upper decile, final year only')
Based on these results, the fine-tuning hypothesis doesn’t seem viable to me anymore. Our next best hypothesis is that this is being driven by our prior. In particular, considor the prior for climate sensitivity, \(S\).
ps <- function(x) {dlnorm(x, log(3.0), log(3.0))}
curve(ps, from=0, to=10, ylab='p(S)', xlab='S')
The prior probability mass over the range 2.5–3.5 (i.e., the values favored in the full-range calculation) is 0.12, while the prior probability mass over the 5.5–6.5 range favored by the high-decile calculation is 0.05. This difference in prior probability is not enough to exclude these higher temperature runs, but it could cause the reduction in probability density we have observed in the output traces. We could test this by broadening the priors to make them less informative and rerunning the full-range calibration test.
To investigate this hypothesis we reran the calculation with part A, but with the priors replaced by very non-informative distributions. Specifically, we made them all normal with \(\sigma = 10\), except for the preindustrial CO2 prior, which had \(\sigma = 60\), owing to the larger scale of that parameter.
The first thing to observe is that the convergence for the Markov chains in this part of the experiment is marginal at best. Looking at the trace plots below, we see that several of the variables have chains with lengthy excursions into very large or very small parameter values. The chains are also highly correlated; despite running 8 chains of 10,000 samples each, the total effective sample count for \(S\) is just 90. This poor convergence is a result of the lack of stabilizing influence of weakly-informative priors. If we were using this run for production results, we would want to try to get better convergence, but for purposes of comparing the edge behavior, what we have here should suffice.
plot(mcmc_wide_priors)
The density plots side-by-side with the trace plots give some idea of the effect of the priors on the posterior distributions. Comparing these to the density plots from Part A, we can see that \(S\) extends to much larger and smaller values than were seen in part A, as do \(\alpha\) (aerosol scaling), \(\kappa\) (diffusivity), and \(Q_{10}\). Looking at the distribution statistics for part A and part C gives us a quantitative version of the same story.
summary(mcmc_mesa_full)
Iterations = 1:10000
Thinning interval = 1
Number of chains = 8
Sample size per chain = 10000
1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:
Mean SD Naive SE Time-series SE
S 3.1520 0.9580 0.0033869 0.068024
aero 0.8622 0.7125 0.0025192 0.024917
kappa 2.9394 1.5821 0.0055935 0.182824
beta 0.1868 0.1166 0.0004122 0.003985
q10 3.0319 1.4021 0.0049571 0.074766
pre_co2 283.1705 11.4316 0.0404168 0.420561
2. Quantiles for each variable:
2.5% 25% 50% 75% 97.5%
S 1.70606 2.42789 2.9986 3.7336 5.4052
aero -0.66319 0.41213 0.9071 1.3542 2.1168
kappa 0.26928 1.69093 2.9037 4.0637 6.2236
beta 0.01327 0.09586 0.1719 0.2578 0.4506
q10 0.70319 1.99792 2.8879 3.8998 6.1183
pre_co2 261.04290 275.26891 282.9438 290.9784 305.6866
summary(mcmc_wide_priors)
Iterations = 1:10000
Thinning interval = 1
Number of chains = 8
Sample size per chain = 10000
1. Empirical mean and standard deviation for each variable,
plus standard error of the mean:
Mean SD Naive SE Time-series SE
S 4.9574 2.4869 0.0087924 0.30105
aero 0.4056 0.8858 0.0031317 0.03759
kappa 8.7586 5.3000 0.0187383 1.08013
beta 0.3361 0.2121 0.0007499 0.01030
q10 6.9971 4.0875 0.0144516 0.48575
pre_co2 282.0648 21.7733 0.0769801 1.21702
2. Quantiles for each variable:
2.5% 25% 50% 75% 97.5%
S 1.7060 2.95302 4.346 6.6770 10.4281
aero -1.7241 -0.05292 0.500 1.0110 1.8440
kappa 0.5866 4.29884 8.609 12.5540 19.9255
beta 0.0227 0.16939 0.305 0.4743 0.8011
q10 1.2412 3.95600 6.191 9.3219 17.3408
pre_co2 237.5422 267.76099 282.097 297.3004 323.4053
remarks on summary stats
Based on these results, we would expect to see a less pronounced density falloff in the hector output traces.
spaghetti_plot(mcrslts_wide_priors, 512, hcores, pnames, alpha=0.1) + ggplot2::ggtitle('Full range calibration with wide priors')
It’s hard to say for sure if there is less density falloff than there was in the part A runs. Only the traces that end up between temperatures of 3.4 and 6.25 count for this purpose; outside of that range the density is falling off because the mesa function is saying that our results don’t belong out there.
samps_wide_priors <- do.call(rbind, lapply(mcrslts_wide_priors, function(x){x$samples}))
hector_output_stats(samps_all, hcores, pnames, nsamp=1000, times=testtimes, quantiles=c(0,0.1, 0.5, 0.9, 1))
Tgav.2006 Tgav.2050 Tgav.2100 Ca.2006 Ca.2050 Ca.2100
0% 0.08067625 1.631145 2.930196 372.8817 558.8366 959.5308
10% 0.63484255 2.128760 3.930256 390.1173 579.2021 1003.0830
50% 1.05632824 2.585504 4.909802 401.1243 593.3297 1050.5649
90% 1.48176851 2.981726 5.932453 411.8916 609.5447 1099.1570
100% 2.02096994 3.566074 6.839982 434.8031 628.9196 1161.2056
hector_output_stats(samps_wide_priors, hcores, pnames, nsamp=1000, times=testtimes, quantiles=c(0,0.1, 0.5, 0.9, 1))
Tgav.2006 Tgav.2050 Tgav.2100 Ca.2006 Ca.2050 Ca.2100
0% 0.09174705 1.620861 2.587488 364.1836 560.1422 956.9549
10% 0.70877789 2.141236 3.936543 384.5894 579.4271 1012.9432
50% 1.15510507 2.544393 4.860366 400.3080 592.9650 1069.4272
90% 1.61379722 2.938590 5.836865 415.3286 610.0192 1125.0223
100% 2.39935260 3.524353 6.823063 443.1595 627.8412 1175.7873
Let’s do what we probably should have done from the start, namely, to compute kernel density functions for the Hector outputs directly.
hs_all <- run_hector_samples(samps_all, 2048, hcores, c(2050, 2100), pnames, c(GLOBAL_TEMP(), ATMOSPHERIC_CO2()))
hs_all$runtype <- 'part A'
hs_wp <- run_hector_samples(samps_wide_priors, 2048, hcores, c(2050, 2100), pnames, c(GLOBAL_TEMP(), ATMOSPHERIC_CO2()))
hs_wp$runtype <- 'part C'
hs <- rbind(hs_all, hs_wp)
ggplot2::ggplot(data=hs, ggplot2::aes(x=value, fill=runtype)) + ggplot2::geom_density(alpha=0.5) +
ggplot2::facet_wrap(~year+variable, scales='free') +
ggthemes::theme_economist() + ggthemes::scale_fill_economist()
Now we arrive at the truth of the matter, which is that despite the dramatic differences in parameter distributions between the two runs, the differences in output distributions don’t look all that different. The \(C_a\) distribution extends a little further up on the high end, right up to the edge of the full range value (1142 ppm). Evidently our prior was causing us to exclude a little bit of that range.
The \(T_g\) values look remarkably similar between the two runs, and in 2100, at least, they appear reasonably flat-topped over the CMIP range of 3.36–6.25 deg C for that year. The Part C \(C_a\) distribution is also reasonably flat over the CMIP range. The remaining peakiness in the distributions I’m willing to ascribe to less than perfect mixing in the Markov chains. (One thing we have learned from this experiment is that we are likely going to have to do more samples than we expected.) At this point, however, I’m not convinced that there is any genuine edge falloff to be found. It’s just another artifact of trying to diagnose this from the spaghetti plots. (It’s worth noting that the spaghetti plots are very lightly sampled compared to these.)
Final verdict, I think this is a non-issue, except perhaps that it’s possibly indicative of poor convergence in our Monte Carlo runs; however, we have other ways of diagnosing that. Unless someone else on the team sees something that really concerns them, I’m going to consider the myth of edge density falloff to be busted.