Analysis of MH CosmoSIS output

Metropolis-Hastings analysis

First we read the Metropolis-Hastings output files, and create a dataframe. The beginning of the dataframe looks like:

dhm <- read.metropolis.hastings("status_metro_wide_concen/chain_metro_*.txt")

## Loading required namespace: testthat

head(dhm) %>% knitr::kable()

omega_m	sigma8_input	concentration	chain	sample
0.4108063	0.9324733	117.2818	1	1
0.4108063	0.9324733	117.2818	1	2
0.4231948	0.9045363	121.2760	1	3
0.4231106	0.9046792	121.2714	1	4
0.4231106	0.9046792	121.2714	1	5
0.4231106	0.9046792	121.2714	1	6

We have 50 chains, each with 10300 samples, for a total of 510200 samples.

The burn-in period

To study the burn-in period, we’ll be looking only at the first nanim values in each chain.

nanim = 300

First we look at scatterplots of the first 300 samples from each chain. Each plot shows a different pair of variables. Each point corresponds to one sample. The color of the point indicated which of the 50 chains was sampled. Note that the palette of colors is limited, so the same color is used for many different chains.

p1a <-
  dhm %>%
  filter(sample < nanim) %>%
  ggplot(aes(omega_m, sigma8_input, color = factor(chain))) +
  geom_point(show.legend = FALSE)

p1a

p1b <-
  dhm %>%
  filter(sample < nanim) %>%
  ggplot(aes(omega_m, concentration, color = factor(chain))) +
  geom_point(show.legend = FALSE)

p1b

p1c <-
  dhm %>%
  filter(sample < nanim) %>%
  ggplot(aes(sigma8_input, concentration, color = factor(chain))) +
  geom_point(show.legend = FALSE)

p1c

These plots don’t provide a good idea of the path each chain takes from sample to sample. We can show this by animating the plots, with one animation frame for each sample number.

p1a +
  transition_time(sample) +
  labs(title = "Sample: {frame_time}")

p1b +
  transition_time(sample) +
  labs(title = "Sample: {frame_time}")

p1c +
  transition_time(sample) +
  labs(title = "Sample: {frame_time}")

Have the chains converged?

One way to evaluate convergence of the chains is to look at how they progress as a function of sample number; this is often called a trace for the chain. We will do this for the first 3000 values from each chain.

One thing we want is see is mixing. It is important that the different chains do not produce traces that stay separated; they should “forget” their initial values, and produce traces that eventually are well-mixed.

Since there are too many chains to visualize this, we look only at the first 4 chains:

q1 <-
  dhm %>%
  filter(sample < 3000, sample %% 50 == 0, chain %in% 1:4) %>%
  ggplot(aes(sample, omega_m, color = factor(chain))) +
  geom_point(show.legend = FALSE) +
  geom_path(show.legend = FALSE)

q2 <-
  dhm %>%
  filter(sample < 3000, sample %% 50 == 0, chain %in% 1:4) %>%
  ggplot(aes(sample, sigma8_input, color = factor(chain))) +
  geom_point(show.legend = FALSE) +
  geom_path(show.legend = FALSE)

q3 <-
  dhm %>%
  filter(sample < 3000, sample %% 50 == 0, chain %in% 1:4) %>%
  ggplot(aes(sample, concentration, color = factor(chain))) +
  geom_point(show.legend = FALSE) +
  geom_path(show.legend = FALSE)

grid.arrange(q3, q1, q2, nrow = 3)

An animation showing all the chains helps visual the mixing. To make this animation, we first must transform the data from a form that has one row per sample per chain, to one that has one row per sample per chain per physical parameter. This reshaping allows us to condition the plot on the name of the physical parameter, and thus to have a single graphic to animate.

The transformation of the data is done using tidyr::pivot_longer. We select only 1 in 50 samples, to avoid overcrowding the plot.

x <-
  dhm %>% #select(-c(prior, loglike, like)) %>%
  #filter(sample < 3000, sample %% 50 == 0) %>%
  filter(sample %% 50 == 0) %>%
  pivot_longer(-c(sample,chain), names_to = "param", values_to = "val")

Producing the plot is now simple.

p <-
  ggplot(x, aes(sample, val, color = factor(chain))) +
  geom_path(show.legend = FALSE) +
  facet_wrap(vars(param), scales = "free", ncol = 1)
p + transition_reveal(sample)

We can also look at how the marginal density for each parameter converges as we go through fixed-size batches of samples. We’ll group the data by 100-sample batches (for each chain). As above, we need to re-organize the dhm dataframe, but this time we don’t thin out the samples, and we also assign samples to batches.

b <- dhm %>%
  filter(sample < 3000) %>%
  mutate(batch = sample %/% 100) %>%
  pivot_longer(-c(sample, chain, batch), names_to = "param", values_to = "val")

We can then histogram the posterior densities, and observe how they change from batch to batch.

ggplot(b, aes(val)) +
  geom_histogram(aes(y = ..density..), bins = 50) + 
  geom_histogram(bins = 100) +
  facet_wrap(vars(param), scales = "free", ncol = 1) +
  transition_time(batch)

Posterior densities

nburn <- 3000

For all our analysis of posterior distributions, we’ll drop the first 3000 elements of each chain.

1-d marginal densities

We use a combined histogram and kernel density plot to show the 1-d marginal distributions.

x %>%
  filter(sample > nburn) %>%
  ggplot(aes(val)) +
  geom_histogram(aes(y = ..density..), bins = 50) + 
  geom_density(color = "red") +
  facet_wrap(vars(param), scales = "free", ncol = 1)

The posterior for concentration has a long high-end tail, and so a log scale in x can be useful.

x %>%
  filter(param == "concentration", sample > nburn) %>%
  ggplot(aes(val)) +
  geom_histogram(aes(y = ..density..), bins = 50) +
  geom_density(color = "red") +
  xlab("concentration") +
  scale_x_log10()

2-d marginal densities

We are using hexbin plots to show 2-d posterior densities because this produces few, and easy to understand, artifacts in the plotting. We use a log scale for the density because of wide range of density. We also plot the contours produced by a 2D kernel density estimator, for comparison with the hexbin results.

dhm %>%
  filter(sample > nburn) %>%
  plot_density_2d(omega_m, sigma8_input)

dhm %>%
  filter(sample > nburn) %>%
  plot_density_2d(omega_m, concentration, scale_y = scale_y_log10)

dhm %>%
  filter(sample > nburn) %>%
  plot_density_2d(sigma8_input, concentration, scale_y = scale_y_log10)

Statistical summaries

dhm %>%
  select(-c(chain, sample)) %>%
  pivot_longer(everything(), names_to = "param", values_to = "val") %>%
  group_by(param) %>%
  summarize(mean = mean(val),
            median = median(val),
            sd = sd(val),
            mad = mad(val),
            "5%" = quantile(val, 0.05),
            "95%" = quantile(val, 0.95)) %>%
  knitr::kable()

param	mean	median	sd	mad	5%	95%
concentration	35.9032551	11.2241586	57.9608391	9.6863447	3.3271743	175.9267765
omega_m	0.2828005	0.2811674	0.0319972	0.0287739	0.2365419	0.3349876
sigma8_input	0.8589840	0.8561743	0.0557183	0.0461338	0.7813546	0.9384601