Voronoi PIT histogram

Objective:

Generate a PIT histogram for the ETAS model applied to the Hector Mine sequence using the Voronoi partition.

1. Load in components

The Hector Mine sequence has been loaded in as catalog. Next we load the constants associated with the ETAS model.

mu <- 0.0392
b <- 2.5
K  <- 0.456
c0 <- 0.0593
p  <- 1.37
a  <- 1.14
d  <- 0.000103
q0 <- 1.67
M0 <- 3.0
endT <- 434
theta <- c(mu, b, K, c0, p, a, d, q0)
const1  <- K * (p - 1) * c0^(p - 1) * (q0 - 1) * d ^ (q0 - 1) / pi
w <- deldir(catalog[, 1], catalog[, 2], rw = c(0, 1, 0, 1))

## 
##      PLEASE NOTE:  The components "delsgs" and "summary" of the 
##      object returned by deldir() are now DATA FRAMES rather than 
##      matrices (as they were prior to release 0.0-18). 
##      See help("deldir").
##  
##      PLEASE NOTE: The process that deldir() uses for determining
##      duplicated points has changed from that used in version
##      0.0-9 of this package (and previously). See help("deldir").

polygon_list <- tile.list(w)

2. Tesselate and compute \(\bar{\lambda}\)

The observed tesselation is subsetted to exclude cells on the boundary and MC integration is used to calculate \(\bar{\lambda}\) for each.

ind_boundary <- which(w$summary[, 7] > 0)
polygon_list_inner <- polygon_list[-ind_boundary]
npts <- 1e6
sourceCpp("../etasCPP.cpp")
meanlambdas <- unlist(lapply(polygon_list_inner, FUN = integrateLambda, npts, endT, gp,
                             catalog, mu, const1, c0, p, a, d, q0))
write.csv(meanlambdas, "vor_meanlambdas.csv", row.names = F)

3. Calculate E and PIT values

The expected counts in each of the interior cells are the product of \(\bar{\lambda}\) (calculated via MC integration) and the area of the cell. This quantity is also known as the reduced area and its distribution can be approximated by \(Gamma(3.5, 3.5)\) for a homogeneous PPP or suitably smooth inhomogeneous one. Therefore we try to calculate PIT values under that distribution.

ind_boundary <- which(w$summary[, 7] > 0)
meanlambdas <- unlist(read.csv("vor_meanlambdas.csv"))
sum(meanlambdas)

## [1] 456956

head(meanlambdas)

##       x1       x2       x3       x4       x5       x6 
## 6902.794 7043.693    3.866  621.132 1888.642  155.584

areas <- w$summary$dir.area[-ind_boundary]
sum(areas)

## [1] 0.4772

expected <- meanlambdas * areas
sum(expected)

## [1] 33.84

u <- pgamma(expected, 3.5, 3.5)
hist(u)

plot of chunk expecteds

Either this demonstrates an extremely poor fit between ETAS and the observed pattern or that the Gamma distribution is inappropriate. We can take a look at the distribution of expected counts compared to the Gamma distribution.

par(mfrow = c(2, 1))
hist(expected)
hist(rgamma(1000, 3.5, 3.5))

plot of chunk sumexp

The most notable problem isn’t the shape but the scale: the sum of expected counts across all 522 (interior) cells is only 33.8391, which vastly underestimates the observed count of 522.

4. Generate a PP from ETAS

One way to check to see if the Gamma distribution is appropriate is to simulate under the null hypothesis and look for uniformity in the PIT values.

set.seed(387)
z <- simetas(theta, x1 = 1, y1 = 1, T = endT, sor=1)
w2 <- deldir(z$lon, z$lat, rw = c(0, 1, 0, 1))
polygon_list2 <- tile.list(w2)
ind_boundary2 <- which(w2$summary[, 7] > 0)
polygon_list_inner2 <- polygon_list2[-ind_boundary2]

npts <- 1e6
sourceCpp("../etasCPP.cpp")
meanlambdas <- unlist(lapply(polygon_list_inner2, FUN = integrateLambda, npts, endT, gp,
                             catalog = matrix(c(z$lon, z$lat, z$t, z$hm), ncol = 4),
                             mu, const1, c0, p, a, d, q0))
write.csv(meanlambdas, "vor_meanlambdas_null.csv", row.names = F)

meanlambdas2 <- unlist(read.csv("vor_meanlambdas_null.csv"))
areas2 <- w2$summary$dir.area[-ind_boundary2]
expected2 <- meanlambdas2 * areas2
u2 <- pgamma(expected2, 3.5, 3.5)
hist(u2, breaks = 4)

plot of chunk expectedsnull

It looks like the Gamma distribution is indeed not a good fit to ETAS. Maybe this isn’t too surprising as the main thing driving this distribution is the distribution of cell areas, which is likely to more skewed than the Gamma due to the clustering of ETAS. One way that I can think of to address this is to come up with the distribution of expected counts in Voronoi cells by simulating from ETAS, similar to the way that we used the empirical cdf of \(N(A)\) for observed counts in the pixel case. This would require, for every iteration, generating a PP, tesslating it, throwing out the boundary cells, and finding the \(\bar{\lambda}\)s. We could then calculate the expected counts and store them to form the empirical cdf. What do you think?

By the way, I just thought up something that might be an interesting way to do this in 3D. Way back when we talked about tessellating in 3D, so each cell is some sort of 3D cell that we would then find \(\bar{\lambda}\) for. Something that might make more sense for this sort of spatio-temporal process (and drawing inspiration from the way the ETAS triggering function works), we could tesselate according to the normal rule for space (cells contains all points closer to that observation than any other) but with a one-sided rule for time (cells contain all points in time after that observation until the next observation).

4. A Corrected cdf for PIT values

Instead of using the Gamma cdf to find the PIT values, use the simulated distribution of \(E(N(A))\). We expect that a PIT histogram resulting from generating from ETAS will look fairly uniform.

E_vec <- unlist(read.csv("../simulations/E_vec.csv"))
u <- pETAS(expected2, E_vec)
hist(u, breaks = 4)

plot of chunk expectedsnull2

It looks much more in line with expectations, but there is quite a bit of deviation from the uniform even though this histogram is generated from the null hypothesis. This indicates that the confidence bands will likely be quite wide.

5. Plot observed catalog histogram with 90% confidence bands

# Simulate
npts <- 1e6
it <- 30
umat <- matrix(rep(NA, 5 * it), ncol = it)
for (i in 1:it){
  # generate a pattern
  z <- simetas(theta, x1 = 1, y1 = 1, T = endT, sor=1)
  w <- deldir(z$lon, z$lat, rw = c(0, 1, 0, 1))
polygon_list <- tile.list(w)
ind_boundary <- which(w$summary[, 7] > 0)
polygon_list_inner <- polygon_list[-ind_boundary]
catalog <- matrix(c(z$lon, z$lat, z$t, z$hm), ncol = 4)

  meanlambdas <- unlist(lapply(polygon_list_inner, FUN = integrateLambda, npts, endT, gp,
                             catalog, mu, const1, c0, p, a, d, q0))
areas <- w$summary$dir.area[-ind_boundary]
expected <- meanlambdas * areas
  u <- pETAS(expected, E_vec)
  umat[,i] <- table(cut(u, breaks = seq(0, 1, .2)))
}

write.csv(umat, "../simulations/vor_umat2.csv", row.names=F)

u <- pETAS(expected, E_vec)
umat <- as.matrix(read.csv("../simulations/umat2.csv"))
qmat <- apply(umat, 1, quantile, probs = c(.05, .95))
ymax <- 1.1 * max(table(cut(u, breaks = seq(0, 1, .2))), max(qmat[2, ]))
hist(u, breaks = seq(0, 1, .2), col = "gray", border = "gray", xlab = "PIT value", ylab = "Frequency",
     ylim = c(0, ymax), main = "ETAS model")
break_x <- seq(0, 1, .2)
for(i in 1:dim(qmat)[2]) {
  lines(x = c(break_x[i], break_x[i + 1]), y = rep(qmat[1, i], 2), lty = 2)
  lines(x = c(break_x[i], break_x[i + 1]), y = rep(qmat[2, i], 2), lty = 2)
}

plot of chunk ci_bands

Does this seems right to you? To break down what’s going on here, we can look at the distribution of \(\hat{E}(N(A))\) against the distribution of expected values that we see in simulation.

par(mfrow = c(2, 1))
hist(expected, main = "Observed expected counts")
hist(E_vec, main = "Reference distribution")

plot of chunk compare

Note in observed expected counts that the sum across all 522 interior cells is 33.8391. If it was a well-fit model, at least in terms of the overall intensity, these two numbers should be about the same, no?