Geostatistical Model-Based Adaptive Sampling

Rationale

This project is aimed at expanding the framework of adaptive geostatistical designs (Chipeta et al 2015 and following works) to include targeting (in the spirit of Andreis and Bonetti, 2018).

Targeting allows to address things such as:

oversampling units with prescribed characteristics (more expected cases, higher estimated uncertainty, combinations thereof, …)
implementing constraints in the units selection (spatial/time/monetary constraints).

We assess the methodology via simulation on a number of scenarios based on estimated real malaria profiles for Ghana, Malawi and Cameroon.

Methods

For a fixed sample size and number of steps, the sampling effort is carried out sequentially, where each step past the first one is informed by what observed in the previous ones. Steps 3 and 4 below encode the extension of the method to include targeting.

select a sample using a non-adaptive design (e.g., srs, stratified, cluster, inhibitory, …)
fit a suitable geostatistical model (with covariates) and obtain a prediction surface (prevalence, predictive variance, exceedance probabilities)
obtain weights for out-of-sample units for the next step by feeding the prediction surface through a series of functions that can boost/deflate them
1. depending on prescribed (or adaptive) targeting objectives
2. subject to constraints (monetary, time, …)
select the next sample using a probability proportional to size design (e.g., pareto)
do NOT rinse, aggregate the samples, and GOTO 2 until the last step is reached.

Simulation

Competing designs

Four designs compared:

simple random sampling [non-adaptive]
stratified sampling [non-adaptive, strata based on supposed prevalence levels]
model-based geostatistical design [adaptive, Chipeta/Kabaghe/…]
targeted adaptive design [adaptive, Chipeta&Andreis].

Performance criteria

We look at the following performance measures:

Table 1. Criteria for comparing sampling strategies. A geostatistical model is fitted to every final sample and the Monte Carlo distribution of quantities such as bias, predictive variance, travel length and inclusion probabilities are considered.
criterion	type	short description	measure
logistics	practical	cost implied by the sampling strategy	travel length, duration and cost, cost/case or cost/hotspot found
targeting	practical/statistical	assess the imbalance of the sample	empirical inclusion probabilities
accuracy	statistical	recovery of true prevalence map	model-based, bias (absolute, relative, mean squared…), exceedance probabilities
uncertainty	statistical	uncertainty in prediction	model-based, estimated predictive variance

Scenarios

For each country, we consider combinations of:

sample size
targeting objectives.

Moreover, the original model-based geostatistical design is considered under both the predictive variance and the exceedance probabilities criteria.

Simulation results

Here are some preliminary results for the Ghanaian scenarios. For simplicity, travel distances are based on (chained, for the adaptive designs) Hamiltonian paths through sampled locations.

Example path

Path length

A first comparison looks at the Monte Carlo distributions of the travel length under the competing designs. This is useful to get a general sense of the costs implied by each strategy.

srs: simple random sampling; strat (stratified); agd_pv (adaptive geostatistical, predictive variance); agd_ep (adaptive geostatistical, exceedance probability); tagd (targeted adaptive geostatistical).

Targeting/sample imbalance

To assess whether targeting is achieved and what this entails in terms of sample imbalance, it is useful to take a look at the empirical inclusion probabilities maps. In a real setting, these would need to be estimated (open problem) and can function as weights to produce the final estimates. All maps are smoothed over estimates on the (finite) number of potential sampling locations for visualisation purposes.

tagd is set to target locations with an estimated prevalence greater the estimated third quartile of the prevalence distribution. Much more complex targeting objectives can be implemented.

Accuracy

The maps below show the Monte Carlo average prevalence and bias estimated via geostatistical modelling, based on the final samples for each design. The number of simulation runs for these outputs is limited, so these are not to be seen as final results.

Prevalence

Exceedance probability

The threshold for the exceedance is set to the true average prevalence.

Bias \((p-\widehat{p})\)

Uncertainty

Open questions/future developments

Many, including:

Exploring adjustment mechanism to weigh observations coming from an adaptive survey, to account for imbalance (Andreis et al 2018 for a possible approach)
Implementing more complex targeting functions, to reflect policymakers needs and budget constraints
Implementing an interface with services such as google maps or similar, to leverage actual travel distances and times
Developing an easy to use application (possibly a ShinyApp) to be used with little or no supervision in a selected few pre-implemented strategies
Explore the use of the methods for different diseases and contexts (in particular, to apply in surveying elusive/hard to reach populations).