This project is aimed at expanding the framework of adaptive geostatistical designs (Chipeta et al 2015 and following works) to include targeting (in the spirit of Andreis and Bonetti, 2018).
Targeting allows to address things such as:
We assess the methodology via simulation on a number of scenarios based on estimated real malaria profiles for Ghana, Malawi and Cameroon.
For a fixed sample size and number of steps, the sampling effort is carried out sequentially, where each step past the first one is informed by what observed in the previous ones. Steps 3 and 4 below encode the extension of the method to include targeting.
select a sample using a non-adaptive design (e.g., srs, stratified, cluster, inhibitory, …)
fit a suitable geostatistical model (with covariates) and obtain a prediction surface (prevalence, predictive variance, exceedance probabilities)
obtain weights for out-of-sample units for the next step by feeding the prediction surface through a series of functions that can boost/deflate them
depending on prescribed (or adaptive) targeting objectives
subject to constraints (monetary, time, …)
select the next sample using a probability proportional to size design (e.g., pareto)
do NOT rinse, aggregate the samples, and GOTO 2 until the last step is reached.
Four designs compared:
We look at the following performance measures:
criterion | type | short description | measure |
---|---|---|---|
logistics | practical | cost implied by the sampling strategy | travel length, duration and cost, cost/case or cost/hotspot found |
targeting | practical/statistical | assess the imbalance of the sample | empirical inclusion probabilities |
accuracy | statistical | recovery of true prevalence map | model-based, bias (absolute, relative, mean squared…), exceedance probabilities |
uncertainty | statistical | uncertainty in prediction | model-based, estimated predictive variance |
For each country, we consider combinations of:
Moreover, the original model-based geostatistical design is considered under both the predictive variance and the exceedance probabilities criteria.
Here are some preliminary results for the Ghanaian scenarios. For simplicity, travel distances are based on (chained, for the adaptive designs) Hamiltonian paths through sampled locations.
A first comparison looks at the Monte Carlo distributions of the travel length under the competing designs. This is useful to get a general sense of the costs implied by each strategy.
srs: simple random sampling; strat (stratified); agd_pv (adaptive geostatistical, predictive variance); agd_ep (adaptive geostatistical, exceedance probability); tagd (targeted adaptive geostatistical).
To assess whether targeting is achieved and what this entails in terms of sample imbalance, it is useful to take a look at the empirical inclusion probabilities maps. In a real setting, these would need to be estimated (open problem) and can function as weights to produce the final estimates. All maps are smoothed over estimates on the (finite) number of potential sampling locations for visualisation purposes.
tagd is set to target locations with an estimated prevalence greater the estimated third quartile of the prevalence distribution. Much more complex targeting objectives can be implemented.
The maps below show the Monte Carlo average prevalence and bias estimated via geostatistical modelling, based on the final samples for each design. The number of simulation runs for these outputs is limited, so these are not to be seen as final results.
The threshold for the exceedance is set to the true average prevalence.
Many, including: