Sensitivity analysis of neighborhood point pattern models in high dimension

4/26/2022

Introduction to spatial epidemiology

Grid-based approaches (Baker,2014)
Spatial logistic regression models estimate odds of disease associated with distance to exposure site
Grid-based estimates are a low level scale, whereas parametric models are high level scaling

Background

distances to exposure sites can determine the level of exposure high/moderate/low/unexposed (Baker,2014)
Case-control comparison estimate odds of disease risk associated with exposure
Disease odds risk is a a challenge for large data with numerous observations and multiple points of exposure
Local neighborhood using Floating Catchment Area (FCA) estimate supply/demand of which could compare cases/controls

Objectives

determine the concordance of non-parametric/parametric models in a simplistic setting
determine concordance of these models in a more complex setting
Expect less concordance for highly complex data, and suggest a local neighborhood methodology
Estimate localized neighborhood differences between cases/controls

Data resources

the simple (single) exposure site data obtained from Chorley-Ribble cancer data in Lancashire, England between 1974-1983 (Diggle, 1990).
Complex data set was created using R using Poisson point process estimates from ecological models (Gelfand, 2010; Baddeley, 2000) of Swedish saplings across 44 regions in Sweden (decimetres).

Simplistic relative risk of distance based exposure (Diggle, 1990)

OR: 0.987 (0.923,1.06 ,p=0.36 ) using spatial logistic regression
Chorley-Ribble cancer data in Lancashire, England between 1974-1983.
58 cases of larynx cancer, and 978 lung cancer serving as the controls.

Contingency table odds ratio of singular exposure (Diggle, 1990)

	estimate	lower	upper
exposed.vs.unexposed	0.907918591889862	0.549347905495309	1.53688400075765

## Simplistic model concordance - for simplistic data sets, the estimated odds are concordant using a logistic model and the grid-based method (Baker, 2014).

Challenging ecology application with multiple points of exposure

(Gelfand, 2010; Baddeley, 2000) discuss ecological models of Swedish saplings across 44 regions in Sweden (decimetres)
The hypothesis is that among seeding trees, we expect clustering between seeding trees.

Example region

Ecological region summary

We have 4 different land types, each contain large counts of tree coordinates (point).

	control	mixed	dense	rich
longleaf (un-flowered)	2106	2321	2921	99
red pine (unflowered)	NA	1462	338	222
mountain pine (flowered)	145	140	141	7
mountain pine (unflowered)	34	53	68	10
longleaf (flowered)	NA	245	597	417
pine (flowered)	NA	113	NA	62
pine (unflowered)	NA	11	33	6
red pine (flowered)	NA	39	21	134

Seeding pine trees spatial association

Seeded pine trees were 18.9 (13.6, 26.8) times as likely to be in close proximity to seeded mountain pine compared to unseeded (p<0.001)

	estimate	lower	upper
unexposed	1.000	NA	NA
low exposure	4.013	2.893	5.695
mod. exposure	7.326	5.344	10.292
high exposure	18.938	13.691	26.838

Seeded red pine trees spatial association

Seeded red pines are 3.5 times more likely to be in close proximity to seeding mountain pines compared to unseeded (2.5, 5.04; p<0.01).

	estimate	lower	upper
unexposed	1.000	NA	NA
low exposure	3.479	2.480	4.959
mod. exposure	3.641	2.616	5.156
high exposure	3.496	2.462	5.042

Seeded longleaf pines spatial association

Seeded longleaf pines are 5.8 (5.04, 6.6) times as likely to be in close proximity to mountain pines that have seeded compared to unseeded (p<0.001).

	estimate	lower	upper
unexposed	1.000	NA	NA
low exposure	3.952	3.445	4.551
mod. exposure	4.546	3.972	5.221
high exposure	5.776	5.039	6.646

Global point intensity estimation

Performs spatial logistic regress across all 44 regions, and pools the estimates into a robust summary statistic.

Pooled pine model

Across all 44 regions, for 1 unit increase in disease to the flowered pine tree, the log-intensity of flowered mountain pine decreases -0.0021 (SE=3.5e-04 p<0.001).

## iteration 1 
## iteration 2

Pooled red pine model

Across all 44 regions, for 1 unit increase in distance to seeding red pine trees, the log intensity of seeding mountain pines increases 1.4e-03 (p<0.001), hence there is not an association between the seeding mountain pines and nearest distance to red pines.

## iteration 1 
## iteration 2 
## iteration 3 
## iteration 4 
## iteration 5

Pooled longleaf estimate

Across all regions, for 1 unit increase in distance to seeding longleaf pine trees, the log intensity of seeding mountain pines decreases 2.7e-03 (SE=4.1e-04, p<0.001) , hence there is an association between the seeding mountain pines and nearest distance to seeding longleaf pines.

## iteration 1 
## iteration 2 
## iteration 3 
## iteration 4 
## iteration 5

Global estimated odds of seeding saplings associated with ‘exposure’

For distance based measures (continuous) we do not see a trend of increased odds of exposure comparing seeding saplings to un-seeded at global level

Local neighborhood estimation

Estimates differences of neighborhoods comparing cases/controls

Methodology for local estimates for neighborhood analysis comparing mountain pines

Draw a disc with a small radius (45 decimeters), representing a catchment area.
We count all tree types (a total of 12 possible types) within this radius
multinomial model for hypothesis testing

Multi-type catchment area estimation of local neighborhood

Similar to a Floating Catchment Area (supply vs. demand)
H0: Neighborhood proportions\(_{\text{seeded MP}}\) = Neighborhood proportions\(_{\text{unseeded MP}}\)

Mountain pine neighborhood seeding

Conclusion

For high dimensional data, non-parametric estimated odds of disease is not concordant with parametric models (scaling!).
neighborhood estimates for each tree label were done simulataneously using a multinomial model
The neighborhood proportions for seeded red pines and pines were significantly higher in seeded mountain pine compared to un-seeded \((p<0.05)\)
The neighborhood proportions for un-seeded trees were not different among mountain pines.

References

Baddeley, A. and Turner, R. (2000) Practical maximum pseudolikelihood for spatial point patterns. Australian and New Zealand Journal of Statistics 42, 283–322.
Baker, David, Valleron, Alain-Jacques. An open source software for fast grid-based data-mining in spatial epidemiology (FGBASE). International Journal of Health Geographics. 13;46. 2014.
Gelfand, E. Alan, Diggle J., Peter, Fuentes, Montserrat, Guttorp, Peter. Handbook of Spatial Statistics. Handbooks of Modern Statistical Methods. Chapman & Hall, CRC Press. 2010.
Kelly-Schwartz, Alexia, Stockard, Jean, Doyle, Scott, Schlossberg, Marc. Is Sprawl Unhealthy? A multilevel Analysis of the Relationship of Metropolitan Sprawl to the Health of Individuals. Journal of Planning Education and Research. 24:184-196. 2004.
P.J. Diggle. A point process modelling approach to raised incidence of a rare phenomenon in the vicinity of a prespecified point. Journal of the Royal Statistical Society, Series A, 153:349-362,1990.
Ripley, B.D. (1981) Spatial statistics. John Wiley and Sons.
Strand, L. (1972). A model for stand growth. IUFRO Third Conference Advisory Group of Forest Statisticians, INRA, Institut National de la Recherche Agronomique, Paris. Pages 207–216.