4/26/2022

Introduction to spatial epidemiology

  • Grid-based approaches (Baker,2014)
  • Spatial logistic regression models estimate odds of disease associated with distance to exposure site
  • Grid-based estimates are a low level scale, whereas parametric models are high level scaling

Background

  • distances to exposure sites can determine the level of exposure high/moderate/low/unexposed (Baker,2014)
  • Case-control comparison estimate odds of disease risk associated with exposure
  • Disease odds risk is a a challenge for large data with numerous observations and multiple points of exposure
  • Local neighborhood using Floating Catchment Area (FCA) estimate supply/demand of which could compare cases/controls

Objectives

  • determine the concordance of non-parametric/parametric models in a simplistic setting
  • determine concordance of these models in a more complex setting
  • Expect less concordance for highly complex data, and suggest a local neighborhood methodology
  • Estimate localized neighborhood differences between cases/controls

Data resources

  • the simple (single) exposure site data obtained from Chorley-Ribble cancer data in Lancashire, England between 1974-1983 (Diggle, 1990).
  • Complex data set was created using R using Poisson point process estimates from ecological models (Gelfand, 2010; Baddeley, 2000) of Swedish saplings across 44 regions in Sweden (decimetres).

Simplistic relative risk of distance based exposure (Diggle, 1990)

  • OR: 0.987 (0.923,1.06 ,p=0.36 ) using spatial logistic regression
  • Chorley-Ribble cancer data in Lancashire, England between 1974-1983.
  • 58 cases of larynx cancer, and 978 lung cancer serving as the controls.

Contingency table odds ratio of singular exposure (Diggle, 1990)

  estimate lower upper
exposed.vs.unexposed 0.907918591889862 0.549347905495309 1.53688400075765

## Simplistic model concordance - for simplistic data sets, the estimated odds are concordant using a logistic model and the grid-based method (Baker, 2014).

Challenging ecology application with multiple points of exposure

  • (Gelfand, 2010; Baddeley, 2000) discuss ecological models of Swedish saplings across 44 regions in Sweden (decimetres)
  • The hypothesis is that among seeding trees, we expect clustering between seeding trees.

Example region

Ecological region summary

  • We have 4 different land types, each contain large counts of tree coordinates (point).
control mixed dense rich
longleaf (un-flowered) 2106 2321 2921 99
red pine (unflowered) NA 1462 338 222
mountain pine (flowered) 145 140 141 7
mountain pine (unflowered) 34 53 68 10
longleaf (flowered) NA 245 597 417
pine (flowered) NA 113 NA 62
pine (unflowered) NA 11 33 6
red pine (flowered) NA 39 21 134

Seeding pine trees spatial association

  • Seeded pine trees were 18.9 (13.6, 26.8) times as likely to be in close proximity to seeded mountain pine compared to unseeded (p<0.001)
  estimate lower upper
unexposed 1.000 NA NA
low exposure 4.013 2.893 5.695
mod. exposure 7.326 5.344 10.292
high exposure 18.938 13.691 26.838

Seeded red pine trees spatial association

  • Seeded red pines are 3.5 times more likely to be in close proximity to seeding mountain pines compared to unseeded (2.5, 5.04; p<0.01).
  estimate lower upper
unexposed 1.000 NA NA
low exposure 3.479 2.480 4.959
mod. exposure 3.641 2.616 5.156
high exposure 3.496 2.462 5.042

Seeded longleaf pines spatial association

  • Seeded longleaf pines are 5.8 (5.04, 6.6) times as likely to be in close proximity to mountain pines that have seeded compared to unseeded (p<0.001).
  estimate lower upper
unexposed 1.000 NA NA
low exposure 3.952 3.445 4.551
mod. exposure 4.546 3.972 5.221
high exposure 5.776 5.039 6.646

Global point intensity estimation

  • Performs spatial logistic regress across all 44 regions, and pools the estimates into a robust summary statistic.

Pooled pine model

Across all 44 regions, for 1 unit increase in disease to the flowered pine tree, the log-intensity of flowered mountain pine decreases -0.0021 (SE=3.5e-04 p<0.001).

## iteration 1 
## iteration 2

Pooled red pine model

Across all 44 regions, for 1 unit increase in distance to seeding red pine trees, the log intensity of seeding mountain pines increases 1.4e-03 (p<0.001), hence there is not an association between the seeding mountain pines and nearest distance to red pines.

## iteration 1 
## iteration 2 
## iteration 3 
## iteration 4 
## iteration 5

Pooled longleaf estimate

Across all regions, for 1 unit increase in distance to seeding longleaf pine trees, the log intensity of seeding mountain pines decreases 2.7e-03 (SE=4.1e-04, p<0.001) , hence there is an association between the seeding mountain pines and nearest distance to seeding longleaf pines.

## iteration 1 
## iteration 2 
## iteration 3 
## iteration 4 
## iteration 5

Global estimated odds of seeding saplings associated with ‘exposure’

  • For distance based measures (continuous) we do not see a trend of increased odds of exposure comparing seeding saplings to un-seeded at global level

Local neighborhood estimation

  • Estimates differences of neighborhoods comparing cases/controls

Methodology for local estimates for neighborhood analysis comparing mountain pines

  • Draw a disc with a small radius (45 decimeters), representing a catchment area.
  • We count all tree types (a total of 12 possible types) within this radius
  • multinomial model for hypothesis testing

Multi-type catchment area estimation of local neighborhood

  • Similar to a Floating Catchment Area (supply vs. demand)
  • H0: Neighborhood proportions\(_{\text{seeded MP}}\) = Neighborhood proportions\(_{\text{unseeded MP}}\)

Mountain pine neighborhood seeding

Conclusion

  • For high dimensional data, non-parametric estimated odds of disease is not concordant with parametric models (scaling!).
  • neighborhood estimates for each tree label were done simulataneously using a multinomial model
  • The neighborhood proportions for seeded red pines and pines were significantly higher in seeded mountain pine compared to un-seeded \((p<0.05)\)
  • The neighborhood proportions for un-seeded trees were not different among mountain pines.

References

  • Baddeley, A. and Turner, R. (2000) Practical maximum pseudolikelihood for spatial point patterns. Australian and New Zealand Journal of Statistics 42, 283–322.
  • Baker, David, Valleron, Alain-Jacques. An open source software for fast grid-based data-mining in spatial epidemiology (FGBASE). International Journal of Health Geographics. 13;46. 2014.
  • Gelfand, E. Alan, Diggle J., Peter, Fuentes, Montserrat, Guttorp, Peter. Handbook of Spatial Statistics. Handbooks of Modern Statistical Methods. Chapman & Hall, CRC Press. 2010.
  • Kelly-Schwartz, Alexia, Stockard, Jean, Doyle, Scott, Schlossberg, Marc. Is Sprawl Unhealthy? A multilevel Analysis of the Relationship of Metropolitan Sprawl to the Health of Individuals. Journal of Planning Education and Research. 24:184-196. 2004.
  • P.J. Diggle. A point process modelling approach to raised incidence of a rare phenomenon in the vicinity of a prespecified point. Journal of the Royal Statistical Society, Series A, 153:349-362,1990.
  • Ripley, B.D. (1981) Spatial statistics. John Wiley and Sons.
  • Strand, L. (1972). A model for stand growth. IUFRO Third Conference Advisory Group of Forest Statisticians, INRA, Institut National de la Recherche Agronomique, Paris. Pages 207–216.