University of Wollongong, 19/01/2015

Talk outline

What I'll be talking about

  • Introduction
  • The need for performance testing
  • Performance testing in the literature
  • Testing the IPF algorithm
  • Broadening the tests
  • Wider context/implications

Introduction

What is spatial microsimulation?

"An approach to the analysis of individual-level phenomena over geographical space that involves the creation, analysis and modelling of spatial microdata" (Lovelace and Dumont 2015).

Why spatial microsimulation?

1: To tackle the modifiable areal unit problem

Why spatial microsimulation?

2: To estimate missing data

Population synthesis can be used as a method of small area estimation. E.g.:

  • Local variations in average income
  • The rate of smoking across a city (Tomintz 2008)
  • Geographical variability use in transportation

Spatial microdata can also shed light on intra-zone variability at the individual level.

Why spatial microsimulation?

3: As a foundation for modelling

Many real-world processes operate at the individual or household level.

Geographical factors almost always play a role.

"Everything is related to everything else, but near things are more related than distant things." (Tobler's '1st rule of geography', 1979)

Spatial microdata provide a foundation for:

  • Spatially explicit and realistic agent-based models (ABM)
  • 'What if' scenarios of change to external influences
  • Full dynamic spatial microsimulation models

What spatial microsimulation is not

It's important to define terms early on.

Spatial microsimulation in geographical analysis is not:

  • Agent-based modelling (ABM) (but can be an input for ABM)
  • A replacement for real data

Sometimes, 'spatial microsimulation' research is not even:

  • Spatial: space is often just a mutually exclusive categorical variable!
  • Simulation: it's often just reweighting in the 1st instance

Many ways to generate spatial microdata

Opportunities to broaden tests

Main methods of population synthesis

Reweighting algorithms, also labelled as synthetic reconstruction (Barthelemy and Toint, 2012):

  • Iterative proportional fitting (IPF) (Deming and Stephan 1940; Beckman and Baggerly 1996)
  • Generalised Regression Reweighting (GREGWT)

Combinatorial optimisation algorithms

  • 'Hill climbing' algorithms
  • Simulated annealing via the 'FMF' (Harland 3013)
  • Genetic algorithms

Synthetic reconstruction, a refinement of IPF without reliance on a microdata sample (Barthelemy and Toint, 2012).

Motivation

  • Dozens of methods available for (spatial) microsimulation
  • Difficult to choose from options
  • Testing can be time consuming and tricky (Harland et al. 2013)

Need for fast and consistent testing framework

Broader motivations

  • To save the world!
  • Assuming that climate science is correct, we must radically alter how we live
  • This is the greatest challenge in planning, engineering and logistics of the 21st century

Problem: each researcher has their own 'horse' in the race

Past testing efforts in the literature

The 'model experiment' genre

  • Harland et al. (2012): CO approaches work better
  • A long (86 pages!) working paper, Williamson (2001): CO performs slightly better than IPF-based approaches
  • Ryan, et al (2009): CO methods slightly outperform IPFP
  • Barthelemy and Toint (2012): new method outperforms IPF

My model experiments 1

Testing optimised IPF in R (ipfp) vs 'FMF'

My model experiments 2

Testing different techniques for 'integerisation'

Are more tests warranted?

  • Of course!
  • Constant testing, skepticism = cornerstone of science
  • Methods, algorithms and software are constantly shifting
  • Many useful findings - often researcher's own model 'best'
  • Few conclusive results from past research
  • Many past tests are not reproducible or compare different things
  • Testing YOUR model will help understand and improve it

But how to measure performance?

Typical results from model tests

Testing the IPF algorithm

Based on recently accepted paper

In Journal of Artificial Societies and Social Simulation

Lovelace et al. (2015) revisited an established technique

Setting-up model the experiments

  • 'Scrambled' versions of official datasets used
  • 3 example datasets used, progressively larger/more comples
  • Saved weight matrices after every constraint:iteration

Project organisation

|-- data-big (just README links)
|-- figure
|-- input-data
|   |-- sheffield
|   |-- simple
|   `-- small-area-eg
|-- literature
|-- models
|   |-- ipfinr
|   |-- FMF
|   |-- simSALUD
|   `-- GREGWT
`-- output

Try it yourself!

Replicable results

Reproducible example:

source("models/etsim.R") # run the code 'live'

Results 1: impacts of different parameters

Root mean-squared error (RMSE)

Results 2: summary for modellers

  • Additional iterations had big impact up to 3, very little benefit after 10
  • 'Empty cells' found to have largest impact on fit
  • Initial weights had very little impact

Impact of initial weights

Results 3: computational efficiency

  • C code (ipfp package): 50 fold speed increase
  • R code can also be made faster with dplyr package
  • Not yet tested mipfp!

Results 4: 'Goodness of fit' measures

Little impact on results

Broadening the tests

CO in FMF vs IPF in R

  • New project to test techniques on very large microdatasets
  • Challenge: allocate 569,741 individuals to 7,787 zones
  • Almost 60 million people in output spatial microdata!
  • New methodology for IPF developed

External validation

  • More important that 'internal validation' is how well results fit reality
  • Opportunity provided by Census variable on census well-being
  • Simulated at small area level with FMF (Harland, 2013) and R
  • Esteban Muñoz and I have a plan for this - advice appreciated

Work in progress

  • Compare different approaches in terms of timing, model fit and ease of use
  • External validation
  • Use alternative methods to generate same output: GREGWT? SimObesity? simSALUD?

An R package for testing population synthesis methods

Esteban Muñoz and I are working on package to facilitate testing: testmsim

Introduction to testmsim

The aim of the R package testmsim is to provide functions, data and documentation for the testing of spatial microsimulation.

library(devtools) # package for R developers
install_github("robinlovelace/testmsim") # install latest version
library(testmsim) # load the package
help(package = testmsim) # show package documentation
  • Very early days
  • Main benefit: facilitate others to do testing

Wider context of spatial microsimulation

Issues within the field

  • "Little attention is paid to the choice of programming language used" for microsimulation (Clarke and Holm 1987)
  • Lack of reproducibility (Lovelace and Ballas, 2013)
  • Hard to get started
  • Few simple examples - uses tend to be big and complicated
  • Few introductory teaching resources

Teaching spatial microsimulation

  • Two courses in May (Leeds) and August (Cambridge)
  • Taught basic principles of spatial microsimulation
  • And implementation in R
  • Feedback: students grateful for first rung on ladder
  • More success with latter course focussing on applications

Spatial microsimulation introductory textbook

  • Contract with CRC Press as part of their R Series
  • Draft of book available online in its entirety
  • Open 'wiki' style allows anyone to contribute
  • Any feedback/input gratefully received
  • Check it out here: robinlovelace.net/spatial-microsim-book/

Discussion with SMART people

Collaboration for mutual benefit

  • What would YOU most like to see in a practical textbook on spatial microsimulation?
  • Allocating individuals to households (Pritchard and Miller, 2012)
  • Alternative methods of population synthesis
  • Thinking about smsim-tools package
  • Applications
  • Code sharing and collaboration can make the world a better place

Key References

Barthelemy, J. and Toint, P. L. (2012) ‘Synthetic Population Generation Without a Sample’, Transportation Science. INFORMS, 47(2), pp. 266–279. doi: 10.1287/trsc.1120.0408.

Harland, K. (2013) ‘Microsimulation model user guide: flexible modelling framework’, National centre for research methods. Leeds: University of Leeds; NCRM (NCRM working papers). doi: http://eprints.ncrm.ac.uk/3177/2/microsimulation\_model.pdf.

Lovelace, R. and Ballas, D. (2013) ‘“Truncate, replicate, sample”: A method for creating integer weights for spatial microsimulation’, Computers, Environment and Urban Systems. Elsevier Ltd, 41, pp. 1–11. doi: 10.1016/j.compenvurbsys.2013.03.004.

Lovelace, R., Ballas, D., Leeuwen, E. van and Birkin, M. (2015) ‘Evaluating the performance of Iterative Proportional Fitting for spatial microsimulation: new tests for an established technique’, Journal of Artificial Societies and Social Simulation.

Pritchard, D. R. and Miller, E. J. (2012) ‘Advances in population synthesis: fitting many attributes per agent and fitting to household and person margins simultaneously’, Transportation, 39(3), pp. 685–704. doi: 10.1007/s11116-011-9367-4.