Establishing the Idea
Why Geocomputation ?
- … and not
- GI Science
- Geography
- Spatial Statistics
- … for example?
Need more than ‘standard’ GIS:
“GIS was, for some, a backwards step because the data models and analysis methods provided were simply not rich enough in geographical concepts and understanding to meet their needs.”
http://www.geocomputation.org/what.html
Needs such as:
- Fitting new (but more appropriate) models
- Searching for spatial pattern
- Visualisation
- Knowledge discovery
- Exploratory data analysis?
Fitting new models
Its not just about the models - its also about the approaches
‘Classical’ spatial statistics:
Assume \(y_i = X\beta + \epsilon\) where \(\epsilon = W\epsilon + \nu\) and \(\nu_i \sim N(0,\sigma^2)\)
- But why? Were does this model come from?
- Why is it linear?
- Why does it depend on a specific set of spatial units? What about the MAUP?
“This lack of a well-defined link between process and form is commonplace in spatial analysis, and is well-documented in fields such as point set clustering and fractal analysis. That it also applies here, in spatial regression modeling, should come as no surprise.” De Smith, Goodchild, Longley 2007 - Geospatial Analysis: A Comprehensive Guide to Principles, Techniques and Software Tools P243
Process-oriented approaches
- Some excellent ones exist
- Cellular automata
- Agent-based models
- Microsimulation
- All more grounded in reality
- But some issues not fully addressed
- Calibration
- Model selection
- Hypothesis testing
- Maybe these are done better by classical approaches
A geocomputation solution?
- Approximate Bayesian Computation (ABC)
- See eg Marjoram, P., J. Molitor, V. Plagnol and S. Tavaré. Markov chain Monte Carlo without likelihoods. Proceedings of the National Academy of Sciences USA 100: 15324–15328.
- Simplifying massively:
- this allows you to make Bayesian inferences about processes you can simulate
- even if the likelihood is intractable
A Quick overview
- Draw parameter values from a prior distribution
- Use these for the simulation
- Keep them in a set of successful parameters if sufficiently ‘near’ to the real data
- repeat these steps LOTS of times
- the successful parameters have a distribution that should approximate the Bayesian posterior
- Throw mud at the wall and see what sticks
Example - 2D hardcore point process

- Random points but
- Always separated by a distance \(d\)
- Models
- Locations of coins on fairground game
- Locations of settlements?
- Locations of animal nests?
- Easy to simulate
- Hard to manage analytically
- How to estimate \(d\)?
It is certainly computation
- Based on code-based simulation
- Embarassingly parallel
- Suited to cloud-based computing
- Suited to map-reduce approach
It is certainly geography
- A very specific part
- But we want to be able to
- Answer geography-led questions
- Assess geography-led models
- Microsimulation
- ABMs
- time-evolving models
- IT HAS SOMETHING TO OFFER!!
Visualisation
Visual Explanation
- ‘Honest mapping’
- Draw crisp lines with caution!
- Issues of fuzziness and uncertainty don’t go away with ‘big data’
- How can these be conveyed?
Fuzzy Travel To Work Areas
Sketchy Map of House Price Estimates
Why is this geocomputation?
- Needs to consider computer graphics techniques
- Needs an understanding of fuzzy algorithms
- Needs to put these in a spatial context
- Has a geographical interpretation
- Although not seen here, needs to consider interaction as well
- No ‘off the shelf’ solution
- finding a solution and implementing it is research in itself
Making the future happen…
Exercise Caution
A Worrying Statement
“Petabytes allow us to say ‘Correlation is enough’. We can stop looking for models. We can analyze the data without hypotheses about what it might show. We can throw the numbers into the biggest computing clusters the world has ever seen and let statistical algorithms find patterns where science cannot …
Correlation supercedes causation, and science can advance without coherent models, unified theories, or really any mechanistic explanation at all."
Chris Anderson, Wired, June 23rd 2008
Hopefully not in Geocomputation !
- Any observed correlations depend heavily on the data collection process
- Simpson’s paradox
- Where patterns that appear in different subsets of data disappear when these subsets are merged
- Different patterns can appear in the resultant data set
- A notable shift is away from the designed experiment
- Data has to be taken as given - little control over its collection
- Inference based on statistical models assumes some kind of survey design or experimental design
- Ignore data collection process at your peril.
- GIGO still holds!
Brunsdon, 2014, Spatial Science - Looking Outward, Dialogs in Human Geography, 4(1), 45-49
When Venn Diagrams go Bad