A cheap, simple framework for distributed MATSim
Pieter Fourie - Future Cities Laboratory
14 March 2016
Singapore scenario
- 300+ lines, 800+ routes, 2000+ stops, 70k+ links
- Needs PT on the network because PT a dominant mode here, detailed PT analysis essential
- Initially plans have negative scores because of low car availability, stranded agents
- Large solution space -> long runs; i.e. 2 days on 24 cores
- Long initialization time: 2hrs on our 24 core server
Example run of Singapore, 30% replanning
Problem
- QSim is limiting factor in performance.
- Increase QSim performance with increasing cores
- BUT You can only improve QSim by so much (tightly coupled, simple network decomposition)
Motivation: performance
- Reduce iterations (QSim expensive, limited parallellization)
- Reduce time per iteration, initialization (Large networks, transit routing)
- Increase agent plan memory
- Increase sample size (esp. important for transit: scaleability, error-prone because have to change many configurations)
- Use cheap computation, e.g. amazon EC2 or office network
- i.e. remove CPU, RAM limitations of single SMP
Design aims: integration
- Monolithic replanning modules are hard to maintain in an evolving project
- Need flexibility to grow with main MATSim project, easily integrate new functionality
- Produce comparable results to MATSim
Design

- Builds on previous work with Johannes Illenberger and Kai Nagel
- Extension to the MATSim model; a meta-model that runs for a number of cycles in series
- Fires agent events based on link travel times from the preceding QSim iteration
Design
- Experiments with the car-only version of PSim showed that it could dramatically reduce
simulation times
- Having it run in series means that QSim waits for plans
Design elements
- Transit PSim
- Plan and person serialization
- Load balancing
Operation modes: serial vs parallel (parallel
implies operating on information one iteration older)
Full transit performance transmission
Stochastic boarding model for PSim
Works with most modules
Performance of distributed system
- Works, in principle
- Parallel is preferable
Meta-model performance: Sioux Falls
Meta-model characteristics
- Increases the number of mutation combinations performed on a plan
- Co-evolutionary algorithm relies on mutation rate and number of iterations
- Number of unique mutation combinations performed on plans is multinomial distribution
- Each plan can be said to have a 'genome' of mutations, i.e. I000A001B003A004G010
- Standard MATSim can increase any given genome by a maximum of 1 letter
Genomic analysis of plans
How does genome length relate to exploration of solution space?
Can genomes possibly tell us something about scenario characteristics/complexity?
Can genome analysis inform effective replanning strategies, e.g by varying replanning rates or chaining strategies based on effective combinations ('genes') observed in the genome?
Gene length: Sioux Falls
MATSim reference run
Q:P = 1:10, parallel
Gene score: Sioux Falls
MATSim reference run
Q:P = 1:10, parallel
Outstanding issues
Design
- Make PSim a replanning module, no regeneration of data structures for replanning
Performance tests
- Number of slaves vs PSim iters vs number of threads per slave vs time
New strategies
- Diversification, large agent memories, randomized routing
Ensemble simulations
- Instead of a single master, have multiple masters running differrent combinations of agent plans
- Requires new modes of analysis