- What we did: Big Data in the PCT
- What is Big Data: worked example (with code!)
- Implications for ITS
- Lessons learned
2017-02-27
It's been an intense 2 years!
Concept (PhD) -> Job at UoL (2009 - 2013)
Discovery of R programming and shiny (2013)
'Propensity to Cycle' bid by DfT via SDG (214)
Link-up with Cambridge University and colleagues (2015)
Implementation on national OD dataset, 700k routes (2016)
Completed LSOA phase (4 million lines!) (2017)
Robin Lovelace (Lead Developer, University of Leeds)
"The PCT is a brilliant example of using Big Data to better plan infrastructure investment. It will allow us to have more confidence that new schemes are built in places and along travel corridors where there is high latent demand."
"The PCT shows the country’s great potential to get on their bikes, highlights the areas of highest possible growth and will be a useful innovation for local authorities to get the greatest bang for their buck from cycling investments and realise cycling potential."
Some of the files were're working with:
Input: A mass of data
Big data is an ubrella term.
"unconventional datasets that are difficult to analyze using established methods. Often this difficulty relates to size but the form, format, and complexity are equally important" (Lovelace, Birkin, et al. 2016).
Here's a 'big' dataset courtesy of tech startup TransportAPI (see here):
{{"request_time":"2016-03-15T16:53:42+00:00","source":"Traveline southeast
journey planning API","acknowledgements":"Traveline
southeast","routes":[{"duration":"03:27:00","route_parts":[{"mode":"foot","from_point_name":"Calverley
Street","to_point_name":"Leeds Rail
Station","destination":"","line_name":"","duration":"00:13:00","departure_time":"17:02","arrival_time":"17:15","coordinates":[[-1.54909,53.80075],[-1.54909,53.8007],[-1.5491,53.80046],[-1.5491,53.79994],[-1.5491,53.79992],[-1.54912,53.79982],[-1.54911,53.79964],[-1.54911,53.79957],[-1.54913,53.79908],[-1.54922,53.79872],[-1.54935,53.7984],[-1.54946,53.79808],[-1.54946,53.79808],[-1.54949,53.79798],[-1.54949,53.79798],[-1.54955,53.79785],[-1.54963,53.79772],[-1.54969,53.79768],[-1.54986,53.79756],[-1.54999,53.79747],[-1.55012,53.79735],[-1.54994,53.79725],[-1.54935,53.79692],[-1.54828,53.79629],[-1.54813,53.79623],[-1.54802,53.79618],[-1.54822,53.79616],[-1.54829,53.79615],[-1.54848,53.79611],[-1.54848,53.79611],[-1.54845,53.79604],[-1.54845,53.796],[-1.54845,53.79597],[-1.5482,53.79597],[-1.54777,53.79428]]},{"mode":"train","from_point_name":"Leeds","to_point_name":"Doncaster","destination":"London
King's
This code automatically pulls out the geographical essentials.
request = "http://fcc.transportapi.com/v3/uk/public/journey/from/lonlat:-1.5490774,53.8007554/to/lonlat:0.121817,52.205337.json?region=southeast&" txt <- httr::content(httr::GET(request), as = "text")
## No encoding supplied: defaulting to UTF-8.
obj <- jsonlite::fromJSON(txt) coords <- obj$routes$route_parts[[1]]$coordinates coords <- do.call(rbind, coords)
Credit: Ali Abbas's code from the LandorLinks TransportHack.
plot(coords)
This code makes the above steps user friendly (relatively speaking!) by bundling the code inside a custom function. (Credit Ali Abbas and other software developers.)
devtools::install_github("ropensci/stplanr")
library(stplanr)
rf = route_transportapi_public("Leeds", "Cambridge, UK")
m = mapview::mapview(rf)@map
htmlwidgets::saveWidget(m, "leeds-cam-data.html")
Not pretty but the results are pretty. …
See result on rpubs or http://robinlovelace.net/figure/leeds-cam-data.html
<iframe src="../figures/map-cyclenet.html" width="100%" height="600"></iframe>
cp ../figures/leeds-cam-data.html ~/repos/robinlovelace.github.io/
It depends on data from diverse, non-official sources
It's crucial that we can share our work to not re-invent the wheel -> 'Data Science'
Multi-level methods (e.g. spatial microsimulation and agent-based modelling) needed to represent complexity of real world (Burgoine and Monsivais, 2015).
"A condition of publication in a Nature journal is that authors are required to make materials, data, code, and associated protocols promptly available to readers without undue qualifications. Any restrictions … must be disclosed."
DfT project: flows ‘super’ data set (Census)
CONTENT: England residence to work commuter trips (LSOA ->LSOA ). ~25 million daily trips
FORMAT: Matrix (data frame in R terminology) with over 2 ,000 million cells (8 million rows per 256 columns per flow- categories based on cross tabs- age, gender, ).
CONFIDENTIALITY: Instance of Big Data ‘issues’: non-confidential cross-tabbed data (agesextravel mode). Half the flows have LESS than 2 individuals!
DfT 2.0 : NTS + microsimulation + Dept for Education layer data. SMS mitigates confidentiality issues?
Manchester Traffic model
Transport for Greater Manchester (3 datasets: sources SATURN-Voyager-GMVDM):
3 basic modes (Walk/Cycle, Public transport, Car trips) based on traffic counts
Over 3+ million flows. 3 different zonal geographies- OD + time + trip purpose
Outcome: Census-style 500K lines (a->b MSOA level).
Big data projects are innevitably complicated and involve multidisciplinary teams.
Lessons learned during the course of the Propensity to Cycle Tool team:
Premises:
Analogous database applicable to obesity research?
Source: My thesis (Lovelace 2014)
The data hierarchy concept applied to global travel data
Gilbert, Daniel T., Gary King, Stephen Pettigrew, and Timothy D. Wilson. 2016. “Comment on ‘Estimating the Reproducibility of Psychological Science’.” Science 351 (6277): 1037–7. doi:10.1126/science.aad7243.
Hollander, Yaron. 2015. “Who Will Save Us from the Misuse of Transport Models?” CTthink. http://www.ctthink.com/publications.html.
Lovelace, Robin, Mark Birkin, Philip Cross, and Martin Clarke. 2016. “From Big Noise to Big Data: Toward the Verification of Large Data Sets for Understanding Regional Retail Flows.” Geographical Analysis 48 (1): 59–81. doi:10.1111/gean.12081.
Lovelace, Robin, Anna Goodman, Rachel Aldred, Nikolai Berkoff, Ali Abbas, and James Woodcock. 2016. “The Propensity to Cycle Tool: An Open Source Online System for Sustainable Transport Planning.” Journal of Transport and Land Use, December. doi:10.5198/jtlu.2016.862.