2017-08-31

Contents

  • Introduction
  • Data and Methods
  • Policy Implications

Introduction

Context

  • Project title: Using Machine Learning and Big Data to Model Car Dependency: an Exploration Using Origin-Destination Data
  • Submitted to the Transport Technology Research Innovation Grant (T-TRIG) funding stream
  • Big data targetted call

Project outline:

  • This project will explore the potential for emerging methods in Big Data and Machine Learning to be applied to transport modelling, with a focus on car dependency at the origin-destination level.

Why Car Dependency?

  • Little government funded research into the issue (CfIT abolished 2010)
  • Yet high social, economic and environmental impact
  • Builds on DfT work: Propensity to Cycle Tool (PCT) (Lovelace et al. 2017)

(Source: CfIT)

A Person Level Definition of Car Dependency

If for some reason you could not longer use a car, would you find it… really inconvenient more or less every day never?

  • Source: (Anable 2005)

Evidence-Based Measures of Car Dependency

Simple measures:

  • Percent driving to work
  • Number of cars per household
  • Road traffic data
  • Road casualties

Complex measures:

  • Numbers who could have used another mode
  • People who could not get to work 'if the oil ran out' (Philips 2015)

Modal split

Source: datashine

Outputs

  • A demonstration of how official datasets can be augmented with geographical data from OpenStreetMap.

  • An application of Machine Learning algorithms to widely available transport datasets.

  • A tutorial on Big Data and Machine Learning targetted at transport planners, consultants and policy-makers relatively new to data science.

  • Write-up of the policy implications of the research.

  • An R package, mlCars, containing the source code, example data and reports to ensure reproducibility. The package can be accessed here: https://github.com/Robinlovelace/mlCars/

What is Big Data?

Big data is an umbrella term. We define it as:

"unconventional datasets that are difficult to analyze using established methods. Often this difficulty relates to size but the form, format, and complexity are equally important" (Lovelace et al. 2016).

What is Machine Learning?

By machine learning, we mean simply that the functional form of the model is not specified by the user.

There are two main types of machine learning (Hastie, Tibshirani, and Friedman 2016):

supervised or unsupervised. In supervised learning, the goal is to pre- dict the value of an outcome measure based on a number of input measures; in unsupervised learning, there is no outcome measure, and the goal is to describe the associations and patterns among a set of input measures.

Many transport problems can be framed as supervised Machine Learning problems.

Data and Methods

Official, Open access Data

  • Origin-Desination (OD) data
  • Geographically aggregated socio-demographic data from the 2011 Census

OSM Data

  • Spatial data from the OpenStreetMap API and the NAPTAN dataset

## Response Variable

Classical Statistical Methods

  • Descriptive Statistics
  • Visualisation
  • Linear Regression

Machine Learning

  • Python and R for Machine Learning
  • Random Forest and Boosted Regression Trees (xgboost)
  • Theano, Spark, TensorFlow

Directed Acyclic Graphs (DAGs)

Directed Acyclic Graphs (DAGs) were used to explore causality between the variables.

Tricky to create the DAG and implement - Rob Long.

Policy Implications

Methodological implications

Oportunties:

  • Can be used to uncover patterns in complex multi-dimensional data that previous methods cannot
  • Encourages transport researchers to use a range of new datasets
  • ML development is primarily open source: opportunity for citizen science

Risks:

  • Can be abused (e.g. to cherry-pick): enforce reproducibility
  • Easy to spend longer setting-up model than actually focussing on issue

Methodological Recommendations:

  • Not a question of whether to use it but when, how and by whom
  • Use Keras: "Being able to go from idea to result with the least possible delay is key to doing good research."
install.packages("keras")
  • Do not use machine learning on a dataset unless:
  • It is very well understood
  • It has been cross-validated against other sources
  • Value has been maximised by visualisation and analysis (imagine if the PCT had become a machine learning project!)

The Policy Implications of Car Dependency

  • Car dependency is the cause of a variety of social, environmental and economic issues in West Yorkshire.

  • Creating more sustainable regions by reducing car dependency and developing multi-modal mobility requires a multi-faceted approach that involves both ‘hard’ and ‘soft’ policy interventions.

Machine Learning and Big Data for the Transport Sector

  • Machine Learning and big data present new opportunities for the transport sector.

  • Big data and Machine Learning offer the promise of more efficient and cheaper transport models based on rich, finely-grained data across a wider range of indicators.

  • However, they should not supplant domain expertise or contextual knowledge in transport policy and planning. They are best used as aides to human decision-making rather than replacements.

  • Barriers to the effective and ethical uptake of innovative new technologies in transport include: access to skills, investment in R&D and organisational culture.

Contact

References

Anable, Jillian. 2005. “‘Complacent Car Addicts’ or ‘Aspiring Environmentalists’? Identifying Travel Behaviour Segments Using Attitude Theory.” Transport Policy 12 (1): 65–78. doi:10.1016/j.tranpol.2004.11.004.

Hastie, Trevor, Robert Tibshirani, and Jerome Friedman. 2016. The Elements of Statistical Learning: Data Mining, Inference, and Prediction, Second Edition. 2nd edition. New York, NY: Springer.

Lovelace, Robin, Mark Birkin, Philip Cross, and Martin Clarke. 2016. “From Big Noise to Big Data: Toward the Verification of Large Data Sets for Understanding Regional Retail Flows.” Geographical Analysis 48 (1): 59–81. doi:10.1111/gean.12081.

Lovelace, Robin, Anna Goodman, Rachel Aldred, Nikolai Berkoff, Ali Abbas, and James Woodcock. 2017. “The Propensity to Cycle Tool: An Open Source Online System for Sustainable Transport Planning.” Journal of Transport and Land Use 10 (1). doi:10.5198/jtlu.2016.862.