The EDIF project aims to identify use cases for Urban Observatory (UO) data. This allows us to identify ways in which the data architecture can be improved. We have focused mainly on traffic data.
The first case study investigates traffic flows using ANPR cameras. Additional studies investigate links between traffic and air pollution, and changes in travel patterns related to individual events.
Building on this work, we have then identified areas in which more detailed UO metadata could prove beneficial.
We have created an R function to download UO datasets. This function can be applied to any other datasets that are stored in similar fashion. Parameters such as dataset name, time period desired, and the base URL, can be amended as required.
For the download function, see https://github.com/ITSLeeds/trafficsim/blob/main/R/download.R and for the generalised UO data import and cleaning script that uses it, see https://github.com/ITSLeeds/trafficsim/blob/main/code/data-import.R.
For code used in this section, see https://github.com/ITSLeeds/trafficsim/blob/main/code/censusdata.R.
Much analysis of traffic flow is based on 2011 Census data. This provides the most comprehensive survey of commute trips, including origins and destinations with high geographic resolution, but it is now over ten years out of date.
Automatic number plate recognition (ANPR) cameras provide vehicle flow data on a real-time basis, typically sending readings every 5 minutes. This allows us to account for long-term changes in traffic flows, such as changes related to the construction of new housing developments over the last decade, as well as short-term changes linked to events or periodic patterns. These cameras can also capture all trip types, rather than just commuting.
The 2011 Census commuting data was collated at MSOA level. We then used a ‘jittering’ process to disaggregate motor vehicle flows between MSOAs using randomly selected origin and destination points on local road networks (Figure @ref(fig:jitter)). The resulting desire lines were routed onto the road network using OpenTripPlanner, and these routes were then converted into a route network.
The top 1000 jittered desire lines for 2011 Census commute trips that start and/or end in Tyne and Wear
For the Urban Observatory sensors, we used data from 222 ANPR cameras across Tyne and Wear dating from February 2021, stored at https://archive.dev.urbanobservatory.ac.uk/. These record all vehicles with visible number plates. Each sensor ID represents a pair of camera locations. The dataset that we used is labelled “Plates In”, and it records the number of vehicles passing the first of the two locations in the sensor ID pair.
We assigned the camera locations to roads using Open Street Map road data, and conducted a series of data cleaning steps as described below. It was thus possible to compare the mean daily vehicle flow at each camera location with the vehicle flow at the same location according to the 2011 Census-based commute route network (Figure /@ref(fig:rnet)).
Daily vehicle flows on the 2011 Census-based commute route network and at ANPR camera locations in Feburary 2021
Sensors with names containing the string “DUMMY” or “TEST”, were excluded, as were sensors missing readings (or with readings showing zero traffic) on the majority of days within a given time period. For traffic data, sensors with names containing the string “BUS” were also excluded, as these appear to represent bus lanes.
The readings were incomplete during some time periods, with sensors only giving occasional readings, perhaps due to weather conditions or other factors. If not excluded from the analysis, this would appear to suggest very low traffic counts on certain days. Therefore, for the traffic data, we only included sensor readings from days on which at least 100 sensors recorded total traffic volumes of at least 20% of the maximum traffic volume for that sensor. For other data, we only included sensor readings from days on which at least half of all sensors recorded a total number of readings at least 20% of the maximum number of daily readings for that sensor.
Sometimes the ANPR cameras don’t send any readings for a few hours, then after that they send a single reading that contains the sum of all the vehicle plates recorded over the previous hours. To avoid problems related to this in time-sensitive analyses, we excluded time periods containing less than 10 readings (typically the cameras send readings every 5 minutes), and periods containing a single reading that is greater than 6 standard deviations higher than the mean reading for the given sensor.
There is a significant positive correlation between the traffic flows according to the UO sensor data and traffic flows in the route network derived from commute trips from the 2011 Census. This is despite the ten year time gap and the fact that the census data relates solely to travel to work, while the UO sensors will pick up all trip types. The R-squared was 0.3 (Figure @ref(fig:graph)).
Mean daily vehicles according to the Census-based route network and the ANPR sensors, with trend line
The model residuals are shown in Figure @ref/(fig:resid). Negative residuals reveal that the mean daily ANPR vehicle counts are lower than the fitted value from the model with the Census rnet, while positive residuals mean the ANPR counts are higher than the fitted value. The greatest residuals are often related to dual carriageways where the route network shows much greater traffic flows in one direction than the other. This is due to the way in which the route network was contructed, in which flows are assumed to be in one direction only. To improve accuracy, the route network should be amended to equalise bidirectional flows.
Residuals from a linear model of vehicle numbers from the ANPR plates data as a function of the vehicle numbers from the 2011 Census commute route network. Circles show residuals relating to mean daily vehicle numbers in February 2021
To conduct an analysis comparing two UO sensor types, we investigated relations between vehicle flows and concentrations of PM10 particulate air pollution.
The PM10 values at some sensors are subject to extreme spikes, which appear to be related to particular events. Figure /@ref(fig:outliers) shows the height of these spikes at a suburban Newcastle sensor, alongside a more regular pattern of PM10 levels at a city centre sensor. We are not aware of the nature of these events that caused these spikes, but they do not appear to be relevant for comparing air pollution to typical traffic flows. Therefore we assessed PM10 levels using the median value at each sensor to minimise the impact of these spikes, and compared these to median daily traffic volume in February 2021.
The PM10 sensors and the ANPR cameras do not share the same locations. Therefore, we investigated whether traffic volumes could best predict PM10 concentrations on the basis of locality or road identity. For the locality effect, we assessed correlations between sensors within the same MSOA. For road identity, we assigned sensors to their nearest road and assessed correlations between sensors on roads with the same reference (e.g. “A167”). When assigning the sensors to roads, we did not allow the traffic sensors to be matched with residential, service or unclassified roads, as these appeared to be false matches. The air pollution sensors were allowed to be matched to residential roads, but not to service or unclassified roads.
We found no significant correlation between PM10 concentrations and traffic volumes using either of these methods, although matching sensors by road identity produced a slightly higher R-squared than matching by MSOA (Figure /@ref(fig:pm10)). We also found no correlation among MSOAs between population density and PM10 concentrations.
Median daily sum of vehicles passing ANPR cameras in February 2021 (coloured circles); median PM10 concentrations (grey bubbles); and population density of the MSOAs that contain both sensor types
For code used in this section, see https://github.com/ITSLeeds/trafficsim/blob/main/code/air-pollution.R.
We investigated travel associated with home football matches at St. James’ Park stadium in Newcastle. During the first half of the 2021-22 season (August - December 2021), Newcastle United played 12 home matches, of which five were on Saturdays, all but one of these having a 3pm kick-off (the kick-off time for the final match is unknown but within an hour of 3pm). We compared road traffic levels on these five match days to traffic on the other 14 Saturdays during August - December 2021. There are ANPR cameras facing in both directions on both St James’ Boulevard and Barrack Road (Figure /@ref(fig:map)); between them these four cameras will cover most routes to and from the stadium.
Location and main daily vehicle flows for the four camera locations closest to St James’ Park stadium, Aug-Dec 2021
Using the four ANPR cameras closest to St. James’ Park stadium, we can see that on match days, traffic is greater than usual immediately before and after the matches, but drops sharply at the 3pm kick-off (Figure /@ref(fig:matchday)).
Mean hourly traffic on match Saturdays (red) and non-match Saturdays (black), at the four camera locations closest to the St. James’ Park stadium, Aug-Dec 2021
For code used in this section, see https://github.com/ITSLeeds/trafficsim/blob/main/code/stadiums.R.
It would be useful to make available metadata relating to following issues:
The location data provided was sufficient to identify the road on which the sensors were located in almost all cases, however there were often uncertainties regarding which traffic lanes the cameras were showing. A camera might cover all lanes in a given direction, or just one of them.
There were also uncertainties related to the way in which the ANPR camera locations were paired together to form sensor IDs. Each sensor ID related to a pair of camera locations, and it was possible for camera locations to be shared between multiple sensor pairs; for example the first camera location of one sensor pair could also be the first camera location of a different sensor pair. Yet in these cases, the traffic data was liable to differ between the two sensor IDs, with no clear reason why these differences existed. They may perhaps be due to different sensor IDs representing different traffic lanes at the same location.
Finally, it is known that ANPR cameras do not capture every single passing vehicle, due to factors such as weather conditions, vehicle speed and dirt covering license plates. It be helpful to know what proportion of plates are being recorded and the level of confidence in this. This is an example of the ‘confidence interval’ issue listed in the previous subsection.
To provide clearer guidance on these factors relating to the ANPR cameras and other road-based sensors, the following metadata could prove helpful: