knitr::opts_chunk$set(echo = TRUE)

Hertfordshire PM₂.₅ Analysis Workflow

A transparent, reproducible data‑processing narrative

Introduction

This document provides a clear, reproducible account of how PM₂.₅ data for Hertfordshire and Bedfordshire was collected, cleaned, and analysed. It supports the findings presented in the Particulate Matter (PM₂.₅) in Ambient Air 2025 report and demonstrates the workflow used to generate the figures included there.

The focus here is on methodology rather than interpretation. By documenting each step — from data import to quality assurance and visualisation — this R Markdown file aims to make the analytical process open, repeatable, and accessible for others working in air‑quality analysis or reproducible research.

The analysis predominantly uses open‑source tools in R, ensuring that all steps — from data import to visualisation — can be independently reviewed and repeated. Where this has not been done an explanation as to why is provided.

Data Preparation

Before any analysis could begin, the datasets needed to be standardised, validated, and combined into a format suitable for reproducible processing. The PM₂.₅ data used in this workflow comes from two distinct sources — Defra‑approved reference monitors and low‑cost multi‑pollutant sensors — each with different formats, levels of completeness, and quality considerations.

This section outlines the steps taken to import, clean, and harmonise these datasets so that they could be analysed consistently across the full 2024 calendar year. The emphasis is on transparency: every transformation is documented, and decisions such as excluding faulty data or applying data‑capture thresholds are explained clearly so that the workflow can be replicated or audited by others.

Packages installed

The analysis carried out in the data relied on both the tidyverse and openair packages. The openair package was specifically developed to analyze air pollution and atmospheric composition data in general.

The workflow also uses the openairmaps and leaflet packages to produce interactive maps of monitoring locations. These tools allow the spatial distribution of reference monitors and AirScan sensors to be visualised directly within the R Markdown document, providing useful context for interpreting site‑level results and understanding the geographic coverage of the monitoring network.

library(tidyverse)

## Warning: package 'tidyverse' was built under R version 4.5.2

## Warning: package 'tibble' was built under R version 4.5.2

## Warning: package 'readr' was built under R version 4.5.2

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.2.0
## ✔ forcats   1.0.0     ✔ stringr   1.6.0
## ✔ ggplot2   3.5.2     ✔ tibble    3.3.1
## ✔ lubridate 1.9.4     ✔ tidyr     1.3.1
## ✔ purrr     1.2.0     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(openair)

## Warning: package 'openair' was built under R version 4.5.2

Data

Importing Reference Monitor Data

Reference‑grade PM₂.₅ data was downloaded directly from the Air Quality England database using the openair package. This ensures that the analysis is based on fully ratified data that meets Defra’s requirements for accuracy and comparability.

The function was used to retrieve annual, daily and hourly data for all Hertfordshire and Bedfordshire reference sites. These data form the backbone of the analysis, particularly for daily means, annual means, and long‑term trend calculations.

HB2024D <- importUKAQ(
  site = c("HB009", "HB018", "HB013", "HB012", "LA001", "HB007", "HB001", "BDMP", "HB017", "LUTR"), year = 2024, data_type = "daily") 

HB2023D <- importUKAQ(
  site = c("HB009", "HB018", "HB013", "HB012", "LA001", "HB007", "HB001", "BDMP", "HB017", "LUTR"), year = 2023, data_type = "daily") 

HB2022D <- importUKAQ(
  site = c("HB009", "HB018", "HB013", "HB012", "LA001", "HB007", "HB001", "BDMP", "HB017", "LUTR"), year = 2022, data_type = "daily") 

HB2021D <- importUKAQ(
  site = c("HB009", "HB018", "HB013", "HB012", "LA001", "HB007", "HB001", "BDMP", "HB017", "LUTR"), year = 2021, data_type = "daily") 

HB2020D <- importUKAQ(
  site = c("HB009", "HB018", "HB013", "HB012", "LA001", "HB007", "HB001", "BDMP", "HB017", "LUTR"), year = 2020, data_type = "daily") 

HB2019D <- importUKAQ(
  site = c("HB009", "HB018", "HB013", "HB012", "LA001", "HB007", "HB001", "BDMP", "HB017", "LUTR"), year = 2019, data_type = "daily") 

HB2018D <- importUKAQ(
  site = c("HB009", "HB018", "HB013", "HB012", "LA001", "HB007", "HB001", "BDMP", "HB017", "LUTR"), year = 2018, data_type = "daily") 

HB2017D <- importUKAQ(
  site = c("HB009", "HB018", "HB013", "HB012", "LA001", "HB007", "HB001", "BDMP", "HB017", "LUTR"), year = 2017, data_type = "daily") 

HB2016D <- importUKAQ(
  site = c("HB009", "HB018", "HB013", "HB012", "LA001", "HB007", "HB001", "BDMP", "HB017", "LUTR"), year = 2016, data_type = "daily")

Annual mean dataset

HBAQE <- importUKAQ(
  site = c("HB009", "HB018", "HB013", "HB012", "LA001", "HB007", "HB001", "BDMP", "HB017", "LUTR"), year = 2016:2024, data_type = "annual", pollutant = "pm2.5", meta = TRUE)

## Importing Statistics ■■■■■■■■■■■■■ 39% | ETA: 2sImporting Statistics
## ■■■■■■■■■■■■■■ 44% | ETA: 2sImporting Statistics ■■■■■■■■■■■■■■■■ 50% | ETA:
## 2sImporting Statistics ■■■■■■■■■■■■■■■■■■■■■■■ 72% | ETA: 1sImporting
## Statistics ■■■■■■■■■■■■■■■■■■■■■■■■■■ 83% | ETA: 1sImporting Statistics
## ■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 89% | ETA: 0s

Importing AirScan Sensor Data

AirScan multi‑pollutant sensors do not provide an API connection, so their data was supplied as individual files. These were: - downloaded manually - consolidated using Excel Power Query - imported into R using data.table:::fread for efficiency

Only outdoor sensors were included, as the report focuses solely on ambient PM₂.₅. Because the sensors were deployed mid‑2024 and experienced intermittent outages, none met the 75% data‑capture threshold. They are therefore used for exploratory context only, not for compliance assessment or long‑term trend analysis.

library(data.table)

AirScan2024_PM2_5 <- fread("AirScan2024_PM2.5.csv")

AirScan2024_PM2_5 <- AirScan2024_PM2_5 %>% select(-V4, -V5)

AirScan2024_PM2_5$Date <- as.Date(AirScan2024_PM2_5$Date)

Cleaning and Harmonising the Datasets

Once imported, the datasets required several steps to ensure consistency: - Column names were standardised across reference and sensor datasets. - Timestamps were parsed and converted to a uniform POSIXct format. - A complete 2024 date sequence was generated to ensure that missing values were explicit rather than implicit. - A full grid of date × site was created, allowing daily means to be calculated consistently even when data was missing. - Additional empty or unused columns were removed to keep the dataset tidy.

This harmonisation step ensures that all subsequent analyses — daily means, temporal variation, annual means — are based on a consistent structure.

AirScan2024_PM2_5 <- AirScan2024_PM2_5 %>%
  rename(site = `AirScan Sensor`, date = Date, value = `PM2.5`)

full_dates <- seq.Date(as.Date("2024-01-01"), as.Date("2024-12-31"), by = "day")

full_grid <- expand.grid(date = full_dates, site = unique(AirScan2024_PM2_5$site))

AirScan2024_PM2_5_padded <- full_grid %>%
  left_join(AirScan2024_PM2_5, by = c("date", "site"))

Identifying and Excluding Faulty Data

During exploratory plotting, one AirScan sensor — Hockerill Street — was found to be producing implausibly high values, with daily means exceeding 200 µg/m³ on multiple occasions. These values were not corroborated by any nearby reference or sensor sites.

Given the regional nature of PM₂.₅ and its atmospheric lifetime, such isolated spikes strongly indicate instrument malfunction rather than genuine pollution events. The dataset from this sensor was therefore: - removed from the main analysis, - retained separately for transparency, - and documented clearly as an outlier.

This step aligns with best practice for quality assurance in air‑quality monitoring.

AirScan2024_PM2_5_NOHOckerill <- AirScan2024_PM2_5_padded %>%
  filter(site != "Hockerill Street")

AirScan2024_PM2_5_HOckerillOnly <- AirScan2024_PM2_5_padded %>% 
  filter(site == "Hockerill Street")

Location map

A location map for each of the reference monitors was created following the instructions for the “Do It Yourself” Network Maps in the Openair book and the leaflet package. Although not an interactive map, the coding enabled the development of a map that could be downloaded as an image.

library(openairmaps)

## Warning: package 'openairmaps' was built under R version 4.5.2

library(leaflet)

## Warning: package 'leaflet' was built under R version 4.5.2

map_data <- HBAQE |>
  buildPopup(
    latitude = "latitude",
    longitude = "longitude",
    columns = c(
      "Code" = "code",
      "Name" = "site",
      "Site Type" = "site_type"
    )
  ) |>
  distinct(site, .keep_all = TRUE)

leaflet(map_data) |>
  addTiles() |>
  addMarkers(
    popup = ~popup,
    )

## Assuming "longitude" and "latitude" are longitude and latitude, respectively

Preparing for Analysis

With the datasets cleaned, harmonised, and validated, they were ready for: - daily mean calculations - annual mean comparisons with Defra background maps - temporal variation analysis using - long‑term trend analysis (2016–2024) - DAQI calendar plotting

The next sections walk through each of these analytical components, with narrative explanations accompanying the relevant code.

Daily Mean PM₂.₅ Concentrations

The datasets used in this analysis were downloaded as daily mean values, so no additional aggregation from hourly data was required. This ensured that the workflow focused on quality assurance and interpretation rather than raw data transformation.

Daily mean PM₂.₅ concentrations were plotted for each monitoring site to illustrate short‑term variation across 2024. These charts were used in the main report to highlight the stability of PM₂.₅ across the region and to identify the single moderate pollution event observed in March 2024.

HB2024D$date <- as.Date(HB2024D$date)

png(filename = "plot_PM2.5_1col_facet.png", width = 2480, height = 3508, res = 300)
  
ggplot(HB2024D, aes(x = date, y = pm2.5, color = site, group = site)) +
geom_line(linewidth = 1) +
geom_hline(yintercept = 15, linetype = "dashed", color = "red") +
geom_hline(yintercept = 36, linetype = "dashed", color = "goldenrod") +
scale_x_date(name = "Date") +
facet_wrap(~ site, ncol = 1) +
labs(x = "Date", y = "Daily PM2.5 Concentration") +
theme_minimal() +
theme(
axis.text.x = element_text(angle = 45, hjust = 1),
legend.position = "none"  
)

## Warning: Removed 573 rows containing missing values or values outside the scale range
## (`geom_line()`).

 dev.off()

## png 
##   2

png(filename = "AirScan Daily Means.png", width = 2480, height = 3508, res = 300)
ggplot(AirScan2024_PM2_5_NOHOckerill, aes(x = date, y = value, color = site, group = site)) +
  geom_line(linewidth = 1) +
  geom_hline(yintercept = 15, linetype = "dashed", color = "red") +
  geom_hline(yintercept = 36, linetype = "dashed", color = "goldenrod") +
scale_x_date(name = "Date") +
  facet_wrap(~ site, ncol = 1) +
  labs(x = "Date", y = "Daily PM2.5 Concentration") +
  theme_minimal() +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1),
    legend.position = "none"  
  )

## Warning: Removed 2207 rows containing missing values or values outside the scale range
## (`geom_line()`).

dev.off()

## png 
##   2

ggplot(AirScan2024_PM2_5_HOckerillOnly, aes(x = date, y = value, color = site, group = site)) +
  geom_line(linewidth = 1) +
  geom_hline(yintercept = 25, linetype = "dashed", color = "red") +
  geom_hline(yintercept = 36, linetype = "dashed", color = "goldenrod") +
  scale_x_date(name = "Date") +
    labs(x = "Date", y = "Daily PM2.5 Concentration",
       subtitle = "Red dashed = WHO daily guideline (25 µg/m³), Yellow dashed = UK alert threshold (36 µg/m³)") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

## Warning: Removed 138 rows containing missing values or values outside the scale range
## (`geom_line()`).

### Annual Mean PM₂.₅ Concentrations

In addition to the daily‑mean datasets, a separate annual mean dataset with full metadata was downloaded. This dataset includes: - annual mean PM₂.₅ concentrations - data‑capture percentages - site type and classification

Because these values are already calculated and quality‑assured by the data provider, no additional annual‑mean computation was required. Instead, the analysis focused on: - filtering sites that met the ≥75% data‑capture threshold - identifying sites requiring annualisation - excluding sites with insufficient data - comparing monitored annual means with Defra’s modelled background values

This approach ensures consistency with Defra’s LAQM TG22 guidance and aligns with the methodology used in the main report.

Comparison with Defra Background Maps

Using the imported annual‑mean dataset, each site’s monitored concentration was compared with the corresponding modelled background value from Defra’s 1 km grid. Background values were extracted manually from the published dataset and entered into a small lookup table (tribble) for reproducibility.Background Mapping data for local authorities

This comparison helps distinguish between: -regional background contributions, which dominate in most locations -localised influences, such as roadside emissions or nearby combustion sources

In the main report, this analysis showed that most monitored values were slightly below the modelled background, suggesting that diffuse regional sources remain the primary driver of PM₂.₅ in Hertfordshire and Bedfordshire.

pm_summary <- tribble(
  ~Site, ~`Monitored_2024`, ~`Background_2024`,
  "Borehamwood Meadow Park", 6.909872, 8.221312,
  "Dacorum Northchurch High Street", 7.924535, 7.32426,
  "East Herts Hertford Gascoyne Way", 8.007649, 8.051809,
  "Hertsmere Borehamwood Roadside", 6.95, 8.165562,
  "Luton Airport FutureLuToN", 7.556129, 7.631193,
  "Luton Dunstable Road East", 7.758886, 8.350014,
  "Stevenage St Georges Way South", 7.524266, 7.721153,
  "Welwyn Hatfield", 6.439454, 7.898216
)

pm_long <- pm_summary |> 
  pivot_longer(cols = c(`Monitored_2024`, `Background_2024`), 
               names_to = "Type", 
               values_to = "PM2.5")

ggplot(pm_long, aes(x = Site, y = PM2.5, fill = Type)) +
  geom_bar(stat = "identity", position = position_dodge(width = 0.8)) +
  scale_fill_manual(values = c("Monitored_2024" = "forestgreen", "Background_2024" = "blue")) +
  geom_hline(yintercept = 10, linetype = "dashed", color = "blue") +
  geom_hline(yintercept = 5, linetype = "dashed", color = "red") +
  annotate("text", x = 1, y = 10.2, label = "UK Air Quality Objective = 10 µg/m³ by 2040",
           hjust = 0, color = "blue", size = 3.5) +
  annotate("text", x = 1, y = 5.2, label = "WHO Limit Value = 5 µg/m³",
           hjust = 0, color = "red", size = 3.5) +
  labs(x = "Monitoring Site", y = "PM2.5 (µg/m³)", fill = "PM 2.5 Concentration") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1))

Temporal Variation in PM₂.₅ Concentrations

Although the main datasets used in this workflow were downloaded as daily mean values, the temporal‑variation analysis required hourly resolution. To support this, a separate hourly dataset was downloaded directly from the Air Quality England database using the openair package.

Hourly data is essential for producing the following plots: - monthly variation - weekday variation - hourly variation - combined hour–day plots

These plots reveal patterns that cannot be seen in daily averages — such as rush‑hour bumps, evening peaks linked to domestic burning, and seasonal shifts in pollutant behaviour.

The hourly dataset was used exclusively for these temporal analyses, while the daily‑mean and annual‑mean datasets were used for all other components of the workflow. This ensures that each analysis is based on the most appropriate level of temporal detail. These were used to compare local patterns with national AURN trends.

HB2024H <- importUKAQ(
  site = c("HB009", "HB018", "HB013", "HB012", "LA001", "HB007", "HB001", "BDMP", "HB017", "LUTR"), year = 2024, meta = TRUE)

## Importing AQ Data ■■■■■■■■■■ 30% | ETA: 3sImporting AQ Data ■■■■■■■■■■■■■ 40% |
## ETA: 3sImporting AQ Data ■■■■■■■■■■■■■■■■ 50% | ETA: 3sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■ 60% | ETA: 2sImporting AQ Data ■■■■■■■■■■■■■■■■■■■■■■ 70% |
## ETA: 2sImporting AQ Data ■■■■■■■■■■■■■■■■■■■■■■■■■ 80% | ETA: 1sImporting AQ
## Data ■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 90% | ETA: 1s

timeVariation(HB2024H,
              pollutant = c("pm2.5", "no2"),
              normalise = TRUE
)

Long‑Term Trends (2016–2024)

Long‑term trends were assessed using the annual mean dataset, which contains published annual averages for each site across multiple years. Because these values already incorporate data‑capture thresholds and quality assurance, the analysis focused on: - selecting sites with ≥75% data capture for each year - calculating the regional average annual mean - plotting site‑level deviations from the regional mean

This method ensures that the trend analysis is based on consistent, validated annual statistics rather than reconstructed values. In 2024, the spread of site‑level deviations narrowed noticeably — a sign of increasing consistency across the network.

HBAQEPlus75 <- HBAQE %>% 
    filter(pm2.5_capture >= 0.75)

HBAQEPlus75 <- HBAQEPlus75 %>%
  mutate(date = as.Date(date))

annual_mean <- HBAQEPlus75 %>%
  group_by(date) %>%
  summarise(mean_pm25 = mean(pm2.5, na.rm = TRUE)) %>%
  rename(year = date)

df_dev <- HBAQEPlus75%>%
  rename(year = date) %>%
  left_join(annual_mean, by = "year") %>%
  mutate(deviation = pm2.5 - mean_pm25)

ggplot(annual_mean, aes(x = year, y = mean_pm25)) +
  geom_line(linewidth = 1) +
  geom_point(size = 2) +
  geom_hline(yintercept = 10, linetype = "dashed", color = "red") +
  geom_hline(yintercept = 12, linetype = "dashed", color = "yellow") +
  labs(subtitle = "Red dashed = Annual Mean Concentration Target (10 µg/m³); 
       Yellow dashed = Interim Annual Mean Concentration Target (12 µg/m³)",
    y = "PM2.5 (µg/m³)",
    x = NULL
  ) +
  theme_minimal(base_size = 12)

ggplot(df_dev, aes(x = year, y = site, fill = deviation)) +
  geom_tile(colour = "white") +
  scale_fill_gradient2(
    low = "#2166ac",
    mid = "white",
    high = "#b2182b",
    midpoint = 0,
    name = "Deviation\nfrom annual mean"
  ) +
  labs(
    title = "Site-Level Deviations from Annual Mean PM2.5",
    x = "Year",
    y = NULL
  ) +
  theme_minimal(base_size = 12) +
  theme(
    axis.text.y = element_text(size = 9),
    panel.grid = element_blank()
  )

Days with Moderate or Higher Pollution (DAQI)

Daily means were used for DAQI classification, but the annual mean dataset provided useful metadata for interpreting these results — particularly data‑capture completeness and site classification. Because the daily‑mean dataset was already provided in daily format, DAQI categories could be assigned directly without additional aggregation.

Calendar plots were produced for each year to visualise the frequency and timing of moderate‑or‑above pollution days.

labels <- c("1- Low", "2- Low", "3- Low", "4- Moderate", "5- Moderate",
            "6- Moderate", "7- High", "8- High", "9- High", "10- Very High")
pm25.breaks <- c(0, 12, 24, 35, 42, 47, 53, 59, 65, 70, 1000)


AQE_CalendarPLot<- calendarPlot(HB2024D, type = "site", year = 2024, pollutant = "pm2.5",
             labels = labels, breaks = pm25.breaks, statistic = "mean", cols = "daqi")

## Warning: ! Duplicate dates detected in mydata$date.
## ℹ Are there multiple sites in `mydata`? Use the type argument to condition them
##   separately.

Compliance with the Annual Mean Concentration Target (AMCT) and Population Exposure Reduction Target (PERT)

For guidance only, the PERT for Hertfordshire was calculated in an excel spreadsheet using the methodology outlined on the UK-Air website. Currently developing a method for calculating a PERT in R.

Public Health Outcomes Framework (PHOF)

The PHOF Indicator D.01, which represents the fraction of mortality attributable to long‑term exposure to PM₂.₅, and the Air pollution: fine particulate matter (new method – concentrations of total PM₂.₅) indicator were both accessed directly in R using the Fingertips API. The D.01 indicator is calculated independently by ONS and Ricardo E&E using the underlying fine particulate matter concentration indicator. Definitions and methodological details for both indicators are available on the Fingertips Website

The analysis focused on:
- extracting PHOF values for Hertfordshire and Bedfordshire
- comparing them with the England average
- linking them to monitored and modelled PM₂.₅ concentrations

The data was accessed using the fingertipsR package, which provides an interface to the Fingertips API. This package can be installed from rOpenSci or GitHub.

library(fingertipsR)


fingertips_data_death <- read_csv("https://fingertips.phe.org.uk/api/all_data/csv/for_one_indicator?indicator_id=93861")

## Rows: 2781 Columns: 27
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (15): Indicator Name, Parent Code, Parent Name, Area Code, Area Name, Ar...
## dbl  (4): Indicator ID, Time period, Value, Time period Sortable
## lgl  (8): Lower CI 95.0 limit, Upper CI 95.0 limit, Lower CI 99.8 limit, Upp...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

fingertips_data_conc <- read_csv("https://fingertips.phe.org.uk/api/all_data/csv/for_one_indicator?indicator_id=93867")

## Rows: 2742 Columns: 27
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (15): Indicator Name, Parent Code, Parent Name, Area Code, Area Name, Ar...
## dbl  (4): Indicator ID, Time period, Value, Time period Sortable
## lgl  (8): Lower CI 95.0 limit, Upper CI 95.0 limit, Lower CI 99.8 limit, Upp...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

HB_conc <- fingertips_data_conc %>%
  filter(`Area Name` %in% c("Broxbourne", "Central Bedfordshire", "St Albans", "Dacorum", "England", "Hertsmere", "East Hertfordshire", "Welwyn Hatfield", "Hertfordshire", "Bedfordshire", "Luton", "Stevenage", "Three River", "Watford", "North Hertfordshire", "Broxbourne", "England")) %>%
  select(`Indicator Name`, `Area Name`, `Time period`, Value) %>%
  rename(Conc_pm2.5 = `Indicator Name`, Value_conc = Value)

HB_deaths <- fingertips_data_death %>%
  filter(`Area Name` %in% c("St Albans", "Dacorum", "Hertsmere", "East Hertfordshire", "Welwyn Hatfield", "Hertfordshire", "Bedfordshire", "Central Bedfordshire", "Luton", "Stevenage", "Three River", "Watford", "North Hertfordshire", "Broxbourne", "England")) %>%
  select(`Indicator Name`, `Area Name`, `Time period`, Value) %>%
  rename(Value_deaths = Value, Percentage_deaths = `Indicator Name`)

HB_conc_deaths <- left_join(HB_conc, HB_deaths, by = c("Area Name", "Time period"))

## Warning in left_join(HB_conc, HB_deaths, by = c("Area Name", "Time period")): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 1 of `x` matches multiple rows in `y`.
## ℹ Row 34 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
##   "many-to-many"` to silence this warning.

ggplot(HB_conc_deaths, aes(x = `Time period`)) +
  geom_line(aes(y = Value_conc, 
                color = "Annual concentration of PM2.5 adjusted to account for population exposure"), 
            linewidth = 1) +
  geom_line(aes(y = Value_deaths, 
                color = "Estimated number of deaths attributable to PM2.5"), 
            linewidth = 1) +
  scale_y_continuous(
    name = "PM2.5 Concentration (µg/m³)",
    sec.axis = sec_axis(~ ., name = "Attributable Mortality (number of deaths)")
  ) +
  scale_color_manual(values = c(
    "Annual concentration of PM2.5 adjusted to account for population exposure" = "darkblue",
    "Estimated number of deaths attributable to PM2.5" = "firebrick"
  )) +
  facet_wrap(~`Area Name`) +
  labs(
    title = "PHOF Fine Particulate Matter Indicators",
    x = "Year",
    color = "Indicator"
  ) +
  theme_minimal() +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1),
    legend.position = "bottom"
  )

Conclusion

This workflow demonstrates a transparent and reproducible approach to analysing PM₂.₅ data for Hertfordshire and Bedfordshire using R. By combining daily‑mean datasets, annual‑mean metadata, and separately downloaded hourly data, the analysis reflects the structure and limitations of the available evidence while ensuring that each analytical step uses the most appropriate level of temporal detail.

The workflow shows how reference‑grade and indicative datasets can be harmonised, validated, and explored using open‑source tools. It also documents key quality‑assurance decisions — such as excluding the faulty Hockerill Street sensor and applying Defra’s data‑capture thresholds — to ensure that the results are robust and defensible.

Although this document focuses on methodology rather than interpretation, the outputs support the findings presented in the Particulate Matter (PM₂.₅) in Ambient Air 2024 report: PM₂.₅ concentrations across the region remain low, long‑term trends continue to decline, and moderate pollution episodes are now rare. The workflow also provides a foundation for future analyses, including the Population Exposure Reduction Target (PERT) calculations that will be incorporated once the methodology is finalised.

By publishing this workflow on RPubs, the aim is to make the analytical process open, accessible, and reproducible — supporting good practice in environmental data analysis and demonstrating how R can be used to produce clear, defensible evidence for public‑health reporting.

Data Sources

Air Quality England (AQE) – Reference‑grade PM₂.₅ monitoring data (daily and hourly). (airqualityengland.co.uk)
Defra Modelled Background Maps (1 km grid) – Annual mean PM₂.₅ background concentrations. (uk-air.defra.gov.uk)
Public Health Outcomes Framework (PHOF) – Indicator D.01 (fraction of mortality attributable to PM₂.₅). (fingertips.phe.org.uk)
AirScan Multi‑Pollutant Sensors – Site‑level CSV files provided by Hertfordshire County Council.

Guidance and Methodology

Defra (2022). Local Air Quality Management Technical Guidance (LAQM TG22). Provides requirements for data capture thresholds, annualisation, and reporting standards.
Environmental Targets (Fine Particulate Matter) (England) Regulations 2023. Defines the Annual Mean Concentration Target (AMCT) and Population Exposure Reduction Target (PERT).

R Packages

openair – Tools for air‑quality data analysis and visualisation. Carslaw, D.C. & Ropkins, K. (2012). openair — An R package for air quality data analysis.
tidyverse – Data wrangling, plotting, and workflow support.
data.table – Efficient import and manipulation of large datasets.
fingertipsR – Access to Public Health England indicators.
leaflet – Interactive mapping for R, based on the Leaflet JavaScript library.
openairmaps – Mapping tools designed for air‑quality datasets, integrating seamlessly with openair

Report Referenced

Particulate Matter (PM₂.₅) in Ambient Air 202 – Hertfordshire & Bedfordshire Air Quality Forum. (Cerys Williams, 2025)

Hertfordshire PM₂.₅ Analysis Workflow

Reproducible Methods for the 2024 Air Quality Report

Cerys Williams

19 March 2026

Hertfordshire PM₂.₅ Analysis Workflow

Introduction

Data Preparation

Packages installed

Data

Importing Reference Monitor Data

Importing AirScan Sensor Data

Cleaning and Harmonising the Datasets

Identifying and Excluding Faulty Data

Location map

Preparing for Analysis

Daily Mean PM₂.₅ Concentrations

Comparison with Defra Background Maps

Temporal Variation in PM₂.₅ Concentrations

Long‑Term Trends (2016–2024)

Days with Moderate or Higher Pollution (DAQI)

Compliance with the Annual Mean Concentration Target (AMCT) and Population Exposure Reduction Target (PERT)

Public Health Outcomes Framework (PHOF)

Conclusion

Data Sources

Guidance and Methodology

R Packages

Report Referenced