Hertfordshire PM₂.₅ Analysis Workflow

A transparent, reproducible data‑processing narrative

Introduction

This document provides a clear, reproducible account of how PM₂.₅ data for Hertfordshire and Bedfordshire was collected, cleaned, and analysed. It supports the findings presented in the Particulate Matter (PM₂.₅) in Ambient Air 2026 report and demonstrates the workflow used to generate the figures included there.

The focus here is on methodology rather than interpretation. By documenting each step — from data import to quality assurance and visualisation — this R Markdown file aims to make the analytical process open, repeatable, and accessible for others working in air‑quality analysis or reproducible research.

The analysis predominantly uses open‑source tools in R, ensuring that all steps — from data import to visualisation — can be independently reviewed and repeated. Where this has not been done an explanation as to why is provided. As this report is for the benefit of Hertfordshire Public Health greater focucs will be made on the monitoring results of the PM2.5 data in Hertfordshire and the Bedfordshire monitoring data and data provided by Defra will be used to sense check.

Data Preparation

Before any analysis could begin, the datasets needed to be standardised, validated, and combined into a format suitable for reproducible processing. The PM₂.₅ data used in this workflow comes from two distinct sources — Defra‑approved reference monitors and low‑cost multi‑pollutant sensors — each with different formats, levels of completeness, and quality considerations.

This section outlines the steps taken to import, clean, and harmonise these datasets so that they could be analysed consistently across the full 2025 calendar year. The emphasis is on transparency: every transformation is documented, and decisions such as excluding faulty data or applying data‑capture thresholds are explained clearly so that the workflow can be replicated or audited by others.

Packages installed

The analysis carried out in the data relied on both the tidyverse and openair packages. The openair package was specifically developed to analyze air pollution and atmospheric composition data in general.

The workflow also uses the openairmaps and leaflet packages to produce interactive maps of monitoring locations. These tools allow the spatial distribution of reference monitors and multi - pollutant sensors to be visualised directly within the R Markdown document, providing useful context for interpreting site‑level results and understanding the geographic coverage of the monitoring network.

library(tidyverse)

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.2.1     ✔ readr     2.2.0
## ✔ forcats   1.0.1     ✔ stringr   1.6.0
## ✔ ggplot2   4.0.3     ✔ tibble    3.3.1
## ✔ lubridate 1.9.5     ✔ tidyr     1.3.2
## ✔ purrr     1.2.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(openair)

Data

Importing Reference Monitor Data

Reference‑grade PM₂.₅ data was downloaded directly from the Air Quality England database using the openair package. This ensures that the analysis is based on fully ratified data that meets Defra’s requirements for accuracy and comparability.

The function was used to retrieve annual, daily and hourly data for all Hertfordshire and Bedfordshire reference sites. These data form the backbone of the analysis, particularly for daily means, annual means, and long‑term trend calculations. It was not necessary to download the daily means for all reported years as this had been done for the 2025 report. However to enable better case/control visualisations the Hertfordshire and Bedfordshire data was downloaded as 2 data sets.

Herts2025D_PM2.5 <- importUKAQ(site = c("HB001","HB009", "HB013", "HB012", "HB017", "HB018","BDMP"), year = 2025, data_type = "daily", pollutant = "pm2.5", meta = TRUE)

## Importing AQ Data ■■■■■■■■■■■■■■ 43% | ETA: 1sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■■■■■■■■ 86% | ETA: 0s

Beds2025D_PM2.5 <- importUKAQ(site = c("LA001", "HB007", "LUTR"), year = 2025, data_type = "daily", pollutant = "pm2.5", meta = TRUE)

Annual mean data set

AllAQE <- importUKAQ(
  site = c("HB001","HB009","HB013","HB012","HB017","HB018","BDMP",
           "LA001","HB007","LUTR"),
  year = 2016:2025,
  data_type = "annual",
  pollutant = "pm2.5",
  meta = TRUE
)

## Importing Statistics ■■■■ 10% | ETA: 12sImporting Statistics ■■■■■ 15% | ETA:
## 13sImporting Statistics ■■■■■■■ 20% | ETA: 11sImporting Statistics ■■■■■■■■■
## 25% | ETA: 9sImporting Statistics ■■■■■■■■■■ 30% | ETA: 9sImporting Statistics
## ■■■■■■■■■■■ 35% | ETA: 9sImporting Statistics ■■■■■■■■■■■■■ 40% | ETA:
## 9sImporting Statistics ■■■■■■■■■■■■■■■ 45% | ETA: 8sImporting Statistics
## ■■■■■■■■■■■■■■■■ 50% | ETA: 7sImporting Statistics ■■■■■■■■■■■■■■■■■ 55% | ETA:
## 6sImporting Statistics ■■■■■■■■■■■■■■■■■■■■■ 65% | ETA: 4sImporting Statistics
## ■■■■■■■■■■■■■■■■■■■■■■ 70% | ETA: 3sImporting Statistics
## ■■■■■■■■■■■■■■■■■■■■■■■ 75% | ETA: 3sImporting Statistics
## ■■■■■■■■■■■■■■■■■■■■■■■■■ 80% | ETA: 2sImporting Statistics
## ■■■■■■■■■■■■■■■■■■■■■■■■■■■ 85% | ETA: 1sImporting Statistics
## ■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 90% | ETA: 1s

Hourly Data set

AllAQEH <- importUKAQ(
  site = c("HB001","HB009","HB013","HB012","HB017","HB018","BDMP","LA001","HB007","LUTR"),
  year = 2025,
  meta = TRUE
)

## Importing AQ Data ■■■■■■■■■■■■■ 40% | ETA: 2sImporting AQ Data ■■■■■■■■■■■■■■■■
## 50% | ETA: 2sImporting AQ Data ■■■■■■■■■■■■■■■■■■■ 60% | ETA: 1sImporting AQ
## Data ■■■■■■■■■■■■■■■■■■■■■■ 70% | ETA: 1sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■■■■■■ 80% | ETA: 1sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 90% | ETA: 0s

Importing AirScan Sensor Data

AirScan multi‑pollutant sensors do not provide an API connection, so their data was supplied as individual files. These were: - downloaded manually - consolidated using Excel Power Query - imported into R using data.table:::fread for efficiency

Only outdoor sensors were included, as the report focuses solely on ambient PM₂.₅. Because the sensors were deployed mid‑2024 and experienced intermittent outages, none met the 75% data‑capture threshold. They are therefore used for exploratory context only, not for compliance assessment or long‑term trend analysis.

library(data.table)

## 
## Attaching package: 'data.table'

## The following objects are masked from 'package:lubridate':
## 
##     hour, isoweek, isoyear, mday, minute, month, quarter, second, wday,
##     week, yday, year

## The following objects are masked from 'package:dplyr':
## 
##     between, first, last

## The following object is masked from 'package:purrr':
## 
##     transpose

AirScan2025_PM2_5_Daily <- read.csv("AirScan_Data_PM2.5_Daily.csv")

AirScan2025_PM2_5_Daily$Date <- as.Date(AirScan2025_PM2_5_Daily$Date)

Cleaning and Harmonising the Datasets

Once imported, the data sets required several steps to ensure consistency: - Column names were standardised across reference and sensor data sets. - Timestamps were parsed and converted to a uniform POSIXct format. - A complete 2025 date sequence was generated to ensure that missing values were explicit rather than implicit. - A full grid of date × site was created, allowing daily means to be calculated consistently even when data was missing. - Additional empty or unused columns were removed to keep the dataset tidy.

This harmonisation step ensures that all subsequent analyses — daily means, temporal variation, annual means — are based on a consistent structure.

For the reference method monitors annual mean data set: - monitors with a data capture rate below 75% were filtered out as required by Defra - An additional column denoting which geographical county each monitor is located in was added to the data set.

AllAQE <- AllAQE %>%
  mutate(
    Geo_County = case_when(
      site %in% c("HB001","HB009","HB013","HB012","HB017","HB018","BDMP") ~ "Hertfordshire",
      site %in% c("LA001","HB007","LUTR") ~ "Bedfordshire",
      TRUE ~ "Unknown"
    )
  )

AllAQE75 <- AllAQE %>%
  filter(pm2.5_capture >= 0.75) %>%
  mutate(date = as.Date(date))

County_annual_mean <- AllAQE75 %>%
  group_by(Geo_County, date) %>%
  summarise(mean_pm25 = mean(pm2.5, na.rm = TRUE), .groups = "drop") %>%
  rename(year = date)

AllAQEH <- AllAQEH %>%
  mutate(
    Geo_County = case_when(
      site %in% c("HB001","HB009","HB013","HB012","HB017","HB018","BDMP") ~ "Hertfordshire",
      site %in% c("LA001","HB007","LUTR") ~ "Bedfordshire",
      TRUE ~ "Unknown"
    )
  )

AirScan2025_PM2_5 <- AirScan2025_PM2_5_Daily %>%
  rename(site = `Sensor`, date = Date, value = `PM2.5`)
  
full_dates <- seq.Date(as.Date("2025-01-01"), as.Date("2025-12-31"), by = "day")

full_grid <- expand.grid(date = full_dates, site = unique(AirScan2025_PM2_5$site))

AirScan2025_PM2_5_padded <- full_grid %>%
  left_join(AirScan2025_PM2_5, by = c("date", "site"))

Location map

A location map for each of the reference monitors was created following the instructions for the “Do It Yourself” Network Maps in the Openair book and the leaflet package. Although not an interactive map, the coding enabled the development of a map that could be downloaded as an image.

library(openairmaps)
library(leaflet)

map_data <- AllAQE |>
  buildPopup(
    latitude = "latitude",
    longitude = "longitude",
    columns = c(
      "Code" = "code",
      "Name" = "site",
      "Site Type" = "site_type"
    )
  ) |>
  distinct(site, .keep_all = TRUE)

leaflet(map_data) |>
  addTiles() |>
  addMarkers(
    popup = ~popup,
    )

## Assuming "longitude" and "latitude" are longitude and latitude, respectively

Preparing for Analysis

With the datasets cleaned, harmonised, and validated, they were ready for: - daily mean calculations - annual mean comparisons with Defra background maps - temporal variation analysis using - long‑term trend analysis (2016–2025) - DAQI calendar plotting

The next sections walk through each of these analytical components, with narrative explanations accompanying the relevant code.

Daily Mean PM₂.₅ Concentrations

The data sets used in this analysis were downloaded as daily mean values, so no additional aggregation from hourly data was required. This ensured that the workflow focused on quality assurance and interpretation rather than raw data transformation.

Daily mean PM₂.₅ concentrations were plotted for each monitoring site to illustrate short‑term variation across 2025. These charts were used in the main report to highlight the stability of PM₂.₅ across the region and to identify the single moderate pollution event observed in March 2024.

Herts2025D_PM2.5$date <- as.Date(Herts2025D_PM2.5$date)
Beds2025D_PM2.5$date <- as.Date(Beds2025D_PM2.5$date)


ggplot(Herts2025D_PM2.5, aes(x = date, y = pm2.5, color = site, group = site)) +
  geom_line(linewidth = 1) +
  geom_hline(yintercept = 36, linetype = "dashed", color = "goldenrod")+
  geom_hline(yintercept = 54, linetype = "dashed", color = "red")+
  geom_hline(yintercept = 71, linetype = "dashed", color = "purple")+
  scale_x_date(name = "Date") +
  facet_wrap(~ site, ncol = 1) +
  labs(subtitle = "Amber dashed = Moderate (36 - 53 µg/m³); 
Red dashed = High (54 - 70 µg/m³); Purple = Very High (> 71 µg/m³)", x = "Date", y = "Daily PM2.5 Concentration") +
  theme_minimal() +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1),
    legend.position = "none")

ggplot(Beds2025D_PM2.5, aes(x = date, y = pm2.5, color = site, group = site)) +
  geom_line(linewidth = 1) +
  geom_hline(yintercept = 36, linetype = "dashed", color = "goldenrod")+
  geom_hline(yintercept = 54, linetype = "dashed", color = "red")+
  geom_hline(yintercept = 71, linetype = "dashed", color = "purple")+
  scale_x_date(name = "Date") +
  facet_wrap(~ site, ncol = 1) +
  labs(subtitle = "Amber dashed = Moderate (36 - 53 µg/m³); 
Red dashed = High (54 - 70 µg/m³); Purple = Very High (> 71 µg/m³)",
    x = "Date", y = "Daily PM2.5 Concentration") +
  theme_minimal() +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1),
    legend.position = "none")

## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_line()`).

ggplot(AirScan2025_PM2_5_padded, aes(x = date, y = value, color = site, group = site)) +
  geom_line(linewidth = 1) +
  geom_hline(yintercept = 36, linetype = "dashed", color = "goldenrod")+
  geom_hline(yintercept = 54, linetype = "dashed", color = "red")+
  geom_hline(yintercept = 71, linetype = "dashed", color = "purple")+
  scale_x_date(name = "Date") +
  facet_wrap(~ site, ncol = 1) +
  labs(subtitle = "Amber dashed = Moderate (36 - 53 µg/m³); 
Red dashed = High (54 - 70 µg/m³); Purple = Very High (> 71 µg/m³)", x = "Date", y = "Daily PM2.5 Concentration") +
  theme_minimal() +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1),
    legend.position = "none"  
  )

## Warning: Removed 249 rows containing missing values or values outside the scale range
## (`geom_line()`).

### Annual Mean PM₂.₅ Concentrations

In addition to the daily‑mean datasets, a separate annual mean dataset with full metadata was downloaded. This dataset includes: - annual mean PM₂.₅ concentrations - data‑capture percentages - site type and classification

Because these values are already calculated and quality‑assured by the data provider, no additional annual‑mean computation was required. Instead, the analysis focused on: - filtering sites that met the ≥75% data‑capture threshold - identifying sites requiring annualisation - excluding sites with insufficient data - comparing monitored annual means with Defra’s modelled background values

This approach ensures consistency with Defra’s LAQM TG22 guidance and aligns with the methodology used in the main report.

Comparison with Defra Background Maps

Using the imported annual‑mean dataset, each site’s monitored concentration was compared with the corresponding modelled background value from Defra’s 1 km grid. Background values were extracted manually from the published dataset and entered into a small lookup table (tribble) for reproducibility.Background Mapping data for local authorities

This comparison helps distinguish between: - regional background contributions, which dominate in most locations - localised influences, such as roadside emissions or nearby combustion sources

In the main report, this analysis showed that most monitored values were slightly below the modelled background, suggesting that diffuse regional sources remain the primary driver of PM₂.₅ in Hertfordshire and Bedfordshire.

Background_2025 <- read_csv("Background_2025.csv")

## Rows: 8 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Site
## dbl (4): Monitored_2024, Background_2024, Monitored_2025, Background_2025
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Background_2025_Long <- Background_2025 |> 
   pivot_longer(cols = c(`Monitored_2024`, `Background_2024`, `Monitored_2025`, `Background_2025`), 
                names_to = "Type", 
                values_to = "PM2.5")

 ggplot(Background_2025_Long, aes(x = Site, y = PM2.5, fill = Type)) +
   geom_bar(stat = "identity", position = position_dodge(width = 0.8)) +
   scale_fill_manual(values = c("Monitored_2024" = "forestgreen", "Background_2024" = "blue", "Monitored_2025" = "forestgreen", "Background_2025" = "blue")) +
   geom_hline(yintercept = 10, linetype = "dashed", color = "blue") +
   geom_hline(yintercept = 5, linetype = "dashed", color = "red") +
   annotate("text", x = 1, y = 10.2, label = "UK Air Quality Objective = 10 µg/m³ by 2040",
            hjust = 0, color = "blue", size = 3.5) +
   annotate("text", x = 1, y = 5.2, label = "WHO Limit Value = 5 µg/m³",
            hjust = 0, color = "red", size = 3.5) +
   labs(x = "Monitoring Site", y = "PM2.5 (µg/m³)", fill = "PM 2.5 Concentration") +
   theme_minimal() +
   theme(axis.text.x = element_text(angle = 45, hjust = 1))

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_bar()`).

Temporal Variation in PM₂.₅ Concentrations

Although the main datasets used in this workflow were downloaded as daily mean values, the temporal‑variation analysis required hourly resolution. To support this, a separate hourly dataset was downloaded directly from the Air Quality England database using the openair package.

Hourly data is essential for producing the following plots: - monthly variation - weekday variation - hourly variation - combined hour–day plots

These plots reveal patterns that cannot be seen in daily averages — such as rush‑hour bumps, evening peaks linked to domestic burning, and seasonal shifts in pollutant behaviour.

The hourly dataset was used exclusively for these temporal analyses, while the daily‑mean and annual‑mean datasets were used for all other components of the workflow. This ensures that each analysis is based on the most appropriate level of temporal detail. These were used to compare local patterns with national AURN trends.

TimeVariationPlotHerts <- timeVariation(AllAQEH, pollutant = c("pm2.5", "no2"), plot = FALSE)


TimeVariationPlotHerts$data$subsets

## [1] "hour.weekday" "hour"         "month"        "weekday"

plot(TimeVariationPlotHerts, subset = "hour")

plot(TimeVariationPlotHerts, subset = "hour.weekday")

plot(TimeVariationPlotHerts, subset = "month")

plot(TimeVariationPlotHerts, subset = "weekday")

Long‑Term Trends (2016–2024)

Long‑term trends were assessed using the annual mean dataset, which contains published annual averages for each site across multiple years. Because these values already incorporate data‑capture thresholds and quality assurance, the analysis focused on: - selecting sites with ≥75% data capture for each year - calculating the regional average annual mean - plotting site‑level deviations from the regional mean

This method ensures that the trend analysis is based on consistent, validated annual statistics rather than reconstructed values. In 2024, the spread of site‑level deviations narrowed noticeably — a sign of increasing consistency across the network.

ggplot(County_annual_mean, aes(x = year, y = mean_pm25, colour = Geo_County)) +
  geom_line(linewidth = 1) +
  geom_point(size = 2) +
  geom_hline(yintercept = 10, linetype = "dashed", color = "red") +
  geom_hline(yintercept = 12, linetype = "dashed", color = "yellow") +
  labs(
    subtitle = "Red dashed = Annual Mean Concentration Target (10 µg/m³); 
Yellow dashed = Interim Annual Mean Concentration Target (12 µg/m³)",
    y = "PM2.5 (µg/m³)",
    x = NULL,
    colour = "County"
  ) +
  theme_minimal(base_size = 12)

Site_dev <- AllAQE75 %>%
  rename(year = date) %>%
  left_join(County_annual_mean, by = c("Geo_County", "year")) %>%
  mutate(deviation = pm2.5 - mean_pm25)

heatmap_plot <- ggplot(Site_dev, aes(x = year, y = site, fill = deviation)) +
  geom_tile(color = "white") +
  scale_fill_gradient2(
    low = "blue",
    mid = "white",
    high = "red",
    midpoint = 0,
    name = "Deviation\n(µg/m³)"
  ) +
  labs(
    x = NULL,
    y = "Site",
    title = "Deviation from County Annual Mean PM2.5"
  ) +
  theme_minimal(base_size = 12)

Theil–Sen method

Theil–Sen trend analysis was undertaken for annual mean PM₂.₅ concentrations across all monitoring sites from 2016 to 2025. Four sites demonstrated statistically significant downward trends: East Herts Hertford Gascoigne Way (−0.43 µg/m³/yr), Hertsmere Borehamwood Roadside (−0.36 µg/m³/yr), Luton Airport FutureLuToN (−0.55 µg/m³/yr), and Welwyn Hatfield (−0.52 µg/m³/yr). At these locations, the 95% confidence intervals did not cross zero, indicating robust evidence of year‑on‑year reductions in PM₂.₅.

The remaining sites showed no statistically significant trend, with confidence intervals spanning zero, suggesting stable concentrations over the analysis period. Two sites (Dacorum London Road and Luton A505 Roadside) exhibited very wide confidence intervals, indicating insufficient data or high variability, and therefore no reliable trend could be inferred.

At the county level, Hertfordshire shows a consistent pattern of improvement, with several sites exhibiting significant downward trends and none showing significant increases. Bedfordshire shows mixed results, with one significant downward trend and other sites remaining stable. Overall, the regional evidence indicates a general reduction in PM₂.₅ concentrations over the past decade.

HBAQE <-
  importUKAQ(
    site = c("HB001","HB009","HB013","HB012","HB017","HB018","BDMP","LA001","HB007","LUTR"),
    year = 2016:2025,
  )

## Importing AQ Data ■■■ 7% | ETA: 15sImporting AQ Data ■■■ 8% | ETA: 15sImporting
## AQ Data ■■■■ 9% | ETA: 16sImporting AQ Data ■■■■ 10% | ETA: 16sImporting AQ
## Data ■■■■ 11% | ETA: 19sImporting AQ Data ■■■■■ 12% | ETA: 22sImporting AQ Data
## ■■■■■ 13% | ETA: 24sImporting AQ Data ■■■■■ 14% | ETA: 25sImporting AQ Data
## ■■■■■ 15% | ETA: 25sImporting AQ Data ■■■■■■ 16% | ETA: 24sImporting AQ Data
## ■■■■■■ 17% | ETA: 25sImporting AQ Data ■■■■■■ 18% | ETA: 25sImporting AQ Data
## ■■■■■■■ 19% | ETA: 25sImporting AQ Data ■■■■■■■ 20% | ETA: 25sImporting AQ Data
## ■■■■■■■ 21% | ETA: 25sImporting AQ Data ■■■■■■■■ 22% | ETA: 24sImporting AQ
## Data ■■■■■■■■ 23% | ETA: 24sImporting AQ Data ■■■■■■■■ 24% | ETA: 24sImporting
## AQ Data ■■■■■■■■■ 25% | ETA: 24sImporting AQ Data ■■■■■■■■■ 26% | ETA:
## 24sImporting AQ Data ■■■■■■■■■ 27% | ETA: 24sImporting AQ Data ■■■■■■■■■ 28% |
## ETA: 23sImporting AQ Data ■■■■■■■■■■ 29% | ETA: 23sImporting AQ Data ■■■■■■■■■■
## 30% | ETA: 22sImporting AQ Data ■■■■■■■■■■ 31% | ETA: 22sImporting AQ Data
## ■■■■■■■■■■■ 32% | ETA: 22sImporting AQ Data ■■■■■■■■■■■ 33% | ETA: 22sImporting
## AQ Data ■■■■■■■■■■■ 34% | ETA: 21sImporting AQ Data ■■■■■■■■■■■ 35% | ETA:
## 21sImporting AQ Data ■■■■■■■■■■■■ 36% | ETA: 21sImporting AQ Data ■■■■■■■■■■■■
## 37% | ETA: 20sImporting AQ Data ■■■■■■■■■■■■ 38% | ETA: 20sImporting AQ Data
## ■■■■■■■■■■■■■ 39% | ETA: 20sImporting AQ Data ■■■■■■■■■■■■■ 40% | ETA:
## 19sImporting AQ Data ■■■■■■■■■■■■■ 41% | ETA: 19sImporting AQ Data
## ■■■■■■■■■■■■■■ 42% | ETA: 19sImporting AQ Data ■■■■■■■■■■■■■■ 43% | ETA:
## 18sImporting AQ Data ■■■■■■■■■■■■■■ 44% | ETA: 18sImporting AQ Data
## ■■■■■■■■■■■■■■■ 45% | ETA: 18sImporting AQ Data ■■■■■■■■■■■■■■■ 46% | ETA:
## 18sImporting AQ Data ■■■■■■■■■■■■■■■ 47% | ETA: 17sImporting AQ Data
## ■■■■■■■■■■■■■■■ 48% | ETA: 17sImporting AQ Data ■■■■■■■■■■■■■■■■ 49% | ETA:
## 17sImporting AQ Data ■■■■■■■■■■■■■■■■ 50% | ETA: 16sImporting AQ Data
## ■■■■■■■■■■■■■■■■ 51% | ETA: 16sImporting AQ Data ■■■■■■■■■■■■■■■■■ 52% | ETA:
## 16sImporting AQ Data ■■■■■■■■■■■■■■■■■ 53% | ETA: 15sImporting AQ Data
## ■■■■■■■■■■■■■■■■■ 54% | ETA: 15sImporting AQ Data ■■■■■■■■■■■■■■■■■ 55% | ETA:
## 15sImporting AQ Data ■■■■■■■■■■■■■■■■■■ 56% | ETA: 14sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■ 57% | ETA: 14sImporting AQ Data ■■■■■■■■■■■■■■■■■■ 58% |
## ETA: 14sImporting AQ Data ■■■■■■■■■■■■■■■■■■■ 59% | ETA: 13sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■ 60% | ETA: 13sImporting AQ Data ■■■■■■■■■■■■■■■■■■■ 61% |
## ETA: 13sImporting AQ Data ■■■■■■■■■■■■■■■■■■■■ 62% | ETA: 12sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■ 63% | ETA: 12sImporting AQ Data ■■■■■■■■■■■■■■■■■■■■ 64% |
## ETA: 12sImporting AQ Data ■■■■■■■■■■■■■■■■■■■■■ 65% | ETA: 11sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■■ 66% | ETA: 11sImporting AQ Data ■■■■■■■■■■■■■■■■■■■■■ 67%
## | ETA: 11sImporting AQ Data ■■■■■■■■■■■■■■■■■■■■■ 68% | ETA: 10sImporting AQ
## Data ■■■■■■■■■■■■■■■■■■■■■■ 69% | ETA: 10sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■■■ 70% | ETA: 10sImporting AQ Data ■■■■■■■■■■■■■■■■■■■■■■
## 71% | ETA: 9sImporting AQ Data ■■■■■■■■■■■■■■■■■■■■■■■ 72% | ETA: 9sImporting
## AQ Data ■■■■■■■■■■■■■■■■■■■■■■■ 73% | ETA: 9sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■■■■ 74% | ETA: 8sImporting AQ Data ■■■■■■■■■■■■■■■■■■■■■■■
## 75% | ETA: 8sImporting AQ Data ■■■■■■■■■■■■■■■■■■■■■■■■ 76% | ETA: 8sImporting
## AQ Data ■■■■■■■■■■■■■■■■■■■■■■■■ 77% | ETA: 7sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■■■■■ 78% | ETA: 7sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■■■■■■ 79% | ETA: 7sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■■■■■■ 80% | ETA: 6sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■■■■■■ 81% | ETA: 6sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■■■■■■■ 82% | ETA: 6sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■■■■■■■ 83% | ETA: 6sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■■■■■■■ 84% | ETA: 5sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■■■■■■■■ 85% | ETA: 5sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■■■■■■■■ 86% | ETA: 5sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■■■■■■■■ 87% | ETA: 4sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■■■■■■■■ 88% | ETA: 4sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 89% | ETA: 4sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 90% | ETA: 3sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 92% | ETA: 3sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 94% | ETA: 2sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 95% | ETA: 2sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 96% | ETA: 1sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 98% | ETA: 1sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 99% | ETA: 0s

TheilSen_Sites <- c(
  "Dacorum Northchurch High Street",
  "East Herts Hertford Gascoyne Way",
  "Hertsmere Borehamwood Roadside",   
  "Welwyn Hatfield"
)


# Run Theil–Sen only on those sites


TheilSen(
  HBAQE %>% filter(site %in% TheilSen_Sites),
  pollutant = "pm2.5",
  type = "site",
  ylab = "PM2.5 (ug/m3)",
  deseason = TRUE,
  date.format = "%Y"
)

## ! 3 missing monthly values found in 'pm2.5'.
## ℹ Gaps filled using a linear model fitted per calendar month before
##   deseasonalising.
## ! 2 missing monthly values found in 'pm2.5'.
## ℹ Gaps filled using a linear model fitted per calendar month before
##   deseasonalising.
## ! 22 missing monthly values found in 'pm2.5'.
## ℹ Gaps filled using a linear model fitted per calendar month before
##   deseasonalising.
## ! 1 missing monthly value found in 'pm2.5'.
## ℹ Gap filled using a linear model fitted per calendar month before
##   deseasonalising.

Days with Moderate or Higher Pollution (DAQI)

Daily means were used for DAQI classification, but the annual mean dataset provided useful metadata for interpreting these results — particularly data‑capture completeness and site classification. Because the daily‑mean dataset was already provided in daily format, DAQI categories could be assigned directly without additional aggregation.

Calendar plots were produced for each year to visualise the frequency and timing of moderate‑or‑above pollution days.

HB2025D <- importUKAQ(
  site = c("BDMP",  "HB001", "HB007", "HB009", "HB011", "HB012", "HB016", "HB017", "HB018", "LA001", "LUTR"), year = 2025, data_type = "daily", pollutant = "pm2.5", meta = TRUE)

## Importing AQ Data ■■■■■■■■■■■■ 36% | ETA: 2sImporting AQ Data ■■■■■■■■■■■■■■■
## 45% | ETA: 3sImporting AQ Data ■■■■■■■■■■■■■■■■■ 55% | ETA: 2sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■ 64% | ETA: 2sImporting AQ Data ■■■■■■■■■■■■■■■■■■■■■■■ 73%
## | ETA: 1sImporting AQ Data ■■■■■■■■■■■■■■■■■■■■■■■■■■ 82% | ETA: 1sImporting AQ
## Data ■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 91% | ETA: 0s

## the labels - same for all species
labels <- c(
  "1 - Low",
  "2 - Low",
  "3 - Low",
  "4 - Moderate",
  "5 - Moderate",
  "6 - Moderate",
  "7 - High",
  "8 - High",
  "9 - High",
  "10 - Very High"
)

pm25.breaks <- c(0, 12, 24, 35, 42, 47, 53, 59, 65, 70, 1000)

# make calendar plot
calendarPlot(
  HB2025D,
  pollutant = "pm2.5",
  year = 2025,
  month = NULL,
    statistic = "median",
    cols = "daqi",
   labels = labels,
  breaks = pm25.breaks,
  )

## Warning: ! Duplicate dates detected in mydata$date.
## ℹ Are there multiple sites in `mydata`? Use the type argument to condition them
##   separately.

DAQI Summary Tables

herts_codes <- c("HB001","HB009","HB013","HB012","HB017","HB018","BDMP")

beds_codes  <- c("LA001","HB007","LUTR")

all_codes <- c("HB001","HB009","HB013","HB012","HB017","HB018","BDMP","LA001","HB007","LUTR" )

daqi_2025 <- importUKAQ(
  year = 2025,
  source = c("aqe", "aurn"),
  data_type = "daqi",
  meta = TRUE
)

## Importing DAQI ■■■■■■■■■■■■■■■■ 50% | ETA: 1s

HBDAQI <- daqi_2025 |> 
  dplyr::filter(code %in% all_codes)|>
  filter(pollutant == "pm2.5")

HBDAQI <- HBDAQI |>
  mutate(region = case_when(
    code %in% herts_codes ~ "Hertfordshire",
    code %in% beds_codes  ~ "Bedfordshire",
    TRUE ~ "Other"
  ))


daqi_site_summary <- HBDAQI |>
  group_by(region, site, band) |>
  summarise(days = n(), .groups = "drop_last") |>
  mutate(percent = round(100 * days / sum(days), 1)) |>
  ungroup()


daqi_site_summary <- daqi_site_summary |> mutate(region = factor(region, levels = c("Hertfordshire", "Bedfordshire"))) |>
  arrange(region, site)


daqi_site_summary_wide <- daqi_site_summary |>
  select(region, site, band, percent) |>
  pivot_wider(
    names_from = band,
    values_from = percent,
    values_fill = 0
  ) |>
  arrange(region)

daqi_site_summary_days <- HBDAQI |>
  group_by(region, site, band) |>
  summarise(days = n(), .groups = "drop_last") |>
  mutate(percent = round(100 * days / sum(days), 1)) |>
  ungroup()


daqi_site_summary_days <- daqi_site_summary_days |> mutate(region = factor(region, levels = c("Hertfordshire", "Bedfordshire"))) |>
  arrange(region, site)


daqi_site_summary_wide_days <- daqi_site_summary |>
  select(region, site, band, days) |>
  pivot_wider(
    names_from = band,
    values_from = days,
    values_fill = 0
  ) |>
  arrange(region)

Impact of removing all daily peaks above moderate

# Set the Moderate threshold for PM2.5
moderate_threshold <- 36

impact_results <- HB2025D %>%
  filter("pm2.5" > moderate_threshold) %>%
  group_by(site) %>%
  summarise(
    original_mean = mean(pm2.5, na.rm = TRUE),
    filtered_mean = mean(pm2.5[pm2.5 <= moderate_threshold], na.rm = TRUE),
    difference = original_mean - filtered_mean,
    percent_change = (difference / original_mean) * 100
  )

## horizontal bar chart showing percent change
ggplot(impact_results, aes(
  x = percent_change,
  y = reorder(site, percent_change)
)) +
  geom_col(fill = "steelblue") +
  geom_text(aes(label = sprintf("%.1f%%", percent_change)),
            hjust = -0.1, size = 3.5) +
  scale_x_continuous(expand = expansion(mult = c(0, 0.1))) +
  labs(
    x = "Percent reduction in annual mean after removing peaks >36 µg/m³",
    y = "",
    title = "Impact of Removing High PM2.5 Peaks on Annual Mean"
  ) +
  theme_minimal()

## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_col()`).

## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_text()`).

## Before vs After paired dot‑plot

df_long <- impact_results |>
  pivot_longer(cols = c(original_mean, filtered_mean),
               names_to = "type", values_to = "value")

ggplot(df_long, aes(
  x = value,
  y = reorder(site, type),
  colour = type
)) +
  geom_point(size = 3) +
  geom_line(aes(group = site), colour = "grey60") +
  labs(
    x = "Annual mean PM2.5 (µg/m³)",
    y = "",
    title = "Before vs After Removing Peaks >36 µg/m³"
  ) +
  theme_minimal()

## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA

## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA
## Warning in mean.default(X[[i]], ...): argument is not numeric or logical:
## returning NA

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_point()`).

## Warning: Removed 4 rows containing missing values or values outside the scale range
## (`geom_line()`).

## Difference bar chart (absolute µg/m³ change)

ggplot(impact_results, aes(
  x = difference,
  y = reorder(site, difference)
)) +
  geom_col(fill = "darkorange") +
  geom_text(aes(label = sprintf("%.2f", difference)),
            hjust = -0.1) +
  labs(
    x = "Reduction in annual mean (µg/m³)",
    y = "",
    title = "Absolute Reduction After Removing Peaks >36 µg/m³"
  ) +
  theme_minimal()

## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_col()`).

## Warning: Removed 2 rows containing missing values or values outside the scale range
## (`geom_text()`).

##Meta Data Table

Herts_AllSites <- importUKAQ(site = c(
  "HB102","HB104","HB105","HB018","HB013","HB106","HB107","HB012",
  "HB108","HB109","HB110","HB111","HB016","HB112","HB008","HB014",
  "HB009","HB002","HB003","HB117","HB118","HB119","HB120","HB121", "BDMP"), 
  year = 2016:2025, pollutant = "pm2.5", meta = TRUE)

## Importing AQ Data ■ 2% | ETA: 1mImporting AQ Data ■■ 2% | ETA: 1mImporting AQ
## Data ■■ 2% | ETA: 1mImporting AQ Data ■■ 3% | ETA: 2mImporting AQ Data ■■ 3% |
## ETA: 2mImporting AQ Data ■■ 4% | ETA: 2mImporting AQ Data ■■ 4% | ETA:
## 2mImporting AQ Data ■■ 4% | ETA: 2mImporting AQ Data ■■ 5% | ETA: 2mImporting
## AQ Data ■■■ 5% | ETA: 2mImporting AQ Data ■■■ 6% | ETA: 2mImporting AQ Data ■■■
## 6% | ETA: 2mImporting AQ Data ■■■ 6% | ETA: 2mImporting AQ Data ■■■ 7% | ETA:
## 2mImporting AQ Data ■■■ 7% | ETA: 2mImporting AQ Data ■■■ 8% | ETA: 2mImporting
## AQ Data ■■■ 8% | ETA: 2mImporting AQ Data ■■■■ 8% | ETA: 2mImporting AQ Data
## ■■■■ 9% | ETA: 2mImporting AQ Data ■■■■ 9% | ETA: 2mImporting AQ Data ■■■■ 10%
## | ETA: 2mImporting AQ Data ■■■■ 10% | ETA: 2mImporting AQ Data ■■■■ 10% | ETA:
## 2mImporting AQ Data ■■■■ 11% | ETA: 2mImporting AQ Data ■■■■ 11% | ETA:
## 2mImporting AQ Data ■■■■ 12% | ETA: 2mImporting AQ Data ■■■■■ 12% | ETA:
## 2mImporting AQ Data ■■■■■ 12% | ETA: 2mImporting AQ Data ■■■■■ 13% | ETA:
## 2mImporting AQ Data ■■■■■ 13% | ETA: 2mImporting AQ Data ■■■■■ 14% | ETA:
## 2mImporting AQ Data ■■■■■ 14% | ETA: 2mImporting AQ Data ■■■■■ 14% | ETA:
## 2mImporting AQ Data ■■■■■ 15% | ETA: 2mImporting AQ Data ■■■■■■ 15% | ETA:
## 2mImporting AQ Data ■■■■■■ 16% | ETA: 2mImporting AQ Data ■■■■■■ 16% | ETA:
## 2mImporting AQ Data ■■■■■■ 16% | ETA: 2mImporting AQ Data ■■■■■■ 17% | ETA:
## 2mImporting AQ Data ■■■■■■ 17% | ETA: 2mImporting AQ Data ■■■■■■ 18% | ETA:
## 2mImporting AQ Data ■■■■■■ 18% | ETA: 2mImporting AQ Data ■■■■■■■ 18% | ETA:
## 2mImporting AQ Data ■■■■■■■ 19% | ETA: 2mImporting AQ Data ■■■■■■■ 19% | ETA:
## 2mImporting AQ Data ■■■■■■■ 20% | ETA: 2mImporting AQ Data ■■■■■■■ 20% | ETA:
## 2mImporting AQ Data ■■■■■■■ 20% | ETA: 2mImporting AQ Data ■■■■■■■ 21% | ETA:
## 2mImporting AQ Data ■■■■■■■ 21% | ETA: 2mImporting AQ Data ■■■■■■■ 22% | ETA:
## 2mImporting AQ Data ■■■■■■■■ 22% | ETA: 2mImporting AQ Data ■■■■■■■■ 22% | ETA:
## 2mImporting AQ Data ■■■■■■■■ 23% | ETA: 2mImporting AQ Data ■■■■■■■■ 23% | ETA:
## 1mImporting AQ Data ■■■■■■■■ 24% | ETA: 1mImporting AQ Data ■■■■■■■■ 24% | ETA:
## 1mImporting AQ Data ■■■■■■■■ 24% | ETA: 1mImporting AQ Data ■■■■■■■■ 25% | ETA:
## 1mImporting AQ Data ■■■■■■■■■ 25% | ETA: 1mImporting AQ Data ■■■■■■■■■ 26% |
## ETA: 1mImporting AQ Data ■■■■■■■■■ 26% | ETA: 1mImporting AQ Data ■■■■■■■■■ 26%
## | ETA: 1mImporting AQ Data ■■■■■■■■■ 27% | ETA: 1mImporting AQ Data ■■■■■■■■■
## 27% | ETA: 1mImporting AQ Data ■■■■■■■■■ 28% | ETA: 1mImporting AQ Data
## ■■■■■■■■■ 28% | ETA: 1mImporting AQ Data ■■■■■■■■■■ 28% | ETA: 1mImporting AQ
## Data ■■■■■■■■■■ 29% | ETA: 1mImporting AQ Data ■■■■■■■■■■ 29% | ETA:
## 1mImporting AQ Data ■■■■■■■■■■ 30% | ETA: 1mImporting AQ Data ■■■■■■■■■■ 30% |
## ETA: 1mImporting AQ Data ■■■■■■■■■■ 30% | ETA: 1mImporting AQ Data ■■■■■■■■■■
## 31% | ETA: 1mImporting AQ Data ■■■■■■■■■■ 31% | ETA: 1mImporting AQ Data
## ■■■■■■■■■■ 32% | ETA: 1mImporting AQ Data ■■■■■■■■■■■ 32% | ETA: 1mImporting AQ
## Data ■■■■■■■■■■■ 32% | ETA: 1mImporting AQ Data ■■■■■■■■■■■ 33% | ETA:
## 1mImporting AQ Data ■■■■■■■■■■■ 33% | ETA: 1mImporting AQ Data ■■■■■■■■■■■ 34%
## | ETA: 1mImporting AQ Data ■■■■■■■■■■■ 34% | ETA: 1mImporting AQ Data
## ■■■■■■■■■■■ 34% | ETA: 1mImporting AQ Data ■■■■■■■■■■■ 35% | ETA: 1mImporting
## AQ Data ■■■■■■■■■■■■ 35% | ETA: 1mImporting AQ Data ■■■■■■■■■■■■ 36% | ETA:
## 1mImporting AQ Data ■■■■■■■■■■■■ 36% | ETA: 1mImporting AQ Data ■■■■■■■■■■■■
## 36% | ETA: 1mImporting AQ Data ■■■■■■■■■■■■ 37% | ETA: 1mImporting AQ Data
## ■■■■■■■■■■■■ 37% | ETA: 1mImporting AQ Data ■■■■■■■■■■■■ 38% | ETA: 1mImporting
## AQ Data ■■■■■■■■■■■■ 38% | ETA: 1mImporting AQ Data ■■■■■■■■■■■■■ 38% | ETA:
## 1mImporting AQ Data ■■■■■■■■■■■■■ 39% | ETA: 1mImporting AQ Data ■■■■■■■■■■■■■
## 39% | ETA: 1mImporting AQ Data ■■■■■■■■■■■■■ 40% | ETA: 1mImporting AQ Data
## ■■■■■■■■■■■■■ 40% | ETA: 1mImporting AQ Data ■■■■■■■■■■■■■ 40% | ETA:
## 1mImporting AQ Data ■■■■■■■■■■■■■ 41% | ETA: 1mImporting AQ Data ■■■■■■■■■■■■■
## 41% | ETA: 1mImporting AQ Data ■■■■■■■■■■■■■ 42% | ETA: 1mImporting AQ Data
## ■■■■■■■■■■■■■■ 42% | ETA: 1mImporting AQ Data ■■■■■■■■■■■■■■ 42% | ETA:
## 1mImporting AQ Data ■■■■■■■■■■■■■■ 43% | ETA: 1mImporting AQ Data
## ■■■■■■■■■■■■■■ 43% | ETA: 1mImporting AQ Data ■■■■■■■■■■■■■■ 44% | ETA:
## 1mImporting AQ Data ■■■■■■■■■■■■■■ 44% | ETA: 1mImporting AQ Data
## ■■■■■■■■■■■■■■ 44% | ETA: 1mImporting AQ Data ■■■■■■■■■■■■■■ 45% | ETA:
## 1mImporting AQ Data ■■■■■■■■■■■■■■■ 45% | ETA: 1mImporting AQ Data
## ■■■■■■■■■■■■■■■ 46% | ETA: 1mImporting AQ Data ■■■■■■■■■■■■■■■ 46% | ETA:
## 1mImporting AQ Data ■■■■■■■■■■■■■■■ 46% | ETA: 1mImporting AQ Data
## ■■■■■■■■■■■■■■■ 47% | ETA: 1mImporting AQ Data ■■■■■■■■■■■■■■■ 47% | ETA:
## 1mImporting AQ Data ■■■■■■■■■■■■■■■ 48% | ETA: 1mImporting AQ Data
## ■■■■■■■■■■■■■■■ 48% | ETA: 1mImporting AQ Data ■■■■■■■■■■■■■■■■ 48% | ETA:
## 1mImporting AQ Data ■■■■■■■■■■■■■■■■ 49% | ETA: 1mImporting AQ Data
## ■■■■■■■■■■■■■■■■ 49% | ETA: 1mImporting AQ Data ■■■■■■■■■■■■■■■■ 50% | ETA:
## 1mImporting AQ Data ■■■■■■■■■■■■■■■■ 50% | ETA: 1mImporting AQ Data
## ■■■■■■■■■■■■■■■■ 50% | ETA: 1mImporting AQ Data ■■■■■■■■■■■■■■■■ 51% | ETA:
## 1mImporting AQ Data ■■■■■■■■■■■■■■■■ 51% | ETA: 1mImporting AQ Data
## ■■■■■■■■■■■■■■■■ 52% | ETA: 1mImporting AQ Data ■■■■■■■■■■■■■■■■■ 52% | ETA:
## 1mImporting AQ Data ■■■■■■■■■■■■■■■■■ 52% | ETA: 1mImporting AQ Data
## ■■■■■■■■■■■■■■■■■ 53% | ETA: 1mImporting AQ Data ■■■■■■■■■■■■■■■■■ 53% | ETA:
## 1mImporting AQ Data ■■■■■■■■■■■■■■■■■ 54% | ETA: 1mImporting AQ Data
## ■■■■■■■■■■■■■■■■■ 54% | ETA: 1mImporting AQ Data ■■■■■■■■■■■■■■■■■ 54% | ETA:
## 1mImporting AQ Data ■■■■■■■■■■■■■■■■■ 55% | ETA: 50sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■ 55% | ETA: 49sImporting AQ Data ■■■■■■■■■■■■■■■■■■ 56% |
## ETA: 49sImporting AQ Data ■■■■■■■■■■■■■■■■■■ 56% | ETA: 48sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■ 56% | ETA: 48sImporting AQ Data ■■■■■■■■■■■■■■■■■■ 57% |
## ETA: 47sImporting AQ Data ■■■■■■■■■■■■■■■■■■ 57% | ETA: 47sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■ 58% | ETA: 46sImporting AQ Data ■■■■■■■■■■■■■■■■■■ 58% |
## ETA: 45sImporting AQ Data ■■■■■■■■■■■■■■■■■■■ 58% | ETA: 45sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■ 59% | ETA: 44sImporting AQ Data ■■■■■■■■■■■■■■■■■■■ 59% |
## ETA: 44sImporting AQ Data ■■■■■■■■■■■■■■■■■■■ 60% | ETA: 43sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■ 60% | ETA: 43sImporting AQ Data ■■■■■■■■■■■■■■■■■■■ 60% |
## ETA: 42sImporting AQ Data ■■■■■■■■■■■■■■■■■■■ 61% | ETA: 42sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■ 61% | ETA: 42sImporting AQ Data ■■■■■■■■■■■■■■■■■■■ 62% |
## ETA: 41sImporting AQ Data ■■■■■■■■■■■■■■■■■■■■ 62% | ETA: 41sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■ 62% | ETA: 40sImporting AQ Data ■■■■■■■■■■■■■■■■■■■■ 63% |
## ETA: 40sImporting AQ Data ■■■■■■■■■■■■■■■■■■■■ 63% | ETA: 39sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■ 64% | ETA: 38sImporting AQ Data ■■■■■■■■■■■■■■■■■■■■ 64% |
## ETA: 38sImporting AQ Data ■■■■■■■■■■■■■■■■■■■■ 64% | ETA: 38sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■ 65% | ETA: 37sImporting AQ Data ■■■■■■■■■■■■■■■■■■■■■ 65%
## | ETA: 37sImporting AQ Data ■■■■■■■■■■■■■■■■■■■■■ 66% | ETA: 36sImporting AQ
## Data ■■■■■■■■■■■■■■■■■■■■■ 66% | ETA: 36sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■■ 66% | ETA: 35sImporting AQ Data ■■■■■■■■■■■■■■■■■■■■■ 67%
## | ETA: 35sImporting AQ Data ■■■■■■■■■■■■■■■■■■■■■ 67% | ETA: 34sImporting AQ
## Data ■■■■■■■■■■■■■■■■■■■■■ 68% | ETA: 34sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■■ 68% | ETA: 33sImporting AQ Data ■■■■■■■■■■■■■■■■■■■■■■
## 68% | ETA: 33sImporting AQ Data ■■■■■■■■■■■■■■■■■■■■■■ 69% | ETA: 33sImporting
## AQ Data ■■■■■■■■■■■■■■■■■■■■■■ 69% | ETA: 32sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■■■ 70% | ETA: 32sImporting AQ Data ■■■■■■■■■■■■■■■■■■■■■■
## 70% | ETA: 31sImporting AQ Data ■■■■■■■■■■■■■■■■■■■■■■ 70% | ETA: 31sImporting
## AQ Data ■■■■■■■■■■■■■■■■■■■■■■ 71% | ETA: 30sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■■■ 71% | ETA: 30sImporting AQ Data ■■■■■■■■■■■■■■■■■■■■■■
## 72% | ETA: 30sImporting AQ Data ■■■■■■■■■■■■■■■■■■■■■■■ 72% | ETA: 29sImporting
## AQ Data ■■■■■■■■■■■■■■■■■■■■■■■ 72% | ETA: 29sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■■■■ 73% | ETA: 28sImporting AQ Data ■■■■■■■■■■■■■■■■■■■■■■■
## 73% | ETA: 28sImporting AQ Data ■■■■■■■■■■■■■■■■■■■■■■■ 74% | ETA: 27sImporting
## AQ Data ■■■■■■■■■■■■■■■■■■■■■■■ 74% | ETA: 27sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■■■■ 74% | ETA: 26sImporting AQ Data ■■■■■■■■■■■■■■■■■■■■■■■
## 75% | ETA: 26sImporting AQ Data ■■■■■■■■■■■■■■■■■■■■■■■■ 75% | ETA:
## 25sImporting AQ Data ■■■■■■■■■■■■■■■■■■■■■■■■ 76% | ETA: 25sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■■■■■ 76% | ETA: 24sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■■■■■ 76% | ETA: 24sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■■■■■ 77% | ETA: 24sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■■■■■ 77% | ETA: 23sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■■■■■ 78% | ETA: 23sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■■■■■ 78% | ETA: 22sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■■■■■■ 78% | ETA: 22sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■■■■■■ 79% | ETA: 21sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■■■■■■ 79% | ETA: 21sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■■■■■■ 80% | ETA: 21sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■■■■■■ 80% | ETA: 20sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■■■■■■ 80% | ETA: 20sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■■■■■■ 81% | ETA: 19sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■■■■■■ 81% | ETA: 19sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■■■■■■ 82% | ETA: 19sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■■■■■■■ 82% | ETA: 18sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■■■■■■■ 82% | ETA: 18sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■■■■■■■ 83% | ETA: 17sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■■■■■■■ 83% | ETA: 17sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■■■■■■■ 84% | ETA: 17sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■■■■■■■ 84% | ETA: 16sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■■■■■■■ 84% | ETA: 16sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■■■■■■■ 85% | ETA: 15sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■■■■■■■■ 85% | ETA: 15sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■■■■■■■■ 86% | ETA: 14sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■■■■■■■■ 86% | ETA: 14sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■■■■■■■■ 86% | ETA: 14sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■■■■■■■■ 87% | ETA: 13sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■■■■■■■■ 87% | ETA: 13sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■■■■■■■■ 88% | ETA: 12sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■■■■■■■■ 88% | ETA: 12sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 88% | ETA: 12sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 89% | ETA: 11sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 89% | ETA: 11sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 90% | ETA: 10sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 90% | ETA: 10sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 90% | ETA: 10sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 91% | ETA: 9sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 91% | ETA: 9sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 92% | ETA: 8sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 92% | ETA: 8sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 92% | ETA: 8sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 93% | ETA: 7sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 93% | ETA: 7sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 94% | ETA: 6sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 94% | ETA: 6sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 94% | ETA: 6sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 95% | ETA: 5sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 95% | ETA: 5sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 96% | ETA: 4sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 96% | ETA: 4sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 96% | ETA: 4sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 97% | ETA: 3sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 97% | ETA: 3sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 98% | ETA: 2sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 98% | ETA: 2sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 98% | ETA: 2sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 99% | ETA: 1sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 99% | ETA: 1sImporting AQ Data
## ■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 100% | ETA: 0s

str(Herts_AllSites)

## tibble [552,424 × 11] (S3: tbl_df/tbl/data.frame)
##  $ source   : chr [1:552424] "aurn" "aurn" "aurn" "aurn" ...
##  $ date     : POSIXct[1:552424], format: "2017-10-01 00:00:00" "2017-10-01 01:00:00" ...
##  $ site     : chr [1:552424] "Borehamwood Meadow Park" "Borehamwood Meadow Park" "Borehamwood Meadow Park" "Borehamwood Meadow Park" ...
##  $ code     : chr [1:552424] "BDMP" "BDMP" "BDMP" "BDMP" ...
##  $ ws       : num [1:552424] NA NA NA NA NA NA NA NA NA NA ...
##  $ wd       : num [1:552424] NA NA NA NA NA NA NA NA NA NA ...
##  $ air_temp : num [1:552424] NA NA NA NA NA NA NA NA NA NA ...
##  $ pm2.5    : num [1:552424] NA NA NA NA NA NA NA NA NA NA ...
##  $ site_type: chr [1:552424] "Urban Background" "Urban Background" "Urban Background" "Urban Background" ...
##  $ latitude : num [1:552424] 51.7 51.7 51.7 51.7 51.7 ...
##  $ longitude: num [1:552424] -0.271 -0.271 -0.271 -0.271 -0.271 ...

HertsAllSitesAnnual <- importUKAQ(site = c(
  "HB102","HB104","HB105","HB018","HB013","HB106","HB107","HB012",
  "HB108","HB109","HB110","HB111","HB016","HB112","HB008","HB014",
  "HB009","HB002","HB003","HB117","HB118","HB119","HB120","HB121", "BDMP"), 
  year = 2016:2024, data_type = "annual", source = c("aqe", "aurn"), pollutant = "pm2.5", meta = TRUE)

## Importing Statistics ■■■■■■ 17% | ETA: 6sImporting Statistics ■■■■■■■■ 22% |
## ETA: 5sImporting Statistics ■■■■■■■■■ 28% | ETA: 5sImporting Statistics
## ■■■■■■■■■■■ 33% | ETA: 5sImporting Statistics ■■■■■■■■■■■■■ 39% | ETA:
## 4sImporting Statistics ■■■■■■■■■■■■■■ 44% | ETA: 4sImporting Statistics
## ■■■■■■■■■■■■■■■■ 50% | ETA: 3sImporting Statistics ■■■■■■■■■■■■■■■■■■ 56% |
## ETA: 3sImporting Statistics ■■■■■■■■■■■■■■■■■■■ 61% | ETA: 2sImporting
## Statistics ■■■■■■■■■■■■■■■■■■■■■ 67% | ETA: 2sImporting Statistics
## ■■■■■■■■■■■■■■■■■■■■■■■■ 78% | ETA: 1sImporting Statistics
## ■■■■■■■■■■■■■■■■■■■■■■■■■■■■ 89% | ETA: 1s

site_dates <- Herts_AllSites %>%
  group_by(code, site) %>%
  summarise(
    date_opened = min(date, na.rm = TRUE),
    date_closed = max(date, na.rm = TRUE),
    .groups = "drop"
  )
site_meta <- HertsAllSitesAnnual %>%
  select(code, site, site_type, latitude, longitude, pm2.5_capture) %>%
  distinct()
site_meta <- site_meta %>%
  rename(capture_pct = pm2.5_capture)
district_lookup <- tibble::tribble(
  ~code, ~District,
  "HB002", "North Herts",
  "HB003", "North Herts",
  "HB008", "Hertsmere",
  "HB009", "Hertsmere",
  "HB012", "Dacorum",
  "HB013", "Dacorum",
  "HB014", "Dacorum",
  "HB016", "East Herts",
  "HB018", "East Herts",
  "HB102", "Welwyn Hatfield",
  "HB104", "Welwyn Hatfield",
  "HB105", "Welwyn Hatfield",
  "HB106", "Welwyn Hatfield",
  "HB107", "Welwyn Hatfield",
  "HB108", "Welwyn Hatfield",
  "HB109", "Welwyn Hatfield",
  "HB110", "Welwyn Hatfield",
  "HB111", "Welwyn Hatfield",
  "HB112", "Welwyn Hatfield",
  "HB117", "Stevenage",
  "HB118", "Stevenage",
  "HB119", "Stevenage",
  "HB120", "Stevenage",
  "HB121", "Stevenage",
  "BDMP",  "Hertsmere"
)
network_lookup <- HertsAllSitesAnnual %>%
  select(code, source) %>%
  distinct() %>%
  mutate(Network = ifelse(source == "aurn", "AURN", "AQE")) %>%
  select(code, Network)

status <- site_dates %>%
  mutate(
    Status = ifelse(date_closed >= Sys.Date() - 30, "Active", "Closed")
  ) %>%
  select(code, Status)

## Warning: There was 1 warning in `mutate()`.
## ℹ In argument: `Status = ifelse(date_closed >= Sys.Date() - 30, "Active",
##   "Closed")`.
## Caused by warning in `check_tzones()`:
## ! 'tzone' attributes are inconsistent

Herts_Metadata_Enhanced <- site_meta %>%
  left_join(site_dates, by = c("code", "site")) %>%
  left_join(district_lookup, by = "code") %>%
  left_join(network_lookup, by = "code") %>%
  left_join(status, by = "code") %>%
  mutate(
    date_opened = as.Date(date_opened),
    date_closed = as.Date(date_closed),
    capture_pct = round(capture_pct * 100, 1)   # convert to %
  ) %>%
  arrange(District, site_type, site) %>%
  select(
    District,
    Code = code,
    Site = site,
    `Site Type` = site_type,
    Network,
    Latitude = latitude,
    Longitude = longitude,
    `Date Opened` = date_opened,
    `Date Closed` = date_closed,
    `Data Capture (%)` = capture_pct,
    Status
  )

Directional Analysis

Download met data from Luton airport.

library(worldmet)

luton_met <- import_ghcn_hourly("UKI0000EGGW", year = 2016:2025)

luton_met_2025 <- import_ghcn_hourly("UKI0000EGGW", year = 2025)

luton_met_small <- luton_met_2025 |>
  select(date, ws, wd, air_temp, rh)

Join to Hourly Air Quality Data

AllAQE_met <- left_join(
  select(AllAQEH, -ws, -wd, -air_temp),
  luton_met_small,
  by = "date"
)
AllAQE_met <- AllAQE_met %>%
  mutate(
    Geo_County = case_when(
      code %in% c("HB001","HB009","HB013","HB012","HB017","HB018","BDMP") ~ "Hertfordshire",
      code %in% c("LA001","HB007","LUTR") ~ "Bedfordshire",
      TRUE ~ "Unknown"
    )
  )

Create various plots

polarPlot(AllAQE_met,
  pollutant = "pm2.5",
  type = "site")

pollutionRose(AllAQE_met, pollutant = "pm2.5", type = "site")

polarPlot(
  AllAQE_met,
  pollutant = "pm2.5",
  type = "site",
  statistic = "mean",
  resolution = "fine"
)

windRose(AllAQE_met, type = "pm2.5", ncol = 4)

## Warning: ! Removing 3935 rows due to missing `pm2.5` data.

met_seasonal <- luton_met_small |>
    mutate(
   season = case_when(
            month(date) %in% c(12, 1, 2)  ~ "Winter",
             month(date) %in% c(3, 4, 5)   ~ "Spring",
             month(date) %in% c(6, 7, 8)   ~ "Summer",
             month(date) %in% c(9, 10, 11) ~ "Autumn"
           )
       )


AQE_seasonal <- AllAQE_met |>
  mutate(
    season = case_when(
      month(date) %in% c(12, 1, 2)  ~ "Winter",
      month(date) %in% c(3, 4, 5)   ~ "Spring",
      month(date) %in% c(6, 7, 8)   ~ "Summer",
      month(date) %in% c(9, 10, 11) ~ "Autumn"
    )
  )



AQE_seasonal <- AQE_seasonal |>
  mutate(
    season = factor(
      season,
      levels = c("Spring", "Summer", "Autumn", "Winter")
    )
  )



polarPlot(
  AQE_seasonal,
  pollutant = "pm2.5",
  type = c("site", "season")
)

Public Health Outcomes Framework (PHOF)

The PHOF Indicator D.01, which represents the fraction of mortality attributable to long‑term exposure to PM₂.₅, and the Air pollution: fine particulate matter (new method – concentrations of total PM₂.₅) indicator were both accessed directly in R using the Fingertips API. The D.01 indicator is calculated independently by ONS and Ricardo E&E using the underlying fine particulate matter concentration indicator. Definitions and methodological details for both indicators are available on the Fingertips Website

The analysis focused on:
- extracting PHOF values for Hertfordshire and Bedfordshire
- comparing them with the England average
- linking them to monitored and modelled PM₂.₅ concentrations

The data was accessed using the fingertipsR package, which provides an interface to the Fingertips API. This package can be installed from rOpenSci or GitHub.

library(fingertipsR)


fingertips_data_death <- read_csv("https://fingertips.phe.org.uk/api/all_data/csv/for_one_indicator?indicator_id=93861")

## Rows: 2781 Columns: 27
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (15): Indicator Name, Parent Code, Parent Name, Area Code, Area Name, Ar...
## dbl  (4): Indicator ID, Time period, Value, Time period Sortable
## lgl  (8): Lower CI 95.0 limit, Upper CI 95.0 limit, Lower CI 99.8 limit, Upp...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

fingertips_data_conc <- read_csv("https://fingertips.phe.org.uk/api/all_data/csv/for_one_indicator?indicator_id=93867")

## Rows: 2742 Columns: 27
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (15): Indicator Name, Parent Code, Parent Name, Area Code, Area Name, Ar...
## dbl  (4): Indicator ID, Time period, Value, Time period Sortable
## lgl  (8): Lower CI 95.0 limit, Upper CI 95.0 limit, Lower CI 99.8 limit, Upp...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

HB_conc <- fingertips_data_conc %>%
  filter(`Area Name` %in% c("Broxbourne", "Central Bedfordshire", "St Albans", "Dacorum", "England", "Hertsmere", "East Hertfordshire", "Welwyn Hatfield", "Hertfordshire", "Bedfordshire", "Luton", "Stevenage", "Three River", "Watford", "North Hertfordshire", "Broxbourne", "England")) %>%
  select(`Indicator Name`, `Area Name`, `Time period`, Value) %>%
  rename(Conc_pm2.5 = `Indicator Name`, Value_conc = Value)

HB_deaths <- fingertips_data_death %>%
  filter(`Area Name` %in% c("St Albans", "Dacorum", "Hertsmere", "East Hertfordshire", "Welwyn Hatfield", "Hertfordshire", "Bedfordshire", "Central Bedfordshire", "Luton", "Stevenage", "Three River", "Watford", "North Hertfordshire", "Broxbourne", "England")) %>%
  select(`Indicator Name`, `Area Name`, `Time period`, Value) %>%
  rename(Value_deaths = Value, Percentage_deaths = `Indicator Name`)

HB_conc_deaths <- left_join(HB_conc, HB_deaths, by = c("Area Name", "Time period"))

## Warning in left_join(HB_conc, HB_deaths, by = c("Area Name", "Time period")): Detected an unexpected many-to-many relationship between `x` and `y`.
## ℹ Row 1 of `x` matches multiple rows in `y`.
## ℹ Row 34 of `y` matches multiple rows in `x`.
## ℹ If a many-to-many relationship is expected, set `relationship =
##   "many-to-many"` to silence this warning.

ggplot(HB_conc_deaths, aes(x = `Time period`)) +
  geom_line(aes(y = Value_conc, 
                color = "Annual concentration of PM2.5 adjusted to account for population exposure"), 
            linewidth = 1) +
  geom_line(aes(y = Value_deaths, 
                color = "Estimated number of deaths attributable to PM2.5"), 
            linewidth = 1) +
  scale_y_continuous(
    name = "PM2.5 Concentration (µg/m³)",
    sec.axis = sec_axis(~ ., name = "Attributable Mortality (number of deaths)")
  ) +
  scale_color_manual(values = c(
    "Annual concentration of PM2.5 adjusted to account for population exposure" = "darkblue",
    "Estimated number of deaths attributable to PM2.5" = "firebrick"
  )) +
  facet_wrap(~`Area Name`) +
  labs(
    title = "PHOF Fine Particulate Matter Indicators",
    x = "Year",
    color = "Indicator"
  ) +
  theme_minimal() +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1),
    legend.position = "bottom"
  )

Conclusion

This workflow demonstrates a transparent and reproducible approach to analysing PM₂.₅ data for Hertfordshire and Bedfordshire using R. By combining daily‑mean datasets, annual‑mean metadata, and separately downloaded hourly data, the analysis reflects the structure and limitations of the available evidence while ensuring that each analytical step uses the most appropriate level of temporal detail.

The workflow shows how reference‑grade and indicative datasets can be harmonised, validated, and explored using open‑source tools. It also documents key quality‑assurance decisions — such as excluding the faulty Hockerill Street sensor and applying Defra’s data‑capture thresholds — to ensure that the results are robust and defensible.

Although this document focuses on methodology rather than interpretation, the outputs support the findings presented in the Particulate Matter (PM₂.₅) in Ambient Air 2024 report: PM₂.₅ concentrations across the region remain low, long‑term trends continue to decline, and moderate pollution episodes are now rare. The workflow also provides a foundation for future analyses, including the Population Exposure Reduction Target (PERT) calculations that will be incorporated once the methodology is finalised.

By publishing this workflow on RPubs, the aim is to make the analytical process open, accessible, and reproducible — supporting good practice in environmental data analysis and demonstrating how R can be used to produce clear, defensible evidence for public‑health reporting.

Data Sources

Air Quality England (AQE) – Reference‑grade PM₂.₅ monitoring data (daily and hourly). (airqualityengland.co.uk)
Defra Modelled Background Maps (1 km grid) – Annual mean PM₂.₅ background concentrations. (uk-air.defra.gov.uk)
Public Health Outcomes Framework (PHOF) – Indicator D.01 (fraction of mortality attributable to PM₂.₅). (fingertips.phe.org.uk)
AirScan Multi‑Pollutant Sensors – Site‑level CSV files provided by Hertfordshire County Council.

Guidance and Methodology

Defra (2022). Local Air Quality Management Technical Guidance (LAQM TG22). Provides requirements for data capture thresholds, annualisation, and reporting standards.
Environmental Targets (Fine Particulate Matter) (England) Regulations 2023. Defines the Annual Mean Concentration Target (AMCT) and Population Exposure Reduction Target (PERT).

R Packages

openair – Tools for air‑quality data analysis and visualisation. Carslaw, D.C. & Ropkins, K. (2012). openair — An R package for air quality data analysis.
tidyverse – Data wrangling, plotting, and workflow support.
data.table – Efficient import and manipulation of large datasets.
fingertipsR – Access to Public Health England indicators.
leaflet – Interactive mapping for R, based on the Leaflet JavaScript library.
openairmaps – Mapping tools designed for air‑quality datasets, integrating seamlessly with openair

Report Referenced

Particulate Matter (PM₂.₅) in Ambient Air 202 – Hertfordshire & Bedfordshire Air Quality Forum. (Cerys Williams, 2025)

Herts PM2.5 2026 Report

Cerys Williams

2026-04-24