1 Introduction

The NHS Referral to Treatment (RTT) pathway measures patient waiting times for elective care. This project focuses on delays in referral to treatment and patterns within acute NHS trusts between April 2024 and February 2025, aiming to uncover key insights and trends.

Under the NHS Constitution, patients have the right to start treatment within 18 weeks, with a historical operational standard of 92% compliance. However, multiple challenges in recent years, including the COVID-19 pandemic backlog, staff shortages, and rising healthcare demand, have led to significant breaches of this standard.

Today, many NHS trusts operate with more realistic goals of 60–70% compliance, making breach management a critical operational priority.

This project specifically examines Incomplete Pathways and subsequent patterns across acute trusts.

For reference on data collection methodologies and system limitations, the latest NHS RTT monthly report (February 2025) is attached: England NHS RTT Statistics - February 2025 (https://www.england.nhs.uk/statistics/wp-content/uploads/sites/2/2025/04/Feb25-RTT-SPN-Publication-PDF-386K-76538.pdf)

2 Data Sources and Methodology

This project draws upon two primary data sources:

The published NHS Referral to Treatment (RTT) monthly datasets, covering April 2024 to February 2025.
The ODS register of Acute NHS Trusts, manually cleaned to isolate relevant providers for analysis.

Data preprocessing involved cleaning and structuring the RTT data, targeting “Incomplete Pathways” and “Incomplete Pathways with Decision to Admit (DTA)” and additionally “Completed Pathways” for further analysis.

Several key metrics were developed:

Total backlog
% Over 18 Weeks
% Over 52 Weeks
Trust-level delay profiles

Analytical Approaches:

To explore underlying patterns in trust performance, two complementary approaches were applied. Hierarchical clustering was conducted to group trusts based on their breach rates, providing a similarity-based segmentation. Separately, a typology framework was developed, categorizing trusts by their overall treatment volumes and delay severity to create more interpretable and strategic groupings.

It is important to note several limitations. This analysis covers only an 11-month period, rather than a full multi-year view. The study is restricted to acute trusts, with community and specialist trusts excluded for consistency. Additionally, the share of patients with a decision to admit (DTA) within incomplete pathways was found to be relatively stable during the period and has been documented but not deeply modeled in this report.

3 Analytical Focus

3.1 Category-Wise Patient Volumes

Patients were analyzed across three RTT reporting categories:

Incomplete Pathways
Incomplete Pathways with Decision to Admit (DTA)
Completed Pathways (Admitted Patients)

file_paths <- list.files(
  here("data", "raw"),
  pattern = "\\.csv$",
  full.names = TRUE
)

if (length(file_paths) == 0) {
  stop("No CSV files found in 'data/raw/'. Please check the folder and try again.")
}


message(length(file_paths), " files found for processing.")

## 11 files found for processing.

get_month_from_filename <- function(path) {
  file <- basename(path)
  str_extract(file, "(?i)(January|February|March|April|May|June|July|August|September|October|November|December)-20\\d{2}") %>%
    parse_date(format = "%B-%Y")
}

message("Month extraction function created.")

## Month extraction function created.

rtt_data <- map_dfr(file_paths, ~ {
  read_csv(.x, show_col_types = FALSE) %>%
    clean_names() %>%
    mutate(period = get_month_from_filename(.x))
})


if (nrow(rtt_data) == 0) {
  stop("No data loaded. Please check if the CSV files are correctly formatted.")
}


message(nrow(rtt_data), " rows loaded into rtt_data.")

## 2037283 rows loaded into rtt_data.

if (!"rtt_part_description" %in% colnames(rtt_data)) {
  stop("Column `rtt_part_description` is missing in rtt_data. Check CSV structure.")
}

rtt_data <- rtt_data %>%
  mutate(rtt_part_description = str_to_lower(rtt_part_description))

message("Data preparation completed successfully.")

## Data preparation completed successfully.

file_list <- list.files(
  here("data", "raw"), 
  pattern = "\\.csv$", 
  full.names = TRUE
)


if (length(file_list) == 0) {
  stop("No CSV files found in 'data/raw/'. Please check the folder and files.")
}


rtt_all <- file_list %>%
  map_dfr(~ read_csv(.x, show_col_types = FALSE)) %>%
  clean_names()

if (nrow(rtt_all) == 0) {
  stop("Data loading failed — 'rtt_all' is empty after combining files.")
}


dir.create(here("data", "processed"), showWarnings = FALSE, recursive = TRUE)

write_csv(rtt_all, here("data", "processed", "rtt_all_months.csv"))

# Preview
#glimpse(rtt_all)

# Confirmation message
message("rtt_all_months.csv saved successfully into 'data/processed/'.")

## rtt_all_months.csv saved successfully into 'data/processed/'.

# Step 2: Load summary dataset
monthly <- read_csv(
  here("output", "monthly_summary.csv"),
  show_col_types = FALSE
)

# Safety check: Confirm data loaded
if (nrow(monthly) == 0) {
  stop("Loaded 'monthly_summary.csv' is empty. Please check the output folder and file contents.")
}

# Step 3: Convert period to proper date (if not already)
monthly <- monthly %>%
  mutate(period = as.Date(period))

# Step 4: Plot % Over 18 Weeks
plot_18 <- monthly %>%
  ggplot(aes(x = period, y = percent_over_18, color = category)) +
  geom_line(size = 0.8) +
  geom_point() +
  labs(
    title = "% of Patients Waiting Over 18 Weeks",
    x = "Month",
    y = "% Over 18 Weeks",
    color = "RTT Category"
  ) +
  scale_y_continuous(labels = percent_format(scale = 1)) +
  theme_minimal(base_size = 10) +
  theme(
    legend.position = "bottom",
    legend.direction = "horizontal",
    legend.title = element_text(size = 10),
    legend.text = element_text(size = 9)
  )

# Step 5: Plot % Over 52 Weeks (exclude completed)
plot_52 <- monthly %>%
  filter(!str_detect(category, "Completed")) %>%
  ggplot(aes(x = period, y = percent_over_52, color = category)) +
  geom_line(size = 0.8) +
  geom_point() +
  labs(
    title = "% of Patients Waiting Over 52 Weeks",
    x = "Month",
    y = "% Over 52 Weeks",
    color = "RTT Category"
  ) +
  scale_y_continuous(labels = percent_format(scale = 1)) +
  theme_minimal(base_size = 10) +
  theme(
    legend.position = "bottom",
    legend.direction = "horizontal",
    legend.title = element_text(size = 10),
    legend.text = element_text(size = 9)
  )

# Step 6: Plot total patients by category (line plot)
plot_total_trend <- ggplot(monthly, aes(x = period, y = total_patients, color = category)) +
  geom_line(size = 0.8) +
  geom_point() +
  scale_x_date(date_labels = "%b '%y", date_breaks = "1 month") +
  scale_y_continuous(labels = label_number(scale_cut = cut_si("M"))) +
  scale_color_brewer(palette = "Dark2") +
  labs(
    title = "Monthly Total Patients by RTT Category",
    x = "Month", y = "Total Patients", color = "Category"
  ) +
  theme_minimal(base_size = 10) +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1),
    legend.position = "bottom",
    legend.direction = "horizontal",
    legend.title = element_text(size = 10),
    legend.text = element_text(size = 9)
  )

# Step 7: Print the plot
print(plot_52)

3.2 Delay Stability Over Time

print(plot_18)

Shorter-term breaches (>18 weeks) have remained persistently high, fluctuating between 40% and 50% across all RTT categories.

print(plot_total_trend)

Total patient volumes across pathways have remained relatively stable, suggesting steady referral pressure.

3.3 Backlog-to-Treatment Ratio

# Step 2: Load data
monthly <- read_csv(
  here("output", "monthly_summary.csv"),
  show_col_types = FALSE
)

# Safety check
if (nrow(monthly) == 0) {
  stop("Loaded 'monthly_summary.csv' is empty. Please check the output folder and file contents.")
}

# Step 3: Prepare backlog (Incomplete + DTA)
backlog_df <- monthly %>%
  filter(category %in% c("Incomplete Pathways", "Incomplete Pathways with DTA")) %>%
  group_by(period) %>%
  summarise(
    backlog = sum(total_patients, na.rm = TRUE),
    .groups = "drop"
  )

# Step 4: Prepare completions (Admitted only)
treated_df <- monthly %>%
  filter(category == "Completed Pathways (Admitted)") %>%
  select(period, treated = total_patients)

# Step 5: Join & calculate ratio
bt_ratio <- backlog_df %>%
  left_join(treated_df, by = "period") %>%
  mutate(
    backlog_to_treatment_ratio = round(backlog / treated, 2)
  )

# Step 6: Save output
dir.create(here("output"), showWarnings = FALSE, recursive = TRUE)
write_csv(bt_ratio, here("output", "backlog_to_treatment.csv"))

# Step 7: Preview
#print(bt_ratio, n = Inf)

# Step 8: Plot backlog-to-treatment ratio over time
plot_bt_ratio <- ggplot(bt_ratio, aes(x = period, y = backlog_to_treatment_ratio)) +
  geom_line(color = "#1f77b4", size = 0.8) +
  geom_point(color = "#1f77b4", size = 2) +
  labs(
    title = "Backlog-to-Treatment Ratio Over Time",
    subtitle = "Total patients waiting (Incomplete + DTA) per 1 treated (Admitted)",
    x = "Month",
    y = "Backlog-to-Treatment Ratio"
  ) +
  scale_x_date(date_labels = "%b '%y", date_breaks = "1 month") +
  scale_y_continuous(labels = label_number(accuracy = 1)) +
  theme_minimal(base_size = 12) +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1),
    plot.title = element_text(face = "bold", hjust = 0.5)
  )

# Step 9: Show the plot
print(plot_bt_ratio)

The ratio of patients waiting (Incomplete + DTA) to those treated (Admitted) fluctuated between 26 and 32 over the period, highlighting persistent system pressures.

3.4 Severity of Delay

# Step 1: Load monthly summary
monthly <- read_csv(
  here("output", "monthly_summary.csv"),
  show_col_types = FALSE
) %>%
  clean_names()

# Safety check
if (nrow(monthly) == 0) {
  stop("Loaded 'monthly_summary.csv' is empty. Please check the output folder and file contents.")
}

# Step 2: Filter backlog categories and calculate severity
severity_df <- monthly %>%
  filter(category %in% c("Incomplete Pathways", "Incomplete Pathways with DTA")) %>%
  group_by(period) %>%
  summarise(
    patients_over_18 = sum(over_18_weeks, na.rm = TRUE),
    patients_over_52 = sum(over_52_weeks, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  mutate(
    severity_ratio = round(patients_over_52 / patients_over_18, 3)
  )

# Step 3: Save output
dir.create(here("output"), showWarnings = FALSE, recursive = TRUE)
write_csv(severity_df, here("output", "severity_ratio.csv"))

# Step 4: Preview
#print(severity_df, n = Inf)

# Step 5: Load severity data (reloading from saved file)
severity_df <- read_csv(
  here("output", "severity_ratio.csv"),
  show_col_types = FALSE
)

# Step 6: Plot severity ratio trend
plot_severity_ratio <- ggplot(severity_df, aes(x = period, y = severity_ratio)) +
  geom_line(color = "#9467bd", size = 0.8) +
  geom_point(color = "#9467bd", size = 2) +
  scale_x_date(date_labels = "%b '%y", date_breaks = "1 month") +
  scale_y_continuous(
    labels = scales::percent_format(accuracy = 1),
    breaks = seq(0, 0.14, by = 0.02),
    limits = c(0, 0.14)
  ) +
  labs(
    title = "Severity of Delay: % of Over-18-Week Patients Waiting Over 52 Weeks",
    x = "Month",
    y = "Severity Ratio (%)"
  ) +
  theme_minimal(base_size = 11) +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1),
    plot.title = element_text(face = "bold", hjust = 0.5)
  )

# Step 7: Print the plot
print(plot_severity_ratio)

The proportion of patients breaching 52 weeks among those over 18 weeks declined from approximately 11% to under 8%, indicating some improvement in managing extreme delays.

4 NHS Trust Clustering Analysis and Typology

4.1 Trust-Level Delay Patterns

# Step 1: Load clustering output
trust_clusters <- read_csv(
  here("output", "clustering", "trust_clusters.csv"),
  show_col_types = FALSE
)

# Step 2: Load combined RTT data
rtt_all_months <- read_csv(
  here("data", "processed", "rtt_all_months.csv"),
  show_col_types = FALSE
) %>%
  clean_names()

# Step 3: Filter to acute trusts only
rtt_filtered <- rtt_all_months %>%
  filter(provider_org_code %in% trust_clusters$provider_org_code)

# Step 4: Get week-related columns
week_cols_all <- names(rtt_filtered)[str_detect(names(rtt_filtered), "^gt_.*")]

# Step 5: Calculate total patients per trust
trust_totals <- rtt_filtered %>%
  group_by(provider_org_code) %>%
  summarise(
    total_patients = sum(across(all_of(week_cols_all)), na.rm = TRUE),
    .groups = "drop"
  )

# Step 6: Merge totals with clustering and build hover text
trust_plot_data <- trust_clusters %>%
  left_join(trust_totals, by = "provider_org_code") %>%
  mutate(
    trust_label = str_to_title(provider_org_name),
    hover_text = paste0(
      trust_label, "<br>",
      "Over 18w Delays: ", round(percent_over_18, 1), "%<br>",
      "Over 52w Delays: ", round(percent_over_52, 1), "%<br>",
      "Total Patients: ", scales::comma(total_patients)
    )
  )

# Step 7: Create interactive scatter plot
plot_ly(
  data = trust_plot_data,
  x = ~percent_over_18,
  y = ~percent_over_52,
  type = "scatter",
  mode = "markers",
  text = ~hover_text,
  hoverinfo = "text",
  color = ~as.factor(cluster),
  colors = "Set2",
  marker = list(size = 8, opacity = 0.8),
  showlegend = FALSE
) %>%
 plotly::layout(
    title = "NHS Trust Clustering by RTT Delays (April 2024 – Feb 2025)",
    xaxis = list(title = "% Over 18 Weeks", tickformat = ".0f"),
    yaxis = list(title = "% Over 52 Weeks", tickformat = ".0f"),
    margin = list(l = 60, r = 50, b = 60, t = 60)
  )

Figure 4.1: Clustering of Acute NHS Trusts by RTT Delays.

The resulting scatter plot categorizes trusts into five broad clusters:

Cluster 1: High overall delays and higher proportions of severe waits.
Cluster 2: Low backlog rates, often smaller or specialist organizations.
Cluster 3: Moderate delays and better management of long waits.
Cluster 4: Lower volumes and variable delay management.
Cluster 5: High-risk outliers with very high breach rates.

4.2 Volume-Delay Typology

Following clustering analysis, a secondary classification was created to categorize NHS trusts based on two key metrics: total patient volume and delay severity.

A 3x3 matrix approach was used:

Volume: High, Medium, or Low total patients handled.
Delay Severity: High, Medium, or Low percentage of patients breaching the 18-week standard.

Each trust was assigned a typology label based on its position in this grid:

Trust Typology Matrix
Volume / Delay	Low Delay	Medium Delay	High Delay
Low Volume	Top Performing Small Trusts	Small Trusts Managing Well	Specialist or Small Trusts Under Pressure
Medium Volume	High Performing Mid-Sized Trusts	Mid-Sized Steady Performers	Mid-Sized Trusts Facing Challenges
High Volume	Leading Large Performers	Large Trusts Managing Backlogs Steadily	Large Trusts Struggling to Manage Delays

Typology provides a structured lens to identify trusts excelling under different pressures versus those requiring operational improvements.

# Load required libraries
library(tidyverse)
library(here)
library(knitr)

# Step 1: Load the trust delay summary CSV
trust_delay <- read_csv(here("output", "trust_typology_base.csv"), show_col_types = FALSE)

# Step 2: Define volume categories
trust_delay <- trust_delay %>%
  mutate(
    volume_category = case_when(
      total_patients >= 1500000 ~ "High Volume",
      total_patients >= 700000 ~ "Medium Volume",
      TRUE ~ "Low Volume"
    )
  )

# Step 3: Define delay categories
trust_delay <- trust_delay %>%
  mutate(
    delay_category = case_when(
      percent_over_18 >= 50 ~ "High Delay",
      percent_over_18 >= 30 ~ "Medium Delay",
      TRUE ~ "Low Delay"
    )
  )

# Step 4: Assign typology labels
trust_delay <- trust_delay %>%
  mutate(
    typology_label = case_when(
      volume_category == "High Volume" & delay_category == "High Delay" ~ "Large Trusts Struggling to Manage Delays",
      volume_category == "High Volume" & delay_category == "Medium Delay" ~ "Large Trusts Managing Backlogs Steadily",
      volume_category == "High Volume" & delay_category == "Low Delay" ~ "Leading Large Performers",

      volume_category == "Medium Volume" & delay_category == "High Delay" ~ "Mid-Sized Trusts Facing Challenges",
      volume_category == "Medium Volume" & delay_category == "Medium Delay" ~ "Mid-Sized Steady Performers",
      volume_category == "Medium Volume" & delay_category == "Low Delay" ~ "High Performing Mid-Sized Trusts",

      volume_category == "Low Volume" & delay_category == "High Delay" ~ "Specialist or Small Trusts Under Pressure",
      volume_category == "Low Volume" & delay_category == "Medium Delay" ~ "Small Trusts Managing Well",
      volume_category == "Low Volume" & delay_category == "Low Delay" ~ "Top Performing Small Trusts",

      TRUE ~ "Unclassified"
    )
  )

# Step 5: Save final output to project-root output folder
write_csv(trust_delay, here("output", "trust_typology_final.csv"))

# Step 6: Display first 20 rows in a neat table
knitr::kable(head(trust_delay, 20), caption = "Sample of Trust Typology Classification")

Sample of Trust Typology Classification
provider_org_code	provider_org_name	total_patients	patients_over_18	percent_over_18	volume_category	delay_category	typology_label
R0A	MANCHESTER UNIVERSITY NHS FOUNDATION TRUST	5674038	2934604	51.7	High Volume	High Delay	Large Trusts Struggling to Manage Delays
R0B	SOUTH TYNESIDE AND SUNDERLAND NHS FOUNDATION TRUST	1907618	565556	29.6	High Volume	Low Delay	Leading Large Performers
R0D	UNIVERSITY HOSPITALS DORSET NHS FOUNDATION TRUST	2097918	829544	39.5	High Volume	Medium Delay	Large Trusts Managing Backlogs Steadily
R1L	ESSEX PARTNERSHIP UNIVERSITY NHS FOUNDATION TRUST	26000	6086	23.4	Low Volume	Low Delay	Top Performing Small Trusts
RA2	ROYAL SURREY COUNTY HOSPITAL NHS FOUNDATION TRUST	1068140	486210	45.5	Medium Volume	Medium Delay	Mid-Sized Steady Performers
RA7	UNIVERSITY HOSPITALS BRISTOL AND WESTON NHS FOUNDATION TRUST	1785868	705630	39.5	High Volume	Medium Delay	Large Trusts Managing Backlogs Steadily
RA9	TORBAY AND SOUTH DEVON NHS FOUNDATION TRUST	1037662	402236	38.8	Medium Volume	Medium Delay	Mid-Sized Steady Performers
RAE	BRADFORD TEACHING HOSPITALS NHS FOUNDATION TRUST	1081246	420282	38.9	Medium Volume	Medium Delay	Mid-Sized Steady Performers
RAJ	MID AND SOUTH ESSEX NHS FOUNDATION TRUST	4603276	2195282	47.7	High Volume	Medium Delay	Large Trusts Managing Backlogs Steadily
RAL	ROYAL FREE LONDON NHS FOUNDATION TRUST	3151714	1396350	44.3	High Volume	Medium Delay	Large Trusts Managing Backlogs Steadily
RAS	THE HILLINGDON HOSPITALS NHS FOUNDATION TRUST	807996	374412	46.3	Medium Volume	Medium Delay	Mid-Sized Steady Performers
RAX	KINGSTON AND RICHMOND NHS FOUNDATION TRUST	435372	173382	39.8	Low Volume	Medium Delay	Small Trusts Managing Well
RAX	KINGSTON HOSPITAL NHS FOUNDATION TRUST	769784	294772	38.3	Medium Volume	Medium Delay	Mid-Sized Steady Performers
RBD	DORSET COUNTY HOSPITAL NHS FOUNDATION TRUST	694748	321912	46.3	Low Volume	Medium Delay	Small Trusts Managing Well
RBL	WIRRAL UNIVERSITY TEACHING HOSPITAL NHS FOUNDATION TRUST	1263958	539086	42.7	Medium Volume	Medium Delay	Mid-Sized Steady Performers
RBQ	LIVERPOOL HEART AND CHEST HOSPITAL NHS FOUNDATION TRUST	196812	65060	33.1	Low Volume	Medium Delay	Small Trusts Managing Well
RBS	ALDER HEY CHILDREN’S NHS FOUNDATION TRUST	674204	315184	46.7	Low Volume	Medium Delay	Small Trusts Managing Well
RBT	MID CHESHIRE HOSPITALS NHS FOUNDATION TRUST	1120446	503522	44.9	Medium Volume	Medium Delay	Mid-Sized Steady Performers
RBV	THE CHRISTIE NHS FOUNDATION TRUST	116836	3980	3.4	Low Volume	Low Delay	Top Performing Small Trusts
RC9	BEDFORDSHIRE HOSPITALS NHS FOUNDATION TRUST	2715622	1282306	47.2	High Volume	Medium Delay	Large Trusts Managing Backlogs Steadily

5 Limitations and Considerations

While the analysis offers valuable insights into RTT delays and patterns across NHS acute trusts, certain limitations should be noted:

Data Scope: The project focused on national-level RTT monthly datasets between April 2024 and February 2025. Trust-specific nuances, specialty-level variations, and short-term operational factors (e.g., seasonal surges) were outside the current scope.

Assumptions and Delay Calculations: Delay metrics were based on total patients and breach counts provided in the RTT datasets. Complexities like patient prioritization, emergency admissions, or multiple referrals could not be modeled.

Clustering Constraints: Hierarchical clustering was applied on a trust-level summary dataset (~112 trusts), not on individual patient records (~2 million rows), due to computational practicality and the need for interpretability. As a result, cluster boundaries are indicative and should be interpreted qualitatively rather than rigidly.

Typology Generalization: The typology categories (e.g., “Large Trusts Managing Backlogs Steadily”) were created to assist in interpretation and strategic thinking, but trusts within the same typology may still differ significantly based on local factors.

Data Reporting Variations: Some trusts had missing or inconsistent reporting in the monthly RTT data, meaning their inclusion in summary analyses reflects only available submissions.

These considerations are important when applying the findings in real-world NHS operational contexts.

6 Conclusion

This project explored patterns of RTT delay across NHS acute trusts through trend analysis, backlog characterization, clustering, and typology development. The findings reveal significant trust-level variations and highlight the need for tailored operational strategies rather than generalized policies.

Both clustering results and volume-delay typologies offer structured frameworks to identify areas needing attention, supporting targeted improvements and better backlog management across the NHS.

NHS Referral to Treatment (RTT) Delay Analysis Report

Abhishek Patil

April 2025