1 Introduction

The NHS Referral to Treatment (RTT) pathway measures patient waiting times for elective care. This project focuses on delays in referral to treatment and patterns within acute NHS trusts between April 2024 and February 2025, aiming to uncover key insights and trends.

Under the NHS Constitution, patients have the right to start treatment within 18 weeks, with a historical operational standard of 92% compliance. However, multiple challenges in recent years, including the COVID-19 pandemic backlog, staff shortages, and rising healthcare demand, have led to significant breaches of this standard.

Today, many NHS trusts operate with more realistic goals of 60–70% compliance, making breach management a critical operational priority.

This project specifically examines Incomplete Pathways and subsequent patterns across acute trusts.

For reference on data collection methodologies and system limitations, the latest NHS RTT monthly report (February 2025) is attached: England NHS RTT Statistics - February 2025 (https://www.england.nhs.uk/statistics/wp-content/uploads/sites/2/2025/04/Feb25-RTT-SPN-Publication-PDF-386K-76538.pdf)

2 Data Sources and Methodology

This project draws upon two primary data sources:

Data preprocessing involved cleaning and structuring the RTT data, targeting “Incomplete Pathways” and “Incomplete Pathways with Decision to Admit (DTA)” and additionally “Completed Pathways” for further analysis.

Several key metrics were developed:

Analytical Approaches:

To explore underlying patterns in trust performance, two complementary approaches were applied. Hierarchical clustering was conducted to group trusts based on their breach rates, providing a similarity-based segmentation. Separately, a typology framework was developed, categorizing trusts by their overall treatment volumes and delay severity to create more interpretable and strategic groupings.

It is important to note several limitations. This analysis covers only an 11-month period, rather than a full multi-year view. The study is restricted to acute trusts, with community and specialist trusts excluded for consistency. Additionally, the share of patients with a decision to admit (DTA) within incomplete pathways was found to be relatively stable during the period and has been documented but not deeply modeled in this report.

3 Analytical Focus

3.1 Category-Wise Patient Volumes

Patients were analyzed across three RTT reporting categories:

  • Incomplete Pathways
  • Incomplete Pathways with Decision to Admit (DTA)
  • Completed Pathways (Admitted Patients)
file_paths <- list.files(
  here("data", "raw"),
  pattern = "\\.csv$",
  full.names = TRUE
)

if (length(file_paths) == 0) {
  stop("No CSV files found in 'data/raw/'. Please check the folder and try again.")
}


message(length(file_paths), " files found for processing.")
## 11 files found for processing.
get_month_from_filename <- function(path) {
  file <- basename(path)
  str_extract(file, "(?i)(January|February|March|April|May|June|July|August|September|October|November|December)-20\\d{2}") %>%
    parse_date(format = "%B-%Y")
}

message("Month extraction function created.")
## Month extraction function created.
rtt_data <- map_dfr(file_paths, ~ {
  read_csv(.x, show_col_types = FALSE) %>%
    clean_names() %>%
    mutate(period = get_month_from_filename(.x))
})


if (nrow(rtt_data) == 0) {
  stop("No data loaded. Please check if the CSV files are correctly formatted.")
}


message(nrow(rtt_data), " rows loaded into rtt_data.")
## 2037283 rows loaded into rtt_data.
if (!"rtt_part_description" %in% colnames(rtt_data)) {
  stop("Column `rtt_part_description` is missing in rtt_data. Check CSV structure.")
}

rtt_data <- rtt_data %>%
  mutate(rtt_part_description = str_to_lower(rtt_part_description))

message("Data preparation completed successfully.")
## Data preparation completed successfully.
file_list <- list.files(
  here("data", "raw"), 
  pattern = "\\.csv$", 
  full.names = TRUE
)


if (length(file_list) == 0) {
  stop("No CSV files found in 'data/raw/'. Please check the folder and files.")
}


rtt_all <- file_list %>%
  map_dfr(~ read_csv(.x, show_col_types = FALSE)) %>%
  clean_names()

if (nrow(rtt_all) == 0) {
  stop("Data loading failed — 'rtt_all' is empty after combining files.")
}


dir.create(here("data", "processed"), showWarnings = FALSE, recursive = TRUE)

write_csv(rtt_all, here("data", "processed", "rtt_all_months.csv"))

# Preview
#glimpse(rtt_all)

# Confirmation message
message("rtt_all_months.csv saved successfully into 'data/processed/'.")
## rtt_all_months.csv saved successfully into 'data/processed/'.
# Step 2: Load summary dataset
monthly <- read_csv(
  here("output", "monthly_summary.csv"),
  show_col_types = FALSE
)

# Safety check: Confirm data loaded
if (nrow(monthly) == 0) {
  stop("Loaded 'monthly_summary.csv' is empty. Please check the output folder and file contents.")
}

# Step 3: Convert period to proper date (if not already)
monthly <- monthly %>%
  mutate(period = as.Date(period))

# Step 4: Plot % Over 18 Weeks
plot_18 <- monthly %>%
  ggplot(aes(x = period, y = percent_over_18, color = category)) +
  geom_line(size = 0.8) +
  geom_point() +
  labs(
    title = "% of Patients Waiting Over 18 Weeks",
    x = "Month",
    y = "% Over 18 Weeks",
    color = "RTT Category"
  ) +
  scale_y_continuous(labels = percent_format(scale = 1)) +
  theme_minimal(base_size = 10) +
  theme(
    legend.position = "bottom",
    legend.direction = "horizontal",
    legend.title = element_text(size = 10),
    legend.text = element_text(size = 9)
  )

# Step 5: Plot % Over 52 Weeks (exclude completed)
plot_52 <- monthly %>%
  filter(!str_detect(category, "Completed")) %>%
  ggplot(aes(x = period, y = percent_over_52, color = category)) +
  geom_line(size = 0.8) +
  geom_point() +
  labs(
    title = "% of Patients Waiting Over 52 Weeks",
    x = "Month",
    y = "% Over 52 Weeks",
    color = "RTT Category"
  ) +
  scale_y_continuous(labels = percent_format(scale = 1)) +
  theme_minimal(base_size = 10) +
  theme(
    legend.position = "bottom",
    legend.direction = "horizontal",
    legend.title = element_text(size = 10),
    legend.text = element_text(size = 9)
  )

# Step 6: Plot total patients by category (line plot)
plot_total_trend <- ggplot(monthly, aes(x = period, y = total_patients, color = category)) +
  geom_line(size = 0.8) +
  geom_point() +
  scale_x_date(date_labels = "%b '%y", date_breaks = "1 month") +
  scale_y_continuous(labels = label_number(scale_cut = cut_si("M"))) +
  scale_color_brewer(palette = "Dark2") +
  labs(
    title = "Monthly Total Patients by RTT Category",
    x = "Month", y = "Total Patients", color = "Category"
  ) +
  theme_minimal(base_size = 10) +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1),
    legend.position = "bottom",
    legend.direction = "horizontal",
    legend.title = element_text(size = 10),
    legend.text = element_text(size = 9)
  )

# Step 7: Print the plot
print(plot_52)

3.2 Delay Stability Over Time

print(plot_18)

Shorter-term breaches (>18 weeks) have remained persistently high, fluctuating between 40% and 50% across all RTT categories.

print(plot_total_trend)

Total patient volumes across pathways have remained relatively stable, suggesting steady referral pressure.

3.3 Backlog-to-Treatment Ratio

# Step 2: Load data
monthly <- read_csv(
  here("output", "monthly_summary.csv"),
  show_col_types = FALSE
)

# Safety check
if (nrow(monthly) == 0) {
  stop("Loaded 'monthly_summary.csv' is empty. Please check the output folder and file contents.")
}

# Step 3: Prepare backlog (Incomplete + DTA)
backlog_df <- monthly %>%
  filter(category %in% c("Incomplete Pathways", "Incomplete Pathways with DTA")) %>%
  group_by(period) %>%
  summarise(
    backlog = sum(total_patients, na.rm = TRUE),
    .groups = "drop"
  )

# Step 4: Prepare completions (Admitted only)
treated_df <- monthly %>%
  filter(category == "Completed Pathways (Admitted)") %>%
  select(period, treated = total_patients)

# Step 5: Join & calculate ratio
bt_ratio <- backlog_df %>%
  left_join(treated_df, by = "period") %>%
  mutate(
    backlog_to_treatment_ratio = round(backlog / treated, 2)
  )

# Step 6: Save output
dir.create(here("output"), showWarnings = FALSE, recursive = TRUE)
write_csv(bt_ratio, here("output", "backlog_to_treatment.csv"))

# Step 7: Preview
#print(bt_ratio, n = Inf)

# Step 8: Plot backlog-to-treatment ratio over time
plot_bt_ratio <- ggplot(bt_ratio, aes(x = period, y = backlog_to_treatment_ratio)) +
  geom_line(color = "#1f77b4", size = 0.8) +
  geom_point(color = "#1f77b4", size = 2) +
  labs(
    title = "Backlog-to-Treatment Ratio Over Time",
    subtitle = "Total patients waiting (Incomplete + DTA) per 1 treated (Admitted)",
    x = "Month",
    y = "Backlog-to-Treatment Ratio"
  ) +
  scale_x_date(date_labels = "%b '%y", date_breaks = "1 month") +
  scale_y_continuous(labels = label_number(accuracy = 1)) +
  theme_minimal(base_size = 12) +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1),
    plot.title = element_text(face = "bold", hjust = 0.5)
  )

# Step 9: Show the plot
print(plot_bt_ratio)

The ratio of patients waiting (Incomplete + DTA) to those treated (Admitted) fluctuated between 26 and 32 over the period, highlighting persistent system pressures.

3.4 Severity of Delay

# Step 1: Load monthly summary
monthly <- read_csv(
  here("output", "monthly_summary.csv"),
  show_col_types = FALSE
) %>%
  clean_names()

# Safety check
if (nrow(monthly) == 0) {
  stop("Loaded 'monthly_summary.csv' is empty. Please check the output folder and file contents.")
}

# Step 2: Filter backlog categories and calculate severity
severity_df <- monthly %>%
  filter(category %in% c("Incomplete Pathways", "Incomplete Pathways with DTA")) %>%
  group_by(period) %>%
  summarise(
    patients_over_18 = sum(over_18_weeks, na.rm = TRUE),
    patients_over_52 = sum(over_52_weeks, na.rm = TRUE),
    .groups = "drop"
  ) %>%
  mutate(
    severity_ratio = round(patients_over_52 / patients_over_18, 3)
  )

# Step 3: Save output
dir.create(here("output"), showWarnings = FALSE, recursive = TRUE)
write_csv(severity_df, here("output", "severity_ratio.csv"))

# Step 4: Preview
#print(severity_df, n = Inf)

# Step 5: Load severity data (reloading from saved file)
severity_df <- read_csv(
  here("output", "severity_ratio.csv"),
  show_col_types = FALSE
)

# Step 6: Plot severity ratio trend
plot_severity_ratio <- ggplot(severity_df, aes(x = period, y = severity_ratio)) +
  geom_line(color = "#9467bd", size = 0.8) +
  geom_point(color = "#9467bd", size = 2) +
  scale_x_date(date_labels = "%b '%y", date_breaks = "1 month") +
  scale_y_continuous(
    labels = scales::percent_format(accuracy = 1),
    breaks = seq(0, 0.14, by = 0.02),
    limits = c(0, 0.14)
  ) +
  labs(
    title = "Severity of Delay: % of Over-18-Week Patients Waiting Over 52 Weeks",
    x = "Month",
    y = "Severity Ratio (%)"
  ) +
  theme_minimal(base_size = 11) +
  theme(
    axis.text.x = element_text(angle = 45, hjust = 1),
    plot.title = element_text(face = "bold", hjust = 0.5)
  )

# Step 7: Print the plot
print(plot_severity_ratio)

The proportion of patients breaching 52 weeks among those over 18 weeks declined from approximately 11% to under 8%, indicating some improvement in managing extreme delays.

4 NHS Trust Clustering Analysis and Typology

4.1 Trust-Level Delay Patterns

# Step 1: Load clustering output
trust_clusters <- read_csv(
  here("output", "clustering", "trust_clusters.csv"),
  show_col_types = FALSE
)

# Step 2: Load combined RTT data
rtt_all_months <- read_csv(
  here("data", "processed", "rtt_all_months.csv"),
  show_col_types = FALSE
) %>%
  clean_names()

# Step 3: Filter to acute trusts only
rtt_filtered <- rtt_all_months %>%
  filter(provider_org_code %in% trust_clusters$provider_org_code)

# Step 4: Get week-related columns
week_cols_all <- names(rtt_filtered)[str_detect(names(rtt_filtered), "^gt_.*")]

# Step 5: Calculate total patients per trust
trust_totals <- rtt_filtered %>%
  group_by(provider_org_code) %>%
  summarise(
    total_patients = sum(across(all_of(week_cols_all)), na.rm = TRUE),
    .groups = "drop"
  )

# Step 6: Merge totals with clustering and build hover text
trust_plot_data <- trust_clusters %>%
  left_join(trust_totals, by = "provider_org_code") %>%
  mutate(
    trust_label = str_to_title(provider_org_name),
    hover_text = paste0(
      trust_label, "<br>",
      "Over 18w Delays: ", round(percent_over_18, 1), "%<br>",
      "Over 52w Delays: ", round(percent_over_52, 1), "%<br>",
      "Total Patients: ", scales::comma(total_patients)
    )
  )

# Step 7: Create interactive scatter plot
plot_ly(
  data = trust_plot_data,
  x = ~percent_over_18,
  y = ~percent_over_52,
  type = "scatter",
  mode = "markers",
  text = ~hover_text,
  hoverinfo = "text",
  color = ~as.factor(cluster),
  colors = "Set2",
  marker = list(size = 8, opacity = 0.8),
  showlegend = FALSE
) %>%
 plotly::layout(
    title = "NHS Trust Clustering by RTT Delays (April 2024 – Feb 2025)",
    xaxis = list(title = "% Over 18 Weeks", tickformat = ".0f"),
    yaxis = list(title = "% Over 52 Weeks", tickformat = ".0f"),
    margin = list(l = 60, r = 50, b = 60, t = 60)
  )

Figure 4.1: Clustering of Acute NHS Trusts by RTT Delays.

The resulting scatter plot categorizes trusts into five broad clusters:

  • Cluster 1: High overall delays and higher proportions of severe waits.
  • Cluster 2: Low backlog rates, often smaller or specialist organizations.
  • Cluster 3: Moderate delays and better management of long waits.
  • Cluster 4: Lower volumes and variable delay management.
  • Cluster 5: High-risk outliers with very high breach rates.

4.2 Volume-Delay Typology

Following clustering analysis, a secondary classification was created to categorize NHS trusts based on two key metrics: total patient volume and delay severity.

A 3x3 matrix approach was used:

  • Volume: High, Medium, or Low total patients handled.

  • Delay Severity: High, Medium, or Low percentage of patients breaching the 18-week standard.

Each trust was assigned a typology label based on its position in this grid:

Trust Typology Matrix
Volume / Delay Low Delay Medium Delay High Delay
Low Volume Top Performing Small Trusts Small Trusts Managing Well Specialist or Small Trusts Under Pressure
Medium Volume High Performing Mid-Sized Trusts Mid-Sized Steady Performers Mid-Sized Trusts Facing Challenges
High Volume Leading Large Performers Large Trusts Managing Backlogs Steadily Large Trusts Struggling to Manage Delays

Typology provides a structured lens to identify trusts excelling under different pressures versus those requiring operational improvements.

# Load required libraries
library(tidyverse)
library(here)
library(knitr)

# Step 1: Load the trust delay summary CSV
trust_delay <- read_csv(here("output", "trust_typology_base.csv"), show_col_types = FALSE)

# Step 2: Define volume categories
trust_delay <- trust_delay %>%
  mutate(
    volume_category = case_when(
      total_patients >= 1500000 ~ "High Volume",
      total_patients >= 700000 ~ "Medium Volume",
      TRUE ~ "Low Volume"
    )
  )

# Step 3: Define delay categories
trust_delay <- trust_delay %>%
  mutate(
    delay_category = case_when(
      percent_over_18 >= 50 ~ "High Delay",
      percent_over_18 >= 30 ~ "Medium Delay",
      TRUE ~ "Low Delay"
    )
  )

# Step 4: Assign typology labels
trust_delay <- trust_delay %>%
  mutate(
    typology_label = case_when(
      volume_category == "High Volume" & delay_category == "High Delay" ~ "Large Trusts Struggling to Manage Delays",
      volume_category == "High Volume" & delay_category == "Medium Delay" ~ "Large Trusts Managing Backlogs Steadily",
      volume_category == "High Volume" & delay_category == "Low Delay" ~ "Leading Large Performers",

      volume_category == "Medium Volume" & delay_category == "High Delay" ~ "Mid-Sized Trusts Facing Challenges",
      volume_category == "Medium Volume" & delay_category == "Medium Delay" ~ "Mid-Sized Steady Performers",
      volume_category == "Medium Volume" & delay_category == "Low Delay" ~ "High Performing Mid-Sized Trusts",

      volume_category == "Low Volume" & delay_category == "High Delay" ~ "Specialist or Small Trusts Under Pressure",
      volume_category == "Low Volume" & delay_category == "Medium Delay" ~ "Small Trusts Managing Well",
      volume_category == "Low Volume" & delay_category == "Low Delay" ~ "Top Performing Small Trusts",

      TRUE ~ "Unclassified"
    )
  )

# Step 5: Save final output to project-root output folder
write_csv(trust_delay, here("output", "trust_typology_final.csv"))

# Step 6: Display first 20 rows in a neat table
knitr::kable(head(trust_delay, 20), caption = "Sample of Trust Typology Classification")
Sample of Trust Typology Classification
provider_org_code provider_org_name total_patients patients_over_18 percent_over_18 volume_category delay_category typology_label
R0A MANCHESTER UNIVERSITY NHS FOUNDATION TRUST 5674038 2934604 51.7 High Volume High Delay Large Trusts Struggling to Manage Delays
R0B SOUTH TYNESIDE AND SUNDERLAND NHS FOUNDATION TRUST 1907618 565556 29.6 High Volume Low Delay Leading Large Performers
R0D UNIVERSITY HOSPITALS DORSET NHS FOUNDATION TRUST 2097918 829544 39.5 High Volume Medium Delay Large Trusts Managing Backlogs Steadily
R1L ESSEX PARTNERSHIP UNIVERSITY NHS FOUNDATION TRUST 26000 6086 23.4 Low Volume Low Delay Top Performing Small Trusts
RA2 ROYAL SURREY COUNTY HOSPITAL NHS FOUNDATION TRUST 1068140 486210 45.5 Medium Volume Medium Delay Mid-Sized Steady Performers
RA7 UNIVERSITY HOSPITALS BRISTOL AND WESTON NHS FOUNDATION TRUST 1785868 705630 39.5 High Volume Medium Delay Large Trusts Managing Backlogs Steadily
RA9 TORBAY AND SOUTH DEVON NHS FOUNDATION TRUST 1037662 402236 38.8 Medium Volume Medium Delay Mid-Sized Steady Performers
RAE BRADFORD TEACHING HOSPITALS NHS FOUNDATION TRUST 1081246 420282 38.9 Medium Volume Medium Delay Mid-Sized Steady Performers
RAJ MID AND SOUTH ESSEX NHS FOUNDATION TRUST 4603276 2195282 47.7 High Volume Medium Delay Large Trusts Managing Backlogs Steadily
RAL ROYAL FREE LONDON NHS FOUNDATION TRUST 3151714 1396350 44.3 High Volume Medium Delay Large Trusts Managing Backlogs Steadily
RAS THE HILLINGDON HOSPITALS NHS FOUNDATION TRUST 807996 374412 46.3 Medium Volume Medium Delay Mid-Sized Steady Performers
RAX KINGSTON AND RICHMOND NHS FOUNDATION TRUST 435372 173382 39.8 Low Volume Medium Delay Small Trusts Managing Well
RAX KINGSTON HOSPITAL NHS FOUNDATION TRUST 769784 294772 38.3 Medium Volume Medium Delay Mid-Sized Steady Performers
RBD DORSET COUNTY HOSPITAL NHS FOUNDATION TRUST 694748 321912 46.3 Low Volume Medium Delay Small Trusts Managing Well
RBL WIRRAL UNIVERSITY TEACHING HOSPITAL NHS FOUNDATION TRUST 1263958 539086 42.7 Medium Volume Medium Delay Mid-Sized Steady Performers
RBQ LIVERPOOL HEART AND CHEST HOSPITAL NHS FOUNDATION TRUST 196812 65060 33.1 Low Volume Medium Delay Small Trusts Managing Well
RBS ALDER HEY CHILDREN’S NHS FOUNDATION TRUST 674204 315184 46.7 Low Volume Medium Delay Small Trusts Managing Well
RBT MID CHESHIRE HOSPITALS NHS FOUNDATION TRUST 1120446 503522 44.9 Medium Volume Medium Delay Mid-Sized Steady Performers
RBV THE CHRISTIE NHS FOUNDATION TRUST 116836 3980 3.4 Low Volume Low Delay Top Performing Small Trusts
RC9 BEDFORDSHIRE HOSPITALS NHS FOUNDATION TRUST 2715622 1282306 47.2 High Volume Medium Delay Large Trusts Managing Backlogs Steadily

5 Limitations and Considerations

While the analysis offers valuable insights into RTT delays and patterns across NHS acute trusts, certain limitations should be noted:

Data Scope: The project focused on national-level RTT monthly datasets between April 2024 and February 2025. Trust-specific nuances, specialty-level variations, and short-term operational factors (e.g., seasonal surges) were outside the current scope.

Assumptions and Delay Calculations: Delay metrics were based on total patients and breach counts provided in the RTT datasets. Complexities like patient prioritization, emergency admissions, or multiple referrals could not be modeled.

Clustering Constraints: Hierarchical clustering was applied on a trust-level summary dataset (~112 trusts), not on individual patient records (~2 million rows), due to computational practicality and the need for interpretability. As a result, cluster boundaries are indicative and should be interpreted qualitatively rather than rigidly.

Typology Generalization: The typology categories (e.g., “Large Trusts Managing Backlogs Steadily”) were created to assist in interpretation and strategic thinking, but trusts within the same typology may still differ significantly based on local factors.

Data Reporting Variations: Some trusts had missing or inconsistent reporting in the monthly RTT data, meaning their inclusion in summary analyses reflects only available submissions.

These considerations are important when applying the findings in real-world NHS operational contexts.

6 Conclusion

This project explored patterns of RTT delay across NHS acute trusts through trend analysis, backlog characterization, clustering, and typology development. The findings reveal significant trust-level variations and highlight the need for tailored operational strategies rather than generalized policies.

Both clustering results and volume-delay typologies offer structured frameworks to identify areas needing attention, supporting targeted improvements and better backlog management across the NHS.