Health system efficiency analysis

In the context of Health System Performance Assessment (HSPA), efficiency analysis focuses primarily on technical efficiency, that is, how well available inputs are converted into service outputs and health outcomes, while accounting for differences in population need and contextual conditions. Efficiency analysis supports comparison within countries and also help in making international comparisons. Identifying relative performance patterns, highlight positive deviants and flag underperformers.

Key analytical approaches

Descriptive benchmarking

Provides initial overview of efficiency patterns. Scatter plots of service access or coverage (or effective coverage) against key health system inputs (e.g. health workforce per capita).

This approach does not account for contextual differences.

Data Envelopment Analysis (DEA)

Non-parametric method comparing health system units to best performing peers in the dataset. Measures technical efficiency as the ability to maximize outputs given available inputs.

Regression based relative efficiency

Parametric method estimating expected performance while adjusting for contextual factors such as poverty, rurality, geography

Use in health system performance assessment

This code summarizes key health system efficiency analyses that can be used in Health system performance assessment (HSPA).

Analysis

  1. Load necessary packages
library(tidyverse)
library(ggplot2)
library(ggrepel)
library(readxl)
library(Benchmarking)
library(frontier)
library(here)
library(sf)
library(dplyr)
library(stringr)

Descriptive benchmarking

dat <- read_excel(here ("data/data_TZ.xlsx"))

Macro-benchmarking:

  1. Plotting composite coverage index (CCI) against Health workforce density
  • Create simple linear model equation, then plotting

    lin_model <- lm(cci ~ hwf_density, data = dat)
    
    summary(lin_model)
    ## 
    ## Call:
    ## lm(formula = cci ~ hwf_density, data = dat)
    ## 
    ## Residuals:
    ##     Min      1Q  Median      3Q     Max 
    ## -8.7942 -4.0944 -0.3026  3.9512 10.6296 
    ## 
    ## Coefficients:
    ##             Estimate Std. Error t value Pr(>|t|)    
    ## (Intercept)  67.0470     2.7228  24.624   <2e-16 ***
    ## hwf_density   0.6489     0.2988   2.172     0.04 *  
    ## ---
    ## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
    ## 
    ## Residual standard error: 5.384 on 24 degrees of freedom
    ## Multiple R-squared:  0.1642, Adjusted R-squared:  0.1294 
    ## F-statistic: 4.717 on 1 and 24 DF,  p-value: 0.03998
    # Extract coefficients
    b0 <- round(coef(lin_model)[1], 2)
    b1 <- round(coef(lin_model)[2], 2)
    
    # Extract R-squared
    r2 <- round(summary(lin_model)$r.squared, 2)
    
    # Create label
    eqn_text <- paste0(
      "CCI = ", b0, " + ", b1, " × HWF density",
      "; R² = ", r2
    )
##Plotting
ggplot(dat, aes(x = hwf_density, y = cci)) +
  geom_point(size = 3) +
  geom_smooth(method = "lm", se = FALSE, color = "black") +
  geom_text_repel(aes(label = adminlevel_1), size = 3) +
  annotate(
    "text",
    x = Inf,
    y = -Inf,
    label = eqn_text,
    hjust = 1.1, vjust = -0.5,
    size = 4
  ) +
  labs(
    title = "Composite Coverage Index vs Health workforce density",
    x = "Health workforce density",
    y = "Composite Coverage Index"
  ) +
  theme_bw()
## `geom_smooth()` using formula = 'y ~ x'

Interpretation

  • Scatter plot shows a positive association between composite coverage index (CCI) for RMNCAH and health workforce density for key health workers (doctors, nurses & midwives)

  • Regions above the regression line indicate regions achieving higher coverage than expected given their workforce density

  • Regions below the regression line indicate underperformers given their inputs

DEA analysis

Given resources available in each region, how efficiently are they converted into health services and coverage?

Unit of analysis

Region (adminlevel_1)

Model choice (DEA specification)

Orientation: Output oriented

  • How much service coverage could be achieved given the current resources?

Returns to scale (how an increase in all inputs affects output) : Variable returns to scale (VRS) - (allows for increasing, decreasing and constant return to scale)

  • Allowing for heterogeneity across regions: Regions differ in size, population and system maturity

Inputs and outputs used

Inputs (resources)

  • Total Health expenditure per capita (total_health_exp_pc)

  • Health workforce density (hwf_density)

  • Facility density (fac_density)

Outputs

  • Composite coverage index for RMNCAH (cci)

Analysis

## Define input and output matrix
x <- as.matrix(dat[, c("total_health_exp_pc", "hwf_density", "fac_density")])
y <- as.matrix(dat[, c("cci")])

Run DEA

  • Output-Oriented DEA Scores : In an output-oriented Data Envelopment Analysis (DEA) model, the raw efficiency score (output expansion factor) is often formulated to measure the potential for output expansion with the same level of inputs.
  • In this case, a score of 1 indicates efficiency, while a score greater than 1 indicates inefficiency (the unit could produce more output). It is common practice to express this as 1/raw eff score (Technical efficiency) so that the score falls within the standard 0 to 1 range, where 1 is efficient.
## Knowing more about dea(), please run: ??dea()
dea_res <- dea(
  X = x,
  Y = y,
  RTS = "vrs",
  ORIENTATION = "out"
)

#Output oriented DEA produce: Output expansion factor (values >1)
#Converting to efficiency scores (0-1), inverse
dat$dea_eff <- dat$dea_eff <- 1/dea_res$eff
summary(dat$dea_eff)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.7934  0.9004  0.9782  0.9472  1.0000  1.0000

Presenting findings

Convert efficiency scores into percentage for easier interpretation (e.g.):

  • 100% → region is efficient

  • 70% → region could increase outputs by ~30% with current resources

Group DEA efficiency (%)
High efficiency ≥90%
Medium efficiency 75-89%
Low efficiency <75%
dat$dea_eff_pct <- dat$dea_eff *100

dat$dea_eff_cat <- cut(
  dat$dea_eff_pct,
  breaks = c(0, 75, 90, 100.01),
  labels = c("Low efficiency", "Medium efficiency", "High efficiency"),
  right = FALSE
)

Findings can be presented in a map

tz_shp <- st_read (here ("Mapping/Tanzania/gadm41_TZA_1.shp"))
## Reading layer `gadm41_TZA_1' from data source 
##   `D:\Sophy\MSC_EAB\2021_2022\Countdownto2030\UManitoba_consultancy\HSPA_WHOGeneva\Draft Module 5\Analysis\Mapping\Tanzania\gadm41_TZA_1.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 26 features and 11 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 29.32717 ymin: -11.7457 xmax: 40.44514 ymax: -0.9857875
## Geodetic CRS:  WGS 84
tz_shp <- tz_shp %>%
  mutate(
    region_clean = str_replace(NAME_1, "\\s+Region$", "")
  )

setdiff(dat$adminlevel_1, tz_shp$region_clean)
## character(0)
tz_map <- tz_shp %>% 
  left_join(dat, by = c("region_clean" = "adminlevel_1"))

Mapping regions by different groups of efficiency scores

tz_map <- tz_shp %>% 
  left_join(dat, by = c("region_clean" = "adminlevel_1"))

ggplot(tz_map) +
  geom_sf(aes(fill = dea_eff_cat), color = "white", size = 0.2) +
  geom_sf_text(
    aes(label = region_clean),
    size = 1.5,
    fontface = "bold"
  ) +
  scale_fill_manual(
    values = c(
      "Low efficiency" = "#d73027",
      "Medium efficiency" = "#fee08b",
      "High efficiency" = "#1a9850"
    ),
    name = "Technical efficiency"
  ) +
  labs(
    title = "Regional health system efficiency",
    subtitle = "Data Envelopment Analysis (output-oriented)"
  ) +
  theme_minimal()+
  theme(
    axis.title = element_blank()
  )

Regression based relative efficiency analysis

Why Regression based relative efficiency?

  • Allows assessment of technical efficiency while adjusting for contextual factors
  1. Data preparation
dat <- dat %>%
  mutate(
    cci_prop = cci / 100
  )

# Avoid 0 and 1 for logit
eps <- 0.001

dat <- dat %>%
  mutate(
    cci_adj = pmin(pmax(cci_prop, eps), 1 - eps),
    ln_cci = log(cci_adj / (1 - cci_adj))
  )
  1. Log transform inputs
dat <- dat %>%
  mutate(
    ln_hwf_density       = log(hwf_density),
    ln_fac_density       = log(fac_density),
    ln_total_hexp_pc     = log(total_health_exp_pc)
  )
  1. Model run
eff_model <- lm(
  ln_cci ~ ln_hwf_density + ln_total_hexp_pc +
    ln_fac_density +
    tfr + poverty_rate,
  data = dat
)

summary(eff_model)
## 
## Call:
## lm(formula = ln_cci ~ ln_hwf_density + ln_total_hexp_pc + ln_fac_density + 
##     tfr + poverty_rate, data = dat)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.46432 -0.19805 -0.02152  0.18238  0.55715 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)
## (Intercept)       0.0220254  2.7682606   0.008    0.994
## ln_hwf_density    0.1825800  0.2455696   0.743    0.466
## ln_total_hexp_pc  0.0700270  0.2131328   0.329    0.746
## ln_fac_density    0.0711342  0.1238934   0.574    0.572
## tfr              -0.0430787  0.1170782  -0.368    0.717
## poverty_rate      0.0003733  0.0085083   0.044    0.965
## 
## Residual standard error: 0.2949 on 20 degrees of freedom
## Multiple R-squared:  0.2057, Adjusted R-squared:  0.007171 
## F-statistic: 1.036 on 5 and 20 DF,  p-value: 0.4237
  1. Extract residuals

    Residual Plots: One can visualize the difference between the observed and expected performance. Typically displays:

    • Residuals (y-axis)

    • Predicted level of performance (x-axis)

Interpretation:

  • Positive residuals (above zero) - higher-than-expected performers

  • Negative residuals (below zero) - lower-than-expected performers

dat <- dat %>%
  mutate(
    expected_coverage = predict(eff_model),
    efficiency_resid  = residuals(eff_model)
  )

plot(eff_model$fitted.values, residuals(eff_model))
abline(h = 0, col = "red")

  1. Normalized efficiency scores

To support clearer comparison and communication, residuals can be normalized to create relative efficiency scores.

Normalisation rescales the residuals so that performance is expressed on a common scale, typically ranging from 0 to 1, where higher values indicate relatively better performance within the comparison group. This transformation does not change the relative ordering of units; rather, it improves interpretability and presentation.

dat <- dat %>%
  mutate(
    efficiency_index =
      (efficiency_resid - min(efficiency_resid)) /
      (max(efficiency_resid) - min(efficiency_resid))
  )

summary(dat$efficiency_index)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.2607  0.4335  0.4546  0.6331  1.0000
  1. Categorization of relative efficiency scores

Normalised efficiency scores provide a continuous measure of relative performance. For analytical interpretation and policy communication, it is often useful to group scores into categories rather than present precise numerical values. Categorization supports clearer messaging, facilitates comparison across units, and avoids over-interpretation of small numerical differences.

Thresholds for categorisation can be defined using distribution-based cut-offs, such as tertiles or quartiles, or using analytically defined ranges (for example, ≥0.67, 0.34–0.66, ≤0.33). The choice of thresholds should remain consistent across analyses and be clearly documented.

dat <- dat %>%
  mutate(
    efficiency_cat = case_when(
      efficiency_index < 0.33 ~ "Low efficiency",
      efficiency_index < 0.66 ~ "Medium efficiency",
      TRUE ~ "High efficiency"
    )
  )
  1. Findings can be presented in a map
tz_map <- tz_shp %>% 
  left_join(dat, by = c("region_clean" = "adminlevel_1"))


ggplot(tz_map) +
  geom_sf(aes(fill = efficiency_cat), color = "white", size = 0.2) +
  geom_sf_text(
    aes(label = region_clean),
    size = 1.5,
    fontface = "bold"
  ) +
  scale_fill_manual(
    values = c(
      "Low efficiency" = "#d73027",
      "Medium efficiency" = "#fee08b",
      "High efficiency" = "#1a9850"
    ),
    name = "Technical efficiency"
  ) +
  labs(
    title = "Regional health system efficiency",
    subtitle = "Regression-based Relative efficiency"
  ) +
  theme_minimal()+
  theme(
    axis.title = element_blank()
  )
## Warning in st_point_on_surface.sfc(sf::st_zm(x)): st_point_on_surface may not
## give correct results for longitude/latitude data