Health system efficiency analysis

Health system efficiency analysis assesses how well health systems convert resources (financing, workforce, infrastructure) into health system outputs (health services) and outcomes.

Key analytical methods

Descriptive benchmarking

Data Envelopment Analysis (DEA)

Regression based relative efficiency

Extension to panel data

Use in health system performance assessment

This code summarizes key health system efficiency analyses that can be used in Health system performance assessment (HSPA).

library(tidyverse)
library(ggplot2)
library(ggrepel)
library(readxl)
library(Benchmarking)
library(frontier)
library(here)
library(sf)
library(dplyr)
library(stringr)

Descriptive benchmarking

dat <- read_excel(here ("data/data_TZ.xlsx"))

Macro-benchmarking:

  1. CCI vs Health workforce density
##Plotting
ggplot(dat, aes(x = hwf_density, y = cci)) +
  geom_point(size = 3) +
  geom_smooth(method = "lm", se = FALSE, color = "black") +
  geom_text_repel(aes(label = adminlevel_1), size = 3) +
  annotate(
    "text",
    x = Inf,
    y = -Inf,
    label = eqn_text,
    hjust = 1.1, vjust = -0.5,
    size = 4
  ) +
  labs(
    title = "Composite Coverage Index vs Health workforce density",
    x = "Health workforce density",
    y = "Composite Coverage Index"
  ) +
  theme_bw()
## `geom_smooth()` using formula = 'y ~ x'

Interpretation

DEA analysis

Given resources available in each region, how efficiently are they converted into health services and coverage?

Unit of analysis

Region (adminlevel_1)

Model choice (DEA specification)

Orientation: Output oriented

  • How much service coverage could be achieved given the current resources?

Returns to scale (how an increase in all inputs affects output) : Variable returns to scale (VRS) - (allows for increasing, decreasing and constant return to scale)

  • Allowing for heterogeneity across regions: Regions differ in size, population and system maturity

Inputs and outputs used

Inputs (resources)

  • Total Health expenditure per capita (total_health_exp_pc)

  • Health workforce density (hwf_density)

  • Facility density (fac_density)

Outputs

  • Composite coverage index for RMNCAH (cci)

Analysis

## Define input and output matrix
x <- as.matrix(dat[, c("total_health_exp_pc", "hwf_density", "fac_density")])
y <- as.matrix(dat[, c("cci")])

Run DEA

  • Output-Oriented DEA Scores : In an output-oriented Data Envelopment Analysis (DEA) model, the raw efficiency score (output expansion factor) is often formulated to measure the potential for output expansion with the same level of inputs.
  • In this case, a score of 1 indicates efficiency, while a score greater than 1 indicates inefficiency (the unit could produce more output). It is common practice to express this as 1/raw eff score (Technical efficiency) so that the score falls within the standard 0 to 1 range, where 1 is efficient.
## Knowing more about dea(), please run: ??dea()
dea_res <- dea(
  X = x,
  Y = y,
  RTS = "vrs",
  ORIENTATION = "out"
)

#Output oriented DEA produce: Output expansion factor (values >1)
#Converting to efficiency scores (0-1), inverse
dat$dea_eff <- dat$dea_eff <- 1/dea_res$eff
summary(dat$dea_eff)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.7934  0.9004  0.9782  0.9472  1.0000  1.0000

Presenting findings

Convert efficiency scores into percentage for easier interpretation (e.g.):

  • 100% → region is efficient

  • 70% → region could increase outputs by ~30% with current resources

Group DEA efficiency (%)
High efficiency ≥90%
Medium efficiency 75-89%
Low efficiency <75%
dat$dea_eff_pct <- dat$dea_eff *100

dat$dea_eff_cat <- cut(
  dat$dea_eff_pct,
  breaks = c(0, 75, 90, 100.01),
  labels = c("Low efficiency", "Medium efficiency", "High efficiency"),
  right = FALSE
)

Maps

tz_shp <- st_read (here ("Mapping/Tanzania/gadm41_TZA_1.shp"))
## Reading layer `gadm41_TZA_1' from data source 
##   `D:\Sophy\MSC_EAB\2021_2022\Countdownto2030\UManitoba_consultancy\HSPA_WHOGeneva\Draft Module 5\Analysis\Mapping\Tanzania\gadm41_TZA_1.shp' 
##   using driver `ESRI Shapefile'
## Simple feature collection with 26 features and 11 fields
## Geometry type: MULTIPOLYGON
## Dimension:     XY
## Bounding box:  xmin: 29.32717 ymin: -11.7457 xmax: 40.44514 ymax: -0.9857875
## Geodetic CRS:  WGS 84
tz_shp <- tz_shp %>%
  mutate(
    region_clean = str_replace(NAME_1, "\\s+Region$", "")
  )

setdiff(dat$adminlevel_1, tz_shp$region_clean)
## character(0)
tz_map <- tz_shp %>% 
  left_join(dat, by = c("region_clean" = "adminlevel_1"))

Mapping regions by different groups of efficiency scores

tz_map <- tz_shp %>% 
  left_join(dat, by = c("region_clean" = "adminlevel_1"))

ggplot(tz_map) +
  geom_sf(aes(fill = dea_eff_cat), color = "white", size = 0.2) +
  geom_sf_text(
    aes(label = region_clean),
    size = 1.5,
    fontface = "bold"
  ) +
  scale_fill_manual(
    values = c(
      "Low efficiency" = "#d73027",
      "Medium efficiency" = "#fee08b",
      "High efficiency" = "#1a9850"
    ),
    name = "Technical efficiency"
  ) +
  labs(
    title = "Regional health system efficiency",
    subtitle = "Data Envelopment Analysis (output-oriented)"
  ) +
  theme_minimal()+
  theme(
    axis.title = element_blank()
  )

Regression based relative efficiency analysis

Why Regression based relative efficiency?

Data prep

dat <- dat %>%
  mutate(
    cci_prop = cci / 100
  )

# Avoid 0 and 1 for logit
eps <- 0.001

dat <- dat %>%
  mutate(
    cci_adj = pmin(pmax(cci_prop, eps), 1 - eps),
    ln_cci = log(cci_adj / (1 - cci_adj))
  )

Log transform inputs

dat <- dat %>%
  mutate(
    ln_hwf_density       = log(hwf_density),
    ln_fac_density       = log(fac_density),
    ln_total_hexp_pc     = log(total_health_exp_pc)
  )

Model run

eff_model <- lm(
  ln_cci ~ ln_hwf_density + ln_total_hexp_pc +
    ln_fac_density +
    tfr + poverty_rate,
  data = dat
)

summary(eff_model)
## 
## Call:
## lm(formula = ln_cci ~ ln_hwf_density + ln_total_hexp_pc + ln_fac_density + 
##     tfr + poverty_rate, data = dat)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -0.46432 -0.19805 -0.02152  0.18238  0.55715 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)
## (Intercept)       0.0220254  2.7682606   0.008    0.994
## ln_hwf_density    0.1825800  0.2455696   0.743    0.466
## ln_total_hexp_pc  0.0700270  0.2131328   0.329    0.746
## ln_fac_density    0.0711342  0.1238934   0.574    0.572
## tfr              -0.0430787  0.1170782  -0.368    0.717
## poverty_rate      0.0003733  0.0085083   0.044    0.965
## 
## Residual standard error: 0.2949 on 20 degrees of freedom
## Multiple R-squared:  0.2057, Adjusted R-squared:  0.007171 
## F-statistic: 1.036 on 5 and 20 DF,  p-value: 0.4237

Extract residuals

dat <- dat %>%
  mutate(
    expected_coverage = predict(eff_model),
    efficiency_resid  = residuals(eff_model)
  )

plot(eff_model$fitted.values, residuals(eff_model))
abline(h = 0, col = "red")

Normalize

dat <- dat %>%
  mutate(
    efficiency_index =
      (efficiency_resid - min(efficiency_resid)) /
      (max(efficiency_resid) - min(efficiency_resid))
  )

summary(dat$efficiency_index)
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.0000  0.2607  0.4335  0.4546  0.6331  1.0000

Categorize

dat <- dat %>%
  mutate(
    efficiency_cat = case_when(
      efficiency_index < 0.33 ~ "Low efficiency",
      efficiency_index < 0.66 ~ "Medium efficiency",
      TRUE ~ "High efficiency"
    )
  )

Plot

tz_map <- tz_shp %>% 
  left_join(dat, by = c("region_clean" = "adminlevel_1"))


ggplot(tz_map) +
  geom_sf(aes(fill = efficiency_cat), color = "white", size = 0.2) +
  geom_sf_text(
    aes(label = region_clean),
    size = 1.5,
    fontface = "bold"
  ) +
  scale_fill_manual(
    values = c(
      "Low efficiency" = "#d73027",
      "Medium efficiency" = "#fee08b",
      "High efficiency" = "#1a9850"
    ),
    name = "Technical efficiency"
  ) +
  labs(
    title = "Regional health system efficiency",
    subtitle = "Regression-based Relative efficiency"
  ) +
  theme_minimal()+
  theme(
    axis.title = element_blank()
  )
## Warning in st_point_on_surface.sfc(sf::st_zm(x)): st_point_on_surface may not
## give correct results for longitude/latitude data

SFA

sfa_model <- sfa(
  ln_cci ~ ln_total_hexp_pc + ln_hwf_density  |
    tfr + poverty_rate,
  data = dat
)
## Warning in sfa(ln_cci ~ ln_total_hexp_pc + ln_hwf_density | tfr + poverty_rate,
## : the residuals of the OLS estimates are right-skewed; this might indicate that
## there is no inefficiency or that the model is misspecified
## Warning in sfa(ln_cci ~ ln_total_hexp_pc + ln_hwf_density | tfr + poverty_rate,
## : the parameter 'gamma' is close to the boundary of the parameter space [0,1]:
## this can cause convergence problems and can negatively affect the validity and
## reliability of statistical tests and might be caused by model misspecification
## Warning in sfa(ln_cci ~ ln_total_hexp_pc + ln_hwf_density | tfr + poverty_rate,
## : the covariance matrix of the maximum likelihood estimates is not positive
## semidefinite
summary(sfa_model)
## Efficiency Effects Frontier (see Battese & Coelli 1995)
## Inefficiency decreases the endogenous variable (as in a production function)
## The dependent variable is logged
## Iterative ML estimation terminated after 36 iterations:
## log likelihood values and parameters of two successive iterations
## are within the tolerance limit
## 
## final maximum likelihood estimates
##                     Estimate  Std. Error    z value  Pr(>|z|)    
## (Intercept)      -1.2635e+00  6.0302e-01    -2.0952 0.0361509 *  
## ln_total_hexp_pc  2.3064e-01  6.4192e-02     3.5930 0.0003269 ***
## ln_hwf_density    1.3964e-01  2.1357e-01     0.6538 0.5132176    
## Z_(Intercept)     4.3874e-01  4.9010e-01     0.8952 0.3706827    
## Z_tfr             2.5514e-02  9.8094e-02     0.2601 0.7947849    
## Z_poverty_rate   -3.5588e-03  6.6930e-03    -0.5317 0.5949214    
## sigmaSq           6.7266e-02  1.8770e-02     3.5837 0.0003387 ***
## gamma             1.0000e+00  2.2792e-05 43874.3998 < 2.2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## log likelihood value: -1.406148 
## 
## cross-sectional data
## total number of observations = 26 
## 
## mean efficiency: 0.6392684