Health system efficiency analysis assesses how well health systems convert resources (financing, workforce, infrastructure) into health system outputs (health services) and outcomes.
Key analytical methods
Descriptive benchmarking
Scatter plots of service access or coverage (or effective coverage) against key inputs (e.g. health workforce per capita) - Help identify positive deviants and underperformers.
A simple regression line shows the average relationship. A linear regression line shows the average association between inputs and outcomes
Regions above the regression line have higher output for a given input and can be considered positive outliers (relatively more efficient)
Regions below the regression line are underperformers (relatively less efficient)
Data Envelopment Analysis (DEA)
Regression based relative efficiency
Extension to panel data
Use in health system performance assessment
This code summarizes key health system efficiency analyses that can be used in Health system performance assessment (HSPA).
library(tidyverse)
library(ggplot2)
library(ggrepel)
library(readxl)
library(Benchmarking)
library(frontier)
library(here)
library(sf)
library(dplyr)
library(stringr)
dat <- read_excel(here ("data/data_TZ.xlsx"))
Macro-benchmarking:
Create simple linear model equation, then plot
lin_model <- lm(cci ~ hwf_density, data = dat)
summary(lin_model)
##
## Call:
## lm(formula = cci ~ hwf_density, data = dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -8.7942 -4.0944 -0.3026 3.9512 10.6296
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 67.0470 2.7228 24.624 <2e-16 ***
## hwf_density 0.6489 0.2988 2.172 0.04 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 5.384 on 24 degrees of freedom
## Multiple R-squared: 0.1642, Adjusted R-squared: 0.1294
## F-statistic: 4.717 on 1 and 24 DF, p-value: 0.03998
# Extract coefficients
b0 <- round(coef(lin_model)[1], 2)
b1 <- round(coef(lin_model)[2], 2)
# Extract R-squared
r2 <- round(summary(lin_model)$r.squared, 2)
# Create label
eqn_text <- paste0(
"CCI = ", b0, " + ", b1, " × HWF density",
"; R² = ", r2
)##Plotting
ggplot(dat, aes(x = hwf_density, y = cci)) +
geom_point(size = 3) +
geom_smooth(method = "lm", se = FALSE, color = "black") +
geom_text_repel(aes(label = adminlevel_1), size = 3) +
annotate(
"text",
x = Inf,
y = -Inf,
label = eqn_text,
hjust = 1.1, vjust = -0.5,
size = 4
) +
labs(
title = "Composite Coverage Index vs Health workforce density",
x = "Health workforce density",
y = "Composite Coverage Index"
) +
theme_bw()
## `geom_smooth()` using formula = 'y ~ x'
Interpretation
Scatter plot shows a positive association between composite coverage index (CCI) for RMNCAH and health workforce density for key health workers (doctors, nurses & midwives)
Regions above the regression line indicate regions achieving higher coverage than expected given their workforce density
Regions below the regression line indicate underperformers given their inputs
Given resources available in each region, how efficiently are they converted into health services and coverage?
Region (adminlevel_1)
Orientation: Output oriented
Returns to scale (how an increase in all inputs affects output) : Variable returns to scale (VRS) - (allows for increasing, decreasing and constant return to scale)
Inputs (resources)
Total Health expenditure per capita
(total_health_exp_pc)
Health workforce density (hwf_density)
Facility density (fac_density)
Outputs
cci)## Define input and output matrix
x <- as.matrix(dat[, c("total_health_exp_pc", "hwf_density", "fac_density")])
y <- as.matrix(dat[, c("cci")])
## Knowing more about dea(), please run: ??dea()
dea_res <- dea(
X = x,
Y = y,
RTS = "vrs",
ORIENTATION = "out"
)
#Output oriented DEA produce: Output expansion factor (values >1)
#Converting to efficiency scores (0-1), inverse
dat$dea_eff <- dat$dea_eff <- 1/dea_res$eff
summary(dat$dea_eff)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.7934 0.9004 0.9782 0.9472 1.0000 1.0000
Convert efficiency scores into percentage for easier interpretation (e.g.):
100% → region is efficient
70% → region could increase outputs by ~30% with current resources
| Group | DEA efficiency (%) |
|---|---|
| High efficiency | ≥90% |
| Medium efficiency | 75-89% |
| Low efficiency | <75% |
dat$dea_eff_pct <- dat$dea_eff *100
dat$dea_eff_cat <- cut(
dat$dea_eff_pct,
breaks = c(0, 75, 90, 100.01),
labels = c("Low efficiency", "Medium efficiency", "High efficiency"),
right = FALSE
)
Maps
tz_shp <- st_read (here ("Mapping/Tanzania/gadm41_TZA_1.shp"))
## Reading layer `gadm41_TZA_1' from data source
## `D:\Sophy\MSC_EAB\2021_2022\Countdownto2030\UManitoba_consultancy\HSPA_WHOGeneva\Draft Module 5\Analysis\Mapping\Tanzania\gadm41_TZA_1.shp'
## using driver `ESRI Shapefile'
## Simple feature collection with 26 features and 11 fields
## Geometry type: MULTIPOLYGON
## Dimension: XY
## Bounding box: xmin: 29.32717 ymin: -11.7457 xmax: 40.44514 ymax: -0.9857875
## Geodetic CRS: WGS 84
tz_shp <- tz_shp %>%
mutate(
region_clean = str_replace(NAME_1, "\\s+Region$", "")
)
setdiff(dat$adminlevel_1, tz_shp$region_clean)
## character(0)
tz_map <- tz_shp %>%
left_join(dat, by = c("region_clean" = "adminlevel_1"))
Mapping regions by different groups of efficiency scores
tz_map <- tz_shp %>%
left_join(dat, by = c("region_clean" = "adminlevel_1"))
ggplot(tz_map) +
geom_sf(aes(fill = dea_eff_cat), color = "white", size = 0.2) +
geom_sf_text(
aes(label = region_clean),
size = 1.5,
fontface = "bold"
) +
scale_fill_manual(
values = c(
"Low efficiency" = "#d73027",
"Medium efficiency" = "#fee08b",
"High efficiency" = "#1a9850"
),
name = "Technical efficiency"
) +
labs(
title = "Regional health system efficiency",
subtitle = "Data Envelopment Analysis (output-oriented)"
) +
theme_minimal()+
theme(
axis.title = element_blank()
)
Data prep
dat <- dat %>%
mutate(
cci_prop = cci / 100
)
# Avoid 0 and 1 for logit
eps <- 0.001
dat <- dat %>%
mutate(
cci_adj = pmin(pmax(cci_prop, eps), 1 - eps),
ln_cci = log(cci_adj / (1 - cci_adj))
)
Log transform inputs
dat <- dat %>%
mutate(
ln_hwf_density = log(hwf_density),
ln_fac_density = log(fac_density),
ln_total_hexp_pc = log(total_health_exp_pc)
)
Model run
eff_model <- lm(
ln_cci ~ ln_hwf_density + ln_total_hexp_pc +
ln_fac_density +
tfr + poverty_rate,
data = dat
)
summary(eff_model)
##
## Call:
## lm(formula = ln_cci ~ ln_hwf_density + ln_total_hexp_pc + ln_fac_density +
## tfr + poverty_rate, data = dat)
##
## Residuals:
## Min 1Q Median 3Q Max
## -0.46432 -0.19805 -0.02152 0.18238 0.55715
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.0220254 2.7682606 0.008 0.994
## ln_hwf_density 0.1825800 0.2455696 0.743 0.466
## ln_total_hexp_pc 0.0700270 0.2131328 0.329 0.746
## ln_fac_density 0.0711342 0.1238934 0.574 0.572
## tfr -0.0430787 0.1170782 -0.368 0.717
## poverty_rate 0.0003733 0.0085083 0.044 0.965
##
## Residual standard error: 0.2949 on 20 degrees of freedom
## Multiple R-squared: 0.2057, Adjusted R-squared: 0.007171
## F-statistic: 1.036 on 5 and 20 DF, p-value: 0.4237
Extract residuals
dat <- dat %>%
mutate(
expected_coverage = predict(eff_model),
efficiency_resid = residuals(eff_model)
)
plot(eff_model$fitted.values, residuals(eff_model))
abline(h = 0, col = "red")
Normalize
dat <- dat %>%
mutate(
efficiency_index =
(efficiency_resid - min(efficiency_resid)) /
(max(efficiency_resid) - min(efficiency_resid))
)
summary(dat$efficiency_index)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 0.0000 0.2607 0.4335 0.4546 0.6331 1.0000
Categorize
dat <- dat %>%
mutate(
efficiency_cat = case_when(
efficiency_index < 0.33 ~ "Low efficiency",
efficiency_index < 0.66 ~ "Medium efficiency",
TRUE ~ "High efficiency"
)
)
Plot
tz_map <- tz_shp %>%
left_join(dat, by = c("region_clean" = "adminlevel_1"))
ggplot(tz_map) +
geom_sf(aes(fill = efficiency_cat), color = "white", size = 0.2) +
geom_sf_text(
aes(label = region_clean),
size = 1.5,
fontface = "bold"
) +
scale_fill_manual(
values = c(
"Low efficiency" = "#d73027",
"Medium efficiency" = "#fee08b",
"High efficiency" = "#1a9850"
),
name = "Technical efficiency"
) +
labs(
title = "Regional health system efficiency",
subtitle = "Regression-based Relative efficiency"
) +
theme_minimal()+
theme(
axis.title = element_blank()
)
## Warning in st_point_on_surface.sfc(sf::st_zm(x)): st_point_on_surface may not
## give correct results for longitude/latitude data
sfa_model <- sfa(
ln_cci ~ ln_total_hexp_pc + ln_hwf_density |
tfr + poverty_rate,
data = dat
)
## Warning in sfa(ln_cci ~ ln_total_hexp_pc + ln_hwf_density | tfr + poverty_rate,
## : the residuals of the OLS estimates are right-skewed; this might indicate that
## there is no inefficiency or that the model is misspecified
## Warning in sfa(ln_cci ~ ln_total_hexp_pc + ln_hwf_density | tfr + poverty_rate,
## : the parameter 'gamma' is close to the boundary of the parameter space [0,1]:
## this can cause convergence problems and can negatively affect the validity and
## reliability of statistical tests and might be caused by model misspecification
## Warning in sfa(ln_cci ~ ln_total_hexp_pc + ln_hwf_density | tfr + poverty_rate,
## : the covariance matrix of the maximum likelihood estimates is not positive
## semidefinite
summary(sfa_model)
## Efficiency Effects Frontier (see Battese & Coelli 1995)
## Inefficiency decreases the endogenous variable (as in a production function)
## The dependent variable is logged
## Iterative ML estimation terminated after 36 iterations:
## log likelihood values and parameters of two successive iterations
## are within the tolerance limit
##
## final maximum likelihood estimates
## Estimate Std. Error z value Pr(>|z|)
## (Intercept) -1.2635e+00 6.0302e-01 -2.0952 0.0361509 *
## ln_total_hexp_pc 2.3064e-01 6.4192e-02 3.5930 0.0003269 ***
## ln_hwf_density 1.3964e-01 2.1357e-01 0.6538 0.5132176
## Z_(Intercept) 4.3874e-01 4.9010e-01 0.8952 0.3706827
## Z_tfr 2.5514e-02 9.8094e-02 0.2601 0.7947849
## Z_poverty_rate -3.5588e-03 6.6930e-03 -0.5317 0.5949214
## sigmaSq 6.7266e-02 1.8770e-02 3.5837 0.0003387 ***
## gamma 1.0000e+00 2.2792e-05 43874.3998 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## log likelihood value: -1.406148
##
## cross-sectional data
## total number of observations = 26
##
## mean efficiency: 0.6392684