Report on Incentivisation Survey

Findings and Analytical Notes

Authors

Affiliation

Ris Heskiel Najogi Sitinjak

SPHERES Project, OUCRU Indonesia

Dilla Rosa

Ainun Permata Shanie

Risyad Abiyyu Siregar

Published

January 21, 2026

1 Overview

Perceptions of salary and remuneration among health workers are shaped not only by individual characteristics, but also by shared norms regarding fairness, merit, and appropriate criteria for differentiation. Understanding these perceptions therefore requires moving beyond simple descriptive comparisons and towards an examination of the belief systems that underpin judgements about pay. In this analysis, beliefs about salary are conceptualised as comprising both distributive judgements (who should earn more) and justificatory principles (which factors should matter in determining remuneration), as well as the relative priority and strength assigned to these considerations.

1.1 Histogram of work duration

Figure 5.1. below presents the distribution of respondents’ work duration, measured in years and stratified by location. In both Lombok Barat and Purbalingga, work duration exhibits a right-skewed distribution. The mean and median in Lombok Barat (mean = 13.53; median = 13) are slightly higher than those in Purbalingga (mean = 12.54; median = 11). Overall, the distributions indicate substantial heterogeneity in length of service, which may shape perceptions of remuneration and fairness.

Show / hide code

# Create summary table

workdur_stats <- raw_combined %>%
  filter(!is.na(work_dur_yr)) %>%
  group_by(loc_label) %>%
  summarise(
    n = n(),
    mean = mean(work_dur_yr),
    median = median(work_dur_yr),
    q1 = quantile(work_dur_yr, 0.25),
    q3 = quantile(work_dur_yr, 0.75),
    min = min(work_dur_yr),
    max = max(work_dur_yr),
    .groups = "drop"
  )

# Construct histogram

ggplot(raw_combined, aes(x = work_dur_yr)) +
  geom_histogram(
    binwidth = 2,
    boundary = 0,
    colour = "white",
    fill = "steelblue"
  ) +
  facet_wrap(~ loc_label) +
  geom_vline(
    data = workdur_stats,
    aes(xintercept = mean),
    linetype = "dashed"
  ) +
  geom_vline(
    data = workdur_stats,
    aes(xintercept = median),
    linetype = "solid"
  ) +
  
# Create summary box per location
  
  geom_label(
    data = workdur_stats,
    aes(
      x = Inf,
      y = Inf,
      label = paste0(
        "N = ", n,
        "\nMean = ", round(mean, 2),
        "\nMedian = ", round(median, 1),
        "\nIQR = ", round(q1, 1), "–", round(q3, 1),
        "\nMin–Max = ", round(min, 1), "–", round(max, 1)
      )
    ),
    hjust = 1.05,
    vjust = 1.1,
    size = 3.2,
    label.size = 0.2,
    fill = "white"
  ) +
  labs(
    title = "Distribution of work duration (years) by location",
    subtitle = "Solid line = median, Dashed line = mean",
    x = "Work duration (years)",
    y = "Number of respondents"
  ) +
  theme_minimal()

1.2 Descriptive table

Table 5.2 below summarises the characteristics of respondents by location. Respondents in both Lombok Barat and Purbalingga were distributed across a wide range of work durations, with the largest proportion in each site having 10–19 years of service (39.8% in Lombok Barat and 30.3% in Purbalingga). Shorter tenure (0–4 years) was more common in Purbalingga (24.7%) than in Lombok Barat (16.4%), while the proportion with at least 20 years of service was comparable across locations (23.9% and 24.7%, respectively). No statistically significant difference in work duration was observed between locations (p = 0.186).

The majority of respondents in both settings were patient-facing health workers, accounting for 78.1% of respondents in Lombok Barat and 82.8% in Purbalingga, with no significant difference in worker type by location (p = 0.203). Respondents were drawn from a diverse set of Puskesmas and SPHERES Working Group within each district, reflecting broad facility coverage in both study sites.

Show / hide code

#--------------------------------------------------
# 1. Prepare work duration groups
#--------------------------------------------------

raw_combined <- raw_combined %>%
  mutate(
    work_dur_grp = cut(
      work_dur_yr,
      breaks = c(0, 5, 10, 20, 65),
      right = FALSE,
      labels = c("0–4 years", "5–9 years", "10–19 years", "≥20 years")
    )
  )

#--------------------------------------------------
# 2. Compute p-values
#--------------------------------------------------

p_workdur <- wilcox.test(work_dur_yr ~ loc_label, data = raw_combined)$p.value
p_worker  <- chisq.test(table(raw_combined$worker_type, raw_combined$loc_label))$p.value

#--------------------------------------------------
# 3. Stratified variable builder
#--------------------------------------------------

make_strat_table <- function(data, var, section_label) {
  data %>%
    filter(!is.na(.data[[var]])) %>%
    count(loc_label, !!sym(var)) %>%
    group_by(loc_label) %>%
    mutate(pct = round(100 * n / sum(n), 1)) %>%
    ungroup() %>%
    mutate(value = paste0(n, " (", pct, "%)")) %>%
    select(
      Characteristic = !!sym(var),
      loc_label,
      value
    ) %>%
    pivot_wider(
      names_from = loc_label,
      values_from = value,
      values_fill = "–"
    ) %>%
    mutate(Section = section_label)
}

#--------------------------------------------------
# 4. Build each section
#--------------------------------------------------

tbl_workdur <- make_strat_table(
  raw_combined,
  "work_dur_grp",
  "Work duration (years)"
)

tbl_worker <- make_strat_table(
  raw_combined,
  "worker_type",
  "Worker type"
)

tbl_puskesmas <- raw_combined %>%
  filter(!is.na(puskesmas)) %>%
  count(loc_label, puskesmas) %>%
  group_by(loc_label) %>%
  mutate(pct = round(100 * n / sum(n), 1)) %>%
  ungroup() %>%
  mutate(value = paste0(n, " (", pct, "%)")) %>%
  select(
    Characteristic = puskesmas,
    loc_label,
    value
  ) %>%
  pivot_wider(
    names_from = loc_label,
    values_from = value,
    values_fill = "–"
  ) %>%
  mutate(Section = "Puskesmas")

#--------------------------------------------------
# 5. Add p-values ONLY to section headers
#--------------------------------------------------

tbl_workdur$p_value <- ""
tbl_workdur$p_value[1] <- sprintf("%.3f", p_workdur)

tbl_worker$p_value <- ""
tbl_worker$p_value[1] <- sprintf("%.3f", p_worker)

tbl_puskesmas$p_value <- ""

#--------------------------------------------------
# 6. Combine final table
#--------------------------------------------------

table1_final <- bind_rows(
  tbl_workdur,
  tbl_worker,
  tbl_puskesmas
) %>%
  select(
    Section,
    Characteristic,
    `Lombok Barat` = `Lombok Barat`,
    `Purbalingga`  = `Purbalingga`,
    p_value
  )

table1_final <- table1_final %>%
  mutate(
    Section = as.character(Section)
  )

table1_final <- table1_final %>%
  mutate(
    Section = gsub("\n", " ", Section)
  )

#--------------------------------------------------
# 7. Render HTML table
#--------------------------------------------------

table1_final %>%
  kable(
    format = "html",
    col.names = c(
      "Section",
      "Characteristic",
      "Lombok Barat<br>n (%)",
      "Purbalingga<br>n (%)",
      "p-value"
    ),
    caption = "<strong>Characteristics of respondents by location</strong>",
    escape = FALSE   # IMPORTANT: allows <br> to render
  ) %>%
  kable_styling(
    bootstrap_options = c("striped", "condensed"),
    full_width = FALSE
  ) %>%
  collapse_rows(
    columns = 1,
    valign = "top"
  )

**Characteristics of respondents by location**
Section	Characteristic	Lombok Barat n (%)	Purbalingga n (%)	p-value
Work duration (years)	0–4 years	33 (16.4%)	92 (24.7%)	0.186
	5–9 years	40 (19.9%)	76 (20.4%)
	10–19 years	80 (39.8%)	113 (30.3%)
	≥20 years	48 (23.9%)	92 (24.7%)
Worker type	Non patient-facing	44 (21.9%)	64 (17.2%)	0.203
Worker type	Patient-facing	157 (78.1%)	309 (82.8%)
Puskesmas	Puskesmas Banyumulek	1 (0.5%)	–
	Puskesmas Dasan Tapen	5 (2.5%)	–
	Puskesmas Eyat Mayang	7 (3.5%)	–
	Puskesmas Gerung	20 (10%)	–
	Puskesmas Gunungsari	17 (8.5%)	–
	Puskesmas Jembatan Kembar	31 (15.4%)	–
	Puskesmas Kediri	58 (28.9%)	–
	Puskesmas Labuapi	11 (5.5%)	–
	Puskesmas Lingsar	18 (9%)	–
	Puskesmas Narmada	15 (7.5%)	–
	Puskesmas Pelangan	1 (0.5%)	–
	Puskesmas Penimbung	13 (6.5%)	–
	Puskesmas Sedau	1 (0.5%)	–
	Puskesmas Sesela	1 (0.5%)	–
	Puskesmas Suranadi	1 (0.5%)	–
	Working Group	1 (0.5%)	9 (2.4%)
	Puskesmas Bobotsari	–	1 (0.3%)
	Puskesmas Bojong	–	32 (8.6%)
	Puskesmas Bojongsari	–	55 (14.7%)
	Puskesmas Kaligondang	–	48 (12.9%)
	Puskesmas Kalikajar	–	13 (3.5%)
	Puskesmas Kalimanah	–	9 (2.4%)
	Puskesmas Karanganyar	–	12 (3.2%)
	Puskesmas Karangjambu	–	18 (4.8%)
	Puskesmas Karangtengah	–	8 (2.1%)
	Puskesmas Kejobong	–	58 (15.5%)
	Puskesmas Kutasari	–	8 (2.1%)
	Puskesmas Kutawis	–	6 (1.6%)
	Puskesmas Mrebet	–	13 (3.5%)
	Puskesmas Padamara	–	10 (2.7%)
	Puskesmas Pengadegan	–	24 (6.4%)
	Puskesmas Purbalingga	–	4 (1.1%)
	Puskesmas Rembang	–	24 (6.4%)
	Puskesmas Serayu Larangan	–	21 (5.6%)

1.3 Q3: Importance of Factors in Determining Incentives

Q3: In your opinion, how important are the following factors in determining incentives for Puskesmas staff?

Figure 5.4 summarises respondents’ assessments of the importance of various factors in determining incentives for Puskesmas staff. Across both Lombok Barat and Purbalingga, performance, service quality, skills, and attendance are most frequently rated as “important” or “very important,” indicating strong support for incentive schemes that reward observable work behaviour and service delivery. In contrast, structural characteristics such as position, work location, and additional tasks receive comparatively lower importance ratings. Education and experience occupy an intermediate position in both locations. Overall, the patterns are visually similar across sites.

Show / hide code

# Likert levels and colours (Q3)

likert_levels_q3 <- c(
  "Unimportant",
  "Less important",
  "Important",
  "Very important"
)

likert_colors_q3 <- c(
  "Unimportant"     = "#deebf7",
  "Less important"  = "#9ecae1",
  "Important"       = "#6baed6",
  "Very important"  = "#2171b5"
)

# Prepare data for plotting

plot_q3 <- raw_combined %>%
  select(loc_label, all_of(q3_vars)) %>%
  pivot_longer(
    cols = all_of(q3_vars),
    names_to = "item",
    values_to = "response"
  ) %>%
  filter(!is.na(response)) %>%
  mutate(
    response = factor(response, levels = likert_levels_q3),
    item = factor(item, levels = q3_vars)
  ) %>%
  group_by(loc_label, item, response) %>%
  summarise(n = n(), .groups = "drop_last") %>%
  mutate(pct = n / sum(n)) %>%
  ungroup()

# Order items by overall importance (% rated "Very imporant")

item_order <- plot_q3 %>%
  filter(response == "Very important") %>%
  group_by(item) %>%
  summarise(
    pct_very_important = sum(pct),
    .groups = "drop"
  ) %>%
  arrange(desc(pct_very_important)) %>%
  pull(item)

item_labels <- c(
  q3_attendance = "Attendance",
  q3_service    = "Service quality",
  q3_exp        = "Experience",
  q3_edu        = "Education",
  q3_skill      = "Skills",
  q3_loc        = "Work location",
  q3_addtask    = "Additional tasks",
  q3_position   = "Position",
  q3_perf       = "Performance"
)

plot_q3 <- plot_q3 %>%
  mutate(
    item = factor(item, levels = rev(item_order))
  )

plot_q3$item <- recode(plot_q3$item, !!!item_labels)

# Construct plot

ggplot(plot_q3, aes(x = pct, y = item, fill = response)) +
  geom_col(width = 0.7) +
  facet_wrap(~ loc_label) +
  scale_x_continuous(
    labels = percent_format(accuracy = 1),
    breaks = seq(0, 1, by = 0.25),
    expand = c(0, 0)
  ) +
  scale_fill_manual(
    values = likert_colors_q3,
    drop = FALSE,
    na.translate = FALSE,
    guide = guide_legend(reverse = TRUE)
  ) +
  labs(
    title = "Importance of factors in determining incentives",
    x = "Percentage of respondents",
    y = NULL,
    fill = "Importance"
  ) +
  theme_minimal(base_size = 11) +
  theme(
    panel.grid.major.y = element_blank(),
    panel.grid.minor = element_blank(),
    panel.spacing.x = unit(2, "lines"),
    legend.position = "bottom",
    plot.margin = margin(10, 20, 10, 10),
    strip.text = element_text(face = "bold")
  )

1.4 Q4 & Q5: Most and Least Important Criteria for Incentive Allocation

Q4: From the following options, which factor is the MOST important for determining incentives for Puskesmas staff?

Q5: From the following options, which factor is the LEAST important for determining incentives for Puskesmas staff?

Figure 5.5 compares the factors identified as most and least important for determining incentives for Puskesmas staff across Lombok Barat and Purbalingga. In both locations, performance emerges as the most frequently selected factor as “most important,” substantially exceeding other criteria. Service quality and attendance are also commonly prioritised, though to a lesser extent. In contrast, positional and structural attributes such as position, work location, and additional tasks are more frequently identified as “least important.”

Show / hide code

# --------------------------------------------------
# Factor levels
# --------------------------------------------------

q45_levels <- c(
  "Experience",
  "Performance",
  "Attendance",
  "Service quality",
  "Skills",
  "Position",
  "Additional tasks",
  "Work location",
  "Education"
)

# --------------------------------------------------
# Prepare data
# --------------------------------------------------

plot_q45 <- raw_combined %>%
  select(loc_label, q4, q5) %>%
  pivot_longer(
    cols = c(q4, q5),
    names_to = "question",
    values_to = "factor"
  ) %>%
  filter(!is.na(factor)) %>%
  mutate(
    question = recode(
      question,
      q4 = "Most important",
      q5 = "Least important"
    ),
    factor = factor(factor, levels = q45_levels)
  ) %>%
  count(loc_label, question, factor) %>%
  group_by(loc_label, question) %>%
  mutate(
    pct = n / sum(n),
    pct = if_else(question == "Least important", -pct, pct)
  ) %>%
  ungroup()

# --------------------------------------------------
# Order factors by importance (based on Q4 across both locations)
# --------------------------------------------------

factor_order <- plot_q45 %>%
  filter(question == "Most important") %>%
  group_by(factor) %>%
  summarise(pct = sum(abs(pct)), .groups = "drop") %>%
  arrange(pct) %>%
  pull(factor)

plot_q45$factor <- factor(plot_q45$factor, levels = factor_order)

# --------------------------------------------------
# Construct mirrored horizontal bar chart
# --------------------------------------------------

ggplot(plot_q45, aes(x = pct, y = factor, fill = question)) +
  geom_col(width = 0.7) +
  facet_wrap(~ loc_label) +
  scale_x_continuous(
    limits = c(-0.3, 0.5),
    breaks = c(-0.3, -0.1, 0, 0.1, 0.3, 0.5),
    minor_breaks = seq(-0.3, 0.5, by = 0.1), # gridlines every 10%
    labels = function(x) paste0(abs(x * 100), "%"),
    expand = c(0, 0)
  ) +
  scale_fill_manual(
    values = c(
      "Most important"  = "#2166ac",
      "Least important" = "#b2182b"
    )
  ) +
  labs(
    title = "Most and least important criteria for incentive allocation",
    subtitle = "Mirrored horizontal bar chart comparing Q4 and Q5",
    x = "Percentage of respondents",
    y = NULL,
    fill = NULL
  ) +
  theme_minimal(base_size = 11) +
  theme(
    # spacing between facets
    panel.spacing.x = unit(4, "lines"),

    # gridlines
    panel.grid.major.y = element_blank(),
    panel.grid.minor.y = element_blank(),
    panel.grid.major.x = element_line(
      colour = "grey80",
      linewidth = 0.5
    ),
    panel.grid.minor.x = element_line(
      colour = "grey90",
      linewidth = 0.3
    ),

    # axes
    axis.text.y = element_text(size = 10),
    axis.text.x = element_text(size = 9),
    axis.ticks.x = element_blank(),
    axis.line.x  = element_blank(),

    # facet titles
    strip.text = element_text(face = "bold"),

    # legend
    legend.position = "bottom",
    
    # margin
    plot.margin = margin(10, 20, 10, 10)
  )

1.5 Q6: Preferred Rules in Work Contract

Q6: Which rules do you think are appropriate to apply in the work contract for Puskesmas staff?

Figure 5.6 summarises respondents’ preferred rules for determining Puskesmas staff income, allowing multiple selections per respondent. Across both Lombok Barat and Purbalingga, performance-based bonuses are the most frequently selected option, with approximately four in five respondents in each location endorsing this rule. Sanctions in the form of pay reductions for unjustified absence are also commonly supported, particularly in Purbalingga. In contrast, attendance-based bonuses and fixed pay rates are selected by a substantially smaller proportion of respondents. Overall, these patterns indicate strong support for income systems that explicitly reward performance and penalise poor attendance, rather than fixed or unconditional remuneration structures.

Show / hide code

# --------------------------------------------------
# Prepare denominators (unique respondents per location)
# --------------------------------------------------

denom <- q6_long %>%
  distinct(resp_id, loc_label) %>%
  count(loc_label, name = "n_resp")

# --------------------------------------------------
# Prepare data for plotting
# --------------------------------------------------

plot_q6 <- q6_long %>%
  count(loc_label, q6_option) %>%
  left_join(denom, by = "loc_label") %>%
  mutate(
    pct = n / n_resp,
    loc_label = factor(
      loc_label,
      levels = c("Lombok Barat", "Purbalingga")
    )
  )

# Order options by overall selection frequency
order_q6 <- plot_q6 %>%
  group_by(q6_option) %>%
  summarise(pct_overall = mean(pct), .groups = "drop") %>%
  arrange(pct_overall) %>%
  pull(q6_option)

plot_q6 <- plot_q6 %>%
  mutate(q6_option = factor(q6_option, levels = order_q6))

# --------------------------------------------------
# Construct plot (Lombok Barat on top)
# --------------------------------------------------

ggplot(plot_q6, aes(x = pct, y = q6_option, fill = loc_label)) +
  geom_col(
    position = position_dodge2(width = 0.7, reverse = TRUE),
    width = 0.6
  ) +
  scale_x_continuous(
    limits = c(0, 1),
    labels = percent_format(accuracy = 1),
    breaks = seq(0, 1, by = 0.2),
    expand = c(0, 0)
  ) +
  labs(
    title = "Preferred rules in work contracts for determining Puskesmas staff income",
    x = "Percentage of respondents",
    y = NULL,
    fill = "Location"
  ) +
  theme_minimal(base_size = 11) +
  theme(
    panel.grid.major.y = element_blank(),
    legend.position = "bottom",
    plot.margin = margin(10, 20, 10, 10),
    plot.title = element_text(size = 12)
  )

2 Inferential analysis

2.1 Location Differences in the Perceived Importance of Incentive Criteria

Show / hide code

library(MASS)

# --------------------------------------------------
# Q3 belief structure: ordinal logistic regression
# Final table only
# --------------------------------------------------

# Ordered levels for Q3 items
q3_levels <- c(
  "Unimportant",
  "Less important",
  "Important",
  "Very important"
)

# Human-readable labels
q3_labels <- c(
  q3_attendance = "Attendance",
  q3_service    = "Service quality",
  q3_exp        = "Experience",
  q3_edu        = "Education",
  q3_skill      = "Skills",
  q3_loc        = "Work location",
  q3_addtask    = "Additional tasks",
  q3_position   = "Position",
  q3_perf       = "Performance"
)

# --------------------------------------------------
# Fit models and extract results
# --------------------------------------------------

q3_summary <- lapply(q3_vars, function(var) {

  dat <- raw_combined %>%
    filter(!is.na(.data[[var]])) %>%
    mutate(
      value = factor(.data[[var]], levels = q3_levels, ordered = TRUE)
    )

  model <- polr(value ~ loc_label, data = dat, Hess = TRUE)

  beta <- coef(model)
  se   <- sqrt(diag(vcov(model)))[names(beta)]
  z    <- beta / se
  p    <- 2 * pnorm(abs(z), lower.tail = FALSE)

  tibble(
    variable_label = recode(var, !!!q3_labels),
    OR = exp(beta),
    CI_low = exp(beta - 1.96 * se),
    CI_high = exp(beta + 1.96 * se),
    p_value = p
  )
}) %>%
  bind_rows() %>%
  mutate(
    p_value = format.pval(p_value, digits = 3, eps = 0.001)
  )

q3_summary

Show / hide code

detach("package:MASS", unload = TRUE)

# --------------------------------------------------
# Forest plot: Odds ratios for Q3 items
# --------------------------------------------------

ggplot(
  q3_summary,
  aes(
    x = OR,
    y = reorder(variable_label, OR),
    xmin = CI_low,
    xmax = CI_high
  )
) +
  geom_point(size = 2) +
  geom_errorbarh(height = 0.25) +
  geom_vline(
    xintercept = 1,
    linetype = "dashed",
    colour = "grey40",
    linewidth = 0.6
  ) +

  scale_x_log10(
    limits = c(0.25, 2),
    breaks = c(0.25, 0.5, 0.75, 1, 1.25, 1.5, 1.75, 2),
    minor_breaks = seq(0.25, 2, by = 0.25),
    labels = c("0.25", "0.50", "0.75", "1,00", "1.25", "1.50", "1.75", "2.00")
  ) +

  labs(
    title = "Location Differences in Perceived Importance of Incentive Criteria",
    subtitle = "Ordinal logistic regression (reference category: Lombok Barat)",
    x = "Odds ratio (Purbalingga vs Lombok Barat, log scale)",
    y = NULL
  ) +

  theme_minimal(base_size = 11) +
  theme(
    panel.grid.major.x = element_line(colour = "grey80", linewidth = 0.5),
    panel.grid.minor.x = element_line(colour = "grey90", linewidth = 0.3),
    panel.grid.major.y = element_blank(),
    panel.grid.minor.y = element_blank(),
    plot.margin = margin(10, 20, 10, 10)
  )

To examine whether the perceived importance of different criteria for incentive allocation varies by location, ordinal logistic regression models were estimated for each item in the Q3 battery, with location as the sole predictor. Odds ratios below one indicate that respondents in Purbalingga were less likely than those in Lombok Barat to rate a given factor as more important.

Several Q3 items show statistically significant differences between locations. Respondents in Purbalingga were substantially less likely to rate education as important compared with respondents in Lombok Barat (OR = , 95% CI –, p < 0.001). Similar patterns are observed for experience (OR = , 95% CI –, p = ), service quality (OR = , 95% CI –, p = ), and skills (OR = , 95% CI –, p = ).

In contrast, no statistically meaningful differences are observed for performance, additional tasks, position, or work location, where odds ratios are close to unity and confidence intervals include one. Overall, these results suggest that respondents in Lombok Barat place greater emphasis on formal attributes such as education, experience, and service quality when evaluating incentive criteria, while performance-related co

2.2 LCA of Q3

Show / hide code

# --------------------------------------------------
# Latent Class Analysis (LCA) of Q3 battery
# --------------------------------------------------

library(dplyr)
library(tidyr)
library(ggplot2)
library(poLCA)

# --------------------------------------------------
# 1. Prepare Q3 data (ordinal → categorical numeric)
# --------------------------------------------------

q3_items <- c(
  "q3_attendance",
  "q3_service",
  "q3_exp",
  "q3_edu",
  "q3_skill",
  "q3_loc",
  "q3_addtask",
  "q3_position",
  "q3_perf"
)

q3_levels <- c(
  "Unimportant",
  "Less important",
  "Important",
  "Very important"
)

q3_data <- raw_combined %>%
  dplyr::select(all_of(q3_items)) %>%
  na.omit() %>%
  mutate(
    across(
      everything(),
      ~ as.numeric(factor(.x, levels = q3_levels))
    )
  )

# --------------------------------------------------
# 2. Fit LCA models (2–6 classes)
# --------------------------------------------------

lca_formula <- as.formula(
  paste("cbind(", paste(q3_items, collapse = ","), ") ~ 1")
)

set.seed(123)

lca_models <- lapply(2:6, function(k) {
  poLCA(
    lca_formula,
    data = q3_data,
    nclass = k,
    nrep = 30,
    maxiter = 5000,
    verbose = FALSE
  )
})

# --------------------------------------------------
# 3. Model selection: BIC
# --------------------------------------------------

fit_stats <- tibble(
  classes = 2:6,
  BIC = sapply(lca_models, \(m) m$bic)
)

ggplot(fit_stats, aes(x = classes, y = BIC)) +
  geom_line() +
  geom_point(size = 2) +
  scale_x_continuous(breaks = 2:6) +
  labs(
    title = "Model selection for latent class analysis",
    subtitle = "Bayesian Information Criterion (lower is better)",
    x = "Number of latent classes",
    y = "BIC"
  ) +
  theme_minimal(base_size = 11)

Show / hide code

# --------------------------------------------------
# 4. Select best model (lowest BIC)
# --------------------------------------------------

best_lca <- lca_models[[which.min(fit_stats$BIC)]]

# Class sizes (for facet labels only)
class_sizes <- as.data.frame(table(best_lca$predclass)) |>
  rename(class = Var1, n = Freq) |>
  mutate(
    class = as.integer(as.character(class)),
    pct = n / sum(n)
  ) |>
  arrange(class)

# --------------------------------------------------
# 5. Prepare item-response probabilities for plotting
# --------------------------------------------------

# Human-readable labels for Q3 items
q3_labels <- c(
  q3_attendance = "Attendance",
  q3_service    = "Service quality",
  q3_exp        = "Experience",
  q3_edu        = "Education",
  q3_skill      = "Skills",
  q3_loc        = "Work location",
  q3_addtask    = "Additional tasks",
  q3_position   = "Position",
  q3_perf       = "Performance"
)

# Human-readable labels for response categories
resp_labels <- c(
  "Pr(1)" = "Unimportant",
  "Pr(2)" = "Less important",
  "Pr(3)" = "Important",
  "Pr(4)" = "Very important"
)

prob_long <- lapply(seq_along(best_lca$probs), function(i) {

  as.data.frame(best_lca$probs[[i]]) %>%
    mutate(
      item = names(best_lca$probs)[i],
      class = row_number()
    )

}) %>%
  bind_rows() %>%
  pivot_longer(
    cols = starts_with("Pr"),
    names_to = "response",
    values_to = "prob"
  ) %>%
  mutate(
    item = recode(item, !!!q3_labels),
    response = factor(
      recode(response, !!!resp_labels),
      levels = c(
        "Unimportant",
        "Less important",
        "Important",
        "Very important"
      )
    )
  ) %>%
  left_join(class_sizes, by = "class") %>%
  mutate(
    class_label = paste0("Class ", class, " (n = ", n, ")")
  )

# --------------------------------------------------
# 6. Heatmap of latent class profiles (final output)
# --------------------------------------------------

ggplot(prob_long, aes(x = response, y = item, fill = prob)) +
  geom_tile() +
  facet_wrap(~ class_label) +
  scale_y_discrete(expand = expansion(mult = c(0.05, 0.05))) +
  scale_fill_gradient(
    low = "#e0ecf4",
    high = "#8856a7",
    name = "Probability"
  ) +
  labs(
    title = "Latent class profiles of beliefs about incentive criteria",
    subtitle = "Item–response probabilities for Q3 battery",
    x = "Response category",
    y = "Incentive criterion"
  ) +
  theme_minimal(base_size = 11) +
  theme(
    panel.grid = element_blank(),
    strip.text = element_text(face = "bold"),
    axis.text.x = element_text(
      size = 10,
      angle = 20,
      hjust = 1
    )
  )

Show / hide code

detach("package:poLCA", unload = TRUE)

To identify distinct patterns in how respondents prioritise criteria for incentive allocation, a latent class analysis (LCA) was conducted using the nine Q3 items. Models with two to six classes were estimated and compared using the Bayesian Information Criterion (BIC), with lower values indicating better model fit.

Model comparison indicates that a 4-class solution provides the best balance between model fit and parsimony (BIC = 6900.1). In the selected model, Class 1 comprises 8.3% of respondents (n = 46), Class 2 represents 23.6% (n = 130), Class 3 accounts for 50.8% (n = 280), and Class 4 comprises 17.2% of respondents (n = 95).

Inspection of the item–response probabilities suggests that the largest class places consistently high importance on performance and service quality, while assigning lower importance to formal attributes such as position and work location. In contrast, another class shows a more balanced profile, attaching comparable importance to education, experience, and performance-related criteria. These patterns indicate meaningful heterogeneity in underlying belief structures that is not captured by single-item analyses alone.