Introduction

Background and problem statement

Vietnam’s rapid digital-government rollout makes service portals the primary state–citizen interface. Ensuring digital accessibility (WCAG 2.1 A/AA) is therefore essential for equitable access. Despite policy intent, compliance varies across portals and tasks—especially in contrast, keyboard support, focus visibility, and error handling.

Objectives and contributions

We (i) benchmark accessibility and design-system conformance across major portals; (ii) identify critical barriers along representative user journeys; and (iii) translate findings into procurement/acceptance checklists and governance processes for accessible-by-default delivery.

Structure of the paper

Section 2 reviews the framework; Section 3 details methods; Section 4 reports results and policy implications. Appendices provide the checklist, scorecard schema, evidence register template, and reliability SOP.

Literature Review and Theoretical Framework

Digital accessibility and WCAG 2.1

WCAG 2.1 A/AA criteria under POUR guide perceivable, operable, understandable, and robust design. Priority checks: text/non-text contrast, keyboard access, focus order/visibility, headings/landmarks, form labels and error handling, and ARIA semantics.

Design systems in the public sector

Design systems operationalize standards via reusable, tested components (e.g., error summary, form fields, focus tokens) that reduce variability and prevent regressions across multi-vendor delivery.

Heuristics and accessibility-by-default

Usability heuristics complement formal conformance to capture real task barriers. Accessibility must be embedded across procurement→design→development→testing→operations.

Methodology

Sampling and scope

Expert audit of 8 portals × 5 tasks (procedure search, multi-step form, status lookup, authentication/eID/OTP, payment). Unit of analysis: portal×task.

Audit framework and scoring

Priority 22 WCAG 2.1 A/AA success criteria scored 0–2; issue severity 1–4. Complemented with public-sector design-system and heuristic checks. Two independent raters (Tran Cong Trung, Vo Thanh Thien) with reconciliation.

Analysis

We report portal/task indices, severe-issue counts, group comparisons (t-test/ANOVA), and visualizations. (Inter-rater κ not computed here because this report uses reconciled, aggregate results.)

Ethics

Public interfaces only; no personal data; results reported for constructive improvement. # Results ## Packages pkgs <- c(“tidyverse”,“janitor”,“readr”,“gt”,“knitr”,“psych”,“pheatmap”,“reshape2”) to_install <- pkgs[!pkgs %in% installed.packages()[,“Package”]] if(length(to_install)) install.packages(to_install) invisible(lapply(pkgs, library, character.only = TRUE)) ## Inputs & Criteria # Data paths (change when using your real CSVs)

path_scores_a <- “data/scores_A.csv” path_scores_b <- “data/scores_B.csv” path_evidence <- “data/evidence.csv”

Priority 22 WCAG 2.1 A/AA criteria

crit <- c(“1.1.1”,“1.3.1”,“1.4.3”,“1.4.10”,“1.4.11”,“1.4.12”,“1.4.13”, “2.1.1”,“2.1.2”,“2.4.1”,“2.4.3”,“2.4.4”,“2.4.7”,“2.5.3”, “3.1.1”,“3.1.2”,“3.2.3”,“3.3.1”,“3.3.2”,“3.3.3”,“4.1.2”,“4.1.3”)

crit_meta <- tibble::tribble( ~code, ~level, ~group, ~criterion, “1.1.1”,“A”,“Perceivable”,“Non-text content (Alt text)”, “1.3.1”,“A”,“Perceivable”,“Info & relationships (semantic structure)”, “1.4.3”,“AA”,“Perceivable”,“Contrast (text)”, “1.4.10”,“AA”,“Perceivable”,“Reflow (320px)”, “1.4.11”,“AA”,“Perceivable”,“Non-text contrast”, “1.4.12”,“AA”,“Perceivable”,“Text spacing”, “1.4.13”,“AA”,“Perceivable”,“Content on hover/focus”, “2.1.1”,“A”,“Operable”,“Keyboard accessible”, “2.1.2”,“A”,“Operable”,“No keyboard trap”, “2.4.1”,“A”,“Operable”,“Bypass blocks (Skip link)”, “2.4.3”,“A”,“Operable”,“Focus order”, “2.4.4”,“A”,“Operable”,“Link purpose (in context)”, “2.4.7”,“AA”,“Operable”,“Focus visible”, “2.5.3”,“A”,“Operable”,“Label in name”, “3.1.1”,“A”,“Understandable”,“Language of page”, “3.1.2”,“AA”,“Understandable”,“Language of parts”, “3.2.3”,“AA”,“Understandable”,“Consistent navigation”, “3.3.1”,“A”,“Understandable”,“Error identification”, “3.3.2”,“A”,“Understandable”,“Labels or instructions”, “3.3.3”,“AA”,“Understandable”,“Error suggestion”, “4.1.2”,“A”,“Robust”,“Name, role, value”, “4.1.3”,“AA”,“Robust”,“Status messages” ) max_points <- length(crit) * 2 ## Load or Simulate Data have_files <- file.exists(path_scores_a) && file.exists(path_scores_b)

if (have_files) { A <- readr::read_csv(path_scores_a) %>% janitor::clean_names() B <- readr::read_csv(path_scores_b) %>% janitor::clean_names() } else { message(“CSV not found. Generating demo data…”) set.seed(2025) portals <- c(“National”,“MOJ”,“VSS”,“Hanoi”,“HCMC”,“Da Nang”,“Quang Ninh”,“Thua Thien Hue”) tasks <- c(“Procedure search”,“Multi-step form”,“Status lookup”,“Authentication / eID / OTP”,“Payment”) grid <- expand.grid(portal=portals, task=tasks, stringsAsFactors = FALSE)

rnd_scores <- function(n) sample(c(0,1,2), n, replace=TRUE, prob=c(0.2,0.35,0.45)) make_scores <- function() { m <- as_tibble(matrix(rnd_scores(nrow(grid)*length(crit)), ncol=length(crit))) names(m) <- crit bind_cols(grid, m, sev_count = sample(1:7, nrow(grid), replace=TRUE)) } A <- make_scores() B <- make_scores() }

needed <- c(“portal”,“task”, crit, “sev_count”) stopifnot(all(needed %in% names(A)), all(needed %in% names(B))) A <- A %>% select(all_of(needed)) B <- B %>% select(all_of(needed)) ## Reconcile A/B, Compute Indices & Kappa keys <- c(“portal”,“task”)

AB <- A %>% rename_with(~ paste0(.x,“_a”), all_of(crit)) %>% rename(sev_a = sev_count) %>% inner_join( B %>% rename_with(~ paste0(.x,“_b”), all_of(crit)) %>% rename(sev_b = sev_count), by = keys )

Conservative reconciliation: min(A,B)

reconciled <- AB %>% rowwise() %>% mutate(across(all_of(crit), ~ min(c_across(c(paste0(cur_column(),“_a”), paste0(cur_column(),“_b”))), na.rm=TRUE), .names = “{.col}_recon”)) %>% ungroup() %>% rowwise() %>% mutate(wcag_total_recon = sum(c_across(ends_with(“_recon”)), na.rm=TRUE), wcag_index_recon = 100 * wcag_total_recon / max_points, severe_recon = pmin(sev_a, sev_b)) %>% ungroup()

Cohen’s kappa per portal×task (pass=2 vs fail<2)

kappa_row <- function(row) { A_pass <- sapply(crit, function(c) ifelse(row[[paste0(c,“_a”)]]==2,1,0)) B_pass <- sapply(crit, function(c) ifelse(row[[paste0(c,“_b”)]]==2,1,0)) psych::cohen.kappa(cbind(A_pass, B_pass))$kappa } kappa_df <- AB %>% rowwise() %>% mutate(kappa = kappa_row(cur_data())) %>% ungroup() %>% select(all_of(keys), kappa)

recon_full <- reconciled %>% select(all_of(keys), wcag_total_recon, wcag_index_recon, severe_recon) %>% left_join(kappa_df, by = keys) ## Portal & Task Summaries summary_portal <- recon_full %>% group_by(portal) %>% summarise( mean_wcag = mean(wcag_index_recon), ci_low = mean_wcag - 1.96sd(wcag_index_recon)/sqrt(n()), ci_high= mean_wcag + 1.96sd(wcag_index_recon)/sqrt(n()), severe_median = median(severe_recon), mean_kappa = mean(kappa, na.rm=TRUE), n_task = n(), .groups=“drop” ) %>% arrange(desc(mean_wcag))

summary_portal %>% gt::gt() %>% gt::fmt_number(columns = c(mean_wcag, ci_low, ci_high, mean_kappa), decimals = 1) %>% gt::tab_header(title = “Portal-level WCAG Index (Recon), CI, Severe median, and κ”) summary_task <- recon_full %>% group_by(task) %>% summarise( mean_wcag = mean(wcag_index_recon), sd_wcag = sd(wcag_index_recon), severe_median = median(severe_recon), n = n(), .groups=“drop” ) %>% arrange(desc(mean_wcag))

summary_task %>% gt::gt() %>% gt::fmt_number(columns = c(mean_wcag, sd_wcag), decimals = 1) %>% gt::tab_header(title = “Task-level WCAG Index (Recon) and Severe median”)

Heatmap of Criterion Scores

Appendix

Appendix A — WCAG 2.1 A/AA Checklist (Priority 22)

Table A1. Priority WCAG 2.1 A/AA Checklist
Code Level POUR Group Criterion
1.1.1 A Perceivable Non-text content (Alt text)
1.3.1 A Perceivable Info & relationships (semantic structure)
1.4.3 AA Perceivable Contrast (text)
1.4.10 AA Perceivable Reflow (320px)
1.4.11 AA Perceivable Non-text contrast
1.4.12 AA Perceivable Text spacing
1.4.13 AA Perceivable Content on hover/focus
2.1.1 A Operable Keyboard accessible
2.1.2 A Operable No keyboard trap
2.4.1 A Operable Bypass blocks (Skip link)
2.4.3 A Operable Focus order
2.4.4 A Operable Link purpose (in context)
2.4.7 AA Operable Focus visible
2.5.3 A Operable Label in name
3.1.1 A Understandable Language of page
3.1.2 AA Understandable Language of parts
3.2.3 AA Understandable Consistent navigation
3.3.1 A Understandable Error identification
3.3.2 A Understandable Labels or instructions
3.3.3 AA Understandable Error suggestion
4.1.2 A Robust Name, role, value
4.1.3 AA Robust Status messages

Rubric: Pass (2) = fully conforms; Partial (1) = minor deviation; Fail (0) = substantial violation. ## Appendix B — Scorecards (example rows)

Table Bx. Example Scorecard Rows (WCAG & Severe)
portal task wcag severe
Da Nang Authentication / eID / OTP 70 2
Da Nang Multi-step form 68 3
Da Nang Payment 70 2
Da Nang Procedure search 82 1
Da Nang Status lookup 80 2
HCMC Authentication / eID / OTP 56 6
HCMC Multi-step form 50 6
HCMC Payment 46 7
HCMC Procedure search 70 3
HCMC Status lookup 68 3

Appendix C — Evidence Register (template)

Table C1. Evidence Register (template)
evidence_id portal task url_full timestamp description file_name
HCMC_AUTH_007 HCMC Authentication / eID / OTP https://…/login 2025-10-20 10:05 Focus trapped in OTP modal HCMC_AUTH_007.png

Appendix D — Reliability & SOP (summary)

Auditors: Tran Cong Trung; Vo Thanh Thien.
Reporting: Reconciled scores are reported in this paper; inter-rater κ is not computed in this version.

SOP (operational steps): 1) Define the portal×task matrix;
2) Two independent audits: keyboard-only pass, screen-reader smoke test (NVDA/VoiceOver), contrast & semantics checks;
3) Capture evidence (full URL, timestamp, screenshots);
4) Reconcile disagreements; finalize Recon (0–2) and Severity (1–4);
5) Compute indices and severe counts;
6) Compile tables/figures; archive evidence.

Index formula: WCAG Index = (Σ Recon scores / 44) × 100;
Severe issues: count of level 3–4; Blockers: level 4.