1 Introduction

This report examines provincial competitiveness and business environment performance in Vietnam using PCI 2024 data. The analysis focuses on three questions: how PCI performance differs across provinces and regions, which PCI subindices most clearly distinguish stronger and weaker performers, and how well overall PCI score can be predicted from structured provincial indicators.

From a business and policy perspective, the value of this analysis lies in moving beyond a single headline PCI score. Instead, it identifies where provincial competitiveness is strong, where governance bottlenecks remain, and which institutional dimensions appear most relevant for improving the business environment.

2 Problem framing

The practical question for decision-makers is not simply which provinces rank highest or lowest, but why those differences appear and how they can be improved. Provincial competitiveness matters because it shapes the investment climate, administrative efficiency, business confidence, and the broader operating environment for firms.

This report therefore treats PCI as a multidimensional governance outcome. The analysis combines descriptive visualization with predictive modeling to show how provincial performance varies across regions and subindices, and to assess how systematically those subindices structure the overall PCI score.

3 Data and methods

3.1 Data preparation

ft_pci_structure <- ft_table(
  pci_structure_table,
  font_size = 9.5,
  fit_width = TRUE,
  page_width = 6.8
) %>%
  flextable::align(
    j = c("Variable", "Type", "Role", "Description"),
    align = "left",
    part = "body"
  ) %>%
  flextable::width(j = "Variable", width = 2.7) %>%
  flextable::width(j = "Type", width = 0.9) %>%
  flextable::width(j = "Role", width = 1.1) %>%
  flextable::width(j = "Description", width = 2.1)

ft_pci_structure

Variable

Type

Role

Description

Province

Categorical

Identifier

Province name

Region

Categorical

Grouping variable

Region of the province

PCI_Overall

Numeric

Outcome

Overall PCI score

CSTP_1_Gia_nhap_thi_truong

Numeric

Predictor

Market entry

CSTP_2_Tiep_can_dat_dai

Numeric

Predictor

Land access

CSTP_3_Tinh_Minh_bach

Numeric

Predictor

Transparency

CSTP_4_Chi_phi_thoi_gian

Numeric

Predictor

Time costs

CSTP_5_Chi_phi_khong_chinh_thuc

Numeric

Predictor

Informal charges

CSTP_6_Canh_tranh_binh_dang

Numeric

Predictor

Fair competition

CSTP_7_Tinh_nang_dong_va_tien_phong_cua_chinh_quyen

Numeric

Predictor

Government proactivity

CSTP_8_Chinh_sach_ho_tro_doanh_nghiep

Numeric

Predictor

Business support policy

CSTP_9_Dao_tao_lao_dong

Numeric

Predictor

Labor training

CSTP_10_Thiet_che_phap_ly_An_ninh_trat_tu

Numeric

Predictor

Legal institutions and security

The data structure indicates a clean province-level analytical design, with each row representing one provincial observation. PCI_Overall is appropriately defined as the main numeric outcome, while the ten PCI subindices serve as structured numeric predictors capturing different dimensions of provincial competitiveness. In addition, Province functions as the identifier and Region provides a higher-level grouping variable for comparative regional analysis.

This structure is well suited to the analytical goals of the report. Because the dependent variable is continuous and the predictor set is mostly numeric, the dataset is appropriate for descriptive analysis, heatmaps, regional comparison, and regression-based predictive modeling. At the same time, the inclusion of clearly separated PCI component scores allows the analysis to move beyond overall rankings and examine which institutional dimensions contribute most to stronger or weaker provincial performance.

4 Exploratory data analysis

4.1 Overview of PCI performance

No. of provinces

Mean PCI

SD

Median

Minimum

Maximum

30

69.70

1.67

69.12

67.87

74.84

The summary statistics suggest that PCI performance across the 30 provinces is moderate rather than highly polarized. The mean PCI score is 69.70, while the median is slightly lower at 69.12, indicating a fairly balanced distribution with only a mild pull from higher-scoring provinces. In addition, the standard deviation of 1.67 is relatively small, which implies that provincial PCI scores are clustered within a fairly narrow range rather than being widely dispersed. However, the observed range from 67.87 to 74.84 still shows that meaningful differences in competitiveness remain across provinces.

4.2 Map of overall PCI performance

The map shows that the 30 provinces in the sample are geographically distributed across all three major parts of Vietnam, including the North, Central, and South. However, the sample is not spatially uniform. A clear concentration of observations appears in the Red River Delta and nearby northern provinces, while a second cluster is visible in the southern economic zone around Ho Chi Minh City and the Mekong Delta. In contrast, central Vietnam is represented by fewer provinces and a wider geographic spacing between observations.

This spatial pattern suggests that the dataset captures a broad national spread but remains denser in the country’s major economic and administrative corridors. From an analytical perspective, this is useful because it allows comparison across different territorial contexts, while also indicating that some regional clusters may exert stronger influence on overall descriptive patterns. Therefore, the provincial PCI results should be interpreted as geographically broad but somewhat uneven in distribution.

4.3 Top and bottom provinces

The chart shows that PCI differences across provinces are real but not extreme. Hai Phong, Quang Ninh, and Long An form the strongest-performing group, each scoring above 72, while Thai Binh, Ho Chi Minh City, and Ca Mau are the weakest performers in the sample. However, the overall spread remains relatively narrow, which suggests that provincial competitiveness varies within a moderate band rather than across sharply divided tiers. This means that even small score gaps may reflect meaningful differences in governance quality and business conditions.

4.4 PCI subindex heatmap for top and bottom performers

4.5 Regional PCI distribution using ridgeline density

4.6 Key EDA insights

First, PCI performance varies substantially across provinces, and the gap between top and bottom performers is visually clear on both the ranking plot and the map. However, those differences are not purely regional. Provinces within the same region can still show noticeably different competitiveness levels.

Second, the heatmap suggests that stronger-performing provinces are not defined by one outstanding subindex alone. Instead, they tend to maintain relatively solid performance across multiple PCI dimensions. In contrast, weaker-performing provinces more often display broader weaknesses across several governance-related areas.

Third, the ridgeline distribution shows that regional PCI profiles overlap considerably. This indicates that regional context matters, but it does not fully determine provincial competitiveness. Provincial institutional performance remains an important source of variation within regions.

5 Model development

5.1 Modeling strategy

The predictive objective is to estimate overall PCI score from structured provincial predictors. Because PCI overall score is continuous, Multiple Linear Regression is used as the baseline model. Ridge and Lasso are added as regularized alternatives to address collinearity among PCI subindices. A naive mean benchmark is included for comparison.

5.2 Train/test split and model fitting

6 Model evaluation

6.1 Performance table

Model

MAE

RMSE

MAPE

ME

Multiple Linear Regression

0.0000

0.0000

0.0000

-0.0000

Lasso

0.0798

0.0938

0.1142

0.0084

Ridge

0.1789

0.1887

0.2555

0.0085

Naive mean benchmark

1.2286

1.6092

1.7343

0.4587

6.2 Model comparison plot

The predictive results show that the linear model, Ridge, and Lasso all substantially outperform the naive benchmark. This indicates that overall PCI score is highly structured by the underlying PCI dimensions and associated regional context. At the same time, this result should be interpreted carefully: PCI overall score is itself closely tied to those component subindices, so the predictive exercise mainly confirms internal consistency of the PCI framework rather than identifying independent causal drivers.

7 Business recommendations

First, provincial authorities should treat PCI as a multidimensional governance dashboard rather than focusing only on overall rank. Improvement efforts should prioritize the specific subindices where weak-performing provinces repeatedly lag behind, especially transparency, fair competition, government proactivity, and labor training.

Second, provinces aiming to improve competitiveness should avoid narrow, isolated reforms. The descriptive patterns suggest that stronger performers tend to be consistently solid across several institutional dimensions at once. This implies that coordinated reform across administrative efficiency, business support, and legal-institutional quality is likely to be more effective than improving a single metric in isolation.

Third, decision-makers should use comparative provincial benchmarking more actively. The top-bottom ranking and heatmap together show that stronger provinces may serve as practical reference cases for weaker provinces. Benchmarking against peers with similar regional context could make reform priorities more actionable and credible.

8 Limitations

This analysis has several limitations. First, the predictive model uses PCI component subindices to predict PCI overall score, so the strong predictive performance mainly reflects the internal structure of the PCI system rather than external causal explanation. Second, the modeling sample is limited to provinces with complete observations. Third, the analysis is cross-sectional and does not capture changes over time.

Future work could extend the dataset across multiple PCI years, incorporate external economic outcomes such as firm entry, FDI, or local growth, and test whether PCI dimensions predict broader business performance beyond the composite PCI score itself.

9 Conclusion

Overall, the analysis suggests that provincial competitiveness in Vietnam is not driven by isolated strengths, but by broader patterns of institutional performance across multiple PCI dimensions. Stronger-performing provinces tend to maintain consistently better scores across several governance-related subindices, whereas weaker-performing provinces more often exhibit simultaneous weaknesses across multiple areas.

The predictive modeling results further confirm that overall PCI is closely and systematically tied to its component dimensions, indicating strong internal consistency within the PCI framework. Therefore, the main implication of this analysis is that provincial competitiveness should be understood as a multidimensional governance outcome. Policy efforts aimed at improving PCI should focus on coordinated institutional reform across several key subindices rather than narrow attempts to improve overall rankings alone.