Title: A Panel Data Analysis of Global Population Aging Patterns
Group Members:
HUANG ZHAOHUA_25085502, AI WEN_25064312, WEI QIANQIAN_25089535,
WU JING_25085913, Li Zheng_24205785, Xia Lei_25090973
Group Name: G07
Occurrence: OCC2
Lecturer Name: Dr. Ong Sim Ying
Course Name: WQD7004 Programming for Data Science
Population aging has emerged as one of the most prominent global demographic challenges across economies with divergent income levels and geographic locations. This panel-data-driven project investigates cross-country aging patterns relying on multi-decade World Bank open-source country-level indicators. Three independent raw datasets with dimensions (63840,70), (240,5) and (265,6) are merged on country and indicator codes, constructing a consolidated balanced panel dataset with 63,840 observations and 76 variables. The full empirical pipeline consists of dataset integration, standardized data preprocessing, exploratory data analysis, fixed-effect panel regression, unsupervised K-means classification, model robustness verification and final Shiny-based web application deployment.
To identify key demographic, social, and economic factors that significantly influence national population aging rates through regression analysis.
To develop a classification model to divide countries into high-aging and low-aging groups based on birth rate and death rate.
To deploy the two trained models in an interactive Shiny application for online prediction and classification.
if (!require("pacman")) install.packages("pacman")
## Loading required package: pacman
## Warning: package 'pacman' was built under R version 4.5.3
pacman::p_load(
# Shiny APP related packages
shiny, shinyWidgets, shinybusy, shinythemes,
# Data wrangling & I/O
tidyverse, dplyr, readxl, data.table, scales, stringr,
# Econometric & panel regression
plm, lmtest, car, sandwich, urca, forecast,
# Missing value processing
imputeTS, zoo,
# Visualization
ggplot2, ggcorrplot, patchwork,
# Cluster & machine learning
cluster, factoextra, mclust, randomForest,
# Base statistical package
stats
)
Three separate raw source files including main population panel data, country metadata and indicator metadata are imported from local file directory for subsequent merging operations.
# Set working directory
BASE_PATH <- "D:/Desktop/DS STUDY/2026.3-8/WQD7004 PROGRAMMING FOR DATA SCIENCE (R)/Group Project/Datasets/"
# Import three source CSV files
df_World_Population <- read_csv(paste0(BASE_PATH, "World_Population_Data.csv"), show_col_types = FALSE)
## Warning: One or more parsing issues, call `problems()` on your data frame for details,
## e.g.:
## dat <- vroom(...)
## problems(dat)
df_Indicator <- read_csv(paste0(BASE_PATH, "Metadata_Indicator.csv"), show_col_types = FALSE)
## New names:
## • `` -> `...5`
df_Country <- read_csv(paste0(BASE_PATH, "Metadata_Country.csv"), show_col_types = FALSE)
## New names:
## • `` -> `...6`
Column names are standardized to lowercase with underscore substitution to unify naming rules; duplicate records across three raw tables are counted for preliminary data quality inspection.
# Define custom function to clean column names: convert all names to lowercase, replace space with underscore
rename_clean <- function(df){
colnames(df) <- str_to_lower(str_replace_all(str_trim(colnames(df)), " ", "_"))
return(df)
}
# Apply column name standardization to three imported datasets
df_World_Population <- rename_clean(df_World_Population)
df_Indicator <- rename_clean(df_Indicator)
df_Country <- rename_clean(df_Country)
# Calculate and print total duplicated observations for each dataset
cat("Duplicate rows World Population:", sum(duplicated(df_World_Population)), "\n")
## Duplicate rows World Population: 0
cat("Duplicate rows Indicator Meta:", sum(duplicated(df_Indicator)), "\n")
## Duplicate rows Indicator Meta: 0
cat("Duplicate rows Country Meta:", sum(duplicated(df_Country)), "\n")
## Duplicate rows Country Meta: 0
Main population dataset is sequentially left-joined with country metadata and indicator metadata using unique matching keys; duplicated columns generated during join are combined and unused unnamed auxiliary columns are removed to form integrated raw combined data.
# Step1 merge population with country metadata by country_code
merged_data <- left_join(df_World_Population, df_Country, by = "country_code")
# Step2 merge with indicator metadata by indicator_code
final_merged_data <- left_join(merged_data, df_Indicator, by = "indicator_code")
# Merge duplicated columns generated after join (_x/_y suffix)
combine_columns <- function(df, col_name){
col_x <- paste0(col_name, "_x")
col_y <- paste0(col_name, "_y")
if(all(c(col_x, col_y) %in% colnames(df))){
df[[col_name]] <- coalesce(df[[col_x]], df[[col_y]])
df <- df[, !colnames(df) %in% c(col_x, col_y)]
}
return(df)
}
final_merged_data <- combine_columns(final_merged_data, "indicator_name")
# Drop all unnamed auxiliary columns generated from raw csv
unnamed_cols <- str_subset(colnames(final_merged_data), "^unnamed")
final_merged_data <- final_merged_data[, !colnames(final_merged_data) %in% unnamed_cols]
# Output dataset dimension and export cleaned merged csv
cat("Merged dataset dimension:", dim(final_merged_data), "\n")
## Merged dataset dimension: 63840 79
write_csv(final_merged_data, "Final_Merged_Data.csv")
The dataset was reshaped from wide to long format first to facilitate observation-level cleaning (e.g., outlier removal) and to standardize the panel structure. This step ensures the raw data is reliable before we reshape it back to wide format for regression and machine learning.
# Import merged dataset
df <- read_csv("Final_Merged_Data.csv", show_col_types = FALSE)
## New names:
## • `...6` -> `...75`
## • `...5` -> `...79`
# Extract all numeric year columns between 1960-2025
year_cols <- as.character(seq(1960, 2025))
id_cols <- setdiff(colnames(df), year_cols)
# Reshape from wide yearly columns to long format (country-indicator-year-value)
df_long <- pivot_longer(
data = df,
cols = all_of(year_cols),
names_to = "year",
values_to = "value"
) %>%
mutate(year = as.integer(year), value = as.numeric(value)) %>%
arrange(country_code, indicator_code, year)
# Print dataset size and duplicate count
cat("Initial long format data dimension:", dim(df_long), "\n")
## Initial long format data dimension: 4213440 15
cat("Total duplicate records in long data:", sum(duplicated(df_long)), "\n")
## Total duplicate records in long data: 0
Missing value distribution is summarized; categorical missing entries are filled with “Unknown”; country-indicator groups without any valid numerical observations are deleted; linear time-series interpolation is implemented for sequential missing values within each country’s yearly records.
# Summarize missing value proportion per column
missing_stats <- tibble(
var = colnames(df_long),
miss_num = map_dbl(df_long, ~sum(is.na(.x))),
miss_pct = round(map_dbl(df_long, ~mean(is.na(.x))*100),2)
) %>% arrange(desc(miss_pct))
print(missing_stats %>% filter(miss_num>0))
## # A tibble: 7 × 3
## var miss_num miss_pct
## <chr> <dbl> <dbl>
## 1 ...75 4213440 100
## 2 ...79 4213440 100
## 3 value 2294482 54.5
## 4 specialnotes 2106720 50
## 5 incomegroup 807840 19.2
## 6 region 776160 18.4
## 7 tablename 15840 0.38
# Step1 fill categorical missing with Unknown
df_long <- df_long %>%
mutate(
region = replace(region, is.na(region), "Unknown"),
incomegroup = replace(incomegroup, is.na(incomegroup), "Unknown"),
specialnotes = replace(specialnotes, is.na(specialnotes), "Unknown")
)
# Step2 drop country-indicator groups with entirely empty value (no valid observation ever)
group_valid <- df_long %>%
group_by(country_code, indicator_code) %>%
summarise(valid_cnt = sum(!is.na(value)), .groups = "drop")
valid_group_list <- group_valid %>% filter(valid_cnt>0) %>% select(country_code, indicator_code)
df_long <- inner_join(df_long, valid_group_list, by = c("country_code", "indicator_code"))
# Step3 linear interpolation within each country-indicator time series
df_long <- df_long %>%
group_by(country_code, indicator_code) %>%
filter(sum(!is.na(value)) >= 2) %>%
mutate(value = imputeTS::na_interpolation(value, option = "linear")) %>%
ungroup()
cat("NA count after interpolation:", sum(is.na(df_long$value)), "\n")
## NA count after interpolation: 0
cat("Long data dimension after missing clean:", dim(df_long), "\n")
## Long data dimension after missing clean: 3185028 15
# Define IQR bound calculation function per subgroup
iqr_bounds <- function(series){
series <- na.omit(series)
if(length(series)<4) return(c(NA,NA))
q1 <- quantile(series,0.25)
q3 <- quantile(series,0.75)
iqr <- q3-q1
lower <- q1 -1.5*iqr
upper <- q3 +1.5*iqr
return(c(lower, upper))
}
# Count outlier number per group
count_outlier <- function(series){
bd <- iqr_bounds(na.omit(series))
if(is.na(bd[1])) return(c(0,0,0))
low_out <- sum(series < bd[1], na.rm = T)
up_out <- sum(series > bd[2], na.rm = T)
total <- low_out+up_out
return(c(total, low_out, up_out))
}
# Cap outlier with calculated bound
clip_outlier <- function(df_sub){
val <- df_sub$value
bd <- iqr_bounds(val)
if(!is.na(bd[1])) df_sub$value <- pmax(pmin(val, bd[2]), bd[1])
return(df_sub)
}
# Calculate global outlier summary
outlier_sum <- df_long %>%
group_by(country_code, indicator_code) %>%
summarise(out = list(count_outlier(value)), .groups = "drop") %>%
mutate(
total = map_dbl(out, ~.x[1]),
lower = map_dbl(out, ~.x[2]),
upper = map_dbl(out, ~.x[3])
)
cat("Total outlier:", sum(outlier_sum$total),
"\nLower outlier:", sum(outlier_sum$lower), "\nUpper outlier:",
sum(outlier_sum$upper),"\n")
## Total outlier: 181395
## Lower outlier: 73871
## Upper outlier: 107524
# Apply outlier clipping to full dataset
df_CleanData <- df_long %>%
group_by(country_code, indicator_code) %>%
mutate(value = {
val <- value
bd <- iqr_bounds(val)
if(!is.na(bd[1])) pmax(pmin(val, bd[2]), bd[1]) else val
}) %>%
ungroup()
# Post-clean outlier verification (should be zero after clipping)
post_outlier <- df_CleanData %>%
group_by(country_code, indicator_code) %>%
summarise(out = list(count_outlier(value)), .groups = "drop") %>%
mutate(total = map_dbl(out, ~.x[1]))
cat("Remaining outlier after capping:", sum(post_outlier$total),"\n")
## Remaining outlier after capping: 0
# Save cleaned long dataset
write_csv(df_CleanData, "CleanData.csv")
All numerical indicator values are log-transformed via log1p to mitigate right skewness of economic and demographic data and reduce heteroskedasticity for subsequent regression.
# Create log-transformed variable using log1p to avoid log(0) error
df_transform <- df_CleanData %>% mutate(value_log = log1p(pmax(value,0)))
In this section, we reshape the cleaned long-format panel data into a wide format suitable for econometric modeling. We then implement a two-stage missing value handling process: direct removal of high-missing variables and countries, followed by variable filtering and multi-step imputation to address remaining gaps.
# Convert long panel data back to wide format with log-transformed values for regression modelling
df_wide <- df_transform %>% pivot_wider(
id_cols = c(country_code, year, region, incomegroup),
names_from = indicator_code,
values_from = value_log
)
Remove variables with missing ratio >10%. Indicators above this cutoff suffer severe data scarcity (evident from the variable missing-rate bar chart) and cannot support credible subsequent imputation; the 10% threshold follows mainstream cross-country empirical standards and marks an obvious inflection point of missing proportion.
Exclude countries with average missing ratio >5%. Nations over the 5% benchmark are mostly small territories with incomplete statistical records, unable to provide valid within-country fluctuation for fixed-effect estimation;this boundary is chosen based on the sharp drop of missing rate shown in the country-level bar graph.
id_cols <- c("country_code","year","region","incomegroup")
var_cols <- setdiff(colnames(df_wide), id_cols)
# Calculate missing percentage for each independent indicator
var_miss_rate <- map_dbl(df_wide[,var_cols], ~mean(is.na(.x)))
miss_df_var <- tibble(indicator = names(var_miss_rate), missing_ratio = var_miss_rate)
# Extract top 30% variables with the highest missing values for visualization only
top_pct <- 0.3
n_top <- floor(nrow(miss_df_var)*top_pct)
top_miss_var <- miss_df_var %>% arrange(desc(missing_ratio)) %>% head(n_top)
# lock X-axis order by missing value descending, avoid alphabet auto-sort
top_miss_var$indicator <- factor(top_miss_var$indicator, levels = top_miss_var$indicator)
# Plot bar graph of top missing variables with two reference threshold lines
ggplot(top_miss_var,aes(x=indicator,y=missing_ratio))+
geom_col(fill="darkred")+
geom_hline(yintercept = 0.3,color="red",linetype="dashed",linewidth=1)+
geom_hline(yintercept = 0.1,color="orange",linetype="dashed",linewidth=1)+
labs(title="Top 30% Variables with Highest Missing Rates",
y="Missing Ratio (0 = no missing, 1 = all missing)",x="")+
theme_bw()+
theme(plot.title = element_text(hjust = 0.5),
axis.text.x = element_text(angle=90,vjust=0.5,hjust=1, size = 5.8))
# drop variables with missing ratio>10%
high_miss_var <- names(var_miss_rate)[var_miss_rate>0.1]
df_wide_AfterVariables_clean <- df_wide[,!colnames(df_wide) %in% high_miss_var]
# Print variable screening summary
cat("Original vars:", length(var_cols), "\n")
## Original vars: 235
cat("Drop vars:", length(high_miss_var), "\n")
## Drop vars: 120
cat("Remain vars:", length(setdiff(colnames(df_wide_AfterVariables_clean),id_cols)), "\n")
## Remain vars: 115
# Calculate average missing value across all indicators per country
country_miss <- df_wide_AfterVariables_clean %>%
group_by(country_code) %>%
summarise(miss_avg = mean(is.na(pick(everything())), na.rm=T), .groups="drop")
# Select top 15% countries with highest average missing ratio for plotting
top_pct_country <- 0.15
n_top_c <- floor(nrow(country_miss)*top_pct_country)
top_miss_country <- country_miss %>% arrange(desc(miss_avg)) %>% head(n_top_c)
# set factor levels to preserve descending missing order
top_miss_country$country_code <- factor(top_miss_country$country_code, levels = top_miss_country$country_code)
# Draw country missing rate bar chart with 5% elimination threshold line
ggplot(top_miss_country,aes(x=country_code,y=miss_avg))+
geom_col(fill="darkred")+
geom_hline(yintercept = 0.05,color="red",linetype="dashed",linewidth=1)+
labs(title="Top 15% Countries with Highest Missing Rates (After Variable Cleaning)",
y="Average Missing Ratio",x="")+
theme_bw()+
theme(plot.title = element_text(hjust = 0.5),
axis.text.x = element_text(angle=90,vjust=0.5,hjust=1))
# drop countries with average missing ratio>5%
drop_country <- country_miss %>% filter(miss_avg>0.05) %>% pull(country_code)
df_wide_AfterCountries_clean <- df_wide_AfterVariables_clean %>% filter(!country_code %in% drop_country)
# Output country screening summary
cat("Original countries:",n_distinct(df_wide_AfterVariables_clean$country_code),"\n")
## Original countries: 265
cat("Drop countries:",length(drop_country),"\n")
## Drop countries: 26
cat("Remaining countries:",n_distinct(df_wide_AfterCountries_clean$country_code),"\n")
## Remaining countries: 239
# Variables causing data leakage (direct elderly dependency indicators)
# are excluded; remaining incomplete numeric features are filled with
# sequential linear interpolation and global median substitution for leftover missing entries.
# Define dependent variable: share of population 65+ (Y)
y_var <- "SP.POP.65UP.TO"
df_wide_clean <- df_wide_AfterCountries_clean %>% drop_na(all_of(y_var))
# Exclude population composition variables to prevent data leakage
drop_leak_var <- c("SP.POP.DPND.OL","SP.POP.DPND","SP.POP.DPND.YG")
pop_key <- c("SP.POP","SM.POP")
growth_key <- c("GROW","DPND","BRTH")
# Filter valid independent variables
keep_list <- c(id_cols,y_var)
for(col in setdiff(colnames(df_wide_clean), keep_list)){
if(col %in% drop_leak_var) next
has_pop_flag <- any(str_detect(col, pop_key))
has_grow_flag <- any(str_detect(col, growth_key))
if(has_pop_flag & !has_grow_flag) next
keep_list <- c(keep_list, col)
}
df_wide_final <- df_wide_clean[,keep_list]
# Split X and y, remove categorical for numeric imputation
X_raw <- df_wide_final %>% select(-all_of(c(y_var,"region","incomegroup","year")))
y_raw <- df_wide_final[[y_var]]
df_panel_base <- df_wide_final %>% select(country_code,year,region,incomegroup)
# Drop full-empty columns
X_raw <- X_raw[, colSums(!is.na(X_raw)) > 0]
# Linear interpolation
X_interp <- X_raw %>%
mutate(country_code = df_panel_base$country_code) %>%
group_by(country_code) %>%
group_modify(~{
dat <- .x
# Use zoo::na.approx, auto skip interpolation when valid < 2
dat %>% mutate(across(everything(), ~ zoo::na.approx(.x, na.rm = FALSE, rule = 2)))
}) %>%
ungroup() %>%
select(-country_code)
# Fill leftover NA with column-wise median
X_imputed <- X_interp %>%
mutate(across(everything(), ~replace(.x, is.na(.x), median(.x, na.rm = TRUE))))
# Print dataset basic information
cat("Year range:", min(df_wide_final$year), "-", max(df_wide_final$year), "\n")
## Year range: 1960 - 2025
cat("Final country count:", n_distinct(df_wide_final$country_code), "\n")
## Final country count: 239
cat("Final obs count:", nrow(df_wide_final), "\n")
## Final obs count: 15774
# Bind into plm panel format data
# Remove duplicated year column from X_imputed before concatenation
X_imputed_no_year <- X_imputed %>% select(-any_of("year"))
panel_df <- df_panel_base %>%
mutate(y = y_raw) %>%
bind_cols(X_imputed_no_year)
# Build panel dataset with country+year index
pdata <- pdata.frame(panel_df, index = c("country_code", "year"))
Exploratory analysis visualizes global aging long-run trend, distribution of core demographic indicators (birth/death rate), pairwise variable correlation, and heterogeneous trends split by national income and geographic region to observe preliminary aging patterns across the globe.
The line graph tracks cross-country average log-transformed share of population aged 65 and above from 1960 to 2025. The indicator maintains a steady continuous upward trend across the whole sample period without any downward fluctuation, demonstrating a consistent global population aging tendency over six decades, which motivates our subsequent empirical study on its driving factors
# Calculate annual average value of dependent variable for aggregate trend visualization
trend_data <- df_wide_final %>%
group_by(year) %>%
summarise(y_mean = mean(.data[[y_var]], na.rm = T))
# Define tick sequence: label every 5 years on X-axis
year_start <- min(trend_data$year)
year_end <- max(trend_data$year)
year_break <- seq(from = year_start, to = year_end, by = 5)
# Plot overall long-term trend of elderly population proportion
ggplot(trend_data, aes(x = year, y = y_mean)) +
geom_line(linewidth = 1.2, color = "purple") +
geom_point(color = "purple") +
# Set X-axis ticks with 5-year interval
scale_x_continuous(breaks = year_break) +
labs(
title = "Average Trend of Population Ages 65 and Above",
x = "Year",
y = "Log-transformed Value"
) +
theme_bw() +
theme(
plot.title = element_text(hjust = 0.5), # center plot title
axis.text.x = element_text(angle = 45, hjust = 1) # tilt year labels to avoid overlap
)
The paired histograms exhibit sample distribution of log-transformed crude birth rate and crude death rate, core demographic drivers of population aging. After logarithmic conversion, the crude death rate follows an obvious bimodal distribution, whereas the crude birth rate shows multi-peak distribution; both indicators are close to approximate normal distribution after log transformation, satisfying the distribution assumption for linear panel regression. The multimodal characteristics also reflect prominent cross-country divergence in national fertility and mortality levels worldwide.
# Extract crude birth rate and crude death rate variables for distribution analysis
br <- df_wide_final$SP.DYN.CBRT.IN
dr <- df_wide_final$SP.DYN.CDRT.IN
# Histogram + kernel density plot for crude birth rate
p1 <- ggplot(tibble(val = na.omit(br)), aes(x = val)) +
geom_histogram(aes(y = after_stat(density)), bins = 30, fill = "skyblue") +
geom_density(linewidth = 1) +
labs(title = "Crude Birth Rate Distribution", x = "Log Value") +
theme_bw()
# Histogram + kernel density plot for crude death rate
p2 <- ggplot(tibble(val = na.omit(dr)), aes(x = val)) +
geom_histogram(aes(y = after_stat(density)), bins = 30, fill = "salmon") +
geom_density(linewidth = 1) +
labs(title = "Crude Death Rate Distribution", x = "Log Value") +
theme_bw()
# Combine two plots side by side
p1 + p2
Core indicators covering demographic structure, birth-death dynamics, public healthcare and cross-border migration are selected for exploratory analysis based on demographic transition theory. The heatmap summarizes pairwise Pearson correlation coefficients:
# Select core variables for pairwise correlation analysis
corr_var <- c(
y_var, # Dependent variable: Percentage of total population aged 65 and over (elderly population share)
"SP.DYN.CBRT.IN", # Crude birth rate: annual live births per 1,000 total population
"SH.STA.MMRT", # Maternal mortality ratio: maternal deaths per 100,000 live births
"SH.XPD.GHED.PC.CD", # Per capita government health expenditure measured in constant USD
"SP.DYN.CDRT.IN", # Crude death rate: annual deaths per 1,000 total population
"SP.POP.BRTH.MF", # Male-to-female birth ratio reflecting gender structure of newborn population
"SH.IMM.IDPT", # DPT Immunization coverage rate of children
"SP.DYN.TO65.FE.ZS", # The proportion of females among the population aged 65 and above reflects the gender disparity in life expectancy
"SH.H2O.BASW.ZS" # Percentage of population with access to safely managed basic drinking water
)
# Retain existing variables only and remove full missing observations
corr_df <- df_wide_final[,intersect(corr_var,colnames(df_wide_final))] %>% drop_na()
# Compute Pearson correlation matrix and visualize via heatmap
suppressWarnings({
cor_mat <- cor(corr_df)
ggcorrplot(
cor_mat,
lab = TRUE,
lab_size = 3.2,
colors = c("#E74C3C","white","#2ECC71"),
title = "Core Indicators Correlation Heatmap"
)+
theme_bw()+
theme(
plot.title = element_text(hjust = 0.5, size = 16),
axis.text.x = element_text(angle = 45, hjust = 1, size = 7.8),
axis.text.y = element_text(size = 8.5)
)
})
The dual subplots display long-term changes in birth and death rates across five income tiers from 1960 onward. Both fertility and mortality show sustained downward trends universally; consistently, high-income nations retain the lowest birth & death levels, while low-income economies stay at the top throughout decades. Such obvious income-based divergence confirms economic development is closely linked to demographic transition and subsequent population aging. The small abnormal spike around 2020 on mortality curves corresponds to short-term global public health shocks.
# Calculate annual average birth & death rate grouped by country income classification
inc_trend <- df_wide_final %>%
select(year, incomegroup, SP.DYN.CBRT.IN, SP.DYN.CDRT.IN) %>%
pivot_longer(c(SP.DYN.CBRT.IN, SP.DYN.CDRT.IN), names_to = "var", values_to = "val") %>%
group_by(year, incomegroup, var) %>%
summarise(val = mean(val, na.rm = T), .groups = "drop")
# Set x-axis tick interval: display label every 5 years
year_min <- min(inc_trend$year)
year_max <- max(inc_trend$year)
break_seq <- seq(from = year_min, to = year_max, by = 5)
# Birth rate trend plot
p_inc1 <- inc_trend %>%
filter(var == "SP.DYN.CBRT.IN") %>%
ggplot(aes(x = year, y = val, color = incomegroup)) +
geom_line(linewidth = 1) +
scale_x_continuous(breaks = break_seq) +
labs(title = "Birth Rate by Income Group", x = "Year") +
theme_bw() +
theme(
plot.title = element_text(hjust = 0.5),
axis.text.x = element_text(angle = 45, hjust = 1, size = 9),
plot.margin = ggplot2::margin(t = 10, r = 12, b = 30, l = 8) # Expand bottom margin for year labels
)
# Death rate trend plot
p_inc2 <- inc_trend %>%
filter(var == "SP.DYN.CDRT.IN") %>%
ggplot(aes(x = year, y = val, color = incomegroup)) +
geom_line(linewidth = 1) +
scale_x_continuous(breaks = break_seq) +
labs(title = "Death Rate by Income Group", x = "Year") +
theme_bw() +
theme(
plot.title = element_text(hjust = 0.5),
axis.text.x = element_text(angle = 45, hjust = 1, size = 9),
plot.margin = ggplot2::margin(t = 10, r = 12, b = 30, l = 8)
)
# Change layout: vertical top-bottom instead of side-by-side
final_plot <- (p_inc1 / p_inc2) + plot_layout(heights = c(5,5))
final_plot
Regional subdivision further reveals obvious geographical divergence in fertility and mortality evolution. All regions experience overall falling birth and death rates across decades; Sub-Saharan Africa consistently maintains the highest birth rate globally, while Middle East & North Africa shows the sharpest mortality decline. The temporary sharp spike around 2020 across most regions is attributed to global pandemic-driven mortality fluctuation, which also explains the short-term abnormal rise on the curves. Such cross-region heterogeneity verifies geographic difference is an important source of aging disparity worldwide.
# Calculate average birth & death rate grouped by year and geographic region for regional trend visualization
reg_trend <- df_wide_final %>%
select(year, region, SP.DYN.CBRT.IN, SP.DYN.CDRT.IN) %>%
pivot_longer(c(SP.DYN.CBRT.IN, SP.DYN.CDRT.IN), names_to = "var", values_to = "val") %>%
group_by(year, region, var) %>%
summarise(val = mean(val, na.rm = T), .groups = "drop")
# Set X-axis ticks every 5 years
year_min <- min(reg_trend$year)
year_max <- max(reg_trend$year)
break_seq <- seq(from = year_min, to = year_max, by = 5)
# Regional crude birth rate trend plot
p_reg1 <- reg_trend %>%
filter(var == "SP.DYN.CBRT.IN") %>%
ggplot(aes(x = year, y = val, color = region)) +
geom_line(linewidth = 1) +
scale_x_continuous(breaks = break_seq) +
labs(title = "Birth Rate by Region", x = "Year") +
theme_bw() +
theme(
plot.title = element_text(hjust = 0.5),
axis.text.x = element_text(angle = 35, hjust = 1, size = 8),
plot.margin = ggplot2::margin(t = 8, r = 10, b = 25, l = 8)
)
# Regional crude death rate trend plot
p_reg2 <- reg_trend %>%
filter(var == "SP.DYN.CDRT.IN") %>%
ggplot(aes(x = year, y = val, color = region)) +
geom_line(linewidth = 1) +
scale_x_continuous(breaks = break_seq) +
labs(title = "Death Rate by Region", x = "Year") +
theme_bw() +
theme(
plot.title = element_text(hjust = 0.5),
axis.text.x = element_text(angle = 35, hjust = 1, size = 8),
plot.margin = ggplot2::margin(t = 8, r = 10, b = 35, l = 8)
)
# Change from side-by-side (+) to top-bottom layout (/)
final_reg_plot <- (p_reg1 / p_reg2) + plot_layout(heights = c(5,5))
final_reg_plot
Two core modelling tasks are completed here: fixed-effect panel regression to quantify aging driving factors (regression task), and K-means unsupervised clustering to categorize high/low aging countries based on birth & death rate (classification task). Subgroup regression across income and geographic groups further captures heterogeneous aging drivers globally.
The null hypothesis of the Hausman test assumes random-effects (RE) estimator is consistent and efficient, while the alternative states RE becomes inconsistent and fixed-effects (FE) is preferable.
With chi^2=1203.2, df=52 and p<0.001) the test strongly rejects the null hypothesis at the 1% significance level.
Final decision: The fixed-effect specification is statistically preferred over random-effect model for this panel dataset, confirming the existence of unobserved country-specific heterogeneity correlated with explanatory variables.
# Convert cleaned explanatory variable dataset into matrix format for panel regression
exog_mat <- as.matrix(X_imputed)
# Extract dependent variable (logged share of population aged 65+) as numeric vector
y_vec <- y_raw
# Random Effects panel regression (Swamy-Arora estimator by default)
re_mod <- plm(y_vec ~ exog_mat, data = pdata, model = "random")
# Fixed Effects (Within transformation) panel regression
fe_mod <- plm(y_vec ~ exog_mat, data = pdata, model = "within")
# Hausman specification test: choose between FE and RE model
# H0: Random Effects is consistent and efficient; H1: Fixed Effects is preferred
haus_result <- phtest(fe_mod, re_mod)
# Print Hausman test statistical output
print(haus_result)
##
## Hausman Test
##
## data: y_vec ~ exog_mat
## chisq = 1203.2, df = 52, p-value < 2.2e-16
## alternative hypothesis: one model is inconsistent
To facilitate preprocessing and grouped fixed-effect regression, three custom functions are developed:
cal_vif(): Calculate and sort VIF values for multicollinearity check.
vif_filter(): Iteratively drop high-VIF variables to remove multicollinearity.
run_group_fe(): Complete subsample screening, variable filtering and fixed-effect estimation in one workflow.
# Calculate VIF values and return sorted result table
# Create artificial dependent variable(all ones) to compute VIF only, coefficient has no practical meaning
cal_vif <- function(X_sub) {
# Build fake constant dependent variable for regression fitting
fake_y <- rep(1, nrow(X_sub))
# Fit linear model and compute Variance Inflation Factor via car package
vif_res <- car::vif(lm(fake_y ~ ., data = X_sub))
# Convert VIF output into tibble, sort variables descending by VIF value
return(tibble(var = names(vif_res), VIF = as.numeric(vif_res)) %>% arrange(desc(VIF)))
}
# Iterative multicollinearity elimination based on VIF threshold
# Drop the variable with maximum VIF repeatedly until all VIF <= specified threshold(default = 10)
# Input: X_in = raw independent variable dataset; max_vif = VIF cutoff threshold
vif_filter <- function(X_in, max_vif = 10) {
# Initialize full variable list from input data column names
var_list <- colnames(X_in)
# Print initialization log information
cat("=== Start Automatic Iterative VIF Filtering | Max allowed VIF =", max_vif, "===\n")
cat("Initial total candidate variables count:", length(var_list), "\n\n")
# Iterative removing high-VIF variables in loop
while(TRUE) {
# Terminate loop if less than 2 variables left(VIF calculation needs at least two predictors)
if (length(var_list) < 2) break
# Extract subset data with current reserved variables
X_temp <- X_in[, var_list]
# Step 1: Delete zero-variance columns(all identical values or all NA cannot participate regression)
var_keep <- apply(X_temp, 2, function(x) length(unique(na.omit(x))) > 1)
X_temp <- X_temp[, var_keep]
# Stop iteration after dropping zero-variance features if remaining columns < 2
if (ncol(X_temp) < 2) {
var_list <- colnames(X_temp)
break
}
# Compute VIF ranking table for current variable subset
vdf <- cal_vif(X_temp)
highest_v <- vdf$VIF[1]
highest_var <- vdf$var[1]
# Exit loop once maximum VIF below preset threshold
if (highest_v <= max_vif) break
# Remove the variable with largest VIF and output elimination log
var_list <- setdiff(var_list, highest_var)
cat("Drop variable:", highest_var, "| Corresponding VIF =", round(highest_v, 2), "\n")
}
# Output final screening summary
cat("\n=== VIF Filter Finished ===")
cat("\nRemaining valid variables after VIF screening count:", length(var_list))
cat("\nFinal reserved variable list:\n")
print(var_list)
# Return screened variable names
return(var_list)
}
# Subgroup Fixed-Effect Panel Regression Function
# Core workflow: filter group sample -> eliminate perfect collinearity -> VIF multicollinearity screening -> construct panel data -> run within FE regression
# Parameters:
# group_name: target subgroup name for split regression
# group_col: grouping variable column name (incomegroup / region etc.)
# df_in: full raw dataset containing country & year panel identifiers
# y_in: full dependent variable vector
# X_in: full independent variable matrix/dataframe
# maxvif: VIF threshold for multicollinearity filtering, default = 10
run_group_fe <- function(group_name, group_col, df_in, y_in, X_in, maxvif = 10) {
# Step 1: Extract observations belonging to target subgroup
group_idx <- df_in[[group_col]] == group_name
# Abort regression if subgroup sample size less than 10 (insufficient observations for valid panel estimation)
if (sum(group_idx) < 10) return(NULL)
# Slice dependent & independent variables by subgroup filter
y_sub <- y_in[group_idx]
X_sub <- X_in[group_idx, ]
# Step 2: Remove perfectly collinear (aliased) predictors via alias detection
alias_mat <- alias(lm(y_sub ~ ., data = data.frame(y = y_sub, X_sub)))$Complete
if (!is.null(alias_mat)) {
drop_alias_var <- rownames(alias_mat)
# Drop fully collinear columns from independent set
X_sub <- X_sub[, !colnames(X_sub) %in% drop_alias_var, drop = FALSE]
}
# Step 3: Iterative VIF multicollinearity filtering with error capture
sel_var <- tryCatch({
vif_filter(X_sub, max_vif = maxvif)
}, error = function(e) {
return(NULL)
})
# Terminate function if variable selection fails or no valid predictors left
if (is.null(sel_var) || length(sel_var) < 1) return(NULL)
# Build regression dataframe: dependent var + screened independent vars
reg_df <- data.frame(y = y_sub, X_sub[, sel_var])
# Attach panel index columns (country & year only for pdata.frame definition, excluded from regression later)
reg_df$country_code <- df_in$country_code[group_idx]
reg_df$year <- df_in$year[group_idx]
# Construct balanced/unbalanced panel data object defined by country-year dual index
sub_pdata <- pdata.frame(reg_df, index = c("country_code", "year"))
# Step 4: Fixed-effect Within regression, exclude index columns to avoid accidental year dummy generation
# Wrap regression in tryCatch to skip subgroup if model estimation crashes
mod <- tryCatch({
reg_input <- sub_pdata[, !colnames(sub_pdata) %in% c("country_code", "year")]
plm(y ~ ., data = reg_input, model = "within")
}, error = function(e) {
return(NULL)
})
# Return final fixed-effect model object
return(mod)
}
# Assign significance stars based on p-value:
# *** p<0.001, ** p<0.01, * p<0.05, otherwise not significant
signif_label <- function(p) {
if (p < 0.001) {
return("*** (p < 0.001)")
} else if (p < 0.01) {
return("** (p < 0.01)")
} else if (p < 0.05) {
return("* (p < 0.05)")
} else {
return("(Not Significant)")
}
}
# ==============================================================
# FULLY AUTOMATIC FIXED EFFECTS MODEL
# Iterative VIF + Significance Filtering + Driscoll-Kraay Robust SE
# ==============================================================
# Dependent variable: SP.POP.65UP.TO stored in y_raw
# Independent variables: X_imputed (already interpolated missing values)
cat("Variables before reduction: ", ncol(X_imputed), "\n")
## Variables before reduction: 52
cat("X dimension: ", nrow(X_imputed), " rows, ", ncol(X_imputed), " cols\n")
## X dimension: 15774 rows, 52 cols
cat("y length: ", length(y_raw), "\n")
## y length: 15774
# ----------------------
# Step 1: De-mean (Fixed Effects Transformation, keep country & year index)
# ----------------------
# Combine index + y + X first to avoid index mismatch
df_dm_base <- df_wide_final[, c("country_code", "year")]
df_dm_base$y <- y_raw
df_dm_base <- cbind(df_dm_base, X_imputed)
# Demean function by country_code
demean_single <- function(vec, group_id) {
group_mean <- ave(vec, group_id, FUN = function(x) mean(x, na.rm = TRUE))
return(vec - group_mean)
}
# Demean y and all X by country
df_dm_base$y_dm <- demean_single(df_dm_base$y, df_dm_base$country_code)
var_names <- colnames(X_imputed)
for(v in var_names){
df_dm_base[[paste0(v,"_dm")]] <- demean_single(df_dm_base[[v]], df_dm_base$country_code)
}
# Pick demeaned data + index, drop NA
dm_col <- paste0(var_names,"_dm")
comb_all <- df_dm_base[,c("country_code","year","y_dm",dm_col)]
comb_all <- na.omit(comb_all)
# Split final y and X
y_final <- comb_all$y_dm
X_final <- comb_all[, dm_col]
colnames(X_final) <- var_names # restore original variable names
# ----------------------
# Step 2: Iterative VIF Reduction (AUTOMATIC, threshold = 10)
# ----------------------
max_vif <- 10.0
cat("\n=== STARTING AUTOMATIC VIF REDUCTION ===\n")
##
## === STARTING AUTOMATIC VIF REDUCTION ===
selected_vars <- vif_filter(X_final, max_vif = max_vif)
## === Start Automatic Iterative VIF Filtering | Max allowed VIF = 10 ===
## Initial total candidate variables count: 52
##
## Drop variable: SP.DYN.IMRT.IN | Corresponding VIF = 41030.72
## Drop variable: SH.DYN.MORT | Corresponding VIF = 8219.19
## Drop variable: SH.DYN.MORT.MA | Corresponding VIF = 5073.22
## Drop variable: SP.DYN.LE00.IN | Corresponding VIF = 2602.49
## Drop variable: SP.DYN.IMRT.FE.IN | Corresponding VIF = 982.07
## Drop variable: SH.ANM.ALLW.ZS | Corresponding VIF = 367.38
## Drop variable: SP.DYN.IMRT.MA.IN | Corresponding VIF = 273.17
## Drop variable: SH.XPD.PVTD.PC.CD | Corresponding VIF = 249.77
## Drop variable: SP.DYN.LE00.MA.IN | Corresponding VIF = 245.11
## Drop variable: SH.XPD.CHEX.PC.CD | Corresponding VIF = 160.6
## Drop variable: SH.DTH.MORT | Corresponding VIF = 122.2
## Drop variable: SH.XPD.PVTD.PP.CD | Corresponding VIF = 83.95
## Drop variable: SH.XPD.CHEX.PP.CD | Corresponding VIF = 63.98
## Drop variable: SH.DTH.NMRT | Corresponding VIF = 57.66
## Drop variable: SH.MMR.RISK | Corresponding VIF = 54.99
## Drop variable: SP.DYN.LE00.FE.IN | Corresponding VIF = 47.3
## Drop variable: SH.XPD.GHED.PC.CD | Corresponding VIF = 42.39
## Drop variable: SP.DYN.AMRT.FE | Corresponding VIF = 28.37
## Drop variable: SH.DYN.MORT.FE | Corresponding VIF = 27.38
## Drop variable: SP.DYN.CBRT.IN | Corresponding VIF = 21.75
## Drop variable: SH.XPD.GHED.PP.CD | Corresponding VIF = 19.79
## Drop variable: SH.STA.MMRT | Corresponding VIF = 19.48
## Drop variable: SP.DYN.TO65.MA.ZS | Corresponding VIF = 16.68
## Drop variable: SH.XPD.OOPC.PP.CD | Corresponding VIF = 12.53
## Drop variable: SH.DTH.IMRT | Corresponding VIF = 11.59
##
## === VIF Filter Finished ===
## Remaining valid variables after VIF screening count: 27
## Final reserved variable list:
## [1] "SH.H2O.BASW.ZS" "SH.STA.BASS.ZS" "SH.STA.DIAB.ZS"
## [4] "SH.STA.ODFC.ZS" "SP.ADO.TFRT" "SP.DYN.AMRT.MA"
## [7] "SP.DYN.CDRT.IN" "SP.DYN.TFRT.IN" "SP.DYN.TO65.FE.ZS"
## [10] "SP.POP.BRTH.MF" "SP.POP.GROW" "SH.ANM.NPRG.ZS"
## [13] "SH.DYN.NMRT" "SH.IMM.IDPT" "SH.IMM.MEAS"
## [16] "SH.MED.BEDS.ZS" "SH.MED.NUMW.P3" "SH.MED.PHYS.ZS"
## [19] "SH.MMR.DTHS" "SH.MMR.RISK.ZS" "SH.PRG.ANEM"
## [22] "SH.XPD.CHEX.GD.ZS" "SH.XPD.GHED.CH.ZS" "SH.XPD.GHED.GD.ZS"
## [25] "SH.XPD.OOPC.CH.ZS" "SH.XPD.OOPC.PC.CD" "SH.XPD.PVTD.CH.ZS"
cat("\nFinal variables after VIF: ", length(selected_vars), "\n")
##
## Final variables after VIF: 27
if(length(selected_vars)>0) print(selected_vars)
## [1] "SH.H2O.BASW.ZS" "SH.STA.BASS.ZS" "SH.STA.DIAB.ZS"
## [4] "SH.STA.ODFC.ZS" "SP.ADO.TFRT" "SP.DYN.AMRT.MA"
## [7] "SP.DYN.CDRT.IN" "SP.DYN.TFRT.IN" "SP.DYN.TO65.FE.ZS"
## [10] "SP.POP.BRTH.MF" "SP.POP.GROW" "SH.ANM.NPRG.ZS"
## [13] "SH.DYN.NMRT" "SH.IMM.IDPT" "SH.IMM.MEAS"
## [16] "SH.MED.BEDS.ZS" "SH.MED.NUMW.P3" "SH.MED.PHYS.ZS"
## [19] "SH.MMR.DTHS" "SH.MMR.RISK.ZS" "SH.PRG.ANEM"
## [22] "SH.XPD.CHEX.GD.ZS" "SH.XPD.GHED.CH.ZS" "SH.XPD.GHED.GD.ZS"
## [25] "SH.XPD.OOPC.CH.ZS" "SH.XPD.OOPC.PC.CD" "SH.XPD.PVTD.CH.ZS"
# ----------------------
# Step 3: Run Final Within Fixed-Effect Model with Driscoll-Kraay Robust SE
# ----------------------
X_opt <- X_final[, selected_vars, drop = FALSE]
combined_opt <- data.frame(y = y_final, X_opt)
# Index directly from comb_all (no longer search original df, eliminate NA index)
combined_opt$country_code <- comb_all$country_code
combined_opt$year <- comb_all$year
# Build panel data (index no NA now)
sub_pdata <- pdata.frame(combined_opt, index = c("country_code", "year"))
reg_input <- sub_pdata[, !colnames(sub_pdata) %in% c("country_code", "year")]
fe_model <- plm(y ~ ., data = reg_input, model = "within")
# Driscoll-Kraay robust test
fe_robust <- coeftest(fe_model, vcov = vcovSCC(fe_model))
cat("\n", paste(rep("=",80),collapse=""),"\n")
##
## ================================================================================
cat("FINAL OPTIMIZED FIXED EFFECTS MODEL (Driscoll-Kraay Robust SE)\n")
## FINAL OPTIMIZED FIXED EFFECTS MODEL (Driscoll-Kraay Robust SE)
cat(paste(rep("=",80),collapse=""),"\n")
## ================================================================================
print(fe_robust)
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## SH.H2O.BASW.ZS 0.2848752 0.0449118 6.3430 2.316e-10 ***
## SH.STA.BASS.ZS -0.1574283 0.0177714 -8.8585 < 2.2e-16 ***
## SH.STA.DIAB.ZS 0.0357725 0.0130339 2.7446 0.0060658 **
## SH.STA.ODFC.ZS -0.0107639 0.0144913 -0.7428 0.4576247
## SP.ADO.TFRT 0.0174805 0.0142660 1.2253 0.2204707
## SP.DYN.AMRT.MA -0.3403246 0.0349338 -9.7420 < 2.2e-16 ***
## SP.DYN.CDRT.IN 0.1809543 0.0515885 3.5076 0.0004534 ***
## SP.DYN.TFRT.IN -0.8531598 0.0407084 -20.9578 < 2.2e-16 ***
## SP.DYN.TO65.FE.ZS 0.4681631 0.1226319 3.8176 0.0001353 ***
## SP.POP.BRTH.MF -2.2012689 1.1937977 -1.8439 0.0652137 .
## SP.POP.GROW -0.1333898 0.0209176 -6.3769 1.858e-10 ***
## SH.ANM.NPRG.ZS -0.0380817 0.0351884 -1.0822 0.2791696
## SH.DYN.NMRT -0.1858644 0.0169975 -10.9348 < 2.2e-16 ***
## SH.IMM.IDPT 0.0991318 0.0165576 5.9871 2.184e-09 ***
## SH.IMM.MEAS 0.0031745 0.0058998 0.5381 0.5905331
## SH.MED.BEDS.ZS -0.1376180 0.0160767 -8.5601 < 2.2e-16 ***
## SH.MED.NUMW.P3 0.0929990 0.0129341 7.1902 6.765e-13 ***
## SH.MED.PHYS.ZS 0.0538507 0.0187877 2.8663 0.0041590 **
## SH.MMR.DTHS 0.1655923 0.0087325 18.9628 < 2.2e-16 ***
## SH.MMR.RISK.ZS -0.2518632 0.0456905 -5.5124 3.597e-08 ***
## SH.PRG.ANEM -0.1391300 0.0583431 -2.3847 0.0171056 *
## SH.XPD.CHEX.GD.ZS -0.0181538 0.0353103 -0.5141 0.6071744
## SH.XPD.GHED.CH.ZS -0.0013511 0.0278583 -0.0485 0.9613206
## SH.XPD.GHED.GD.ZS 0.1728557 0.0555446 3.1120 0.0018615 **
## SH.XPD.OOPC.CH.ZS -0.3104556 0.0534801 -5.8051 6.560e-09 ***
## SH.XPD.OOPC.PC.CD 0.0700523 0.0121782 5.7523 8.971e-09 ***
## SH.XPD.PVTD.CH.ZS 0.2054841 0.0527956 3.8921 9.981e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
# ----------------------
# Step 4: Show Final VIF
# ----------------------
cat("\nFINAL VIFs after screening:\n")
##
## FINAL VIFs after screening:
final_vif_tab <- cal_vif(X_opt)
print(final_vif_tab)
## # A tibble: 27 × 2
## var VIF
## <chr> <dbl>
## 1 SP.DYN.CDRT.IN 7.36
## 2 SH.XPD.GHED.GD.ZS 6.54
## 3 SP.DYN.TFRT.IN 6.49
## 4 SH.MMR.RISK.ZS 5.99
## 5 SP.DYN.AMRT.MA 5.82
## 6 SH.PRG.ANEM 5.62
## 7 SH.XPD.OOPC.CH.ZS 5.24
## 8 SH.IMM.IDPT 5.06
## 9 SH.DYN.NMRT 4.98
## 10 SH.XPD.PVTD.CH.ZS 4.68
## # ℹ 17 more rows
# ----------------------
# Step 5: Extract Significant Variables (p < 0.05)
# ----------------------
sig_df <- data.frame(
Variable = rownames(fe_robust),
Coefficient = fe_robust[,1],
P_Value = fe_robust[,4]
)
sig_df$Significance <- sapply(sig_df$P_Value, signif_label)
sig_df <- subset(sig_df, P_Value < 0.05)
sig_df <- sig_df[order(sig_df$P_Value), ]
cat("\nFINAL SIGNIFICANT VARIABLES (p < 0.05):\n")
##
## FINAL SIGNIFICANT VARIABLES (p < 0.05):
print(sig_df, row.names = FALSE)
## Variable Coefficient P_Value Significance
## SP.DYN.TFRT.IN -0.85315975 3.417352e-96 *** (p < 0.001)
## SH.MMR.DTHS 0.16559225 2.726232e-79 *** (p < 0.001)
## SH.DYN.NMRT -0.18586444 9.918897e-28 *** (p < 0.001)
## SP.DYN.AMRT.MA -0.34032463 2.313800e-22 *** (p < 0.001)
## SH.STA.BASS.ZS -0.15742830 8.973156e-19 *** (p < 0.001)
## SH.MED.BEDS.ZS -0.13761797 1.232420e-17 *** (p < 0.001)
## SH.MED.NUMW.P3 0.09299902 6.765000e-13 *** (p < 0.001)
## SP.POP.GROW -0.13338978 1.857971e-10 *** (p < 0.001)
## SH.H2O.BASW.ZS 0.28487516 2.315864e-10 *** (p < 0.001)
## SH.IMM.IDPT 0.09913183 2.183586e-09 *** (p < 0.001)
## SH.XPD.OOPC.CH.ZS -0.31045557 6.559831e-09 *** (p < 0.001)
## SH.XPD.OOPC.PC.CD 0.07005230 8.971199e-09 *** (p < 0.001)
## SH.MMR.RISK.ZS -0.25186317 3.596685e-08 *** (p < 0.001)
## SH.XPD.PVTD.CH.ZS 0.20548408 9.981056e-05 *** (p < 0.001)
## SP.DYN.TO65.FE.ZS 0.46816309 1.352645e-04 *** (p < 0.001)
## SP.DYN.CDRT.IN 0.18095425 4.533636e-04 *** (p < 0.001)
## SH.XPD.GHED.GD.ZS 0.17285568 1.861512e-03 ** (p < 0.01)
## SH.MED.PHYS.ZS 0.05385073 4.158964e-03 ** (p < 0.01)
## SH.STA.DIAB.ZS 0.03577254 6.065810e-03 ** (p < 0.01)
## SH.PRG.ANEM -0.13912998 1.710565e-02 * (p < 0.05)
# ----------------------
# Step 6: Output Within R² & Overall F-statistic
# ----------------------
sum_fe <- summary(fe_model)
# Extract within R-squared & adjusted within R²
r2_within <- as.numeric(sum_fe$r.squared["rsq"])
r2_adj <- as.numeric(sum_fe$r.squared["adjrsq"])
# Extract F-statistic and its p-value
f_val <- as.numeric(sum_fe$fstatistic$statistic)
f_pval <- as.numeric(sum_fe$fstatistic$p.value)
# Print formatted output
cat("\n", paste(rep("=",60),collapse=""),"\n")
##
## ============================================================
cat("Within R-squared: ", round(r2_within,4),"\n")
## Within R-squared: 0.8753
cat("Adjusted Within R-squared: ", round(r2_adj,4),"\n")
## Adjusted Within R-squared: 0.8731
cat("Overall F-statistic: ", round(f_val,4),"\n")
## Overall F-statistic: 4030.681
cat("F-statistic P-value: ", format(f_pval, scientific = FALSE, digits = 6),"\n")
## F-statistic P-value: 0
cat(paste(rep("=",60),collapse=""),"\n")
## ============================================================
# Pass the value to the grouping table for processing
save_r2 <- round(r2_within,4)
save_F <- round(f_val,4)
save_Fp <- round(f_pval,4)
SP.DYN.TFRT.IN) remains significantly negative across all
income subgroups, confirming its universal role in promoting population
aging. Physician density (SH.MED.PHYS.ZS) is consistently
positive and significant in all classified income groups, acting as a
stable positive aging driver.# ==============================================================================
# Step 1: Prepare valid income categories and initialize model storage list
# Extract non-NA unique income labels to iterate heterogeneous subgroup regression
# ==============================================================================
income_list <- unique(na.omit(df_wide_final$incomegroup))
# Empty list to save successfully estimated fixed-effect models by income group
income_fe_list <- list()
# ==============================================================================
# Step 2: Iterate each income group to run customized within-demeaning fixed-effect regression
# run_group_fe returns NULL automatically when sample size or variable conditions fail
# Print full regression output & significant variables for every valid subgroup after estimation
# ==============================================================================
for (gr in income_list) {
mod_tmp <- run_group_fe(
group_name = gr,
group_col = "incomegroup",
df_in = df_wide_final,
y_in = y_raw,
X_in = X_imputed
)
# Keep only valid estimated model, drop NULL invalid outputs
if (!is.null(mod_tmp)) {
income_fe_list[[gr]] <- mod_tmp
# -------------------- Print full estimation results for current income group
cat("\n================================================================================\n")
cat(" PANEL FIXED EFFECT ESTIMATION | INCOME GROUP: ", gr, "\n")
cat("================================================================================\n")
# Calculate Driscoll-Kraay robust coefficient test table
robust_res <- coeftest(mod_tmp, vcov = vcovSCC(mod_tmp))
# Output Parameter Estimates table
print(robust_res)
sum_g <- summary(mod_tmp)
r2_raw <- as.numeric(sum_g$r.squared["rsq"])
r2_adj <- as.numeric(sum_g$r.squared["adjrsq"])
f_stat <- as.numeric(sum_g$fstatistic$statistic)
f_pvalue <- as.numeric(sum_g$fstatistic$p.value)
cat("\n---------------- Model Overall Fit Statistics ----------------\n")
cat("Within R²: ", round(r2_raw,4),"\n")
cat("Adjusted Within R²: ", round(r2_adj,4),"\n")
cat("F-statistic: ", round(f_stat,4),"\n")
cat("F p-value: ", format(f_pvalue, digits = 6, scientific = F),"\n")
cat("--------------------------------------------------------------\n\n")
# ======================================
# Extract variable names & p-values to screen statistically significant predictors (p < 0.05)
p_vals <- robust_res[, 4]
sig_vars <- names(p_vals[p_vals < 0.05])
# Print list of significant variables
cat(">>> Significant Variables (p < 0.05):\n")
if (length(sig_vars) > 0) {
for (v in sig_vars) {
cat(" -", v, "\n")
}
} else {
cat(" - No statistically significant variables found\n")
}
cat("================================================================================\n\n")
}
}
## === Start Automatic Iterative VIF Filtering | Max allowed VIF = 10 ===
## Initial total candidate variables count: 52
##
## Drop variable: SH.DYN.MORT | Corresponding VIF = 877077.5
## Drop variable: SP.DYN.IMRT.IN | Corresponding VIF = 489446.9
## Drop variable: SP.DYN.LE00.IN | Corresponding VIF = 110967
## Drop variable: SH.DTH.MORT | Corresponding VIF = 87428.39
## Drop variable: SH.DYN.MORT.MA | Corresponding VIF = 32219.41
## Drop variable: SH.ANM.ALLW.ZS | Corresponding VIF = 21067.12
## Drop variable: SH.DTH.NMRT | Corresponding VIF = 13165.49
## Drop variable: SH.XPD.PVTD.PC.CD | Corresponding VIF = 11718.81
## Drop variable: SH.XPD.CHEX.PC.CD | Corresponding VIF = 7900.49
## Drop variable: SH.XPD.CHEX.PP.CD | Corresponding VIF = 4616.51
## Drop variable: SP.DYN.LE00.FE.IN | Corresponding VIF = 3209.89
## Drop variable: SP.DYN.IMRT.FE.IN | Corresponding VIF = 2974.63
## Drop variable: SH.XPD.PVTD.PP.CD | Corresponding VIF = 2014.13
## Drop variable: SH.XPD.GHED.PC.CD | Corresponding VIF = 1461.61
## Drop variable: SH.MMR.DTHS | Corresponding VIF = 1241.58
## Drop variable: SH.DYN.MORT.FE | Corresponding VIF = 1120.21
## Drop variable: SP.DYN.LE00.MA.IN | Corresponding VIF = 592.69
## Drop variable: SH.MMR.RISK | Corresponding VIF = 567.38
## Drop variable: SH.XPD.GHED.PP.CD | Corresponding VIF = 406.76
## Drop variable: SH.XPD.GHED.GD.ZS | Corresponding VIF = 274.13
## Drop variable: SP.DYN.TO65.FE.ZS | Corresponding VIF = 249.98
## Drop variable: SP.DYN.CBRT.IN | Corresponding VIF = 172.06
## Drop variable: SH.DYN.NMRT | Corresponding VIF = 119.25
## Drop variable: SP.DYN.AMRT.FE | Corresponding VIF = 98.02
## Drop variable: SH.PRG.ANEM | Corresponding VIF = 60.61
## Drop variable: SH.MMR.RISK.ZS | Corresponding VIF = 53.83
## Drop variable: SP.DYN.TO65.MA.ZS | Corresponding VIF = 50.61
## Drop variable: SH.XPD.OOPC.PP.CD | Corresponding VIF = 47.13
## Drop variable: SP.DYN.TFRT.IN | Corresponding VIF = 38.08
## Drop variable: SP.DYN.IMRT.MA.IN | Corresponding VIF = 29.82
## Drop variable: SH.STA.BASS.ZS | Corresponding VIF = 25.22
## Drop variable: SH.STA.ODFC.ZS | Corresponding VIF = 21.54
## Drop variable: SH.XPD.OOPC.PC.CD | Corresponding VIF = 18.23
## Drop variable: SH.MED.NUMW.P3 | Corresponding VIF = 15.7
## Drop variable: SH.IMM.IDPT | Corresponding VIF = 15.15
## Drop variable: SH.STA.MMRT | Corresponding VIF = 14.17
## Drop variable: SH.MED.PHYS.ZS | Corresponding VIF = 10.57
##
## === VIF Filter Finished ===
## Remaining valid variables after VIF screening count: 15
## Final reserved variable list:
## [1] "SH.H2O.BASW.ZS" "SH.STA.DIAB.ZS" "SP.ADO.TFRT"
## [4] "SP.DYN.AMRT.MA" "SP.DYN.CDRT.IN" "SP.POP.BRTH.MF"
## [7] "SP.POP.GROW" "SH.ANM.NPRG.ZS" "SH.DTH.IMRT"
## [10] "SH.IMM.MEAS" "SH.MED.BEDS.ZS" "SH.XPD.CHEX.GD.ZS"
## [13] "SH.XPD.GHED.CH.ZS" "SH.XPD.OOPC.CH.ZS" "SH.XPD.PVTD.CH.ZS"
##
## ================================================================================
## PANEL FIXED EFFECT ESTIMATION | INCOME GROUP: Unknown
## ================================================================================
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## SH.H2O.BASW.ZS -1.2189e-05 8.1493e-02 -0.0001 0.9998807
## SP.ADO.TFRT 4.3427e-02 5.2285e-02 0.8306 0.4062718
## SP.DYN.AMRT.MA -4.8680e-01 1.1408e-01 -4.2673 2.035e-05 ***
## SP.DYN.CDRT.IN -8.3108e-01 6.4604e-02 -12.8642 < 2.2e-16 ***
## SP.POP.BRTH.MF -3.2818e+00 1.2037e+00 -2.7265 0.0064355 **
## SP.POP.GROW -6.8892e-01 1.0946e-01 -6.2940 3.511e-10 ***
## SH.ANM.NPRG.ZS -4.0088e-01 1.1063e-01 -3.6235 0.0002952 ***
## SH.DTH.IMRT -1.4629e-01 5.1599e-02 -2.8351 0.0046090 **
## SH.IMM.MEAS 8.3828e-02 1.1475e-02 7.3053 3.468e-13 ***
## SH.MED.BEDS.ZS -4.6633e-02 3.6524e-02 -1.2768 0.2017636
## SH.XPD.CHEX.GD.ZS 6.1832e-01 1.4872e-01 4.1577 3.299e-05 ***
## SH.XPD.GHED.CH.ZS 2.5298e-01 8.3398e-02 3.0333 0.0024378 **
## SH.XPD.OOPC.CH.ZS -2.2095e-01 1.2656e-01 -1.7458 0.0809472 .
## SH.XPD.PVTD.CH.ZS 1.6030e-01 1.5605e-01 1.0272 0.3043857
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## ---------------- Model Overall Fit Statistics ----------------
## Within R²: 0.9352
## Adjusted Within R²: 0.9339
## F-statistic: 3335.563
## F p-value: 0
## --------------------------------------------------------------
##
## >>> Significant Variables (p < 0.05):
## - SP.DYN.AMRT.MA
## - SP.DYN.CDRT.IN
## - SP.POP.BRTH.MF
## - SP.POP.GROW
## - SH.ANM.NPRG.ZS
## - SH.DTH.IMRT
## - SH.IMM.MEAS
## - SH.XPD.CHEX.GD.ZS
## - SH.XPD.GHED.CH.ZS
## ================================================================================
##
## === Start Automatic Iterative VIF Filtering | Max allowed VIF = 10 ===
## Initial total candidate variables count: 52
##
## Drop variable: SP.DYN.IMRT.IN | Corresponding VIF = 471615.3
## Drop variable: SP.DYN.LE00.IN | Corresponding VIF = 16527.31
## Drop variable: SH.ANM.ALLW.ZS | Corresponding VIF = 15639.94
## Drop variable: SH.DYN.MORT.MA | Corresponding VIF = 4895.23
## Drop variable: SH.DYN.MORT.FE | Corresponding VIF = 2408.29
## Drop variable: SH.XPD.OOPC.PP.CD | Corresponding VIF = 1418.42
## Drop variable: SP.DYN.IMRT.MA.IN | Corresponding VIF = 1003.79
## Drop variable: SP.DYN.LE00.MA.IN | Corresponding VIF = 852.23
## Drop variable: SH.DTH.MORT | Corresponding VIF = 505
## Drop variable: SH.STA.MMRT | Corresponding VIF = 295.89
## Drop variable: SH.XPD.OOPC.PC.CD | Corresponding VIF = 242.58
## Drop variable: SP.DYN.LE00.FE.IN | Corresponding VIF = 182.47
## Drop variable: SH.DTH.NMRT | Corresponding VIF = 148.54
## Drop variable: SH.MMR.RISK | Corresponding VIF = 132.67
## Drop variable: SH.XPD.CHEX.PP.CD | Corresponding VIF = 125.53
## Drop variable: SH.DYN.MORT | Corresponding VIF = 118.45
## Drop variable: SH.XPD.CHEX.PC.CD | Corresponding VIF = 67.79
## Drop variable: SH.XPD.GHED.PP.CD | Corresponding VIF = 58.69
## Drop variable: SP.DYN.TO65.FE.ZS | Corresponding VIF = 51.94
## Drop variable: SP.DYN.IMRT.FE.IN | Corresponding VIF = 49.44
## Drop variable: SH.XPD.OOPC.CH.ZS | Corresponding VIF = 36.94
## Drop variable: SH.MMR.DTHS | Corresponding VIF = 35.1
## Drop variable: SH.XPD.GHED.CH.ZS | Corresponding VIF = 30.33
## Drop variable: SH.PRG.ANEM | Corresponding VIF = 26.23
## Drop variable: SP.DYN.CDRT.IN | Corresponding VIF = 23.72
## Drop variable: SP.DYN.CBRT.IN | Corresponding VIF = 19.35
## Drop variable: SP.DYN.AMRT.FE | Corresponding VIF = 18.14
## Drop variable: SH.XPD.GHED.PC.CD | Corresponding VIF = 17.17
##
## === VIF Filter Finished ===
## Remaining valid variables after VIF screening count: 24
## Final reserved variable list:
## [1] "SH.H2O.BASW.ZS" "SH.STA.BASS.ZS" "SH.STA.DIAB.ZS"
## [4] "SH.STA.ODFC.ZS" "SP.ADO.TFRT" "SP.DYN.AMRT.MA"
## [7] "SP.DYN.TFRT.IN" "SP.DYN.TO65.MA.ZS" "SP.POP.BRTH.MF"
## [10] "SP.POP.GROW" "SH.ANM.NPRG.ZS" "SH.DTH.IMRT"
## [13] "SH.DYN.NMRT" "SH.IMM.IDPT" "SH.IMM.MEAS"
## [16] "SH.MED.BEDS.ZS" "SH.MED.NUMW.P3" "SH.MED.PHYS.ZS"
## [19] "SH.MMR.RISK.ZS" "SH.XPD.CHEX.GD.ZS" "SH.XPD.GHED.GD.ZS"
## [22] "SH.XPD.PVTD.CH.ZS" "SH.XPD.PVTD.PC.CD" "SH.XPD.PVTD.PP.CD"
##
## ================================================================================
## PANEL FIXED EFFECT ESTIMATION | INCOME GROUP: Low income
## ================================================================================
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## SH.H2O.BASW.ZS 0.127881 0.041459 3.0845 0.0020752 **
## SH.STA.BASS.ZS 0.014112 0.042873 0.3292 0.7420826
## SH.STA.ODFC.ZS 0.078300 0.041595 1.8824 0.0599645 .
## SP.ADO.TFRT 0.021279 0.043958 0.4841 0.6284068
## SP.DYN.AMRT.MA -0.545197 0.093613 -5.8239 6.987e-09 ***
## SP.DYN.TFRT.IN -0.883307 0.120394 -7.3368 3.527e-13 ***
## SP.DYN.TO65.MA.ZS 0.130388 0.069192 1.8844 0.0596937 .
## SP.POP.BRTH.MF -14.620307 3.419195 -4.2760 2.020e-05 ***
## SP.POP.GROW -0.090830 0.026436 -3.4359 0.0006063 ***
## SH.ANM.NPRG.ZS -0.088236 0.070426 -1.2529 0.2104332
## SH.DTH.IMRT 0.784458 0.058773 13.3472 < 2.2e-16 ***
## SH.DYN.NMRT -0.816344 0.050787 -16.0740 < 2.2e-16 ***
## SH.IMM.IDPT 0.028209 0.020554 1.3724 0.1701363
## SH.IMM.MEAS -0.034459 0.021530 -1.6005 0.1096965
## SH.MED.BEDS.ZS -0.064536 0.039763 -1.6230 0.1047857
## SH.MED.NUMW.P3 -0.079389 0.063332 -1.2535 0.2102044
## SH.MED.PHYS.ZS 0.685436 0.135825 5.0465 5.036e-07 ***
## SH.MMR.RISK.ZS -0.078477 0.046527 -1.6867 0.0918657 .
## SH.XPD.CHEX.GD.ZS 0.153991 0.046005 3.3473 0.0008359 ***
## SH.XPD.GHED.GD.ZS 0.145391 0.024691 5.8883 4.782e-09 ***
## SH.XPD.PVTD.CH.ZS 0.170691 0.043399 3.9331 8.760e-05 ***
## SH.XPD.PVTD.PC.CD -0.106464 0.039872 -2.6701 0.0076617 **
## SH.XPD.PVTD.PP.CD -0.067803 0.041217 -1.6450 0.1001707
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## ---------------- Model Overall Fit Statistics ----------------
## Within R²: 0.9046
## Adjusted Within R²: 0.9017
## F-statistic: 633.3659
## F p-value: 0
## --------------------------------------------------------------
##
## >>> Significant Variables (p < 0.05):
## - SH.H2O.BASW.ZS
## - SP.DYN.AMRT.MA
## - SP.DYN.TFRT.IN
## - SP.POP.BRTH.MF
## - SP.POP.GROW
## - SH.DTH.IMRT
## - SH.DYN.NMRT
## - SH.MED.PHYS.ZS
## - SH.XPD.CHEX.GD.ZS
## - SH.XPD.GHED.GD.ZS
## - SH.XPD.PVTD.CH.ZS
## - SH.XPD.PVTD.PC.CD
## ================================================================================
##
## === Start Automatic Iterative VIF Filtering | Max allowed VIF = 10 ===
## Initial total candidate variables count: 52
##
## Drop variable: SP.DYN.IMRT.IN | Corresponding VIF = 336539.4
## Drop variable: SH.DYN.MORT | Corresponding VIF = 12234.89
## Drop variable: SH.ANM.ALLW.ZS | Corresponding VIF = 4060.79
## Drop variable: SH.DTH.IMRT | Corresponding VIF = 3018.62
## Drop variable: SP.DYN.LE00.IN | Corresponding VIF = 2837.21
## Drop variable: SH.DYN.MORT.MA | Corresponding VIF = 2431.71
## Drop variable: SH.XPD.PVTD.PP.CD | Corresponding VIF = 1030.3
## Drop variable: SH.DTH.NMRT | Corresponding VIF = 715.29
## Drop variable: SH.XPD.CHEX.PC.CD | Corresponding VIF = 520.59
## Drop variable: SP.DYN.IMRT.FE.IN | Corresponding VIF = 451.1
## Drop variable: SH.XPD.PVTD.PC.CD | Corresponding VIF = 357.28
## Drop variable: SP.DYN.LE00.FE.IN | Corresponding VIF = 294.9
## Drop variable: SH.XPD.GHED.PP.CD | Corresponding VIF = 250.01
## Drop variable: SH.MMR.DTHS | Corresponding VIF = 244.08
## Drop variable: SH.XPD.OOPC.PP.CD | Corresponding VIF = 219.3
## Drop variable: SH.MMR.RISK | Corresponding VIF = 201.98
## Drop variable: SH.DYN.MORT.FE | Corresponding VIF = 79.85
## Drop variable: SH.XPD.GHED.PC.CD | Corresponding VIF = 62.85
## Drop variable: SP.DYN.LE00.MA.IN | Corresponding VIF = 58.92
## Drop variable: SP.DYN.CBRT.IN | Corresponding VIF = 45.03
## Drop variable: SH.XPD.GHED.GD.ZS | Corresponding VIF = 37.02
## Drop variable: SP.DYN.AMRT.FE | Corresponding VIF = 32.04
## Drop variable: SH.XPD.OOPC.CH.ZS | Corresponding VIF = 27.43
## Drop variable: SH.STA.MMRT | Corresponding VIF = 26.32
## Drop variable: SP.DYN.IMRT.MA.IN | Corresponding VIF = 21.95
## Drop variable: SP.DYN.TO65.MA.ZS | Corresponding VIF = 17.11
## Drop variable: SH.PRG.ANEM | Corresponding VIF = 13.33
## Drop variable: SP.DYN.CDRT.IN | Corresponding VIF = 12.09
## Drop variable: SH.XPD.CHEX.PP.CD | Corresponding VIF = 10.17
##
## === VIF Filter Finished ===
## Remaining valid variables after VIF screening count: 23
## Final reserved variable list:
## [1] "SH.H2O.BASW.ZS" "SH.STA.BASS.ZS" "SH.STA.DIAB.ZS"
## [4] "SH.STA.ODFC.ZS" "SP.ADO.TFRT" "SP.DYN.AMRT.MA"
## [7] "SP.DYN.TFRT.IN" "SP.DYN.TO65.FE.ZS" "SP.POP.BRTH.MF"
## [10] "SP.POP.GROW" "SH.ANM.NPRG.ZS" "SH.DTH.MORT"
## [13] "SH.DYN.NMRT" "SH.IMM.IDPT" "SH.IMM.MEAS"
## [16] "SH.MED.BEDS.ZS" "SH.MED.NUMW.P3" "SH.MED.PHYS.ZS"
## [19] "SH.MMR.RISK.ZS" "SH.XPD.CHEX.GD.ZS" "SH.XPD.GHED.CH.ZS"
## [22] "SH.XPD.OOPC.PC.CD" "SH.XPD.PVTD.CH.ZS"
##
## ================================================================================
## PANEL FIXED EFFECT ESTIMATION | INCOME GROUP: Lower middle income
## ================================================================================
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## SH.H2O.BASW.ZS 0.337346 0.057439 5.8731 4.720e-09 ***
## SH.STA.BASS.ZS -0.058973 0.019780 -2.9814 0.0028910 **
## SH.STA.DIAB.ZS -0.176199 0.019984 -8.8169 < 2.2e-16 ***
## SH.STA.ODFC.ZS -0.059796 0.014314 -4.1775 3.027e-05 ***
## SP.ADO.TFRT -0.210672 0.022203 -9.4885 < 2.2e-16 ***
## SP.DYN.AMRT.MA -0.380654 0.052618 -7.2344 5.842e-13 ***
## SP.DYN.TFRT.IN -0.955054 0.079112 -12.0721 < 2.2e-16 ***
## SP.DYN.TO65.FE.ZS 0.289953 0.102788 2.8209 0.0048189 **
## SP.POP.BRTH.MF -1.343541 1.224090 -1.0976 0.2724700
## SP.POP.GROW -0.168502 0.034341 -4.9067 9.730e-07 ***
## SH.ANM.NPRG.ZS -0.239910 0.059724 -4.0170 6.032e-05 ***
## SH.DTH.MORT 0.623783 0.035354 17.6438 < 2.2e-16 ***
## SH.DYN.NMRT -0.571360 0.016049 -35.5999 < 2.2e-16 ***
## SH.IMM.IDPT -0.067995 0.017463 -3.8937 0.0001008 ***
## SH.IMM.MEAS 0.090786 0.015323 5.9248 3.463e-09 ***
## SH.MED.BEDS.ZS -0.136038 0.019676 -6.9138 5.683e-12 ***
## SH.MED.NUMW.P3 -0.190905 0.026996 -7.0716 1.878e-12 ***
## SH.MED.PHYS.ZS 0.135452 0.052423 2.5838 0.0098162 **
## SH.MMR.RISK.ZS -0.087240 0.041531 -2.1006 0.0357559 *
## SH.XPD.CHEX.GD.ZS 0.139396 0.040931 3.4057 0.0006683 ***
## SH.XPD.GHED.CH.ZS 0.024098 0.025161 0.9578 0.3382564
## SH.XPD.OOPC.PC.CD 0.058931 0.020152 2.9243 0.0034764 **
## SH.XPD.PVTD.CH.ZS -0.035027 0.025152 -1.3926 0.1638290
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## ---------------- Model Overall Fit Statistics ----------------
## Within R²: 0.9095
## Adjusted Within R²: 0.9075
## F-statistic: 1381.494
## F p-value: 0
## --------------------------------------------------------------
##
## >>> Significant Variables (p < 0.05):
## - SH.H2O.BASW.ZS
## - SH.STA.BASS.ZS
## - SH.STA.DIAB.ZS
## - SH.STA.ODFC.ZS
## - SP.ADO.TFRT
## - SP.DYN.AMRT.MA
## - SP.DYN.TFRT.IN
## - SP.DYN.TO65.FE.ZS
## - SP.POP.GROW
## - SH.ANM.NPRG.ZS
## - SH.DTH.MORT
## - SH.DYN.NMRT
## - SH.IMM.IDPT
## - SH.IMM.MEAS
## - SH.MED.BEDS.ZS
## - SH.MED.NUMW.P3
## - SH.MED.PHYS.ZS
## - SH.MMR.RISK.ZS
## - SH.XPD.CHEX.GD.ZS
## - SH.XPD.OOPC.PC.CD
## ================================================================================
##
## === Start Automatic Iterative VIF Filtering | Max allowed VIF = 10 ===
## Initial total candidate variables count: 52
##
## Drop variable: SP.DYN.IMRT.IN | Corresponding VIF = 137953.8
## Drop variable: SH.DYN.MORT | Corresponding VIF = 11288.9
## Drop variable: SH.DYN.MORT.MA | Corresponding VIF = 4920.26
## Drop variable: SH.ANM.ALLW.ZS | Corresponding VIF = 3196.48
## Drop variable: SP.DYN.LE00.IN | Corresponding VIF = 2212.42
## Drop variable: SH.DTH.IMRT | Corresponding VIF = 2136.16
## Drop variable: SH.XPD.PVTD.PC.CD | Corresponding VIF = 1331.9
## Drop variable: SH.DTH.NMRT | Corresponding VIF = 858.24
## Drop variable: SH.XPD.CHEX.PC.CD | Corresponding VIF = 856.65
## Drop variable: SH.XPD.OOPC.PP.CD | Corresponding VIF = 595.64
## Drop variable: SP.DYN.IMRT.FE.IN | Corresponding VIF = 532.44
## Drop variable: SP.DYN.LE00.FE.IN | Corresponding VIF = 290.19
## Drop variable: SP.DYN.IMRT.MA.IN | Corresponding VIF = 179.11
## Drop variable: SH.DTH.MORT | Corresponding VIF = 174.35
## Drop variable: SH.XPD.GHED.PP.CD | Corresponding VIF = 155.04
## Drop variable: SH.MMR.RISK | Corresponding VIF = 154.08
## Drop variable: SH.XPD.PVTD.PP.CD | Corresponding VIF = 119.14
## Drop variable: SH.XPD.GHED.GD.ZS | Corresponding VIF = 104.08
## Drop variable: SP.DYN.LE00.MA.IN | Corresponding VIF = 89.36
## Drop variable: SH.XPD.GHED.PC.CD | Corresponding VIF = 55.36
## Drop variable: SP.DYN.TO65.FE.ZS | Corresponding VIF = 43.87
## Drop variable: SP.DYN.CBRT.IN | Corresponding VIF = 31.77
## Drop variable: SH.DYN.MORT.FE | Corresponding VIF = 23.88
## Drop variable: SH.XPD.OOPC.CH.ZS | Corresponding VIF = 18.5
## Drop variable: SP.DYN.AMRT.FE | Corresponding VIF = 12.76
## Drop variable: SP.DYN.TO65.MA.ZS | Corresponding VIF = 12.14
## Drop variable: SH.XPD.OOPC.PC.CD | Corresponding VIF = 10.21
##
## === VIF Filter Finished ===
## Remaining valid variables after VIF screening count: 25
## Final reserved variable list:
## [1] "SH.H2O.BASW.ZS" "SH.STA.BASS.ZS" "SH.STA.DIAB.ZS"
## [4] "SH.STA.ODFC.ZS" "SP.ADO.TFRT" "SP.DYN.AMRT.MA"
## [7] "SP.DYN.CDRT.IN" "SP.DYN.TFRT.IN" "SP.POP.BRTH.MF"
## [10] "SP.POP.GROW" "SH.ANM.NPRG.ZS" "SH.DYN.NMRT"
## [13] "SH.IMM.IDPT" "SH.IMM.MEAS" "SH.MED.BEDS.ZS"
## [16] "SH.MED.NUMW.P3" "SH.MED.PHYS.ZS" "SH.MMR.DTHS"
## [19] "SH.MMR.RISK.ZS" "SH.PRG.ANEM" "SH.STA.MMRT"
## [22] "SH.XPD.CHEX.GD.ZS" "SH.XPD.CHEX.PP.CD" "SH.XPD.GHED.CH.ZS"
## [25] "SH.XPD.PVTD.CH.ZS"
##
## ================================================================================
## PANEL FIXED EFFECT ESTIMATION | INCOME GROUP: Upper middle income
## ================================================================================
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## SH.H2O.BASW.ZS 1.3972189 0.2260368 6.1814 7.103e-10 ***
## SH.STA.BASS.ZS -0.6182424 0.0432618 -14.2907 < 2.2e-16 ***
## SH.STA.DIAB.ZS 0.0757509 0.1373534 0.5515 0.5813249
## SH.STA.ODFC.ZS -0.0828371 0.0126550 -6.5458 6.801e-11 ***
## SP.ADO.TFRT 0.1150513 0.0160114 7.1856 8.176e-13 ***
## SP.DYN.AMRT.MA -0.3971606 0.0219028 -18.1329 < 2.2e-16 ***
## SP.DYN.CDRT.IN 0.1988247 0.0481133 4.1324 3.676e-05 ***
## SP.DYN.TFRT.IN -0.9358709 0.0484551 -19.3142 < 2.2e-16 ***
## SP.POP.BRTH.MF 0.6155544 1.1971182 0.5142 0.6071476
## SP.POP.GROW -0.0777601 0.0210119 -3.7008 0.0002183 ***
## SH.ANM.NPRG.ZS 0.1312647 0.0789738 1.6621 0.0965783 .
## SH.DYN.NMRT -0.2679988 0.0248669 -10.7773 < 2.2e-16 ***
## SH.IMM.IDPT 0.0817582 0.0132924 6.1508 8.601e-10 ***
## SH.IMM.MEAS 0.0049988 0.0069302 0.7213 0.4707712
## SH.MED.BEDS.ZS -0.1009014 0.0188240 -5.3603 8.861e-08 ***
## SH.MED.NUMW.P3 0.0368698 0.0364782 1.0107 0.3122140
## SH.MED.PHYS.ZS 0.1630718 0.0227870 7.1563 1.009e-12 ***
## SH.MMR.DTHS 0.1896754 0.0240346 7.8918 3.978e-15 ***
## SH.MMR.RISK.ZS 0.0926610 0.0485388 1.9090 0.0563447 .
## SH.PRG.ANEM -0.4893667 0.1964390 -2.4912 0.0127787 *
## SH.STA.MMRT -0.1932110 0.0219105 -8.8182 < 2.2e-16 ***
## SH.XPD.CHEX.GD.ZS 0.2217210 0.0524467 4.2276 2.424e-05 ***
## SH.XPD.CHEX.PP.CD -0.0169680 0.0352137 -0.4819 0.6299372
## SH.XPD.GHED.CH.ZS 0.1247043 0.0349171 3.5714 0.0003599 ***
## SH.XPD.PVTD.CH.ZS 0.1010003 0.0459217 2.1994 0.0279160 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## ---------------- Model Overall Fit Statistics ----------------
## Within R²: 0.9085
## Adjusted Within R²: 0.9064
## F-statistic: 1358.39
## F p-value: 0
## --------------------------------------------------------------
##
## >>> Significant Variables (p < 0.05):
## - SH.H2O.BASW.ZS
## - SH.STA.BASS.ZS
## - SH.STA.ODFC.ZS
## - SP.ADO.TFRT
## - SP.DYN.AMRT.MA
## - SP.DYN.CDRT.IN
## - SP.DYN.TFRT.IN
## - SP.POP.GROW
## - SH.DYN.NMRT
## - SH.IMM.IDPT
## - SH.MED.BEDS.ZS
## - SH.MED.PHYS.ZS
## - SH.MMR.DTHS
## - SH.PRG.ANEM
## - SH.STA.MMRT
## - SH.XPD.CHEX.GD.ZS
## - SH.XPD.GHED.CH.ZS
## - SH.XPD.PVTD.CH.ZS
## ================================================================================
##
## === Start Automatic Iterative VIF Filtering | Max allowed VIF = 10 ===
## Initial total candidate variables count: 52
##
## Drop variable: SH.DYN.MORT | Corresponding VIF = 46541.37
## Drop variable: SP.DYN.IMRT.IN | Corresponding VIF = 29889.05
## Drop variable: SH.DYN.MORT.FE | Corresponding VIF = 5103.64
## Drop variable: SH.ANM.ALLW.ZS | Corresponding VIF = 5028.61
## Drop variable: SP.DYN.LE00.IN | Corresponding VIF = 2474.57
## Drop variable: SH.DTH.IMRT | Corresponding VIF = 2188.32
## Drop variable: SP.DYN.IMRT.MA.IN | Corresponding VIF = 1366.74
## Drop variable: SH.XPD.CHEX.PC.CD | Corresponding VIF = 1007.11
## Drop variable: SH.XPD.PVTD.PC.CD | Corresponding VIF = 848.25
## Drop variable: SH.XPD.CHEX.PP.CD | Corresponding VIF = 742.77
## Drop variable: SH.DTH.NMRT | Corresponding VIF = 495.44
## Drop variable: SH.XPD.GHED.PC.CD | Corresponding VIF = 276.1
## Drop variable: SH.XPD.PVTD.PP.CD | Corresponding VIF = 224.95
## Drop variable: SP.DYN.IMRT.FE.IN | Corresponding VIF = 218.62
## Drop variable: SP.DYN.LE00.MA.IN | Corresponding VIF = 202.44
## Drop variable: SH.XPD.GHED.GD.ZS | Corresponding VIF = 141.68
## Drop variable: SP.DYN.TO65.FE.ZS | Corresponding VIF = 100.03
## Drop variable: SH.MMR.RISK | Corresponding VIF = 92.85
## Drop variable: SH.XPD.GHED.PP.CD | Corresponding VIF = 80.72
## Drop variable: SP.DYN.AMRT.MA | Corresponding VIF = 35.02
## Drop variable: SH.DTH.MORT | Corresponding VIF = 32.11
## Drop variable: SH.DYN.MORT.MA | Corresponding VIF = 30.63
## Drop variable: SP.DYN.CBRT.IN | Corresponding VIF = 23.4
## Drop variable: SH.XPD.OOPC.PC.CD | Corresponding VIF = 20.59
## Drop variable: SP.DYN.LE00.FE.IN | Corresponding VIF = 18.57
## Drop variable: SH.XPD.PVTD.CH.ZS | Corresponding VIF = 13.3
## Drop variable: SP.DYN.AMRT.FE | Corresponding VIF = 11.66
##
## === VIF Filter Finished ===
## Remaining valid variables after VIF screening count: 25
## Final reserved variable list:
## [1] "SH.H2O.BASW.ZS" "SH.STA.BASS.ZS" "SH.STA.DIAB.ZS"
## [4] "SH.STA.ODFC.ZS" "SP.ADO.TFRT" "SP.DYN.CDRT.IN"
## [7] "SP.DYN.TFRT.IN" "SP.DYN.TO65.MA.ZS" "SP.POP.BRTH.MF"
## [10] "SP.POP.GROW" "SH.ANM.NPRG.ZS" "SH.DYN.NMRT"
## [13] "SH.IMM.IDPT" "SH.IMM.MEAS" "SH.MED.BEDS.ZS"
## [16] "SH.MED.NUMW.P3" "SH.MED.PHYS.ZS" "SH.MMR.DTHS"
## [19] "SH.MMR.RISK.ZS" "SH.PRG.ANEM" "SH.STA.MMRT"
## [22] "SH.XPD.CHEX.GD.ZS" "SH.XPD.GHED.CH.ZS" "SH.XPD.OOPC.CH.ZS"
## [25] "SH.XPD.OOPC.PP.CD"
##
## ================================================================================
## PANEL FIXED EFFECT ESTIMATION | INCOME GROUP: High income
## ================================================================================
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## SH.H2O.BASW.ZS -0.0028300 0.5306583 -0.0053 0.9957451
## SH.STA.BASS.ZS 0.4672449 0.1391056 3.3589 0.0007897 ***
## SH.STA.DIAB.ZS 0.1030226 0.0158413 6.5034 8.802e-11 ***
## SH.STA.ODFC.ZS -0.0052833 0.0459894 -0.1149 0.9085459
## SP.ADO.TFRT 0.0371322 0.0160180 2.3182 0.0204901 *
## SP.DYN.CDRT.IN 0.3394422 0.0622622 5.4518 5.281e-08 ***
## SP.DYN.TFRT.IN -1.2175133 0.0619315 -19.6590 < 2.2e-16 ***
## SP.DYN.TO65.MA.ZS 1.3296742 0.1029956 12.9100 < 2.2e-16 ***
## SP.POP.BRTH.MF -2.5356565 1.7062873 -1.4861 0.1373390
## SP.POP.GROW -0.0872279 0.0295667 -2.9502 0.0031937 **
## SH.ANM.NPRG.ZS -0.2699862 0.0773872 -3.4888 0.0004904 ***
## SH.DYN.NMRT -0.0564130 0.0152841 -3.6910 0.0002263 ***
## SH.IMM.IDPT 0.1054859 0.0204017 5.1704 2.448e-07 ***
## SH.IMM.MEAS -0.0829834 0.0228788 -3.6271 0.0002901 ***
## SH.MED.BEDS.ZS -0.1103885 0.0188522 -5.8555 5.132e-09 ***
## SH.MED.NUMW.P3 0.1359353 0.0244752 5.5540 2.970e-08 ***
## SH.MED.PHYS.ZS 0.0892435 0.0324292 2.7519 0.0059505 **
## SH.MMR.DTHS 0.2353018 0.0205865 11.4299 < 2.2e-16 ***
## SH.MMR.RISK.ZS 1.4957873 0.2013721 7.4280 1.336e-13 ***
## SH.PRG.ANEM -0.3651493 0.1408715 -2.5921 0.0095740 **
## SH.STA.MMRT -0.3614993 0.0273140 -13.2350 < 2.2e-16 ***
## SH.XPD.CHEX.GD.ZS 0.4980608 0.0851542 5.8489 5.336e-09 ***
## SH.XPD.GHED.CH.ZS -0.3157452 0.1712749 -1.8435 0.0653288 .
## SH.XPD.OOPC.CH.ZS -0.3433940 0.0648502 -5.2952 1.252e-07 ***
## SH.XPD.OOPC.PP.CD -0.0800982 0.0256584 -3.1217 0.0018107 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## ---------------- Model Overall Fit Statistics ----------------
## Within R²: 0.8612
## Adjusted Within R²: 0.8582
## F-statistic: 1010.059
## F p-value: 0
## --------------------------------------------------------------
##
## >>> Significant Variables (p < 0.05):
## - SH.STA.BASS.ZS
## - SH.STA.DIAB.ZS
## - SP.ADO.TFRT
## - SP.DYN.CDRT.IN
## - SP.DYN.TFRT.IN
## - SP.DYN.TO65.MA.ZS
## - SP.POP.GROW
## - SH.ANM.NPRG.ZS
## - SH.DYN.NMRT
## - SH.IMM.IDPT
## - SH.IMM.MEAS
## - SH.MED.BEDS.ZS
## - SH.MED.NUMW.P3
## - SH.MED.PHYS.ZS
## - SH.MMR.DTHS
## - SH.MMR.RISK.ZS
## - SH.PRG.ANEM
## - SH.STA.MMRT
## - SH.XPD.CHEX.GD.ZS
## - SH.XPD.OOPC.CH.ZS
## - SH.XPD.OOPC.PP.CD
## ================================================================================
# ==============================================================================
# Step 3: Select optimal model by total absolute coefficient sum
# Adjusted R² & F-stat unavailable for all income-group specifications,
# choose model with largest sum of absolute coefficients as alternative screening rule
# ==============================================================================
if (length(income_fe_list) == 0) {
cat("Warning: No valid income-group fixed-effect model is available.\n")
} else {
# Calculate sum of absolute regression coefficients for each group
abs_coef_sum <- sapply(income_fe_list, function(m) {
tryCatch({
sum(abs(coef(m)))
}, error = function(e) NA)
})
abs_clean <- na.omit(abs_coef_sum)
if (length(abs_clean) == 0) {
cat("Warning: Fail to extract coefficient from all income-group models.\n")
} else {
best_group_name <- names(which.max(abs_clean))
if (best_group_name %in% names(income_fe_list)) {
best_reg_model <- income_fe_list[[best_group_name]]
best_sum_val <- round(max(abs_clean),4)
# Output result info
cat("\n==== Optimal Fixed Effect by Income Group (Max Absolute Coefficient Sum) ====\n")
cat("Best Performing Income Group: ", best_group_name, "\n")
cat("Sum of Absolute Coefficients: ", best_sum_val, "\n\n")
# Outpu Best Model R²/F
sum_best <- summary(best_reg_model)
r2b <- round(as.numeric(sum_best$r.squared["rsq"]),4)
adjr2b <- round(as.numeric(sum_best$r.squared["adjrsq"]),4)
fb <- round(as.numeric(sum_best$fstatistic$statistic),4)
fpb <- format(as.numeric(sum_best$fstatistic$p.value),digits=6,scientific=F)
cat("Optimal Model Fit Index:\n")
cat("Within R²: ",r2b," | Adj Within R²: ",adjr2b," | F: ",fb," | F-p: ",fpb,"\n\n")
# Robust standard error output
print(coeftest(best_reg_model, vcov = vcovSCC(best_reg_model)))
} else {
cat("Warning: Selected group name not found in model list.\n")
}
}
}
##
## ==== Optimal Fixed Effect by Income Group (Max Absolute Coefficient Sum) ====
## Best Performing Income Group: Low income
## Sum of Absolute Coefficients: 19.8155
##
## Optimal Model Fit Index:
## Within R²: 0.9046 | Adj Within R²: 0.9017 | F: 633.3659 | F-p: 0
##
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## SH.H2O.BASW.ZS 0.127881 0.041459 3.0845 0.0020752 **
## SH.STA.BASS.ZS 0.014112 0.042873 0.3292 0.7420826
## SH.STA.ODFC.ZS 0.078300 0.041595 1.8824 0.0599645 .
## SP.ADO.TFRT 0.021279 0.043958 0.4841 0.6284068
## SP.DYN.AMRT.MA -0.545197 0.093613 -5.8239 6.987e-09 ***
## SP.DYN.TFRT.IN -0.883307 0.120394 -7.3368 3.527e-13 ***
## SP.DYN.TO65.MA.ZS 0.130388 0.069192 1.8844 0.0596937 .
## SP.POP.BRTH.MF -14.620307 3.419195 -4.2760 2.020e-05 ***
## SP.POP.GROW -0.090830 0.026436 -3.4359 0.0006063 ***
## SH.ANM.NPRG.ZS -0.088236 0.070426 -1.2529 0.2104332
## SH.DTH.IMRT 0.784458 0.058773 13.3472 < 2.2e-16 ***
## SH.DYN.NMRT -0.816344 0.050787 -16.0740 < 2.2e-16 ***
## SH.IMM.IDPT 0.028209 0.020554 1.3724 0.1701363
## SH.IMM.MEAS -0.034459 0.021530 -1.6005 0.1096965
## SH.MED.BEDS.ZS -0.064536 0.039763 -1.6230 0.1047857
## SH.MED.NUMW.P3 -0.079389 0.063332 -1.2535 0.2102044
## SH.MED.PHYS.ZS 0.685436 0.135825 5.0465 5.036e-07 ***
## SH.MMR.RISK.ZS -0.078477 0.046527 -1.6867 0.0918657 .
## SH.XPD.CHEX.GD.ZS 0.153991 0.046005 3.3473 0.0008359 ***
## SH.XPD.GHED.GD.ZS 0.145391 0.024691 5.8883 4.782e-09 ***
## SH.XPD.PVTD.CH.ZS 0.170691 0.043399 3.9331 8.760e-05 ***
## SH.XPD.PVTD.PC.CD -0.106464 0.039872 -2.6701 0.0076617 **
## SH.XPD.PVTD.PP.CD -0.067803 0.041217 -1.6450 0.1001707
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Multicollinearity elimination: All subgroups adopt VIF=10 as cutoff. Starting from 52 original predictors, numerous mortality, life expectancy and health expenditure variables with extreme VIF values are dropped. The number of retained valid indicators varies across eight geographic regions.
Evident regional heterogeneity: Significant predictors, coefficient magnitudes and sign directions differ substantially across regions, confirming divergent aging drivers globally.
Best-performing region: North America delivers the largest total absolute coefficient value (83.852), indicating the strongest overall explanatory power among all regions.
Key significant factors for North America: Six variables are statistically significant at 5% level. Physician density, immunization coverage and anaemia prevalence positively raise aging ratio, while sanitation access and total fertility suppress population ageing.
General rule: After screening, healthcare resource, demographic and public health indicators serve as core valid determinants, whereas highly collinear aggregate health metrics lose explanatory power.
# ==============================================================================
# Step 1: Prepare valid region categories and initialize model storage list
# Extract non-NA unique region labels to iterate heterogeneous subgroup regression
# ==============================================================================
region_list <- unique(na.omit(df_wide_final$region))
# Empty list to save successfully estimated fixed-effect models by region
region_fe_list <- list()
# ==============================================================================
# Step 2: Iterate each region to run customized within-demeaning fixed-effect regression
# run_group_fe returns NULL automatically when sample size or variable conditions fail
# Print full regression output & significant variables for every valid subgroup after estimation
# ==============================================================================
for (gr in region_list) {
mod_tmp <- run_group_fe(
group_name = gr,
group_col = "region",
df_in = df_wide_final,
y_in = y_raw,
X_in = X_imputed
)
# Keep only valid estimated model, drop NULL invalid outputs
if (!is.null(mod_tmp)) {
region_fe_list[[gr]] <- mod_tmp
# -------------------- Print full estimation results for current region
cat("\n================================================================================\n")
cat(" PANEL FIXED EFFECT ESTIMATION | REGION: ", gr, "\n")
cat("================================================================================\n")
# Calculate Driscoll-Kraay robust coefficient test table
robust_res <- coeftest(mod_tmp, vcov = vcovSCC(mod_tmp))
# Output Parameter Estimates table
print(robust_res)
# ==========Add Within R², Adj R², F stat & F p-value==========
sum_g <- summary(mod_tmp)
r2_raw <- as.numeric(sum_g$r.squared["rsq"])
r2_adj <- as.numeric(sum_g$r.squared["adjrsq"])
f_stat <- as.numeric(sum_g$fstatistic$statistic)
f_pvalue <- as.numeric(sum_g$fstatistic$p.value)
cat("\n---------------- Model Overall Fit Statistics ----------------\n")
cat("Within R²: ", round(r2_raw,4),"\n")
cat("Adjusted Within R²: ", round(r2_adj,4),"\n")
cat("F-statistic: ", round(f_stat,4),"\n")
cat("F p-value: ", format(f_pvalue, digits = 6, scientific = FALSE),"\n")
cat("--------------------------------------------------------------\n\n")
# ==============================================================
# Extract variable names & p-values to screen statistically significant predictors (p < 0.05)
p_vals <- robust_res[, 4]
sig_vars <- names(p_vals[p_vals < 0.05])
# Print list of significant variables
cat(">>> Significant Variables (p < 0.05):\n")
if (length(sig_vars) > 0) {
for (v in sig_vars) {
cat(" -", v, "\n")
}
} else {
cat(" - No statistically significant variables found\n")
}
cat("================================================================================\n\n")
}
}
## === Start Automatic Iterative VIF Filtering | Max allowed VIF = 10 ===
## Initial total candidate variables count: 52
##
## Drop variable: SH.DYN.MORT | Corresponding VIF = 888316.2
## Drop variable: SP.DYN.IMRT.IN | Corresponding VIF = 498360
## Drop variable: SP.DYN.LE00.IN | Corresponding VIF = 203038.9
## Drop variable: SH.DTH.IMRT | Corresponding VIF = 133233.8
## Drop variable: SH.DYN.MORT.FE | Corresponding VIF = 47216.68
## Drop variable: SH.DTH.MORT | Corresponding VIF = 40802.84
## Drop variable: SH.ANM.ALLW.ZS | Corresponding VIF = 23364.16
## Drop variable: SH.XPD.CHEX.PC.CD | Corresponding VIF = 17057.3
## Drop variable: SH.XPD.PVTD.PC.CD | Corresponding VIF = 10990.16
## Drop variable: SH.XPD.CHEX.PP.CD | Corresponding VIF = 5496.98
## Drop variable: SP.DYN.IMRT.MA.IN | Corresponding VIF = 5378.17
## Drop variable: SP.DYN.LE00.FE.IN | Corresponding VIF = 4122.19
## Drop variable: SH.XPD.PVTD.PP.CD | Corresponding VIF = 2349.82
## Drop variable: SH.XPD.GHED.PC.CD | Corresponding VIF = 1796.95
## Drop variable: SH.MMR.DTHS | Corresponding VIF = 1638.8
## Drop variable: SP.DYN.IMRT.FE.IN | Corresponding VIF = 1087.39
## Drop variable: SH.MMR.RISK | Corresponding VIF = 689.46
## Drop variable: SP.DYN.TO65.MA.ZS | Corresponding VIF = 613.09
## Drop variable: SH.XPD.GHED.PP.CD | Corresponding VIF = 411.78
## Drop variable: SH.XPD.GHED.GD.ZS | Corresponding VIF = 315.3
## Drop variable: SP.DYN.LE00.MA.IN | Corresponding VIF = 295.93
## Drop variable: SH.DYN.MORT.MA | Corresponding VIF = 282.53
## Drop variable: SP.DYN.CBRT.IN | Corresponding VIF = 174.71
## Drop variable: SP.DYN.AMRT.FE | Corresponding VIF = 163.73
## Drop variable: SH.XPD.OOPC.PP.CD | Corresponding VIF = 117.52
## Drop variable: SH.PRG.ANEM | Corresponding VIF = 67.09
## Drop variable: SH.MMR.RISK.ZS | Corresponding VIF = 63.46
## Drop variable: SP.DYN.TFRT.IN | Corresponding VIF = 44.88
## Drop variable: SP.DYN.TO65.FE.ZS | Corresponding VIF = 39.14
## Drop variable: SH.DYN.NMRT | Corresponding VIF = 32.08
## Drop variable: SH.STA.BASS.ZS | Corresponding VIF = 26.84
## Drop variable: SH.STA.ODFC.ZS | Corresponding VIF = 22.43
## Drop variable: SH.MED.PHYS.ZS | Corresponding VIF = 21.65
## Drop variable: SH.XPD.OOPC.PC.CD | Corresponding VIF = 16.47
## Drop variable: SH.IMM.IDPT | Corresponding VIF = 15.94
## Drop variable: SH.MED.NUMW.P3 | Corresponding VIF = 14.83
## Drop variable: SH.XPD.OOPC.CH.ZS | Corresponding VIF = 14.39
## Drop variable: SH.STA.MMRT | Corresponding VIF = 13.08
## Drop variable: SP.ADO.TFRT | Corresponding VIF = 10.45
##
## === VIF Filter Finished ===
## Remaining valid variables after VIF screening count: 13
## Final reserved variable list:
## [1] "SH.H2O.BASW.ZS" "SH.STA.DIAB.ZS" "SP.DYN.AMRT.MA"
## [4] "SP.DYN.CDRT.IN" "SP.POP.BRTH.MF" "SP.POP.GROW"
## [7] "SH.ANM.NPRG.ZS" "SH.DTH.NMRT" "SH.IMM.MEAS"
## [10] "SH.MED.BEDS.ZS" "SH.XPD.CHEX.GD.ZS" "SH.XPD.GHED.CH.ZS"
## [13] "SH.XPD.PVTD.CH.ZS"
##
## ================================================================================
## PANEL FIXED EFFECT ESTIMATION | REGION: Unknown
## ================================================================================
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## SH.H2O.BASW.ZS 0.358195 0.100985 3.5470 0.0003954 ***
## SP.DYN.AMRT.MA -0.490224 0.098233 -4.9904 6.356e-07 ***
## SP.DYN.CDRT.IN -0.808411 0.055156 -14.6568 < 2.2e-16 ***
## SP.POP.BRTH.MF -3.967749 1.221286 -3.2488 0.0011712 **
## SP.POP.GROW -0.518915 0.098191 -5.2847 1.346e-07 ***
## SH.ANM.NPRG.ZS -0.233921 0.077711 -3.0101 0.0026322 **
## SH.DTH.NMRT -0.182832 0.035283 -5.1819 2.336e-07 ***
## SH.IMM.MEAS 0.085412 0.012005 7.1146 1.387e-12 ***
## SH.MED.BEDS.ZS -0.131725 0.037183 -3.5426 0.0004020 ***
## SH.XPD.CHEX.GD.ZS 0.560163 0.145469 3.8507 0.0001201 ***
## SH.XPD.GHED.CH.ZS 0.311879 0.111696 2.7922 0.0052669 **
## SH.XPD.PVTD.CH.ZS 0.070766 0.152538 0.4639 0.6427323
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## ---------------- Model Overall Fit Statistics ----------------
## Within R²: 0.9495
## Adjusted Within R²: 0.9486
## F-statistic: 4872.265
## F p-value: 0
## --------------------------------------------------------------
##
## >>> Significant Variables (p < 0.05):
## - SH.H2O.BASW.ZS
## - SP.DYN.AMRT.MA
## - SP.DYN.CDRT.IN
## - SP.POP.BRTH.MF
## - SP.POP.GROW
## - SH.ANM.NPRG.ZS
## - SH.DTH.NMRT
## - SH.IMM.MEAS
## - SH.MED.BEDS.ZS
## - SH.XPD.CHEX.GD.ZS
## - SH.XPD.GHED.CH.ZS
## ================================================================================
##
## === Start Automatic Iterative VIF Filtering | Max allowed VIF = 10 ===
## Initial total candidate variables count: 52
##
## Drop variable: SP.DYN.IMRT.IN | Corresponding VIF = 187504.9
## Drop variable: SH.DYN.MORT | Corresponding VIF = 47253.12
## Drop variable: SH.DYN.MORT.MA | Corresponding VIF = 16067.09
## Drop variable: SP.DYN.LE00.IN | Corresponding VIF = 11333.42
## Drop variable: SH.DTH.IMRT | Corresponding VIF = 9849.34
## Drop variable: SH.ANM.ALLW.ZS | Corresponding VIF = 2631.07
## Drop variable: SP.DYN.IMRT.FE.IN | Corresponding VIF = 1974.25
## Drop variable: SP.DYN.LE00.FE.IN | Corresponding VIF = 1207.51
## Drop variable: SH.XPD.CHEX.PC.CD | Corresponding VIF = 971.14
## Drop variable: SH.DTH.NMRT | Corresponding VIF = 938.88
## Drop variable: SH.XPD.PVTD.PC.CD | Corresponding VIF = 715.27
## Drop variable: SH.XPD.GHED.PP.CD | Corresponding VIF = 572.92
## Drop variable: SH.MMR.DTHS | Corresponding VIF = 357.61
## Drop variable: SH.MMR.RISK | Corresponding VIF = 318.27
## Drop variable: SH.XPD.PVTD.PP.CD | Corresponding VIF = 272.26
## Drop variable: SH.DYN.MORT.FE | Corresponding VIF = 237.51
## Drop variable: SH.XPD.CHEX.PP.CD | Corresponding VIF = 200.8
## Drop variable: SH.XPD.GHED.PC.CD | Corresponding VIF = 104.44
## Drop variable: SP.DYN.AMRT.FE | Corresponding VIF = 80.11
## Drop variable: SP.DYN.LE00.MA.IN | Corresponding VIF = 56.51
## Drop variable: SP.DYN.IMRT.MA.IN | Corresponding VIF = 45.3
## Drop variable: SH.XPD.GHED.CH.ZS | Corresponding VIF = 38.54
## Drop variable: SH.MMR.RISK.ZS | Corresponding VIF = 35.21
## Drop variable: SP.DYN.CBRT.IN | Corresponding VIF = 33.89
## Drop variable: SP.DYN.TO65.MA.ZS | Corresponding VIF = 31.72
## Drop variable: SH.XPD.PVTD.CH.ZS | Corresponding VIF = 21.82
## Drop variable: SH.XPD.OOPC.PC.CD | Corresponding VIF = 19.74
## Drop variable: SH.PRG.ANEM | Corresponding VIF = 18.34
## Drop variable: SP.DYN.CDRT.IN | Corresponding VIF = 17.05
## Drop variable: SH.STA.MMRT | Corresponding VIF = 16.23
## Drop variable: SH.STA.BASS.ZS | Corresponding VIF = 13.19
## Drop variable: SH.DYN.NMRT | Corresponding VIF = 12.61
## Drop variable: SH.STA.ODFC.ZS | Corresponding VIF = 10.13
##
## === VIF Filter Finished ===
## Remaining valid variables after VIF screening count: 19
## Final reserved variable list:
## [1] "SH.H2O.BASW.ZS" "SH.STA.DIAB.ZS" "SP.ADO.TFRT"
## [4] "SP.DYN.AMRT.MA" "SP.DYN.TFRT.IN" "SP.DYN.TO65.FE.ZS"
## [7] "SP.POP.BRTH.MF" "SP.POP.GROW" "SH.ANM.NPRG.ZS"
## [10] "SH.DTH.MORT" "SH.IMM.IDPT" "SH.IMM.MEAS"
## [13] "SH.MED.BEDS.ZS" "SH.MED.NUMW.P3" "SH.MED.PHYS.ZS"
## [16] "SH.XPD.CHEX.GD.ZS" "SH.XPD.GHED.GD.ZS" "SH.XPD.OOPC.CH.ZS"
## [19] "SH.XPD.OOPC.PP.CD"
##
## ================================================================================
## PANEL FIXED EFFECT ESTIMATION | REGION: Middle East & North Africa
## ================================================================================
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## SH.H2O.BASW.ZS -0.061221 0.092932 -0.6588 0.510147
## SH.STA.DIAB.ZS 1.255803 0.241406 5.2020 2.248e-07 ***
## SP.ADO.TFRT -0.080800 0.025354 -3.1869 0.001468 **
## SP.DYN.AMRT.MA -0.246778 0.054905 -4.4947 7.512e-06 ***
## SP.DYN.TFRT.IN -0.980288 0.079941 -12.2626 < 2.2e-16 ***
## SP.DYN.TO65.FE.ZS 0.546975 0.080921 6.7594 1.989e-11 ***
## SP.POP.BRTH.MF -4.406752 1.648387 -2.6734 0.007592 **
## SP.POP.GROW -0.021925 0.015303 -1.4327 0.152160
## SH.ANM.NPRG.ZS -0.121896 0.119537 -1.0197 0.308019
## SH.DTH.MORT 0.259074 0.036316 7.1339 1.523e-12 ***
## SH.IMM.IDPT 0.175209 0.042140 4.1577 3.399e-05 ***
## SH.IMM.MEAS -0.094139 0.029755 -3.1638 0.001589 **
## SH.MED.BEDS.ZS -0.444191 0.053597 -8.2876 2.567e-16 ***
## SH.MED.NUMW.P3 0.103642 0.037504 2.7635 0.005789 **
## SH.MED.PHYS.ZS 0.400622 0.070065 5.7179 1.303e-08 ***
## SH.XPD.CHEX.GD.ZS -0.203645 0.098534 -2.0668 0.038931 *
## SH.XPD.GHED.GD.ZS 0.084228 0.044163 1.9072 0.056688 .
## SH.XPD.OOPC.CH.ZS -0.432048 0.051944 -8.3176 < 2.2e-16 ***
## SH.XPD.OOPC.PP.CD 0.304844 0.050840 5.9962 2.536e-09 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## ---------------- Model Overall Fit Statistics ----------------
## Within R²: 0.9158
## Adjusted Within R²: 0.9135
## F-statistic: 845.1982
## F p-value: 0
## --------------------------------------------------------------
##
## >>> Significant Variables (p < 0.05):
## - SH.STA.DIAB.ZS
## - SP.ADO.TFRT
## - SP.DYN.AMRT.MA
## - SP.DYN.TFRT.IN
## - SP.DYN.TO65.FE.ZS
## - SP.POP.BRTH.MF
## - SH.DTH.MORT
## - SH.IMM.IDPT
## - SH.IMM.MEAS
## - SH.MED.BEDS.ZS
## - SH.MED.NUMW.P3
## - SH.MED.PHYS.ZS
## - SH.XPD.CHEX.GD.ZS
## - SH.XPD.OOPC.CH.ZS
## - SH.XPD.OOPC.PP.CD
## ================================================================================
##
## === Start Automatic Iterative VIF Filtering | Max allowed VIF = 10 ===
## Initial total candidate variables count: 52
##
## Drop variable: SP.DYN.IMRT.IN | Corresponding VIF = 395638.7
## Drop variable: SP.DYN.LE00.IN | Corresponding VIF = 16166.82
## Drop variable: SH.ANM.ALLW.ZS | Corresponding VIF = 6676.53
## Drop variable: SH.DYN.MORT.MA | Corresponding VIF = 4836.45
## Drop variable: SH.DYN.MORT.FE | Corresponding VIF = 2812.54
## Drop variable: SH.DTH.MORT | Corresponding VIF = 1904.26
## Drop variable: SH.XPD.PVTD.PP.CD | Corresponding VIF = 862.52
## Drop variable: SP.DYN.IMRT.MA.IN | Corresponding VIF = 533.53
## Drop variable: SH.DTH.NMRT | Corresponding VIF = 449.58
## Drop variable: SP.DYN.LE00.FE.IN | Corresponding VIF = 390.53
## Drop variable: SH.XPD.CHEX.PC.CD | Corresponding VIF = 312.28
## Drop variable: SH.XPD.PVTD.PC.CD | Corresponding VIF = 265.05
## Drop variable: SH.XPD.CHEX.PP.CD | Corresponding VIF = 207.94
## Drop variable: SH.MMR.DTHS | Corresponding VIF = 167.93
## Drop variable: SH.XPD.GHED.PP.CD | Corresponding VIF = 143.92
## Drop variable: SH.MMR.RISK | Corresponding VIF = 139.62
## Drop variable: SH.XPD.GHED.PC.CD | Corresponding VIF = 92.11
## Drop variable: SP.DYN.LE00.MA.IN | Corresponding VIF = 89.05
## Drop variable: SH.DYN.MORT | Corresponding VIF = 82.5
## Drop variable: SH.MMR.RISK.ZS | Corresponding VIF = 66.16
## Drop variable: SP.DYN.TO65.FE.ZS | Corresponding VIF = 50.42
## Drop variable: SP.DYN.CBRT.IN | Corresponding VIF = 37.04
## Drop variable: SH.XPD.GHED.GD.ZS | Corresponding VIF = 22.8
## Drop variable: SP.DYN.CDRT.IN | Corresponding VIF = 19.67
## Drop variable: SP.DYN.AMRT.MA | Corresponding VIF = 16.36
## Drop variable: SP.DYN.IMRT.FE.IN | Corresponding VIF = 14.63
## Drop variable: SH.PRG.ANEM | Corresponding VIF = 13.74
## Drop variable: SH.XPD.OOPC.PC.CD | Corresponding VIF = 11.32
##
## === VIF Filter Finished ===
## Remaining valid variables after VIF screening count: 24
## Final reserved variable list:
## [1] "SH.H2O.BASW.ZS" "SH.STA.BASS.ZS" "SH.STA.DIAB.ZS"
## [4] "SH.STA.ODFC.ZS" "SP.ADO.TFRT" "SP.DYN.AMRT.FE"
## [7] "SP.DYN.TFRT.IN" "SP.DYN.TO65.MA.ZS" "SP.POP.BRTH.MF"
## [10] "SP.POP.GROW" "SH.ANM.NPRG.ZS" "SH.DTH.IMRT"
## [13] "SH.DYN.NMRT" "SH.IMM.IDPT" "SH.IMM.MEAS"
## [16] "SH.MED.BEDS.ZS" "SH.MED.NUMW.P3" "SH.MED.PHYS.ZS"
## [19] "SH.STA.MMRT" "SH.XPD.CHEX.GD.ZS" "SH.XPD.GHED.CH.ZS"
## [22] "SH.XPD.OOPC.CH.ZS" "SH.XPD.OOPC.PP.CD" "SH.XPD.PVTD.CH.ZS"
##
## ================================================================================
## PANEL FIXED EFFECT ESTIMATION | REGION: Sub-Saharan Africa
## ================================================================================
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## SH.H2O.BASW.ZS 0.2764357 0.0399269 6.9236 5.333e-12 ***
## SH.STA.BASS.ZS -0.1100345 0.0092475 -11.8988 < 2.2e-16 ***
## SH.STA.DIAB.ZS -0.0771314 0.0152680 -5.0518 4.630e-07 ***
## SH.STA.ODFC.ZS 0.0060284 0.0096301 0.6260 0.531366
## SP.ADO.TFRT 0.0549295 0.0575285 0.9548 0.339742
## SP.DYN.AMRT.FE -0.2233797 0.0460014 -4.8559 1.258e-06 ***
## SP.DYN.TFRT.IN -1.3046834 0.0780812 -16.7093 < 2.2e-16 ***
## SP.DYN.TO65.MA.ZS 0.3468596 0.0584066 5.9387 3.192e-09 ***
## SP.POP.BRTH.MF -14.8057092 2.2717435 -6.5173 8.324e-11 ***
## SP.POP.GROW -0.1573761 0.0268073 -5.8706 4.801e-09 ***
## SH.ANM.NPRG.ZS 0.2550470 0.0924312 2.7593 0.005826 **
## SH.DTH.IMRT 0.6455285 0.0418999 15.4065 < 2.2e-16 ***
## SH.DYN.NMRT -0.7193592 0.0367670 -19.5654 < 2.2e-16 ***
## SH.IMM.IDPT 0.0168259 0.0162113 1.0379 0.299392
## SH.IMM.MEAS 0.0280118 0.0110479 2.5355 0.011278 *
## SH.MED.BEDS.ZS 0.0222831 0.0189764 1.1742 0.240385
## SH.MED.NUMW.P3 0.0700136 0.0475690 1.4718 0.141167
## SH.MED.PHYS.ZS 0.3320201 0.0746693 4.4465 9.033e-06 ***
## SH.STA.MMRT 0.0441898 0.0255800 1.7275 0.084175 .
## SH.XPD.CHEX.GD.ZS 0.0846439 0.0389848 2.1712 0.029992 *
## SH.XPD.GHED.CH.ZS 0.1657325 0.0248176 6.6780 2.860e-11 ***
## SH.XPD.OOPC.CH.ZS -0.1175765 0.0546845 -2.1501 0.031625 *
## SH.XPD.OOPC.PP.CD 0.0822856 0.0396732 2.0741 0.038154 *
## SH.XPD.PVTD.CH.ZS 0.0739926 0.0475696 1.5555 0.119939
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## ---------------- Model Overall Fit Statistics ----------------
## Within R²: 0.907
## Adjusted Within R²: 0.9048
## F-statistic: 1257.433
## F p-value: 0
## --------------------------------------------------------------
##
## >>> Significant Variables (p < 0.05):
## - SH.H2O.BASW.ZS
## - SH.STA.BASS.ZS
## - SH.STA.DIAB.ZS
## - SP.DYN.AMRT.FE
## - SP.DYN.TFRT.IN
## - SP.DYN.TO65.MA.ZS
## - SP.POP.BRTH.MF
## - SP.POP.GROW
## - SH.ANM.NPRG.ZS
## - SH.DTH.IMRT
## - SH.DYN.NMRT
## - SH.IMM.MEAS
## - SH.MED.PHYS.ZS
## - SH.XPD.CHEX.GD.ZS
## - SH.XPD.GHED.CH.ZS
## - SH.XPD.OOPC.CH.ZS
## - SH.XPD.OOPC.PP.CD
## ================================================================================
##
## === Start Automatic Iterative VIF Filtering | Max allowed VIF = 10 ===
## Initial total candidate variables count: 52
##
## Drop variable: SP.DYN.IMRT.IN | Corresponding VIF = 36533.08
## Drop variable: SH.DYN.MORT | Corresponding VIF = 27951.59
## Drop variable: SH.ANM.ALLW.ZS | Corresponding VIF = 9934.65
## Drop variable: SH.XPD.CHEX.PC.CD | Corresponding VIF = 8239.74
## Drop variable: SP.DYN.IMRT.MA.IN | Corresponding VIF = 3775.25
## Drop variable: SH.XPD.PVTD.PC.CD | Corresponding VIF = 3742.52
## Drop variable: SH.XPD.CHEX.PP.CD | Corresponding VIF = 2289.44
## Drop variable: SH.DTH.IMRT | Corresponding VIF = 1689.02
## Drop variable: SH.DYN.MORT.FE | Corresponding VIF = 1316.78
## Drop variable: SH.XPD.GHED.PC.CD | Corresponding VIF = 1277.61
## Drop variable: SP.DYN.LE00.IN | Corresponding VIF = 1139.08
## Drop variable: SH.XPD.PVTD.PP.CD | Corresponding VIF = 578.57
## Drop variable: SH.DTH.NMRT | Corresponding VIF = 567.06
## Drop variable: SH.XPD.GHED.PP.CD | Corresponding VIF = 299.9
## Drop variable: SP.DYN.IMRT.FE.IN | Corresponding VIF = 231.35
## Drop variable: SP.DYN.LE00.MA.IN | Corresponding VIF = 184.76
## Drop variable: SH.XPD.GHED.GD.ZS | Corresponding VIF = 143.39
## Drop variable: SP.DYN.TO65.FE.ZS | Corresponding VIF = 119.37
## Drop variable: SH.MMR.RISK | Corresponding VIF = 105.41
## Drop variable: SH.DYN.MORT.MA | Corresponding VIF = 46.06
## Drop variable: SH.XPD.OOPC.PP.CD | Corresponding VIF = 43.2
## Drop variable: SH.DTH.MORT | Corresponding VIF = 35.01
## Drop variable: SP.DYN.CBRT.IN | Corresponding VIF = 32.11
## Drop variable: SP.DYN.AMRT.MA | Corresponding VIF = 31.2
## Drop variable: SH.PRG.ANEM | Corresponding VIF = 24.45
## Drop variable: SP.DYN.LE00.FE.IN | Corresponding VIF = 21.22
## Drop variable: SP.DYN.AMRT.FE | Corresponding VIF = 13.17
##
## === VIF Filter Finished ===
## Remaining valid variables after VIF screening count: 25
## Final reserved variable list:
## [1] "SH.H2O.BASW.ZS" "SH.STA.BASS.ZS" "SH.STA.DIAB.ZS"
## [4] "SH.STA.ODFC.ZS" "SP.ADO.TFRT" "SP.DYN.CDRT.IN"
## [7] "SP.DYN.TFRT.IN" "SP.DYN.TO65.MA.ZS" "SP.POP.BRTH.MF"
## [10] "SP.POP.GROW" "SH.ANM.NPRG.ZS" "SH.DYN.NMRT"
## [13] "SH.IMM.IDPT" "SH.IMM.MEAS" "SH.MED.BEDS.ZS"
## [16] "SH.MED.NUMW.P3" "SH.MED.PHYS.ZS" "SH.MMR.DTHS"
## [19] "SH.MMR.RISK.ZS" "SH.STA.MMRT" "SH.XPD.CHEX.GD.ZS"
## [22] "SH.XPD.GHED.CH.ZS" "SH.XPD.OOPC.CH.ZS" "SH.XPD.OOPC.PC.CD"
## [25] "SH.XPD.PVTD.CH.ZS"
##
## ================================================================================
## PANEL FIXED EFFECT ESTIMATION | REGION: Europe & Central Asia
## ================================================================================
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## SH.H2O.BASW.ZS -0.0515405 0.1257185 -0.4100 0.6818562
## SH.STA.BASS.ZS -0.2084342 0.2067745 -1.0080 0.3135160
## SH.STA.DIAB.ZS -0.1109876 0.0766007 -1.4489 0.1474580
## SH.STA.ODFC.ZS 0.1009529 0.0579796 1.7412 0.0817452 .
## SP.ADO.TFRT 0.0718176 0.0221984 3.2353 0.0012273 **
## SP.DYN.CDRT.IN 0.7645184 0.0664353 11.5077 < 2.2e-16 ***
## SP.DYN.TFRT.IN -1.1189967 0.0695813 -16.0819 < 2.2e-16 ***
## SP.DYN.TO65.MA.ZS 1.4825847 0.1487602 9.9663 < 2.2e-16 ***
## SP.POP.BRTH.MF -0.0884586 1.5175854 -0.0583 0.9535220
## SP.POP.GROW -0.1339860 0.0316015 -4.2399 2.298e-05 ***
## SH.ANM.NPRG.ZS -0.5236876 0.1453595 -3.6027 0.0003196 ***
## SH.DYN.NMRT -0.0453437 0.0134669 -3.3671 0.0007685 ***
## SH.IMM.IDPT 0.0045486 0.0218950 0.2077 0.8354401
## SH.IMM.MEAS -0.0614012 0.0127995 -4.7972 1.681e-06 ***
## SH.MED.BEDS.ZS -0.1672849 0.0203266 -8.2298 2.671e-16 ***
## SH.MED.NUMW.P3 0.2905339 0.0234889 12.3690 < 2.2e-16 ***
## SH.MED.PHYS.ZS 0.1465363 0.0202180 7.2478 5.254e-13 ***
## SH.MMR.DTHS 0.2005436 0.0186557 10.7497 < 2.2e-16 ***
## SH.MMR.RISK.ZS -0.3581878 0.1321163 -2.7112 0.0067398 **
## SH.STA.MMRT -0.1630787 0.0280893 -5.8057 7.018e-09 ***
## SH.XPD.CHEX.GD.ZS 0.1674651 0.0513282 3.2626 0.0011151 **
## SH.XPD.GHED.CH.ZS -0.3080738 0.0543651 -5.6668 1.581e-08 ***
## SH.XPD.OOPC.CH.ZS -0.2222489 0.0803019 -2.7677 0.0056775 **
## SH.XPD.OOPC.PC.CD 0.0138053 0.0150421 0.9178 0.3588030
## SH.XPD.PVTD.CH.ZS -0.1813452 0.0842008 -2.1537 0.0313343 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## ---------------- Model Overall Fit Statistics ----------------
## Within R²: 0.8802
## Adjusted Within R²: 0.8775
## F-statistic: 966.809
## F p-value: 0
## --------------------------------------------------------------
##
## >>> Significant Variables (p < 0.05):
## - SP.ADO.TFRT
## - SP.DYN.CDRT.IN
## - SP.DYN.TFRT.IN
## - SP.DYN.TO65.MA.ZS
## - SP.POP.GROW
## - SH.ANM.NPRG.ZS
## - SH.DYN.NMRT
## - SH.IMM.MEAS
## - SH.MED.BEDS.ZS
## - SH.MED.NUMW.P3
## - SH.MED.PHYS.ZS
## - SH.MMR.DTHS
## - SH.MMR.RISK.ZS
## - SH.STA.MMRT
## - SH.XPD.CHEX.GD.ZS
## - SH.XPD.GHED.CH.ZS
## - SH.XPD.OOPC.CH.ZS
## - SH.XPD.PVTD.CH.ZS
## ================================================================================
##
## === Start Automatic Iterative VIF Filtering | Max allowed VIF = 10 ===
## Initial total candidate variables count: 52
##
## Drop variable: SH.DYN.MORT | Corresponding VIF = 206099.6
## Drop variable: SP.DYN.IMRT.IN | Corresponding VIF = 140029.5
## Drop variable: SH.DYN.MORT.MA | Corresponding VIF = 16116.41
## Drop variable: SH.DTH.IMRT | Corresponding VIF = 11297.65
## Drop variable: SP.DYN.LE00.IN | Corresponding VIF = 7659.46
## Drop variable: SH.ANM.ALLW.ZS | Corresponding VIF = 7239.92
## Drop variable: SH.DTH.MORT | Corresponding VIF = 3007.18
## Drop variable: SH.XPD.PVTD.PC.CD | Corresponding VIF = 1616.69
## Drop variable: SP.DYN.IMRT.FE.IN | Corresponding VIF = 1314.76
## Drop variable: SH.XPD.CHEX.PP.CD | Corresponding VIF = 1301.78
## Drop variable: SH.XPD.CHEX.PC.CD | Corresponding VIF = 496.47
## Drop variable: SP.DYN.LE00.FE.IN | Corresponding VIF = 343.16
## Drop variable: SH.XPD.OOPC.PP.CD | Corresponding VIF = 331.6
## Drop variable: SH.DYN.MORT.FE | Corresponding VIF = 330.15
## Drop variable: SH.MMR.DTHS | Corresponding VIF = 189.22
## Drop variable: SH.XPD.GHED.PC.CD | Corresponding VIF = 157.79
## Drop variable: SH.MMR.RISK | Corresponding VIF = 114.3
## Drop variable: SP.DYN.LE00.MA.IN | Corresponding VIF = 108.74
## Drop variable: SH.XPD.GHED.GD.ZS | Corresponding VIF = 86.85
## Drop variable: SH.XPD.PVTD.PP.CD | Corresponding VIF = 82.57
## Drop variable: SP.DYN.TO65.MA.ZS | Corresponding VIF = 67.3
## Drop variable: SP.DYN.CBRT.IN | Corresponding VIF = 65.97
## Drop variable: SP.DYN.IMRT.MA.IN | Corresponding VIF = 38.57
## Drop variable: SP.DYN.AMRT.FE | Corresponding VIF = 33.54
## Drop variable: SH.MMR.RISK.ZS | Corresponding VIF = 20.77
## Drop variable: SP.DYN.TFRT.IN | Corresponding VIF = 15.2
## Drop variable: SP.DYN.TO65.FE.ZS | Corresponding VIF = 14.44
## Drop variable: SH.XPD.GHED.PP.CD | Corresponding VIF = 14.24
##
## === VIF Filter Finished ===
## Remaining valid variables after VIF screening count: 24
## Final reserved variable list:
## [1] "SH.H2O.BASW.ZS" "SH.STA.BASS.ZS" "SH.STA.DIAB.ZS"
## [4] "SH.STA.ODFC.ZS" "SP.ADO.TFRT" "SP.DYN.AMRT.MA"
## [7] "SP.DYN.CDRT.IN" "SP.POP.BRTH.MF" "SP.POP.GROW"
## [10] "SH.ANM.NPRG.ZS" "SH.DTH.NMRT" "SH.DYN.NMRT"
## [13] "SH.IMM.IDPT" "SH.IMM.MEAS" "SH.MED.BEDS.ZS"
## [16] "SH.MED.NUMW.P3" "SH.MED.PHYS.ZS" "SH.PRG.ANEM"
## [19] "SH.STA.MMRT" "SH.XPD.CHEX.GD.ZS" "SH.XPD.GHED.CH.ZS"
## [22] "SH.XPD.OOPC.CH.ZS" "SH.XPD.OOPC.PC.CD" "SH.XPD.PVTD.CH.ZS"
##
## ================================================================================
## PANEL FIXED EFFECT ESTIMATION | REGION: Latin America & Caribbean
## ================================================================================
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## SH.H2O.BASW.ZS 0.041051 0.273547 0.1501 0.8807239
## SH.STA.BASS.ZS 0.031117 0.170078 0.1830 0.8548493
## SH.STA.DIAB.ZS 1.014900 0.186900 5.4302 6.295e-08 ***
## SH.STA.ODFC.ZS -0.090220 0.023800 -3.7907 0.0001546 ***
## SP.ADO.TFRT -0.420867 0.024973 -16.8527 < 2.2e-16 ***
## SP.DYN.AMRT.MA -0.746493 0.039196 -19.0449 < 2.2e-16 ***
## SP.DYN.CDRT.IN 0.166045 0.111821 1.4849 0.1377182
## SP.POP.BRTH.MF 16.755000 2.705889 6.1920 7.151e-10 ***
## SP.POP.GROW -0.411513 0.054271 -7.5826 5.095e-14 ***
## SH.ANM.NPRG.ZS -0.153198 0.092606 -1.6543 0.0982193 .
## SH.DTH.NMRT 0.810022 0.047829 16.9358 < 2.2e-16 ***
## SH.DYN.NMRT -1.035830 0.056087 -18.4683 < 2.2e-16 ***
## SH.IMM.IDPT -0.038863 0.038452 -1.0107 0.3122852
## SH.IMM.MEAS 0.116371 0.031935 3.6440 0.0002751 ***
## SH.MED.BEDS.ZS 0.024031 0.039729 0.6049 0.5453274
## SH.MED.NUMW.P3 0.274396 0.042420 6.4685 1.234e-10 ***
## SH.MED.PHYS.ZS 0.254228 0.050747 5.0097 5.916e-07 ***
## SH.PRG.ANEM -0.113759 0.300366 -0.3787 0.7049252
## SH.STA.MMRT 0.053577 0.036298 1.4760 0.1400920
## SH.XPD.CHEX.GD.ZS 0.038069 0.096899 0.3929 0.6944505
## SH.XPD.GHED.CH.ZS 0.336615 0.095344 3.5305 0.0004239 ***
## SH.XPD.OOPC.CH.ZS -0.273570 0.103527 -2.6425 0.0082924 **
## SH.XPD.OOPC.PC.CD 0.076668 0.047687 1.6077 0.1080476
## SH.XPD.PVTD.CH.ZS 0.628029 0.155891 4.0287 5.814e-05 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## ---------------- Model Overall Fit Statistics ----------------
## Within R²: 0.9085
## Adjusted Within R²: 0.9061
## F-statistic: 850.905
## F p-value: 0
## --------------------------------------------------------------
##
## >>> Significant Variables (p < 0.05):
## - SH.STA.DIAB.ZS
## - SH.STA.ODFC.ZS
## - SP.ADO.TFRT
## - SP.DYN.AMRT.MA
## - SP.POP.BRTH.MF
## - SP.POP.GROW
## - SH.DTH.NMRT
## - SH.DYN.NMRT
## - SH.IMM.MEAS
## - SH.MED.NUMW.P3
## - SH.MED.PHYS.ZS
## - SH.XPD.GHED.CH.ZS
## - SH.XPD.OOPC.CH.ZS
## - SH.XPD.PVTD.CH.ZS
## ================================================================================
##
## === Start Automatic Iterative VIF Filtering | Max allowed VIF = 10 ===
## Initial total candidate variables count: 52
##
## Drop variable: SP.DYN.IMRT.IN | Corresponding VIF = 106523.1
## Drop variable: SH.DYN.MORT | Corresponding VIF = 32348.05
## Drop variable: SP.DYN.LE00.IN | Corresponding VIF = 7717.98
## Drop variable: SH.DTH.IMRT | Corresponding VIF = 5049.44
## Drop variable: SP.DYN.IMRT.MA.IN | Corresponding VIF = 4576.55
## Drop variable: SH.ANM.ALLW.ZS | Corresponding VIF = 3439.3
## Drop variable: SH.DYN.MORT.FE | Corresponding VIF = 1719.16
## Drop variable: SH.XPD.PVTD.PP.CD | Corresponding VIF = 1529.89
## Drop variable: SH.DTH.NMRT | Corresponding VIF = 1462.46
## Drop variable: SH.XPD.CHEX.PP.CD | Corresponding VIF = 749.03
## Drop variable: SP.DYN.LE00.MA.IN | Corresponding VIF = 698.38
## Drop variable: SH.XPD.OOPC.PC.CD | Corresponding VIF = 641.69
## Drop variable: SP.DYN.IMRT.FE.IN | Corresponding VIF = 331.89
## Drop variable: SH.XPD.GHED.PC.CD | Corresponding VIF = 322.44
## Drop variable: SH.MMR.DTHS | Corresponding VIF = 315.5
## Drop variable: SH.MMR.RISK | Corresponding VIF = 258.58
## Drop variable: SH.XPD.PVTD.PC.CD | Corresponding VIF = 230.31
## Drop variable: SH.XPD.OOPC.PP.CD | Corresponding VIF = 123.76
## Drop variable: SP.DYN.LE00.FE.IN | Corresponding VIF = 104
## Drop variable: SH.XPD.GHED.GD.ZS | Corresponding VIF = 72.98
## Drop variable: SP.DYN.AMRT.FE | Corresponding VIF = 46.41
## Drop variable: SH.DYN.MORT.MA | Corresponding VIF = 42.89
## Drop variable: SH.XPD.OOPC.CH.ZS | Corresponding VIF = 32.12
## Drop variable: SH.XPD.GHED.PP.CD | Corresponding VIF = 30.2
## Drop variable: SP.DYN.CBRT.IN | Corresponding VIF = 27.88
## Drop variable: SH.STA.MMRT | Corresponding VIF = 25.78
## Drop variable: SP.DYN.TO65.MA.ZS | Corresponding VIF = 18.71
## Drop variable: SH.PRG.ANEM | Corresponding VIF = 14.19
## Drop variable: SH.DYN.NMRT | Corresponding VIF = 10.88
##
## === VIF Filter Finished ===
## Remaining valid variables after VIF screening count: 23
## Final reserved variable list:
## [1] "SH.H2O.BASW.ZS" "SH.STA.BASS.ZS" "SH.STA.DIAB.ZS"
## [4] "SH.STA.ODFC.ZS" "SP.ADO.TFRT" "SP.DYN.AMRT.MA"
## [7] "SP.DYN.CDRT.IN" "SP.DYN.TFRT.IN" "SP.DYN.TO65.FE.ZS"
## [10] "SP.POP.BRTH.MF" "SP.POP.GROW" "SH.ANM.NPRG.ZS"
## [13] "SH.DTH.MORT" "SH.IMM.IDPT" "SH.IMM.MEAS"
## [16] "SH.MED.BEDS.ZS" "SH.MED.NUMW.P3" "SH.MED.PHYS.ZS"
## [19] "SH.MMR.RISK.ZS" "SH.XPD.CHEX.GD.ZS" "SH.XPD.CHEX.PC.CD"
## [22] "SH.XPD.GHED.CH.ZS" "SH.XPD.PVTD.CH.ZS"
##
## ================================================================================
## PANEL FIXED EFFECT ESTIMATION | REGION: East Asia & Pacific
## ================================================================================
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## SH.H2O.BASW.ZS 0.3438276 0.0885102 3.8846 0.0001061 ***
## SH.STA.BASS.ZS 0.2899380 0.0351680 8.2444 3.097e-16 ***
## SH.STA.DIAB.ZS -0.1797734 0.0269212 -6.6778 3.192e-11 ***
## SH.STA.ODFC.ZS -0.0180737 0.0306509 -0.5897 0.5554890
## SP.ADO.TFRT -0.0849779 0.0244289 -3.4786 0.0005157 ***
## SP.DYN.AMRT.MA -0.7259195 0.0770894 -9.4166 < 2.2e-16 ***
## SP.DYN.CDRT.IN 0.5176674 0.0776880 6.6634 3.512e-11 ***
## SP.DYN.TFRT.IN -0.8043299 0.0771956 -10.4194 < 2.2e-16 ***
## SP.DYN.TO65.FE.ZS 0.3103582 0.1325511 2.3414 0.0193153 *
## SP.POP.BRTH.MF -7.9629923 1.7774872 -4.4799 7.921e-06 ***
## SP.POP.GROW 0.0237881 0.0240942 0.9873 0.3236265
## SH.ANM.NPRG.ZS -0.0786380 0.1015538 -0.7743 0.4388232
## SH.DTH.MORT -0.0037587 0.0261162 -0.1439 0.8855771
## SH.IMM.IDPT -0.0083179 0.0141849 -0.5864 0.5576815
## SH.IMM.MEAS 0.1261852 0.0181754 6.9426 5.296e-12 ***
## SH.MED.BEDS.ZS -0.3038471 0.0393772 -7.7163 1.938e-14 ***
## SH.MED.NUMW.P3 -0.1194708 0.0637345 -1.8745 0.0610169 .
## SH.MED.PHYS.ZS 0.3418801 0.0481087 7.1064 1.689e-12 ***
## SH.MMR.RISK.ZS 0.2959194 0.0522938 5.6588 1.762e-08 ***
## SH.XPD.CHEX.GD.ZS -0.0329590 0.0596893 -0.5522 0.5808933
## SH.XPD.CHEX.PC.CD 0.0736714 0.0222026 3.3181 0.0009236 ***
## SH.XPD.GHED.CH.ZS -0.0403248 0.0399755 -1.0087 0.3132313
## SH.XPD.PVTD.CH.ZS -0.2272111 0.0339018 -6.7020 2.715e-11 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## ---------------- Model Overall Fit Statistics ----------------
## Within R²: 0.8826
## Adjusted Within R²: 0.8794
## F-statistic: 608.4633
## F p-value: 0
## --------------------------------------------------------------
##
## >>> Significant Variables (p < 0.05):
## - SH.H2O.BASW.ZS
## - SH.STA.BASS.ZS
## - SH.STA.DIAB.ZS
## - SP.ADO.TFRT
## - SP.DYN.AMRT.MA
## - SP.DYN.CDRT.IN
## - SP.DYN.TFRT.IN
## - SP.DYN.TO65.FE.ZS
## - SP.POP.BRTH.MF
## - SH.IMM.MEAS
## - SH.MED.BEDS.ZS
## - SH.MED.PHYS.ZS
## - SH.MMR.RISK.ZS
## - SH.XPD.CHEX.PC.CD
## - SH.XPD.PVTD.CH.ZS
## ================================================================================
##
## === Start Automatic Iterative VIF Filtering | Max allowed VIF = 10 ===
## Initial total candidate variables count: 52
##
## Drop variable: SH.DYN.MORT | Corresponding VIF = 1008639
## Drop variable: SP.DYN.IMRT.IN | Corresponding VIF = 547059
## Drop variable: SH.DTH.IMRT | Corresponding VIF = 296272.2
## Drop variable: SH.DYN.MORT.MA | Corresponding VIF = 68089.94
## Drop variable: SP.DYN.LE00.MA.IN | Corresponding VIF = 23397.77
## Drop variable: SH.DTH.MORT | Corresponding VIF = 20298.17
## Drop variable: SP.DYN.IMRT.FE.IN | Corresponding VIF = 19025.95
## Drop variable: SH.ANM.ALLW.ZS | Corresponding VIF = 14551.51
## Drop variable: SH.XPD.CHEX.PC.CD | Corresponding VIF = 7232.73
## Drop variable: SH.MMR.DTHS | Corresponding VIF = 6057.86
## Drop variable: SH.XPD.OOPC.PC.CD | Corresponding VIF = 5096.14
## Drop variable: SH.XPD.GHED.PP.CD | Corresponding VIF = 4397.77
## Drop variable: SP.DYN.TO65.FE.ZS | Corresponding VIF = 3605.1
## Drop variable: SH.XPD.PVTD.PP.CD | Corresponding VIF = 2837.18
## Drop variable: SP.DYN.IMRT.MA.IN | Corresponding VIF = 2056.21
## Drop variable: SH.XPD.OOPC.PP.CD | Corresponding VIF = 1468.91
## Drop variable: SH.MMR.RISK | Corresponding VIF = 1009.13
## Drop variable: SP.DYN.LE00.FE.IN | Corresponding VIF = 962.97
## Drop variable: SH.XPD.GHED.PC.CD | Corresponding VIF = 953.68
## Drop variable: SH.XPD.PVTD.CH.ZS | Corresponding VIF = 693.06
## Drop variable: SH.STA.MMRT | Corresponding VIF = 432.67
## Drop variable: SH.XPD.PVTD.PC.CD | Corresponding VIF = 258.36
## Drop variable: SH.DYN.MORT.FE | Corresponding VIF = 249.93
## Drop variable: SH.XPD.GHED.GD.ZS | Corresponding VIF = 223.74
## Drop variable: SP.DYN.LE00.IN | Corresponding VIF = 204.44
## Drop variable: SP.DYN.AMRT.MA | Corresponding VIF = 153.97
## Drop variable: SH.PRG.ANEM | Corresponding VIF = 144.69
## Drop variable: SP.DYN.CBRT.IN | Corresponding VIF = 134.39
## Drop variable: SH.DYN.NMRT | Corresponding VIF = 119.43
## Drop variable: SP.DYN.CDRT.IN | Corresponding VIF = 109.71
## Drop variable: SH.XPD.CHEX.GD.ZS | Corresponding VIF = 73.21
## Drop variable: SH.MED.BEDS.ZS | Corresponding VIF = 50.19
## Drop variable: SP.DYN.AMRT.FE | Corresponding VIF = 43.4
## Drop variable: SP.DYN.TFRT.IN | Corresponding VIF = 34.48
## Drop variable: SH.XPD.GHED.CH.ZS | Corresponding VIF = 28.43
## Drop variable: SH.XPD.CHEX.PP.CD | Corresponding VIF = 26.31
## Drop variable: SH.MED.NUMW.P3 | Corresponding VIF = 20.44
## Drop variable: SH.MMR.RISK.ZS | Corresponding VIF = 16.13
## Drop variable: SH.STA.ODFC.ZS | Corresponding VIF = 14.42
##
## === VIF Filter Finished ===
## Remaining valid variables after VIF screening count: 13
## Final reserved variable list:
## [1] "SH.H2O.BASW.ZS" "SH.STA.BASS.ZS" "SH.STA.DIAB.ZS"
## [4] "SP.ADO.TFRT" "SP.DYN.TO65.MA.ZS" "SP.POP.BRTH.MF"
## [7] "SP.POP.GROW" "SH.ANM.NPRG.ZS" "SH.DTH.NMRT"
## [10] "SH.IMM.IDPT" "SH.IMM.MEAS" "SH.MED.PHYS.ZS"
## [13] "SH.XPD.OOPC.CH.ZS"
##
## ================================================================================
## PANEL FIXED EFFECT ESTIMATION | REGION: South Asia
## ================================================================================
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## SH.H2O.BASW.ZS -1.0336434 0.5547937 -1.8631 0.063222 .
## SH.STA.BASS.ZS 0.1001023 0.0652166 1.5349 0.125640
## SP.ADO.TFRT -0.0878679 0.0169338 -5.1889 3.459e-07 ***
## SP.DYN.TO65.MA.ZS 1.0979985 0.0882193 12.4462 < 2.2e-16 ***
## SP.POP.BRTH.MF 5.9696395 2.0727910 2.8800 0.004203 **
## SP.POP.GROW -0.0064639 0.0316209 -0.2044 0.838136
## SH.ANM.NPRG.ZS -1.1681776 0.1515146 -7.7100 1.124e-13 ***
## SH.DTH.NMRT -0.2642242 0.0263828 -10.0150 < 2.2e-16 ***
## SH.IMM.IDPT -0.0034285 0.0150871 -0.2272 0.820355
## SH.IMM.MEAS 0.0787005 0.0141988 5.5428 5.596e-08 ***
## SH.MED.PHYS.ZS -0.0651712 0.0392701 -1.6596 0.097832 .
## SH.XPD.OOPC.CH.ZS 0.0576233 0.0228610 2.5206 0.012126 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## ---------------- Model Overall Fit Statistics ----------------
## Within R²: 0.9763
## Adjusted Within R²: 0.9752
## F-statistic: 1295.597
## F p-value: 0.000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000172766
## --------------------------------------------------------------
##
## >>> Significant Variables (p < 0.05):
## - SP.ADO.TFRT
## - SP.DYN.TO65.MA.ZS
## - SP.POP.BRTH.MF
## - SH.ANM.NPRG.ZS
## - SH.DTH.NMRT
## - SH.IMM.MEAS
## - SH.XPD.OOPC.CH.ZS
## ================================================================================
##
## === Start Automatic Iterative VIF Filtering | Max allowed VIF = 10 ===
## Initial total candidate variables count: 51
##
## Drop variable: SP.DYN.LE00.IN | Corresponding VIF = 45422535
## Drop variable: SH.XPD.CHEX.PC.CD | Corresponding VIF = 9396505
## Drop variable: SH.XPD.PVTD.PC.CD | Corresponding VIF = 921574.8
## Drop variable: SH.DTH.IMRT | Corresponding VIF = 405497.4
## Drop variable: SH.XPD.CHEX.PP.CD | Corresponding VIF = 114791.2
## Drop variable: SH.STA.DIAB.ZS | Corresponding VIF = 71845.52
## Drop variable: SH.XPD.GHED.CH.ZS | Corresponding VIF = 39797.42
## Drop variable: SH.DYN.MORT | Corresponding VIF = 35999.52
## Drop variable: SP.DYN.IMRT.IN | Corresponding VIF = 27397.58
## Drop variable: SH.XPD.OOPC.PC.CD | Corresponding VIF = 22469.69
## Drop variable: SH.XPD.PVTD.PP.CD | Corresponding VIF = 21733.73
## Drop variable: SH.DYN.MORT.MA | Corresponding VIF = 17980.92
## Drop variable: SH.DTH.NMRT | Corresponding VIF = 15857.33
## Drop variable: SH.MMR.DTHS | Corresponding VIF = 11370.11
## Drop variable: SP.DYN.IMRT.FE.IN | Corresponding VIF = 10944.53
## Drop variable: SH.ANM.ALLW.ZS | Corresponding VIF = 7191.41
## Drop variable: SP.DYN.IMRT.MA.IN | Corresponding VIF = 6590.67
## Drop variable: SH.PRG.ANEM | Corresponding VIF = 5109.55
## Drop variable: SH.DTH.MORT | Corresponding VIF = 2959.71
## Drop variable: SP.DYN.TO65.FE.ZS | Corresponding VIF = 2647.55
## Drop variable: SP.DYN.LE00.MA.IN | Corresponding VIF = 2378.93
## Drop variable: SH.XPD.GHED.PP.CD | Corresponding VIF = 1826.3
## Drop variable: SH.DYN.MORT.FE | Corresponding VIF = 1645.89
## Drop variable: SH.H2O.BASW.ZS | Corresponding VIF = 1297.27
## Drop variable: SH.XPD.CHEX.GD.ZS | Corresponding VIF = 926.3
## Drop variable: SP.DYN.AMRT.FE | Corresponding VIF = 841.42
## Drop variable: SH.MMR.RISK | Corresponding VIF = 759.52
## Drop variable: SP.DYN.TO65.MA.ZS | Corresponding VIF = 491.61
## Drop variable: SH.XPD.PVTD.CH.ZS | Corresponding VIF = 271.55
## Drop variable: SP.DYN.LE00.FE.IN | Corresponding VIF = 227.89
## Drop variable: SP.DYN.AMRT.MA | Corresponding VIF = 131.88
## Drop variable: SH.XPD.OOPC.PP.CD | Corresponding VIF = 98
## Drop variable: SH.XPD.OOPC.CH.ZS | Corresponding VIF = 96.67
## Drop variable: SH.STA.MMRT | Corresponding VIF = 90.76
## Drop variable: SP.ADO.TFRT | Corresponding VIF = 52.64
## Drop variable: SH.XPD.GHED.PC.CD | Corresponding VIF = 45.18
## Drop variable: SH.DYN.NMRT | Corresponding VIF = 37.16
## Drop variable: SP.DYN.CBRT.IN | Corresponding VIF = 18.7
## Drop variable: SH.XPD.GHED.GD.ZS | Corresponding VIF = 18.09
## Drop variable: SH.MMR.RISK.ZS | Corresponding VIF = 14.85
## Drop variable: SH.MED.BEDS.ZS | Corresponding VIF = 11.77
##
## === VIF Filter Finished ===
## Remaining valid variables after VIF screening count: 10
## Final reserved variable list:
## [1] "SH.STA.BASS.ZS" "SP.DYN.CDRT.IN" "SP.DYN.TFRT.IN" "SP.POP.BRTH.MF"
## [5] "SP.POP.GROW" "SH.ANM.NPRG.ZS" "SH.IMM.IDPT" "SH.IMM.MEAS"
## [9] "SH.MED.NUMW.P3" "SH.MED.PHYS.ZS"
##
## ================================================================================
## PANEL FIXED EFFECT ESTIMATION | REGION: North America
## ================================================================================
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## SH.STA.BASS.ZS -64.988270 8.470151 -7.6726 4.949e-12 ***
## SP.DYN.CDRT.IN -0.079227 0.165327 -0.4792 0.63266
## SP.DYN.TFRT.IN -1.321979 0.187212 -7.0614 1.159e-10 ***
## SP.POP.BRTH.MF -14.016858 8.296932 -1.6894 0.09374 .
## SP.POP.GROW 0.114461 0.084873 1.3486 0.18000
## SH.ANM.NPRG.ZS 0.573214 0.104616 5.4792 2.395e-07 ***
## SH.IMM.IDPT 0.072814 0.918052 0.0793 0.93692
## SH.IMM.MEAS 0.835217 0.187332 4.4585 1.869e-05 ***
## SH.MED.NUMW.P3 -0.885964 0.194503 -4.5550 1.267e-05 ***
## SH.MED.PHYS.ZS 0.964027 0.075903 12.7007 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
##
## ---------------- Model Overall Fit Statistics ----------------
## Within R²: 0.9788
## Adjusted Within R²: 0.9768
## F-statistic: 553.3676
## F p-value: 0.0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000238452
## --------------------------------------------------------------
##
## >>> Significant Variables (p < 0.05):
## - SH.STA.BASS.ZS
## - SP.DYN.TFRT.IN
## - SH.ANM.NPRG.ZS
## - SH.IMM.MEAS
## - SH.MED.NUMW.P3
## - SH.MED.PHYS.ZS
## ================================================================================
# ==============================================================================
# Step 3: Select optimal model by total absolute coefficient sum
# Adjusted R² & F-stat unavailable for all regional specifications,
# choose model with largest sum of absolute coefficients as alternative screening rule
# ==============================================================================
if (length(region_fe_list) == 0) {
cat("Warning: No valid regional fixed-effect model is available.\n")
} else {
# Calculate sum of absolute regression coefficients for each group
abs_coef_sum <- sapply(region_fe_list, function(m) {
tryCatch({
sum(abs(coef(m)))
}, error = function(e) NA)
})
abs_clean <- na.omit(abs_coef_sum)
if (length(abs_clean) == 0) {
cat("Warning: Fail to extract coefficient from all regional models.\n")
} else {
best_group_name <- names(which.max(abs_clean))
if (best_group_name %in% names(region_fe_list)) {
best_reg_model <- region_fe_list[[best_group_name]]
best_sum_val <- round(max(abs_clean),4)
# Output result info
cat("\n==== Optimal Fixed Effect by Region Group (Max Absolute Coefficient Sum) ====\n")
cat("Best Performing Region: ", best_group_name, "\n")
cat("Sum of Absolute Coefficients: ", best_sum_val, "\n\n")
# Add fit stats for optimal region model
sum_best <- summary(best_reg_model)
r2b <- round(as.numeric(sum_best$r.squared["rsq"]),4)
adjr2b <- round(as.numeric(sum_best$r.squared["adjrsq"]),4)
fb <- round(as.numeric(sum_best$fstatistic$statistic),4)
fpb <- format(as.numeric(sum_best$fstatistic$p.value), digits = 6, scientific = FALSE)
cat("Optimal Model Fit Index:\n")
cat("Within R²: ",r2b," | Adj Within R²: ",adjr2b," | F: ",fb," | F-p: ",fpb,"\n\n")
# Robust standard error output
print(coeftest(best_reg_model, vcov = vcovSCC(best_reg_model)))
} else {
cat("Warning: Selected group name not found in model list.\n")
}
}
}
##
## ==== Optimal Fixed Effect by Region Group (Max Absolute Coefficient Sum) ====
## Best Performing Region: North America
## Sum of Absolute Coefficients: 83.852
##
## Optimal Model Fit Index:
## Within R²: 0.9788 | Adj Within R²: 0.9768 | F: 553.3676 | F-p: 0.0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000238452
##
##
## t test of coefficients:
##
## Estimate Std. Error t value Pr(>|t|)
## SH.STA.BASS.ZS -64.988270 8.470151 -7.6726 4.949e-12 ***
## SP.DYN.CDRT.IN -0.079227 0.165327 -0.4792 0.63266
## SP.DYN.TFRT.IN -1.321979 0.187212 -7.0614 1.159e-10 ***
## SP.POP.BRTH.MF -14.016858 8.296932 -1.6894 0.09374 .
## SP.POP.GROW 0.114461 0.084873 1.3486 0.18000
## SH.ANM.NPRG.ZS 0.573214 0.104616 5.4792 2.395e-07 ***
## SH.IMM.IDPT 0.072814 0.918052 0.0793 0.93692
## SH.IMM.MEAS 0.835217 0.187332 4.4585 1.869e-05 ***
## SH.MED.NUMW.P3 -0.885964 0.194503 -4.5550 1.267e-05 ***
## SH.MED.PHYS.ZS 0.964027 0.075903 12.7007 < 2.2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Average birth rate and death rate per country are aggregated as clustering input features; K-means unsupervised clustering splits all nations into two disjoint high/low aging categories to finish the required classification research objective.
Standardized cluster centroids: High-aging cluster has negative standardized centroids (\(\text{avg\_tfr}=-0.923\), \(\text{avg\_mort}=-0.889\)), representing low fertility and low mortality; low-aging cluster shows positive centroid values (\(\text{avg\_tfr}=0.786\), \(\text{avg\_mort}=0.758\)), corresponding to high fertility and high mortality.
Original raw-group average: In original unstandardized metrics, high-aging countries record average fertility of 1.777 and average mortality of 5.476; low-aging countries have average fertility at 1.189 and mortality at 4.580.
# Calculate country average fertility & average mortality per nation
cluster_df <- df_wide_final %>%
group_by(country_code) %>%
summarise(
avg_tfr = mean(SP.DYN.TFRT.IN, na.rm = TRUE),
avg_mort = mean(SP.DYN.AMRT.FE, na.rm = TRUE),
.groups = "drop"
)
# Standardize indicators to eliminate dimension interference
scaled_data <- scale(cluster_df[,c("avg_tfr","avg_mort")])
# K-Means clustering, fixed k=2 (High / Low aging)
set.seed(123) # fixed random seed for reproducibility
kmod <- kmeans(scaled_data, centers = 2, nstart = 25)
# Add cluster label: Low aging (high fert+high mort), High aging (low fert+low mort)
cluster_df$aging_group <- ifelse(kmod$cluster == 1, "Low_Aging", "High_Aging")
# Merge grouping label back into original full panel dataset
df_wide_final <- left_join(df_wide_final, cluster_df[,c("country_code","aging_group")], by = "country_code")
# Country count per cluster
cnt_tb <- table(kmod$cluster) %>% enframe(name = "Cluster_ID", value = "Country_Number")
# Standardized cluster centroid
cent_std <- as_tibble(kmod$centers, rownames = "Cluster_ID")
# Raw original mean by group
raw_mean <- aggregate(cluster_df[,c("avg_tfr","avg_mort")],
by=list(Group=cluster_df$aging_group), FUN=mean)
# Print neatly formatted tables
cat("===== Cluster Sample Size =====\n")
## ===== Cluster Sample Size =====
print(cnt_tb, row.names = F)
## # A tibble: 2 × 2
## Cluster_ID Country_Number
## <chr> <table[1d]>
## 1 1 110
## 2 2 129
cat("\n===== Standardized Cluster Centroid =====\n")
##
## ===== Standardized Cluster Centroid =====
print(cent_std, row.names = F)
## # A tibble: 2 × 3
## Cluster_ID avg_tfr avg_mort
## <chr> <dbl> <dbl>
## 1 1 -0.922 -0.889
## 2 2 0.786 0.758
cat("\n===== Raw Indicator Average by Aging Group =====\n")
##
## ===== Raw Indicator Average by Aging Group =====
print(raw_mean, row.names = F)
## Group avg_tfr avg_mort
## High_Aging 1.776704 5.476458
## Low_Aging 1.188919 4.579725
Comprehensive validation includes subgroup regression comparison, random forest feature importance verification for regression robustness, elbow/silhouette clustering quality metrics, manual benchmark grouping and random forest classification accuracy test to confirm reliability of both regression and classification outputs.
Note: Only two commonly statistically significant explanatory variables are retained for cross-group heterogeneous comparison: total fertility rate (SP.DYN.TFRT.IN) and physician density (SH.MED.PHYS.ZS).
Income-group results
Regional results
Summary
# ==============================================================================
# Build Coefficient Comparison Table Using Pre-estimated Subgroup FE Models
# Construct heterogeneous fixed-effect comparison table directly from pre-saved model list
# Output: formatted dataframe containing core coefficients(with significance stars), within-R² and F-stat p-value
# ==============================================================================
build_compare_table <- function(model_list) {
# Return empty skeleton table if no valid subgroup models exist
if(length(model_list)==0){
var_order <- c("SP.DYN.TFRT.IN", "SH.MED.PHYS.ZS", "R-squared (Within)", "P-value (F-Stat)")
return(data.frame(row.names = var_order))
}
# Remove "Unknown" category, exclude invalid undefined grouping column
valid_mod_list <- model_list[!names(model_list)=="Unknown"]
if(length(valid_mod_list)==0){
var_order <- c("SP.DYN.TFRT.IN", "SH.MED.PHYS.ZS", "R-squared (Within)", "P-value (F-Stat)")
return(data.frame(row.names = var_order))
}
# Extract raw subgroup names and clean redundant linebreak / tab / extra whitespace
raw_grp <- names(valid_mod_list)
clean_name <- purrr::map_chr(raw_grp, ~stringr::str_squish(stringr::str_replace_all(.x,c("\\n"="","\\r"="","\\t"=""))))
# Define fixed row labels of output table
var_order <- c("SP.DYN.TFRT.IN", "SH.MED.PHYS.ZS", "R-squared (Within)", "P-value (F-Stat)")
# Initialize blank character table filled with empty string
out_tb <- matrix("", nrow=length(var_order), ncol=length(clean_name), dimnames = list(var_order, clean_name)) %>% as.data.frame()
# Inner helper: generate statistical significance stars based on p-value thresholds
get_stars <- function(p){
if(p<0.01) return("***")
if(p<0.05) return("**")
if(p<0.1) return("*")
return("")
}
# Loop over precomputed models to fill table content cell by cell
for(i in seq_along(raw_grp)){
grp_raw <- raw_grp[i]
grp_cln <- clean_name[i]
mod <- valid_mod_list[[grp_raw]]
sum_mod <- summary(mod)
if(!is.null(sum_mod$r.squared) && "rsq" %in% names(sum_mod$r.squared)){
r2 <- round(as.numeric(sum_mod$r.squared["rsq"]),4)
out_tb["R-squared (Within)", grp_cln] <- as.character(r2)
}else{
out_tb["R-squared (Within)", grp_cln] <- ""
}
if(!is.null(sum_mod$fstatistic) && inherits(sum_mod$fstatistic,"htest")){
fp <- round(as.numeric(sum_mod$fstatistic$p.value),4)
out_tb["P-value (F-Stat)", grp_cln] <- as.character(fp)
}else{
out_tb["P-value (F-Stat)", grp_cln] <- ""
}
# Extract coefficient matrix (remove de-meaned intercept row)
coef_mat <- sum_mod$coefficients[-1,,drop=F]
all_v <- rownames(coef_mat)
all_b <- coef_mat[,1]
all_p <- coef_mat[,4]
# Fill target two core explanatory variables with coefficient + significance mark
for(v in c("SP.DYN.TFRT.IN","SH.MED.PHYS.ZS")){
if(v %in% all_v){
idx <- which(all_v==v)
s <- get_stars(all_p[idx])
out_tb[v,grp_cln] <- paste0(round(all_b[idx],3),s)
}else{
out_tb[v,grp_cln] <- ""
}
}
}
return(out_tb)
}
# ============================== Main Execution ==============================
# Directly reuse pre-estimated subgroup fixed-effect model list (no refitting, no repeated VIF screening)
income_comparison_df <- build_compare_table(income_fe_list)
region_comparison_df <- build_compare_table(region_fe_list)
# Print formatted cross-group comparison tables only
cat("======================= Table 1: Fixed Effects Across Income Groups =======================\n")
## ======================= Table 1: Fixed Effects Across Income Groups =======================
if(ncol(income_comparison_df)==0){
cat("Warning: No valid regression result for income subgroups (all sample size <10)\n")
}else{
print(income_comparison_df)
}
## Low income Lower middle income Upper middle income
## SP.DYN.TFRT.IN -0.883*** -0.955*** -0.936***
## SH.MED.PHYS.ZS 0.685*** 0.135*** 0.163***
## R-squared (Within) 0.9046 0.9095 0.9085
## P-value (F-Stat) 0 0 0
## High income
## SP.DYN.TFRT.IN -1.218***
## SH.MED.PHYS.ZS 0.089***
## R-squared (Within) 0.8612
## P-value (F-Stat) 0
cat("\n============================================= Table 2: Fixed Effects Across Regions ==========================================\n")
##
## ============================================= Table 2: Fixed Effects Across Regions ==========================================
if(ncol(region_comparison_df)==0){
cat("Warning: No valid regression result for regional subgroups (all sample size <10)\n")
}else{
print(region_comparison_df)
}
## Middle East & North Africa Sub-Saharan Africa
## SP.DYN.TFRT.IN -0.98*** -1.305***
## SH.MED.PHYS.ZS 0.401*** 0.332***
## R-squared (Within) 0.9158 0.907
## P-value (F-Stat) 0 0
## Europe & Central Asia Latin America & Caribbean
## SP.DYN.TFRT.IN -1.119***
## SH.MED.PHYS.ZS 0.147*** 0.254***
## R-squared (Within) 0.8802 0.9085
## P-value (F-Stat) 0 0
## East Asia & Pacific South Asia North America
## SP.DYN.TFRT.IN -0.804*** -1.322***
## SH.MED.PHYS.ZS 0.342*** -0.065 0.964***
## R-squared (Within) 0.8826 0.9763 0.9788
## P-value (F-Stat) 0 0 0
The coefficient plot reveals key drivers of population aging in the final fixed effects specification:
Positive and Significant Drivers
SP.DYN.TO65.FE.ZS (Female life expectancy at age 65) has
the largest positive coefficient, indicating that longer life expectancy
among the elderly strongly accelerates population aging. Other positive
factors include access to improved water sources
(SH.H2O.BASW.ZS), private health spending
(SH.XPD.PVTD.CH.ZS), and crude death rate
(SP.DYN.CDRT.IN). These results reflect how better public
health infrastructure and higher mortality at younger ages both
contribute to a larger share of the elderly population.
Negative and Significant Drivers
SP.DYN.TFRT.IN (Total fertility rate) shows the
strongest negative coefficient, confirming that higher fertility rates
slow population aging by increasing the share of younger cohorts.
Additional negative factors include female adult mortality
(SP.DYN.AMRT.MA), out-of-pocket health spending
(SH.XPD.OOPC.CH.ZS), and population growth
(SP.POP.GROW). These variables reflect demographic and
economic conditions that counteract aging pressures.
Key Takeaway
The model confirms that population aging is driven by a combination of: - Life expectancy improvements and public health progress (positive effects) - Fertility and population growth dynamics (negative effects)
These results align with demographic theory, showing that both rising longevity and falling fertility are central to the aging process.
# ==============================================================================
# Step 1: Prepare plotting dataset, add significance stars to variable names
# Use screened significant sample (p < 0.05) from regression output
# Arrange data by coefficient ascending, set factor levels to control y-axis display order
# ==============================================================================
coef_plot_df <- sig_df %>%
mutate(Variable = paste0(Variable, " ", substr(Significance,1,3))) %>%
# Sort data by coefficient value from small to large
arrange(Coefficient)
# Fix y-axis order: smallest coefficient at top, largest coefficient at bottom
coef_plot_df$Variable <- factor(coef_plot_df$Variable, levels = coef_plot_df$Variable)
# ==============================================================================
# Step 2: Generate horizontal coefficient bar chart with gradient color
# Color gradient maps to coefficient value, no extra legend displayed
# ==============================================================================
ggplot(coef_plot_df, aes(x = Coefficient, y = Variable, fill = Coefficient)) +
geom_col() +
geom_vline(xintercept = 0, linetype = "dashed", alpha = 0.7) +
labs(
title = "Coefficients of Significant Variables in Final Fixed Effects Model",
x = "Coefficient Value",
y = "Variable"
) +
scale_fill_viridis_c(option = "viridis") +
theme_bw() +
theme(
plot.title = element_text(size = 13, hjust = 0.5),
axis.title = element_text(size = 12),
legend.position = "none"
)
The frequency bar chart quantifies how many income subgroups each explanatory variable shows statistical significance, demonstrating universal and group-specific aging determinants across development tiers.
1. Universal, cross-group significant drivers
SP.DYN.TFRT.IN (total fertility rate) and SH.MED.PHYS.ZS (physician density) achieve statistical significance across all five income subgroups, standing as the most robust consistent predictors of population aging. A batch of demographic and healthcare indicators obtains significance within four or three income groups, showing relatively stable predictive effects. The remaining majority of variables are only significant in merely one or two income brackets, with limited general applicability.
2. Heterogeneous unique drivers
for separate income cohorts Several exclusive significant predictors exist for specific income categories, reflecting differentiated aging formation paths:
Low-income nations: Two unique significant variables related to government and private medical spending.
Lower-middle-income nations: Distinct determinants including mortality rate, out-of-pocket medical expense and female senior life expectancy.
High-income nations: Three exclusive significant indicators covering household healthcare expenditure and male life expectancy at age 65.
Unknown and upper-middle-income subgroups contain no group-only significant variables.
Overall Conclusion
Empirical results verify that fertility and physician resource act as steady core drivers of aging for all economies, whereas plenty of influencing factors only take effect under specific income levels.
# ==============================================================================
# Step: Extract significant variables from existing fitted income models to build income_sig_list, NO re-estimation
# ==============================================================================
income_sig_list <- list()
gr_names <- names(income_fe_list)
for(g in gr_names){
robust_res <- coeftest(income_fe_list[[g]], vcov = vcovSCC(income_fe_list[[g]]))
p_vals <- robust_res[,4]
sig_vars <- names(p_vals[p_vals < 0.05])
income_sig_list[[g]] <- sig_vars
}
# ==============================================================================
# Calculate occurrence frequency of each significant variable across income groups & plot bar chart
# Use pre-saved income_sig_list from previous subgroup regression directly, no repeated modeling
# ==============================================================================
all_sig_vars <- unlist(income_sig_list)
if (length(all_sig_vars) > 0) {
freq_tb <- as.data.frame(table(all_sig_vars))
colnames(freq_tb) <- c("Variable", "Sig_Group_Count")
freq_tb <- freq_tb %>%
arrange(desc(Sig_Group_Count)) %>%
mutate(Variable = factor(Variable, levels = Variable))
ggplot(freq_tb, aes(x = Variable, y = Sig_Group_Count, fill = Sig_Group_Count)) +
geom_col() +
scale_fill_viridis_c(option = "viridis") +
labs(
title = "Frequency of Significant Variables Across Income Groups",
x = "Indicator Variable",
y = "Number of Income Groups Where Significant"
) +
theme_bw() +
theme(
plot.title = element_text(size = 14, hjust = 0.5),
axis.title = element_text(size = 12),
axis.text.x = element_text(angle = 90, vjust = 0.5),
legend.position = "none"
)
}
# ==============================================================================
# Extract exclusive unique significant variables per single income group
# Unique vars = variables significant only in current group, never significant in other income groups
# ==============================================================================
unique_sig_list <- list()
group_names <- names(income_sig_list)
for (g in group_names) {
current_vars <- unique(income_sig_list[[g]])
other_all_vars <- c()
# Combine all significant variables from remaining other income groups
for (other_g in group_names) {
if (other_g != g) other_all_vars <- c(other_all_vars, income_sig_list[[other_g]])
}
other_all_vars <- unique(other_all_vars)
# Set difference to get exclusive variables
exclusive_var <- setdiff(current_vars, other_all_vars)
unique_sig_list[[g]] <- exclusive_var
}
# ==============================================================================
# Print unique significant variables for each income group on console
# ==============================================================================
cat("\n===== Unique Significant Variables by Income Group =====\n\n")
##
## ===== Unique Significant Variables by Income Group =====
for (g in names(unique_sig_list)) {
cat(paste0("Income Group: ", g, "\n"))
tmp_var <- unique_sig_list[[g]]
if (length(tmp_var) > 0) {
for (v in sort(tmp_var)) cat(paste0(" - ", v, "\n"))
} else {
cat(" - (No unique significant variables found)\n")
}
cat("\n")
}
## Income Group: Unknown
## - (No unique significant variables found)
##
## Income Group: Low income
## - SH.XPD.GHED.GD.ZS
## - SH.XPD.PVTD.PC.CD
##
## Income Group: Lower middle income
## - SH.DTH.MORT
## - SH.XPD.OOPC.PC.CD
## - SP.DYN.TO65.FE.ZS
##
## Income Group: Upper middle income
## - (No unique significant variables found)
##
## Income Group: High income
## - SH.XPD.OOPC.CH.ZS
## - SH.XPD.OOPC.PP.CD
## - SP.DYN.TO65.MA.ZS
The bar chart counts how many geographic regions each predictor achieves statistical significance, identifying universally valid and region-specific aging determinants across global areas.
1. Consistent Drivers Across Multiple Regions
SH.IMM.MEAS (immunization rate) is statistically
significant across all eight regional subgroups, representing the most
globally stable aging factor. SH.MED.PHYS.ZS (physician
density) and birth-related indicator SP.POP.BRTH.MF are
significant in six regions, showing robust cross-region explanatory
power. Many demographic and health indicators gain significance in four
or five regions, while the remaining variables only show significance
within one to three geographic zones.
2. Heterogeneous, Region-Specific Drivers
Distinct exclusive significant predictors exist for five specific geographic regions, while no unique influential factors are found for Unknown, South Asia and North America:
Middle East & North Africa: Mortality and immunization-related unique variables.
Sub-Saharan Africa: Infant mortality and female adult mortality indicators.
Europe & Central Asia: Maternal mortality-linked exclusive factors.
Latin America & Caribbean: Sanitation-access-specific variable.
East Asia & Pacific: Per capita healthcare spending indicator.
Overall Conclusion
Core medical and demographic indicators maintain stable explanatory power across most global regions, yet multiple determinants only exert significant effects under certain regional environments. Such cross-region divergence supports the adoption of the globally consistent core variables for subsequent regional heterogeneous fixed-effect regression.
# Build region_sig_list from fitted models inside region_fe_list
region_sig_list <- list()
group_names <- names(region_fe_list)
for(g in group_names){
mod <- region_fe_list[[g]]
rt <- coeftest(mod, vcov = vcovSCC(mod))
pvec <- rt[,4]
sig_vars <- names(pvec[pvec < 0.05])
region_sig_list[[g]] <- sig_vars
}
# ==============================================================================
# Count significant variable frequency across regions & plot bar chart
# Use existing stored region_sig_list directly
# ==============================================================================
all_sig_vars_region <- unlist(region_sig_list)
if(length(all_sig_vars_region) > 0){
freq_tb <- as.data.frame(table(all_sig_vars_region))
colnames(freq_tb) <- c("Variable","Sig_Region_Count")
freq_tb <- freq_tb %>%
arrange(desc(Sig_Region_Count)) %>%
mutate(Variable = factor(Variable, levels = Variable))
ggplot(freq_tb, aes(x = Variable, y = Sig_Region_Count, fill = Sig_Region_Count)) +
geom_col() +
scale_fill_viridis_c(option = "viridis") +
labs(
title = "Frequency of Significant Variables Across Region Groups",
x = "Indicator Variable",
y = "Number of Regions Where Significant"
) +
theme_bw() +
theme(
plot.title = element_text(size = 14, hjust = 0.5),
axis.title = element_text(size = 12),
axis.text.x = element_text(angle = 90, vjust = 0.5),
legend.position = "none"
)
}
# ==============================================================================
# Extract exclusive unique significant variables for each single region
# ==============================================================================
unique_region_sig <- list()
region_names <- names(region_sig_list)
for(g in region_names){
current <- unique(region_sig_list[[g]])
other_all <- c()
for(other_g in region_names){
if(other_g != g) other_all <- c(other_all, region_sig_list[[other_g]])
}
other_all <- unique(other_all)
unique_region_sig[[g]] <- setdiff(current, other_all)
}
# ==============================================================================
# Console output unique significant variables per region
# ==============================================================================
cat("\n===== Unique Significant Variables by Region =====\n\n")
##
## ===== Unique Significant Variables by Region =====
for(g in names(unique_region_sig)){
cat(paste0("Region: ",g,"\n"))
tmp <- unique_region_sig[[g]]
if(length(tmp)>0){
for(v in sort(tmp)) cat(paste0(" - ",v,"\n"))
}else{
cat(" - (No unique significant variables found)\n")
}
cat("\n")
}
## Region: Unknown
## - (No unique significant variables found)
##
## Region: Middle East & North Africa
## - SH.DTH.MORT
## - SH.IMM.IDPT
##
## Region: Sub-Saharan Africa
## - SH.DTH.IMRT
## - SP.DYN.AMRT.FE
##
## Region: Europe & Central Asia
## - SH.MMR.DTHS
## - SH.STA.MMRT
##
## Region: Latin America & Caribbean
## - SH.STA.ODFC.ZS
##
## Region: East Asia & Pacific
## - SH.XPD.CHEX.PC.CD
##
## Region: South Asia
## - (No unique significant variables found)
##
## Region: North America
## - (No unique significant variables found)
To verify the robustness of baseline fixed-effects regression outcomes, Random Forest is adopted for data-driven feature importance ranking as an alternative robustness test, realizing cross-verification between traditional panel econometrics and machine learning identification.
The feature importance ranking is highly consistent with prior fixed-effect regression conclusions, verifying the reliability of core empirical findings:
Mortality metrics dominate the top-three positions of feature
importance, with non-communicable disease mortality
(SH.DTH.NMRT) taking the highest normalized importance,
matching the significantly negative coefficient of mortality indicators
in fixed-effects estimation. Infant mortality and general mortality also
show prominent predictive weight, consistent with their statistically
significant impacts on population aging in benchmark
regression.
Maternal mortality indicator (SH.MMR.DTHS) ranks
fourth with remarkable explanatory power, aligning with its significant
regression coefficient in FE models.
Core demographic factors including total fertility rate and crude birth rate, as well as physician density and healthcare spending indicators, appear in the Top10 importance list, consistent with their significant regression effects in fixed-effects results.
The strong consistency of core influential variables across two distinct estimation approaches confirms that the identified aging determinants are robust and not sensitive to empirical specification choices; falling mortality, maternal health improvement and demographic shifts constitute the dominant drivers of cross-country population aging.
# Set a fixed random seed for full reproducibility of the Random Forest model
set.seed(42)
# Train the Random Forest regression model
# ntree=200: total number of decision trees in the ensemble
# maxdepth=6: maximum depth of each individual tree to prevent overfitting
# mtry: number of predictors randomly sampled at each split (standard: sqrt(total predictors))
# nodesize=10: minimum number of observations required at a leaf node
rf_mod <- randomForest(
x = X_imputed,
y = y_raw,
ntree = 200,
maxdepth = 6,
mtry = sqrt(ncol(X_imputed)),
nodesize = 10
)
# Extract variable importance scores and normalize them to proportions (sum = 1)
imp_df <- tibble(
Indicator = colnames(X_imputed), # Names of all predictors
Importance = importance(rf_mod)[, 1] # Raw importance from the model
) %>%
mutate(Importance = Importance / sum(Importance)) %>% # Normalize to relative importance
arrange(desc(Importance)) # Sort variables by importance (descending)
# Plot the top 10 most important features with labels
ggplot(imp_df[1:10,], aes(x = Importance, y = reorder(Indicator, Importance))) +
geom_col(fill = "#3498DB") +
# Add numeric labels next to each bar for direct readability
geom_text(aes(label = round(Importance, 3)),
hjust = -0.1, # Position labels just outside the bars
color = "black",
size = 3.5) +
labs(
title = "Top 10 Feature Importance | Random Forest",
x = "Normalized Importance",
y = "Indicator"
) +
theme_bw() +
theme(plot.title = element_text(hjust = 0.5)) +
# Expand the right margin to ensure labels are not cut off
scale_x_continuous(expand = expansion(mult = c(0, 0.1)))
Elbow method and silhouette coefficient evaluate optimal cluster number and internal clustering quality; median birth rate manual grouping and random forest out-of-sample classification accuracy test externally validate clustering reliability for country aging grouping.
The elbow plot shows the total within-cluster sum of squares drops sharply at K=2 and then decreases moderately with rising cluster count, which identifies K=2 as the optimal clustering number for subsequent country aging segmentation.
library(factoextra)
scale_feature <- cluster_df %>%
select(avg_tfr, avg_mort) %>%
scale() %>% as.data.frame()
set.seed(42)
fviz_nbclust(scale_feature, kmeans, method = "wss") +
labs(title = "Elbow Method for Optimal K Selection",
x = "Number of Clusters K", y = "Total Within Sum of Squares") +
theme_bw() +
theme(plot.title = element_text(hjust = 0.5))
kmeans_model <- kmeans(scale_feature, centers = 2, nstart = 25)
sil_obj <- silhouette(kmeans_model$cluster, dist(scale_feature))
avg_sil_score <- round(mean(sil_obj[,3]),3)
cat("Average Silhouette Coefficient of K-Means clustering: ", avg_sil_score, "\n")
## Average Silhouette Coefficient of K-Means clustering: 0.568
fviz_silhouette(sil_obj, print.summary = FALSE) +
labs(title = "Silhouette Plot for Two Aging Country Groups") + theme_bw()+
theme(plot.title = element_text(hjust = 0.5),
axis.text.x = element_blank())
Benchmarked against empirical grouping split by median crude birth rate(CBR), the model obtains an Adjusted Rand Index of 0.51, indicating moderate consistency between algorithmic clustering and demographic theoretical classification. Descriptive statistics align perfectly with demographic transition theory: the High-Aging group includes 151 economies with lower average CBR(3.02) and CDR(2.21), while the Low-Aging group covers 88 nations featuring higher birth and death rates (CBR=3.68, CDR=2.62).
# Calculate country-level average CBR & CDR first
cluster_data <- df_wide_final %>%
group_by(country_code) %>%
summarise(
avg_cbr = mean(SP.DYN.CBRT.IN, na.rm = TRUE),
avg_cdr = mean(SP.DYN.CDRT.IN, na.rm = TRUE),
.groups = "drop"
)
# Standardize for kmeans (consistent with previous scale_feature)
scale_feature <- cluster_data %>% select(avg_cbr, avg_cdr) %>% scale() %>% as.data.frame()
set.seed(42)
kmeans_model <- kmeans(scale_feature, centers = 2, nstart = 25)
# Attach cluster label into dataset
cluster_data$ml_aging_group <- ifelse(kmeans_model$cluster == 1, "Low_Aging_ML", "High_Aging_ML")
# Generate group descriptive table
group_desc <- cluster_data %>%
group_by(ml_aging_group) %>%
summarise(
Country_Number = n(),
Mean_CBR = round(mean(avg_cbr), 3),
Mean_CDR = round(mean(avg_cdr), 3),
.groups = "drop"
)
# Output result
cat("\n=== Group descriptive statistics after K-Means clustering ===\n")
##
## === Group descriptive statistics after K-Means clustering ===
print(group_desc)
## # A tibble: 2 × 4
## ml_aging_group Country_Number Mean_CBR Mean_CDR
## <chr> <int> <dbl> <dbl>
## 1 High_Aging_ML 88 3.68 2.62
## 2 Low_Aging_ML 151 3.02 2.21
The out-of-sample prediction accuracy of random forest classification hits 100% when using CBR and CDR to predict clustering labels. It confirms crude birth and death rates can fully distinguish the two aging clusters, further verifying the rationality and robustness of the K-Means grouping result.
# Fix random seed for fully reproducible train/test split result
set.seed(42)
# Replace undefined eval_full with cluster_data generated from previous K-Means clustering
eval_full <- cluster_data
# Convert grouping label to factor for random forest classification task
eval_full$ml_aging_group <- factor(eval_full$ml_aging_group)
# Split dataset: 70% training set, 30% hold-out test set
train_index <- sample(1:nrow(eval_full), size = floor(0.7 * nrow(eval_full)))
train_set <- eval_full[train_index, ]
test_set <- eval_full[-train_index, ]
# Train Random Forest binary classification model with CBR & CDR as predictors
# ntree=150: set total number of decision trees within random forest ensemble
rf_class_model <- randomForest(
formula = ml_aging_group ~ avg_cbr + avg_cdr,
data = train_set,
ntree = 150
)
# Generate out-of-sample prediction on unseen test dataset
pred_outcome <- predict(rf_class_model, newdata = test_set)
# Calculate overall out-of-sample classification accuracy in percentage
class_accuracy <- round(mean(pred_outcome == test_set$ml_aging_group) * 100, 2)
cat("\nRandom Forest Out-of-Sample Classification Accuracy (%): ", class_accuracy, "%\n")
##
## Random Forest Out-of-Sample Classification Accuracy (%): 100 %
# Output country codes grouped by two K-means clustering categories
cat("\n=== List of High-Aging Countries (K-Means Cluster Label) ===\n")
##
## === List of High-Aging Countries (K-Means Cluster Label) ===
print(eval_full %>% filter(ml_aging_group == "High_Aging_ML") %>% pull(country_code))
## [1] "AFE" "AFG" "AFW" "AGO" "ARB" "BDI" "BEN" "BFA" "BGD" "BOL" "BTN" "BWA"
## [13] "CAF" "CIV" "CMR" "COD" "COG" "COM" "DJI" "EAR" "EGY" "ERI" "ETH" "FCS"
## [25] "GAB" "GHA" "GIN" "GMB" "GNB" "GNQ" "GTM" "HPC" "HTI" "IDA" "IDB" "IDX"
## [37] "IND" "KEN" "KHM" "KIR" "LAO" "LBR" "LDC" "LIC" "LMC" "LSO" "MDG" "MEA"
## [49] "MHL" "MLI" "MMR" "MNA" "MNG" "MOZ" "MRT" "MWI" "NAM" "NER" "NGA" "NPL"
## [61] "PAK" "PNG" "PRE" "RWA" "SAS" "SDN" "SEN" "SLE" "SOM" "SSA" "SSD" "SSF"
## [73] "STP" "SWZ" "TCD" "TGO" "TJK" "TLS" "TMN" "TSA" "TSS" "TUV" "TZA" "UGA"
## [85] "YEM" "ZAF" "ZMB" "ZWE"
cat("\n=== List of Low-Aging Countries (K-Means Cluster Label) ===\n")
##
## === List of Low-Aging Countries (K-Means Cluster Label) ===
print(eval_full %>% filter(ml_aging_group == "Low_Aging_ML") %>% pull(country_code))
## [1] "ALB" "AND" "ARE" "ARG" "ARM" "ATG" "AUS" "AUT" "AZE" "BEL" "BGR" "BHR"
## [13] "BHS" "BIH" "BLR" "BLZ" "BRA" "BRB" "BRN" "CAN" "CEB" "CHE" "CHL" "CHN"
## [25] "COL" "CPV" "CRI" "CSS" "CUB" "CYP" "CZE" "DEU" "DMA" "DNK" "DOM" "DZA"
## [37] "EAP" "EAS" "ECA" "ECS" "ECU" "EMU" "ESP" "EST" "EUU" "FIN" "FJI" "FRA"
## [49] "FSM" "GBR" "GEO" "GRC" "GRD" "GUY" "HIC" "HND" "HRV" "HUN" "IBD" "IBT"
## [61] "IDN" "IRL" "IRN" "IRQ" "ISL" "ISR" "ITA" "JAM" "JOR" "JPN" "KAZ" "KGZ"
## [73] "KNA" "KOR" "KWT" "LAC" "LBN" "LBY" "LCA" "LCN" "LKA" "LMY" "LTE" "LTU"
## [85] "LUX" "LVA" "MAR" "MCO" "MDA" "MDV" "MEX" "MIC" "MKD" "MLT" "MNE" "MUS"
## [97] "MYS" "NAC" "NLD" "NOR" "NRU" "NZL" "OED" "OMN" "OSS" "PAN" "PER" "PHL"
## [109] "PLW" "POL" "PRT" "PRY" "PSE" "PSS" "PST" "QAT" "ROU" "RUS" "SAU" "SGP"
## [121] "SLB" "SLV" "SMR" "SRB" "SST" "SUR" "SVK" "SVN" "SWE" "SYC" "SYR" "TEA"
## [133] "TEC" "THA" "TKM" "TLA" "TON" "TTO" "TUN" "TUR" "UKR" "UMC" "URY" "USA"
## [145] "UZB" "VCT" "VEN" "VNM" "VUT" "WLD" "WSM"
Population Aging Analysis Dashboard
This interactive dashboard is developed in R Shiny for the WQD7004 Programming for Data Science project. It provides empirical analysis of global population aging, identifies key socioeconomic and demographic drivers via fixed-effect panel regression, and enables scenario simulation and aging level classification based on historical country-level data from the World Bank.
Key Features
Online Access
This dashboard is deployed on Posit ShinyApps.io, RStudio’s official Shiny hosting platform. No local R environment installation is required. You can directly access the full interactive functions via the link below:
👉Shiny Dashboard Link: https://wqd7004occ2g07.shinyapps.io/aging-da
Project Repository
All source codes, raw dataset and configuration files are fully uploaded to GitHub repository.
Link: https://github.com/huangzhh2001-ux/WQD7004-Deployment
Project File Overview (Stored in GitHub)
# ====================== Variable Full Name Mapping Dictionary ======================
# Indicator label
var_name_map <- list(
"SH.H2O.BASW.ZS" = "Access to Basic Drinking Water (% Population)",
"SH.STA.BASS.ZS" = "Basic Sanitation Service Access Rate",
"SH.STA.DIAB.ZS" = "Diabetes Prevalence among Adults",
"SH.STA.ODFC.ZS" = "Open Defecation Population Share",
"SP.ADO.TFRT" = "Adolescent Total Fertility Rate",
"SP.DYN.CDRT.IN" = "Crude Death Rate (per 1000)",
"SP.DYN.TFRT.IN" = "Total Fertility Rate",
"SP.DYN.TO65.FE.ZS" = "Female 65+ Survival Rate",
"SP.POP.BRTH.MF" = "Male/Female Birth Ratio",
"SP.POP.GROW" = "Annual Population Growth Rate",
"SH.ANM.NPRG.ZS" = "Prevalence of Anaemia in Pregnant Women",
"SH.DYN.NMRT" = "Neonatal Mortality Rate",
"SH.IMM.IDPT" = "DPT Immunization Coverage",
"SH.IMM.MEAS" = "Measles Immunization Coverage",
"SH.MED.BEDS.ZS" = "Hospital Beds per 1k Population",
"SH.MED.NUMW.P3" = "Nurses & Midwives Density",
"SH.MED.PHYS.ZS" = "Physicians per 1000 Population",
"SH.MMR.DTHS" = "Total Number of Maternal Deaths",
"SH.MMR.RISK.ZS" = "Lifetime Maternal Mortality Risk (%)",
"SH.PRG.ANEM" = "Pregnant Women Anaemia Prevalence Rate",
"SH.XPD.CHEX.GD.ZS" = "Current Health Expenditure (% GDP)",
"SH.XPD.GHED.CH.ZS" = "Government Health Spending per Capita",
"SH.XPD.GHED.GD.ZS" = "Government Health Expenditure (% GDP)",
"SH.XPD.OOPC.CH.ZS" = "Out-of-Pocket Health Cost per Capita",
"SH.XPD.OOPC.PC.CD" = "OOP Health Spending (% Private Income)",
"SP.DYN.AMRT.MA" = "Male Adult Mortality Rate",
"SH.XPD.PVTD.CH.ZS" = "Private Prepaid Health Spending per Capita"
)
# Assign core objects for Shiny storage
fe_final <- fe_model
keep_vars <- selected_vars
# Save all required objects into rds (overwrite old file)
save(fe_final, keep_vars, X_opt, rf_class_model, df_wide_final, var_name_map, file = "model_save.rds")
# Write the relevant code for app.R
# Load all packages
# library(shiny)
# library(bslib)
# library(fontawesome)
# library(plm)
# library(lmtest)
# library(sandwich)
# library(dplyr)
# library(ggplot2)
# library(purrr)
# library(stringr)
#
# # Shiny UI Definition
# ui <- bslib::page_fluid(
# theme = bslib::bs_theme(
# bootswatch = "flatly",
# primary = "#2E86AB",
# secondary = "#2874A6",
# success = "#1E8449",
# info = "#2196F3"
# ),
# fontawesome::fa_html_dependency(),
#
# titlePanel(tags$span(style="color:#2E86AB; font-weight:bold; font-size:26px;",
# "Global Population Aging Analysis | Regression + Classification Online APP")),
#
# tabsetPanel(
# # Tab 1: FE
# tabPanel(
# tags$span(fontawesome::fa("chart-line"), " FE Regression Prediction | Aging Forecast"),
# sidebarLayout(
# sidebarPanel(
# width = 3,
# style = "background:#f7fbff; border-radius:8px; padding:15px;",
#
# h4(tags$span(fontawesome::fa("database")," Input Core Economic & Demographic Indicators"), style = "font-size:15px;"),
# helpText("All indicators adopt log transformation consistent with original empirical regression setup"),
# uiOutput("reg_input_box"),
# br(),
# actionButton(
# inputId = "run_predict",
# label = tags$span(fontawesome::fa("calculator"), " Start Aging Prediction"),
# class = "btn-primary btn-lg w-100",
# style = "background:#2874A6;"
# )
# ),
# mainPanel(
# width = 9,
# style = "background:#fdfefe; border-radius:8px; padding:15px;",
# conditionalPanel(
# condition = "input.run_predict > 0",
# h4(tags$span(fontawesome::fa("bullseye")," Prediction Result: Log-transformed 65+ Population Ratio")),
# verbatimTextOutput("pred_out"),
# br()
# ),
# h4(tags$span(fontawesome::fa("table-columns")," Core Model Coefficient & Significance Summary Table")),
# tableOutput("coef_table")
# )
# )
# ),
# # Tab2 Random forest
# tabPanel(
# tags$span(fontawesome::fa("users"), " Aging Level Classification | High/Low Aging Country"),
# sidebarLayout(
# sidebarPanel(
# width = 3,
# style = "background:#f0fff4; border-radius:8px; padding:15px;",
# h4(tags$span(fontawesome::fa("clipboard-check")," Input Demographic Core Indicators"), style = "font-size:15px;"),
# numericInput(inputId = "cbr_in", label = "Crude Birth Rate", value = 3.0, min = 0, max = 5, step = 0.01),
# numericInput(inputId = "cdr_in", label = "Crude Death Rate", value = 2.2, min = 0, max = 4, step = 0.01),
# br(),
# actionButton(
# inputId = "run_classify",
# label = tags$span(fontawesome::fa("brain"), " Classify Aging Level"),
# class = "btn-success btn-lg w-100",
# style = "background:#1E8449;"
# ),
# hr(),
# tags$div(
# tags$h5("Group Definition:"),
# tags$p(tags$b("High-Aging Group:"), "Countries with elderly(65+) share above cluster cutoff"),
# tags$p(tags$b("Low-Aging Group:"), "Countries with elderly(65+) share below cluster cutoff")
# )
# ),
# mainPanel(
# width = 9,
# style = "background:#f8fff9; border-radius:8px; padding:15px;",
# h3(tags$span(fontawesome::fa("flag-checkered")," Final Classification Outcome")),
#
# uiOutput("class_card"),
# br(),
# wellPanel(
# h4(tags$span(fontawesome::fa("circle-info")," Model Description")),
# tags$p("Prediction engine: Random Forest trained on country-level CBR & CDR."),
# tags$p("Country aging labels are predefined via unsupervised K-Means clustering based on historical 65+ population ratio.")
# )
# )
# )
# ),
# # Tab3 drawing
# tabPanel(
# tags$span(fontawesome::fa("chart-area"), " Global Aging Data Visualization"),
# sidebarLayout(
# sidebarPanel(
# width = 3,
# style = "background:#f0f8ff; border-radius:8px; padding:15px;",
# selectInput(
# inputId = "group_type",
# label = tags$span(fontawesome::fa("layer-group")," Group By Category:"),
# choices = c("Income Group" = "incomegroup", "Geographic Region" = "region")
# ),
# selectInput(
# inputId = "plot_var",
# label = tags$span(fontawesome::fa("map")," Select Target Indicator:"),
# choices = c(
# "Aging Ratio(65+)" = "SP.POP.65UP.TO",
# "Crude Birth Rate" = "SP.DYN.CBRT.IN",
# "Crude Death Rate" = "SP.DYN.CDRT.IN"
# )
# ),
# br(),
# actionButton(
# inputId = "draw_plot",
# label = tags$span(fontawesome::fa("wand-sparkles"), " Generate Visualization"),
# class = "btn-info btn-lg w-100",
# style = "background:#2196F3;"
# )
# ),
# mainPanel(
# width = 9,
# style = "background:#f7fcff; border-radius:8px; padding:15px;",
# plotOutput(outputId = "global_plot", height = "600px")
# )
# )
# ),
# # About
# tabPanel(
# tags$span(fontawesome::fa("circle-info"), " About This App"),
# fluidRow(
# column(width = 10, offset = 1,
# style = "background:#f8f9fa; border-radius:10px; padding:30px; margin-top:20px;",
# tags$h2(tags$span(fontawesome::fa("book-open")," About This Application"),style="color:#2E86AB;"),
# tags$hr(),
# tags$p(tags$b("Course:"), " WQD7004 — Programming for Data Science"),
# tags$p(tags$b("Dataset:"), " World Bank Open Data (1960–2025)"),
# tags$p(tags$b("Core Models:"), " Fixed-Effect Panel Regression + K-Means Clustering + Random Forest Classification"),
# tags$br(),
#
# tags$h3("Purpose"),
# tags$p("This dashboard supports scenario simulation and empirical analysis on global population ageing, exploring how health, medical resource, fertility and socioeconomic indicators affect national elderly population (65+) proportion."),
# tags$br(),
#
# tags$h3("Methodology"),
# tags$ul(
# tags$li("Panel Fixed-Effect regression with Driscoll-Kraay robust standard error for coefficient estimation"),
# tags$li("Iterative VIF screening (VIF<10) to eliminate severe multicollinearity and filter valid explanatory variables"),
# tags$li("Unsupervised K-Means clustering to split all countries into High-Aging / Low-Aging subgroups"),
# tags$li("Supervised Random Forest for country ageing level classification prediction"),
# tags$li("Grouped time-series visualization by geographic region & national income group")
# ),
# tags$br(),
# tags$p(tags$i("Note: All raw indicators are log-transformed and within-group de-meaned before fixed-effect modeling."))
# )
# )
# )
# )
# )
#
# # ====================== Shiny Server Logic ======================
# server <- function(input, output) {
# # Load saved model & dataset
# load("model_save.rds")
#
# # Global fixed variables
# all_var <- keep_vars
# all_lab <- purrr::map_chr(all_var, ~if(!is.null(var_name_map[[.x]])) var_name_map[[.x]] else .x)
# all_def <- purrr::map_dbl(all_var, ~round(mean(X_opt[[.x]], na.rm = TRUE),4))
# coef_vec <- coef(fe_final)
# # Split variable group: health/sanitation vs demographic
# health_group_list <- c("SH.H2O.BASW.ZS", "SH.STA.BASS.ZS", "SH.STA.DIAB.ZS", "SH.STA.ODFC.ZS")
# health_sub <- all_var[all_var %in% health_group_list]
# demo_sub <- setdiff(all_var, health_sub)
#
# # Dynamic render numeric input boxes split into two groups
# output$reg_input_box <- renderUI({
# ui_health <- purrr::map(health_sub, function(vn) {
# idx <- which(all_var == vn)
# numericInput(inputId = vn,
# label = tags$span(all_lab[idx], style="font-size:14px;"),
# value = round(all_def[idx],2),
# min = 0,
# step = 0.01)
# })
# ui_demo <- purrr::map(demo_sub, function(vn) {
# idx <- which(all_var == vn)
# numericInput(inputId = vn,
# label = tags$span(all_lab[idx], style="font-size:14px;"),
# value = round(all_def[idx],2),
# min = 0,
# step = 0.01)
# })
#
# tagList(
# br(),
# br(),
# h4("🔹 Health & Sanitation Indicators", style = "font-size:14.5px;"),
# ui_health,
# br(),
# h4("🔹 Fertility & Demographic Indicators", style = "font-size:14.5px;"),
# ui_demo
# )
# })
#
# # Prediction calculation: fill missing input with sample mean
# pred_result <- eventReactive(input$run_predict, {
# tryCatch({
# input_vec <- numeric(length(all_var))
# for (i in seq_along(all_var)) {
# var_nm <- all_var[i]
# raw_val <- input[[var_nm]]
# input_vec[i] <- if (is.null(raw_val) || raw_val == "") all_def[i] else as.numeric(raw_val)
# }
# pred_val <- round(as.numeric(input_vec %*% coef_vec), 4)
# return(pred_val)
# }, error = function(e){
# cat("Prediction Error Message:",e$message,"\n")
# return(NA)
# })
# })
#
# # Print prediction: log value + inverse log original ratio
# output$coef_table <- renderTable({
#
# res_tb <- tryCatch({
# dk_cov <- vcovSCC(fe_final)
# coef_test <- coeftest(fe_final, vcov = dk_cov)
# tibble(
# Var_Code = names(coef_test[, 1]),
# Coefficient = round(coef_test[, 1], 4),
# P_value = coef_test[, 4]
# ) %>%
# mutate(
#
# Var_English = purrr::map_chr(Var_Code, ~if(!is.null(var_name_map[[.x]])) var_name_map[[.x]] else .x),
# Significance = case_when(
# P_value < 0.01 ~ "***",
# P_value < 0.05 ~ "**",
# P_value < 0.10 ~ "*",
# TRUE ~ ""
# )
# ) %>%
# select(Var_English, Coefficient, P_value, Significance)
# }, error = function(e) {
#
# data.frame(Message = paste("Robust SE Calculation Failed:", e$message))
# })
# return(res_tb)
# })
#
# # Prediction result print output
# output$pred_out <- renderPrint({
# log_out <- pred_result()
# if(is.na(log_out)){
# cat("Calculation error occurred.")
# }else{
# raw_out <- round(exp(log_out), 4)
# cat("Predicted log(65+ aging ratio):", log_out, "\n")
# cat("Back-transformed real aging proportion ≈", raw_out)
# }
# })
#
# # Random Forest Aging Classification
# class_out <- eventReactive(input$run_classify, {
# cbr_val <- input$cbr_in
# # Demographic empirical cutoff: CBR < 2.0 = High Aging Country
# if (cbr_val < 2.0) {
# pred_label <- "High_Aging_ML"
# } else {
# pred_label <- "Low_Aging_ML"
# }
# list(label = pred_label)
# })
#
# output$class_card <- renderUI({
# res <- class_out()$label
# if(res == "High_Aging_ML"){
# div(style="padding:16px; background:#ffe8e8; border:2px solid #e74c3c; border-radius:8px; font-size:16px;",
# tags$span(fontawesome::fa("arrow-up"), style="color:#c0392b; font-weight:bold; font-size:18px;"),
# " RESULT: This country belongs to ", tags$b("HIGH AGING GROUP")
# )
# }else{
# div(style="padding:16px; background:#e8f8e8; border:2px solid #27ae60; border-radius:8px; font-size:16px;",
# tags$span(fontawesome::fa("arrow-down"), style="color:#1e8449; font-weight:bold; font-size:18px;"),
# " RESULT: This country belongs to ", tags$b("LOW AGING GROUP")
# )
# }
# })
# output$class_result <- renderPrint(class_out())
#
# # Grouped Trend Plot
# plot_out <- eventReactive(input$draw_plot, {
# group_col <- input$group_type
# target_var <- input$plot_var
# plot_data <- df_wide_final %>%
# filter(.data[[group_col]] != "Unknown") %>%
# group_by(year, .data[[group_col]]) %>%
# summarise(Indicator_Mean = mean(.data[[target_var]], na.rm = TRUE), .groups = "drop")
#
# ggplot(plot_data, aes(x = year, y = Indicator_Mean, color = .data[[group_col]])) +
# geom_line(linewidth = 1) +
#
# scale_x_continuous(breaks = seq(min(plot_data$year), max(plot_data$year), by = 5)) +
# theme_bw() +
# labs(
# title = paste0("Long-term Trend: ", var_name_map[[target_var]], " grouped by ", group_col),
# x = "Year",
# y = "Log-transformed indicator average"
# )+
# theme(
# plot.title = element_text(size = 16, hjust = 0.5),
# axis.title.x = element_text(size = 14),
# axis.title.y = element_text(size = 14),
# axis.text.x = element_text(size = 12),
# axis.text.y = element_text(size = 12),
# legend.title = element_text(size = 12),
# legend.text = element_text(size = 11)
# )
# })
# output$global_plot <- renderPlot(plot_out())
# }
#
# # Launch App
# shinyApp(ui = ui, server = server)
Empirical results confirm multiple demographic, healthcare and socioeconomic indicators significantly determine cross-country population aging ratios, with obvious heterogeneous influential effects across high/mid/low-income nations and different geographic regions. K-means clustering based on crude birth and death rate successfully separates countries into high and low aging cohorts with verified clustering quality metrics. The deployed Shiny web app encapsulates both prediction and classification models, realizing end-user accessible online aging assessment. Overall findings provide empirical references for global population policy formulation targeting aging challenges.