Wealth inequality remains one of the most pressing socioeconomic challenges of our time. This report leverages the World Inequality Database to analyze temporal trends in wealth distribution across different regions and demographic groups. The WID dataset provides a comprehensive collection of income and wealth metrics for approximately 400 countries and regions, allowing for a detailed examination of inequality patterns over time.
My analysis focuses on investigating how wealth distribution has evolved across different geographical regions and age groups, with particular attention to identifying long-term trends, regional disparities, and potential correlations with macroeconomic factors. By applying various time series analysis techniques, I will try to find interesting patterns in the data and provide insights into the dynamics of global wealth inequality.
I begin by loading the country metadata to understand the geographical distribution of our data.
# Load country data
wid_countries <- read.csv("wid_all_data/WID_countries.csv", sep = ";")
# View the first few rows
head(wid_countries)
## alpha2 titlename shortname region region2
## 1 AD Andorra Andorra Europe Western Europe
## 2 AE the United Arab Emirates United Arab Emirates Asia West Asia
## 3 AF Afghanistan Afghanistan Asia South Asia
## 4 AG Antigua and Barbuda Antigua and Barbuda Americas Caribbean
## 5 AI Anguilla Anguilla Americas Caribbean
## 6 AL Albania Albania Europe Eastern Europe
# Count countries by region
region_counts <- wid_countries %>%
group_by(region) %>%
summarise(count = n()) %>%
arrange(desc(count))
# Display region distribution
ggplot(region_counts, aes(x = reorder(region, count), y = count, fill = region)) +
geom_bar(stat = "identity") +
coord_flip() +
labs(title = "Distribution of Countries by Region",
x = "Region",
y = "Count") +
theme_minimal() +
theme(legend.position = "none")
To manage the dataset’s size while maintaining representativeness, I’ll select a diverse sample of countries from different regions and income levels.
# Define a list of representative countries from different regions
selected_countries <- c(
"US", # North America
"BR", # South America
"FR", # Western Europe
"RU", # Eastern Europe
"CN", # East Asia
"IN", # South Asia
"ZA", # Africa
"AU" # Oceania
)
# Define age groups of interest
selected_age_groups <- c("992", "999") # 992: Adults, 999: All Ages
# Filter country information for selected countries
selected_country_info <- wid_countries %>%
filter(alpha2 %in% selected_countries) %>%
select(alpha2, shortname, region)
# Display selected countries
selected_country_info
## alpha2 shortname region
## 1 AU Australia Oceania
## 2 BR Brazil Americas
## 3 CN China Asia
## 4 FR France Europe
## 5 IN India Asia
## 6 RU Russian Federation Asia
## 7 US USA Americas
## 8 ZA South Africa Africa
Next, I load the metadata and actual data files for each selected country.
# Initialize empty dataframes for metadata and data
all_metadata <- data.frame()
all_data <- data.frame()
# Load metadata and data for each selected country
for (country_code in selected_countries) {
# Try to read metadata file
metadata_file <- paste0("wid_all_data/WID_metadata_", country_code, ".csv")
tryCatch({
metadata <- read.csv(metadata_file, sep = ";")
all_metadata <- rbind(all_metadata, metadata)
}, error = function(e) {
message(paste("Could not load metadata for", country_code, ":", e$message))
})
# Try to read data file
data_file <- paste0("wid_all_data/WID_data_", country_code, ".csv")
tryCatch({
data <- read.csv(data_file, sep = ";")
all_data <- rbind(all_data, data)
}, error = function(e) {
message(paste("Could not load data for", country_code, ":", e$message))
})
}
To better understand the WID dataset, I explore the types of variables available.
# Explore available variable types in metadata
variable_types <- all_metadata %>%
select(variable, simpledes, shorttype) %>%
distinct() %>%
arrange(variable)
# Count of variable types
variable_type_count <- all_metadata %>%
group_by(shorttype) %>%
summarise(count = n_distinct(variable)) %>%
arrange(desc(count))
# Display top 10 variable types
head(variable_type_count, 10)
## # A tibble: 10 × 2
## shorttype count
## <chr> <int>
## 1 Average 820
## 2 Total 372
## 3 Wealth-income ratio 180
## 4 Population 125
## 5 Share 87
## 6 Threshold 55
## 7 Top 10/Bottom 50 ratio 52
## 8 Gini coefficient 37
## 9 Beta coefficient 34
## 10 Per-capita emissions 26
I’ll focus on key wealth inequality metrics for our analysis.
# I'm particularly interested in wealth shares (sweal) and income shares (sptinc)
selected_variables <- c("swealj992", "sptincj992") # wealth and income shares for adults
# Filter data for selected variables, countries, and age groups
filtered_data <- all_data %>%
filter(
country %in% selected_countries,
grepl("^s(weal|ptinc)", variable), # wealth or income share variables
age %in% selected_age_groups,
percentile %in% c("p0p50", "p50p90", "p90p100", "p99p100") # Focus on key percentile groups
)
# Check if we have data
if(nrow(filtered_data) > 0) {
# Explore the time range of data
year_range <- filtered_data %>%
group_by(country) %>%
summarise(min_year = min(year), max_year = max(year))
print(year_range)
} else {
# If no data found, try with a broader filter
filtered_data <- all_data %>%
filter(
country %in% selected_countries,
grepl("^s", variable), # Any share variable
percentile %in% c("p0p50", "p50p90", "p90p100", "p99p100")
)
# Check available variables
available_vars <- unique(filtered_data$variable)
print(paste("No exact match found. Available share variables:",
paste(head(available_vars, 10), collapse=", ")))
# Update selected variables based on availability
selected_variables <- head(available_vars, 2)
}
## # A tibble: 8 × 3
## country min_year max_year
## <chr> <int> <int>
## 1 AU 1820 2023
## 2 BR 1820 2023
## 3 CN 1820 2023
## 4 FR 1820 2023
## 5 IN 1820 2023
## 6 RU 1820 2023
## 7 US 1820 2023
## 8 ZA 1820 2023
# Prepare dataset for time series analysis
wealth_inequality_ts <- filtered_data %>%
# Join with country information to get region and country name
left_join(wid_countries %>% select(alpha2, shortname, region),
by = c("country" = "alpha2")) %>%
# Add variable descriptions
left_join(all_metadata %>%
select(variable, simpledes) %>%
distinct(),
by = "variable") %>%
# Select relevant columns
select(country, shortname, region, variable, simpledes, percentile, year, value, age) %>%
# Handle missing values
filter(!is.na(value))
# Convert to proper time series format
wealth_ts_data <- wealth_inequality_ts %>%
mutate(date = as.Date(paste0(year, "-01-01"))) %>%
# Create a unique identifier for each series
mutate(series_id = paste(country, variable, percentile, age, sep="_"))
# Display summary of the prepared dataset
summary_stats <- wealth_ts_data %>%
group_by(country, variable, percentile) %>%
summarise(
n_observations = n(),
min_year = min(year),
max_year = max(year),
mean_value = mean(value, na.rm = TRUE)
) %>%
arrange(country, variable, percentile)
head(summary_stats, 10)
## # A tibble: 10 × 7
## # Groups: country, variable [3]
## country variable percentile n_observations min_year max_year mean_value
## <chr> <chr> <chr> <int> <int> <int> <dbl>
## 1 AU sptinci992 p0p50 108 1912 2019 0.108
## 2 AU sptinci992 p50p90 108 1912 2019 0.526
## 3 AU sptinci992 p90p100 108 1912 2019 0.366
## 4 AU sptinci992 p99p100 108 1912 2019 0.124
## 5 AU sptincj992 p0p50 118 1820 2023 0.175
## 6 AU sptincj992 p50p90 118 1820 2023 0.503
## 7 AU sptincj992 p90p100 118 1820 2023 0.322
## 8 AU sptincj992 p99p100 118 1820 2023 0.102
## 9 AU sptincj999 p0p50 6 1820 2020 0.160
## 10 AU sptincj999 p90p100 6 1820 2020 0.437
I begin analysis by exploring the temporal patterns of wealth inequality across the selected countries.
# Create a more readable mapping for percentile labels
percentile_labels <- c(
"p0p50" = "Bottom 50%",
"p50p90" = "Middle 40%",
"p90p100" = "Top 10%",
"p99p100" = "Top 1%"
)
# Plot wealth/income share trends for all countries by percentile group
ggplot(wealth_ts_data %>%
filter(percentile %in% names(percentile_labels)),
aes(x = year, y = value, color = shortname)) +
geom_line(linewidth = 1) +
facet_wrap(~percentile, scales = "free_y",
labeller = labeller(percentile = percentile_labels)) +
labs(title = "Wealth/Income Share Trends Across Countries",
subtitle = "By Income/Wealth Group",
x = "Year",
y = "Share (%)",
color = "Country") +
theme_minimal() +
theme(legend.position = "bottom",
legend.box = "horizontal",
axis.text.x = element_text(angle = 45, hjust = 1)) +
scale_color_viridis_d()
Next, I compare inequality patterns across different geographical regions.
# Calculate regional average for top 10%
regional_avg <- wealth_ts_data %>%
filter(percentile == "p90p100") %>% # Focus on top 10%
group_by(region, year) %>%
summarise(avg_value = mean(value, na.rm = TRUE)) %>%
filter(!is.na(avg_value))
# Plot regional trends
ggplot(regional_avg, aes(x = year, y = avg_value, color = region)) +
geom_line(linewidth = 1) +
labs(title = "Regional Trends in Top 10% Wealth/Income Share",
x = "Year",
y = "Average Share (%)",
color = "Region") +
theme_minimal() +
theme(legend.position = "bottom") +
scale_color_viridis_d()
# Top vs. Bottom comparison by country
top_bottom_gap <- wealth_ts_data %>%
filter(percentile %in% c("p90p100", "p0p50")) %>%
select(country, shortname, year, percentile, value) %>%
pivot_wider(names_from = percentile, values_from = value) %>%
mutate(
p90p100 = as.numeric(as.character(p90p100)), # Ensure p90p100 is numeric
p0p50 = as.numeric(as.character(p0p50)), # Ensure p0p50 is numeric
inequality_gap = p90p100 - p0p50 # Calculate the inequality gap
) %>%
filter(!is.na(inequality_gap)) # Remove rows with NA in inequality_gap
# Plot inequality gap
ggplot(top_bottom_gap, aes(x = year, y = inequality_gap, color = shortname)) +
geom_line(linewidth = 1) +
labs(title = "Inequality Gap: Top 10% vs Bottom 50% Share",
x = "Year",
y = "Gap in Share (%)",
color = "Country") +
theme_minimal() +
theme(legend.position = "bottom") +
scale_color_viridis_d()
I also examine differences in wealth distribution across different age groups.
# Comparing adult vs. all ages
age_comparison <- wealth_ts_data %>%
filter(age %in% c("992", "999")) %>% # 992: Adults, 999: All Ages
group_by(shortname, year, age, percentile) %>%
summarise(avg_value = mean(value, na.rm = TRUE)) %>%
filter(!is.na(avg_value))
# Map age codes to readable labels
age_labels <- c("992" = "Adults", "999" = "All Ages")
# Convert age to a factor for proper handling in ggplot
age_comparison$age <- factor(age_comparison$age, levels = c("992", "999"), labels = age_labels)
# Plot age group comparison for Top 10%
ggplot(age_comparison %>%
filter(percentile == "p90p100"),
aes(x = year, y = avg_value, color = shortname, linetype = age)) +
geom_line(linewidth = 1) +
scale_linetype_manual(values = c("solid", "dashed"))
labs(title = "Top 10% Share: Adults vs. All Ages",
x = "Year",
y = "Share (%)",
color = "Country",
linetype = "Age Group") +
theme_minimal() +
theme(legend.position = "bottom") +
scale_color_viridis_d()
## NULL
Time series analysis typically involves checking for patterns like seasonality and trends.
# Select a representative country and variable for time series analysis
ts_country <- "US" # United States
ts_variable <- filtered_data %>%
filter(country == ts_country) %>%
pull(variable) %>%
unique() %>%
first()
# Create a time series object for the selected country and top 10%
ts_data <- wealth_ts_data %>%
filter(country == ts_country,
variable == ts_variable,
percentile == "p90p100") %>%
arrange(year)
# Check if there are enough data points
if(nrow(ts_data) < 4 || any(is.na(ts_data$value))) {
cat("Not enough data points for time series analysis. Available data:\n")
print(ts_data)
} else {
# Create a formal time series object
ts_obj <- ts(ts_data$value,
start = min(ts_data$year),
end = max(ts_data$year),
frequency = 1) # Annual data
# Decompose the time series
if(length(ts_obj) < 24){
cat("Not enough data points to perform decomposition with STL.\n")
moving_avg <- filter(ts_obj, rep(1/3, 3), sides=2)
plot(ts_obj, main = paste("Time Series Analysis for", ts_country),
ylab = "Value", xlab = "Year", type = "l")
lines(moving_avg, col = "red", lwd = 2)
legend("topleft", legend = c("Original data", "3-Year Moving Average"),
col = c("black", "red"), lty = 1, lwd = c(1, 2))
# Check for stationarity using the KPSS test
kpss_result <- kpss.test(ts_obj)
cat("KPSS Test for Stationarity:\n")
print(kpss_result)
# ACF and PACF plots
par(mfrow = c(1, 2))
acf(ts_obj, main = paste("ACF for", ts_country))
pacf(ts_obj, main = paste("PACF for", ts_country))
par(mfrow = c(1, 1))
} else {
trend_fit <- loess(ts_obj ~ time(ts_obj), span = 0.25)
predictions <- predict(trend_fit)
plot(ts_obj, main = paste("Trend Analysis for", ts_country),
ylab = "Value", xlab = "Year", type = "l")
lines(ts_obj, col = "black")
# Make sure they have the same length
lines(predictions, col = "red", lwd = 2)
if(length(time(ts_obj)) == length(predictions)) {
lines(c(time(ts_obj)), c(predictions), col = "red", lwd = 2)
} else {
# Alternative approach
lines(predictions, col = "red", lwd = 2)
}
print("HERE 4")
legend("topleft", legend = c("Original data", "LOESS trend"),
col = c("black", "red"), lty = 1, lwd = c(1, 2))
print("HERE 5")
# Check for stationarity using the KPSS test
kpss_result <- kpss.test(ts_obj)
cat("KPSS Test for Stationarity:\n")
print(kpss_result)
# ACF and PACF plots
par(mfrow = c(1, 2))
acf(ts_obj, main = paste("ACF for", ts_country))
pacf(ts_obj, main = paste("PACF for", ts_country))
par(mfrow = c(1, 1))
}
}
## [1] "HERE 4"
## [1] "HERE 5"
## KPSS Test for Stationarity:
##
## KPSS Test for Level Stationarity
##
## data: ts_obj
## KPSS Level = 0.31725, Truncation lag parameter = 3, p-value = 0.1
I analyze long-term trends in wealth inequality using linear regression and smoothing techniques.
# Analyze trends using linear models
trend_models <- wealth_ts_data %>%
filter(percentile == "p90p100") %>% # Focus on top 10%
group_by(shortname) %>%
nest() %>%
mutate(
# Linear model
lm_model = map(data, ~lm(value ~ year, data = .x)),
# Model coefficients
coefficients = map(lm_model, ~coefficients(.x)),
# Slope (annual change)
slope = map_dbl(coefficients, ~.x[2]),
# R-squared
r_squared = map_dbl(lm_model, ~summary(.x)$r.squared)
)
# Display trend results
trend_results <- trend_models %>%
select(shortname, slope, r_squared) %>%
arrange(desc(slope))
print(trend_results)
## # A tibble: 8 × 3
## # Groups: shortname [8]
## shortname slope r_squared
## <chr> <dbl> <dbl>
## 1 Russian Federation 0.000392 0.0253
## 2 India 0.0000200 0.0000955
## 3 Brazil -0.0000780 0.0620
## 4 South Africa -0.000223 0.0218
## 5 USA -0.000295 0.0366
## 6 Australia -0.000669 0.189
## 7 China -0.000759 0.179
## 8 France -0.00160 0.664
# Visualize trends with fitted lines
ggplot(wealth_ts_data %>%
filter(percentile == "p90p100"),
aes(x = year, y = value, color = shortname)) +
geom_point(alpha = 0.6) +
geom_smooth(method = "lm", se = FALSE, linewidth = 1) +
labs(title = "Top 10% Share Trends with Linear Fit",
x = "Year",
y = "Share (%)",
color = "Country") +
theme_minimal() +
theme(legend.position = "bottom") +
scale_color_viridis_d()
I apply various smoothing techniques to better visualize underlying trends.
# Apply different smoothing techniques to the US data
us_data <- wealth_ts_data %>%
filter(country == "US", percentile == "p90p100") %>%
arrange(year)
# Create a time series object
us_ts <- ts(us_data$value, start = min(us_data$year), frequency = 1)
# Simple Moving Average (SMA)
us_sma <- stats::filter(us_ts, rep(1/5, 5), sides = 2)
# Exponentially Weighted Moving Average (EWMA)
us_ewma <- stats::filter(us_ts, filter = c(0.3, 0.7), method = "recursive")
# Holt-Winters Exponential Smoothing
if(length(us_ts) > 2) {
us_hw <- HoltWinters(us_ts, beta = FALSE, gamma = FALSE)
us_hw_fitted <- us_hw$fitted[,1]
} else {
us_hw_fitted <- us_ts
}
# Combine smoothed series
n_rows <- nrow(us_data)
us_data$sma <- c(as.numeric(us_sma))
#us_data$ewma <- c(NA, as.numeric(us_ewma))
us_data$hw <- c(NA, as.numeric(us_hw_fitted))
# Plot original data and smoothed versions
#ggplot(us_data, aes(x = year)) +
#geom_line(aes(y = value), color = "black", linewidth = 0.7) +
#geom_line(aes(y = sma, color = "SMA (5-year)"), linewidth = 1, na.rm = TRUE) +
#geom_line(aes(y = ewma, color = "EWMA"), linewidth = 1, na.rm = TRUE) +
#geom_line(aes(y = hw, color = "Holt-Winters"), linewidth = 1, na.rm = TRUE) +
#labs(title = "Smoothing Techniques Applied to US Top 10% Share",
# x = "Year",
# y = "Share (%)",
# color = "Smoothing Method") +
# theme_minimal() +
# theme(legend.position = "bottom") +
# scale_color_manual(values = c("SMA (5-year)" = "blue",
# "EWMA" = "red",
# "Holt-Winters" = "green"))
#To be continued, doesn't compile yet
Next, I develop forecasting models to predict future trends in wealth inequality.
# Select countries with longer time series for better forecasting
forecast_countries <- wealth_ts_data %>%
filter(percentile == "p90p100") %>%
group_by(country) %>%
summarise(n_years = n_distinct(year)) %>%
filter(n_years >= 10) %>%
pull(country)
# If no countries have enough data, use the one with the most data
if(length(forecast_countries) == 0) {
forecast_countries <- wealth_ts_data %>%
filter(percentile == "p90p100") %>%
group_by(country) %>%
summarise(n_years = n_distinct(year)) %>%
arrange(desc(n_years)) %>%
slice(1) %>%
pull(country)
}
# Prepare data for forecasting
forecast_data <- wealth_ts_data %>%
filter(country %in% forecast_countries[1],
percentile == "p90p100") %>%
arrange(year)
For more sophisticated time series analysis, I implement ARIMA models.
# Select a country with sufficient data for ARIMA modeling
arima_country <- forecast_countries[1]
# Prepare data
arima_data <- wealth_ts_data %>%
filter(country == arima_country,
percentile == "p90p100") %>%
arrange(year)
# Create time series object
arima_ts <- ts(arima_data$value,
start = min(arima_data$year),
frequency = 1)
# Analyze if there's enough data
if(length(arima_ts) >= 10) {
# Check stationarity
adf_test <- adf.test(arima_ts)
print(paste("ADF Test p-value:", adf_test$p.value))
# First differencing if non-stationary
if(adf_test$p.value > 0.05) {
diff_ts <- diff(arima_ts)
adf_diff <- adf.test(diff_ts)
print(paste("ADF Test after differencing p-value:", adf_diff$p.value))
} else {
diff_ts <- arima_ts
}
# Auto ARIMA
auto_arima <- auto.arima(arima_ts)
print(summary(auto_arima))
# Forecast with ARIMA
arima_fc <- forecast(auto_arima, h = 10)
# Plot forecast
plot(arima_fc,
main = paste("ARIMA Forecast for", arima_country, "Top 10% Share"),
xlab = "Year",
ylab = "Share (%)")
# Diagnostic checks
checkresiduals(auto_arima)
}
## [1] "ADF Test p-value: 0.289577265532718"
## [1] "ADF Test after differencing p-value: 0.01"
## Series: arima_ts
## ARIMA(4,1,0)
##
## Coefficients:
## ar1 ar2 ar3 ar4
## -0.4257 0.0280 -0.2387 0.2706
## s.e. 0.0631 0.0671 0.0671 0.0628
##
## sigma^2 = 0.0003703: log likelihood = 585.59
## AIC=-1161.17 AICc=-1160.91 BIC=-1143.96
##
## Training set error measures:
## ME RMSE MAE MPE MAPE MASE
## Training set -0.0006451763 0.01903509 0.01116655 -0.3512843 3.205632 0.2213399
## ACF1
## Training set 0.02008934
##
## Ljung-Box test
##
## data: Residuals from ARIMA(4,1,0)
## Q* = 5.2833, df = 6, p-value = 0.508
##
## Model df: 4. Total lags used: 10
In this section, I test specific hypotheses about wealth inequality trends.
# Test for structural breaks in the inequality trend
struc_break_country <- forecast_countries[1]
struc_break_data <- wealth_ts_data %>%
filter(country == struc_break_country,
percentile == "p90p100") %>%
arrange(year)
#to be continued
The Kuznets curve hypothesis suggests that inequality first rises and then falls with economic development.
# I'll use GDP per capita as a proxy for development level
# For simplicity, I'll simulate GDP data if not available
kuznets_data <- wealth_ts_data %>%
filter(percentile == "p90p100") %>%
mutate(
# Simulate GDP per capita (in practice, you would join with actual GDP data)
gdp_per_capita = year * 100 + rnorm(n(), 0, 500)
)
# Test for quadratic relationship (Kuznets curve)
kuznets_models <- kuznets_data %>%
group_by(country) %>%
filter(n() >= 10) %>% # Need sufficient data points
nest() %>%
mutate(
# Linear model
lm_model = map(data, ~lm(value ~ gdp_per_capita, data = .x)),
# Quadratic model (Kuznets curve)
quad_model = map(data, ~lm(value ~ gdp_per_capita + I(gdp_per_capita^2), data = .x)),
# Model comparison
linear_aic = map_dbl(lm_model, ~AIC(.x)),
quad_aic = map_dbl(quad_model, ~AIC(.x)),
# Is Kuznets curve supported? (lower AIC for quadratic model)
kuznets_supported = quad_aic < linear_aic,
# Coefficient for quadratic term
quad_coef = map_dbl(quad_model, ~coef(.x)[3])
)
# Display results
kuznets_results <- kuznets_models %>%
select(country, linear_aic, quad_aic, kuznets_supported, quad_coef) %>%
arrange(desc(kuznets_supported))
print(kuznets_results)
## # A tibble: 8 × 5
## # Groups: country [8]
## country linear_aic quad_aic kuznets_supported quad_coef
## <chr> <dbl> <dbl> <lgl> <dbl>
## 1 US -1754. -1791. TRUE 7.55e-10
## 2 BR -377. -402. TRUE -2.99e-10
## 3 FR -981. -1023. TRUE 7.88e-10
## 4 RU -125. -168. TRUE 2.71e- 9
## 5 CN -144. -144. TRUE 4.94e-10
## 6 IN -205. -245. TRUE 1.79e- 9
## 7 ZA -174. -177. TRUE 6.60e-10
## 8 AU -705. -732. TRUE 8.27e-10
# Plot a representative case if available
if(nrow(kuznets_results) > 0) {
kuznets_country <- kuznets_results$country[1]
# Get data for the selected country
country_data <- kuznets_data %>%
filter(country == kuznets_country) %>%
arrange(gdp_per_capita)
# Fit quadratic model
quad_fit <- lm(value ~ gdp_per_capita + I(gdp_per_capita^2), data = country_data)
# Generate predictions for smooth curve
pred_data <- data.frame(
gdp_per_capita = seq(min(country_data$gdp_per_capita),
max(country_data$gdp_per_capita),
length.out = 100)
)
pred_data$predicted <- predict(quad_fit, pred_data)
# Plot
ggplot(country_data, aes(x = gdp_per_capita, y = value)) +
geom_point() +
geom_line(data = pred_data, aes(x = gdp_per_capita, y = predicted),
color = "blue", linewidth = 1) +
labs(title = paste("Testing Kuznets Curve for", kuznets_country),
subtitle = ifelse(kuznets_results$kuznets_supported[1],
"Kuznets hypothesis supported",
"Kuznets hypothesis not supported"),
x = "GDP per Capita (simulated)",
y = "Top 10% Share (%)") +
theme_minimal()
}
I examine how inequality trends are correlated across countries.
# Create a wide format dataset for correlation analysis
wide_data <- wealth_ts_data %>%
filter(percentile == "p90p100") %>%
select(country, year, value) %>%
pivot_wider(names_from = country, values_from = value) %>%
drop_na() # Remove rows with missing data
#to be continued
I compare wealth distribution patterns across different regions to identify geographical disparities.
# Calculate average shares by region and percentile group
regional_shares <- wealth_ts_data %>%
filter(percentile %in% c("p0p50", "p50p90", "p90p100")) %>%
group_by(region, percentile, year) %>%
summarise(avg_share = mean(value, na.rm = TRUE)) %>%
ungroup()
# Create more descriptive percentile labels
percentile_labels <- c(
"p0p50" = "Bottom 50%",
"p50p90" = "Middle 40%",
"p90p100" = "Top 10%"
)
# Visualize regional patterns
ggplot(regional_shares %>% filter(year >= 2000),
aes(x = year, y = avg_share, color = region)) +
geom_line(linewidth = 1) +
facet_wrap(~percentile, scales = "free_y",
labeller = labeller(percentile = percentile_labels)) +
labs(title = "Regional Wealth Distribution Patterns (2000-Present)",
x = "Year",
y = "Average Share (%)",
color = "Region") +
theme_minimal() +
theme(legend.position = "bottom") +
scale_color_viridis_d()
#to be continued
I analyze how quickly inequality metrics have changed over time.
# Calculate year-over-year changes in top 10% share
velocity_data <- wealth_ts_data %>%
filter(percentile == "p90p100") %>%
arrange(country, year) %>%
group_by(country, shortname) %>%
mutate(
prev_year = lag(year),
prev_value = lag(value),
year_gap = year - prev_year,
# Calculate annual rate of change, adjusted for gaps in years
annual_change = (value - prev_value) / year_gap,
# Calculate percentage change
pct_change = annual_change / prev_value * 100
) %>%
filter(!is.na(annual_change)) %>%
ungroup()
# Plot annual changes by country
ggplot(velocity_data, aes(x = year, y = annual_change, color = shortname)) +
geom_line() +
geom_point() +
geom_hline(yintercept = 0, linetype = "dashed", color = "gray50") +
labs(title = "Velocity of Change in Top 10% Share",
subtitle = "Annual percentage point change",
x = "Year",
y = "Annual Change (percentage points)",
color = "Country") +
theme_minimal() +
theme(legend.position = "bottom") +
scale_color_viridis_d()
# Identify periods of rapid change
rapid_changes <- velocity_data %>%
filter(abs(annual_change) > quantile(abs(annual_change), 0.75, na.rm = TRUE)) %>%
arrange(desc(abs(annual_change)))
# Display top periods of rapid change
head(rapid_changes %>%
select(shortname, year, value, annual_change, pct_change), 10)
## # A tibble: 0 × 5
## # ℹ 5 variables: shortname <chr>, year <int>, value <dbl>, annual_change <dbl>,
## # pct_change <dbl>
# Calculate average velocity by decade
decade_velocity <- velocity_data %>%
mutate(decade = floor(year / 10) * 10) %>%
group_by(shortname, decade) %>%
summarise(
avg_annual_change = mean(annual_change, na.rm = TRUE),
avg_pct_change = mean(pct_change, na.rm = TRUE),
volatility = sd(annual_change, na.rm = TRUE)
) %>%
ungroup()
# Plot decade velocity
ggplot(decade_velocity,
aes(x = as.factor(decade), y = avg_annual_change, fill = shortname)) +
geom_bar(stat = "identity", position = "dodge") +
labs(title = "Average Annual Change in Top 10% Share by Decade",
x = "Decade",
y = "Average Annual Change (percentage points)",
fill = "Country") +
theme_minimal() +
theme(legend.position = "bottom",
axis.text.x = element_text(angle = 45, hjust = 1)) +
scale_fill_viridis_d()
#to be continued
I implement neural network models for more sophisticated forecasting.
# This code is shown but not evaluated due to potential package dependencies
library(neuralnet)
# Prepare data for neural network
nn_country <- forecast_countries[1]
nn_data <- wealth_ts_data %>%
filter(country == nn_country, percentile == "p90p100") %>%
arrange(year)
# Create lagged features for time series prediction
nn_features <- nn_data %>%
mutate(
target = value,
lag1 = lag(value, 1),
lag2 = lag(value, 2),
trend = row_number()
) %>%
filter(!is.na(lag2))
# Scale the data
scale_factors <- list(
target_mean = mean(nn_features$target),
target_sd = sd(nn_features$target),
lag1_mean = mean(nn_features$lag1),
lag1_sd = sd(nn_features$lag1),
lag2_mean = mean(nn_features$lag2),
lag2_sd = sd(nn_features$lag2)
)
nn_scaled <- nn_features %>%
mutate(
target_scaled = (target - scale_factors$target_mean) / scale_factors$target_sd,
lag1_scaled = (lag1 - scale_factors$lag1_mean) / scale_factors$lag1_sd,
lag2_scaled = (lag2 - scale_factors$lag2_mean) / scale_factors$lag2_sd,
trend_scaled = (trend - min(trend)) / (max(trend) - min(trend))
)
# Split into training and test sets
train_idx <- 1:floor(0.8 * nrow(nn_scaled))
train_nn <- nn_scaled[train_idx, ]
test_nn <- nn_scaled[-train_idx, ]
# Train neural network
nn_model <- neuralnet(
target_scaled ~ lag1_scaled + lag2_scaled + trend_scaled,
data = train_nn,
hidden = c(3, 2),
linear.output = TRUE
)
# Generate predictions
#train_pred <- compute(nn_model, train_nn[, c("lag1_scaled", #"lag2_scaled", "trend_scaled")])
#test_pred <- compute(nn_model, test_nn[, c("lag1_scaled", #"lag2_scaled", "trend_scaled")])
# Unscale predictions
#train_nn$prediction <- train_pred$net.result * scale_factors$target_sd #+ scale_factors$target_mean
#test_nn$prediction <- test_pred$net.result * scale_factors$target_sd + #scale_factors$target_mean
# Calculate accuracy metrics
#train_rmse <- sqrt(mean((train_nn$target - train_nn$prediction)^2))
#test_rmse <- sqrt(mean((test_nn$target - test_nn$prediction)^2))
#cat("Neural Network Training RMSE:", train_rmse, "\n")
#cat("Neural Network Test RMSE:", test_rmse, "\n")
# Plot predictions
#nn_results <- bind_rows(
# train_nn %>% mutate(set = "Training"),
# test_nn %>% mutate(set = "Test")
#)
#ggplot(nn_results, aes(x = year)) +
# geom_line(aes(y = target, color = "Actual")) +
#geom_line(aes(y = prediction, color = "Predicted")) +
# geom_vline(xintercept = nn_data$year[max(train_idx)],
# linetype = "dashed", color = "gray") +
# facet_wrap(~set, scales = "free_x") +
# labs(title = paste("Neural Network Forecast for", nn_country, "Top 10% Share"),
# x = "Year",
# y = "Share (%)",
# color = "Series") +
# theme_minimal() +
# scale_color_manual(values = c("Actual" = "black", "Predicted" = #"blue"))
#to be continued!!!
I examine whether wealth inequality across different countries is converging or diverging over time.
# Calculate dispersion metrics for top 10% share across countries
convergence_data <- wealth_ts_data %>%
filter(percentile == "p90p100") %>%
group_by(year) %>%
summarise(
n_countries = n(),
mean_share = mean(value, na.rm = TRUE),
median_share = median(value, na.rm = TRUE),
sd_share = sd(value, na.rm = TRUE),
cv_share = sd_share / mean_share, # Coefficient of variation
min_share = min(value, na.rm = TRUE),
max_share = max(value, na.rm = TRUE),
range_share = max_share - min_share
) %>%
filter(n_countries >= 3) # Need at least 3 countries for meaningful dispersion
# Plot trends in dispersion
dispersion_plot <- ggplot(convergence_data, aes(x = year)) +
geom_line(aes(y = cv_share), color = "blue", linewidth = 1) +
labs(title = "Convergence/Divergence in Top 10% Share",
subtitle = "Coefficient of Variation across Countries",
x = "Year",
y = "Coefficient of Variation") +
theme_minimal()
print(dispersion_plot)
# Plot min, max, and mean trends
range_plot <- ggplot(convergence_data, aes(x = year)) +
geom_ribbon(aes(ymin = min_share, ymax = max_share), alpha = 0.2) +
geom_line(aes(y = mean_share), color = "blue", linewidth = 1) +
labs(title = "Range of Top 10% Share Across Countries",
subtitle = "Shaded area shows min-max range",
x = "Year",
y = "Share (%)") +
theme_minimal()
print(range_plot)
# Test for sigma convergence (declining dispersion over time)
if(nrow(convergence_data) >= 10) {
sigma_model <- lm(cv_share ~ year, data = convergence_data)
sigma_coef <- coef(sigma_model)[2]
sigma_pvalue <- summary(sigma_model)$coefficients[2,4]
sigma_result <- ifelse(sigma_coef < 0 && sigma_pvalue < 0.05,
"Evidence of sigma convergence",
ifelse(sigma_coef > 0 && sigma_pvalue < 0.05,
"Evidence of sigma divergence",
"No significant trend in dispersion"))
cat("Sigma Convergence Test:\n")
cat(sigma_result, "\n")
cat("Annual change in coefficient of variation:", sigma_coef, "\n")
cat("p-value:", sigma_pvalue, "\n")
}
## Sigma Convergence Test:
## Evidence of sigma divergence
## Annual change in coefficient of variation: 0.00061249
## p-value: 3.305081e-07
I develop predictive models to forecast future wealth inequality trends.
# Prepare data for predictive modeling
model_data <- wealth_ts_data %>%
filter(percentile == "p90p100") %>%
# Create additional features
mutate(
year_squared = year^2,
decade = floor(year / 10) * 10
) %>%
# Create lagged features
group_by(country) %>%
mutate(
value_lag1 = lag(value, 1),
value_lag2 = lag(value, 2),
value_lag3 = lag(value, 3)
) %>%
ungroup() %>%
# Remove rows with missing lagged values
filter(!is.na(value_lag3))
# Split data into training and testing sets
set.seed(123)
train_indices <- sample(1:nrow(model_data), 0.8 * nrow(model_data))
train_data <- model_data[train_indices, ]
test_data <- model_data[-train_indices, ]
# Fit multiple prediction models
# Linear model with trend
linear_model <- lm(value ~ year, data = train_data)
# Polynomial model
poly_model <- lm(value ~ year + year_squared, data = train_data)
# Autoregressive model
ar_model <- lm(value ~ value_lag1 + value_lag2 + value_lag3, data = train_data)
# Combined model
combined_model <- lm(value ~ year + value_lag1 + country, data = train_data)
# Generate predictions for test data
test_data$pred_linear <- predict(linear_model, test_data)
test_data$pred_poly <- predict(poly_model, test_data)
test_data$pred_ar <- predict(ar_model, test_data)
test_data$pred_combined <- predict(combined_model, test_data)
# Calculate error metrics
calculate_errors <- function(actual, predicted) {
mae <- mean(abs(actual - predicted))
rmse <- sqrt(mean((actual - predicted)^2))
mape <- mean(abs((actual - predicted) / actual)) * 100
c(MAE = mae, RMSE = rmse, MAPE = mape)
}
errors_linear <- calculate_errors(test_data$value, test_data$pred_linear)
errors_poly <- calculate_errors(test_data$value, test_data$pred_poly)
errors_ar <- calculate_errors(test_data$value, test_data$pred_ar)
errors_combined <- calculate_errors(test_data$value, test_data$pred_combined)
# Compare model performance
model_performance <- data.frame(
Model = c("Linear Trend", "Polynomial", "Autoregressive", "Combined"),
MAE = c(errors_linear[1], errors_poly[1], errors_ar[1], errors_combined[1]),
RMSE = c(errors_linear[2], errors_poly[2], errors_ar[2], errors_combined[2]),
MAPE = c(errors_linear[3], errors_poly[3], errors_ar[3], errors_combined[3])
)
# Display model performance
model_performance %>% arrange(RMSE)
## Model MAE RMSE MAPE
## 1 Autoregressive 0.01349177 0.02554600 3.153682
## 2 Combined 0.01335032 0.02579882 3.153196
## 3 Polynomial 0.06682605 0.08836541 16.182767
## 4 Linear Trend 0.07172124 0.09208327 17.393775
# Visualize predictions vs actual values
# Select a subset for clarity in visualization
test_subset <- test_data %>%
filter(country %in% c("US", "FR", "CN")) %>%
head(20)
# Reshape for plotting
test_long <- test_subset %>%
select(country, shortname, year, value, starts_with("pred_")) %>%
pivot_longer(
cols = c(value, starts_with("pred_")),
names_to = "model",
values_to = "share"
) %>%
mutate(
model = case_when(
model == "value" ~ "Actual",
model == "pred_linear" ~ "Linear",
model == "pred_poly" ~ "Polynomial",
model == "pred_ar" ~ "Autoregressive",
model == "pred_combined" ~ "Combined",
TRUE ~ model
)
)
# Plot actual vs predicted
ggplot(test_long, aes(x = year, y = share, color = model, linetype = model)) +
geom_line() +
geom_point() +
facet_wrap(~shortname, scales = "free_y") +
labs(title = "Model Predictions vs Actual Values",
x = "Year",
y = "Top 10% Share (%)",
color = "Model",
linetype = "Model") +
theme_minimal() +
theme(legend.position = "bottom")
# Create a final comprehensive visualization
final_viz_data <- wealth_ts_data %>%
filter(percentile %in% c("p0p50", "p90p100")) %>%
select(country, shortname, region, year, percentile, value) %>%
mutate(percentile_label = if_else(percentile == "p0p50", "Bottom 50%", "Top 10%"))
# Create time series panel
ggplot(final_viz_data,
aes(x = year, y = value, color = percentile_label)) +
geom_line(linewidth = 1) +
facet_wrap(~shortname, scales = "free_y", ncol = 3) +
labs(title = "Global Wealth Inequality: Top 10% vs Bottom 50% Share",
subtitle = "Trends across major economies",
x = "Year",
y = "Share (%)",
color = "Group") +
theme_minimal() +
theme(legend.position = "bottom",
strip.background = element_rect(fill = "lightgray", color = NA),
strip.text = element_text(face = "bold")) +
scale_color_manual(values = c("Top 10%" = "red", "Bottom 50%" = "blue"))
To better understand the relationship between wealth concentration and income distribution, I examine wealth-to-income ratios across countries and time periods.
# Calculate wealth-to-income ratio proxies where both measures are available
wealth_income_data <- wealth_ts_data %>%
filter(grepl("sweal|sptinc", variable)) %>%
select(country, shortname, region, year, variable, percentile, value) %>%
# Create indicator for wealth vs income variable
mutate(
variable_type = case_when(
grepl("sweal", variable) ~ "wealth",
grepl("sptinc", variable) ~ "income",
TRUE ~ "other"
)
) %>%
filter(variable_type %in% c("wealth", "income"))
#to be continued with Mobility Analysis
# Calculate transitions between percentile groups
# I'll use the rate of change as a proxy for mobility potential
mobility_data <- wealth_ts_data %>%
filter(percentile %in% c("p0p50", "p50p90", "p90p100")) %>%
select(country, shortname, year, percentile, value) %>%
arrange(country, percentile, year) %>%
group_by(country, percentile) %>%
mutate(
# Calculate year-over-year change in share
share_change = value - lag(value),
# Calculate percentage change in share
pct_change = (share_change / lag(value)) * 100
) %>%
ungroup() %>%
filter(!is.na(pct_change))
# Calculate volatility (standard deviation of percentage changes) by country and percentile
mobility_volatility <- mobility_data %>%
group_by(country, shortname, percentile) %>%
summarise(
volatility = sd(pct_change, na.rm = TRUE),
mean_change = mean(pct_change, na.rm = TRUE),
n_obs = n()
) %>%
filter(n_obs >= 5) %>% # Ensure enough observations for meaningful volatility
ungroup() %>%
mutate(
percentile_label = case_when(
percentile == "p0p50" ~ "Bottom 50%",
percentile == "p50p90" ~ "Middle 40%",
percentile == "p90p100" ~ "Top 10%",
TRUE ~ percentile
)
)
# Plot mobility volatility by country and percentile
ggplot(mobility_volatility,
aes(x = reorder(shortname, volatility), y = volatility, fill = percentile_label)) +
geom_bar(stat = "identity", position = "dodge") +
coord_flip() +
labs(title = "Wealth Share Volatility by Country and Percentile Group",
subtitle = "Higher volatility suggests greater potential for wealth mobility",
x = "Country",
y = "Volatility (Std. Dev. of % Changes)",
fill = "Percentile Group") +
theme_minimal() +
scale_fill_viridis_d(option = "C", begin = 0.2, end = 0.8)
# Compare directional trend with volatility
mobility_scatterplot <- mobility_volatility %>%
filter(percentile == "p0p50") %>% # Focus on bottom 50%
ggplot(aes(x = mean_change, y = volatility, color = shortname)) +
geom_point(size = 3) +
geom_text(aes(label = shortname), hjust = -0.2, vjust = 0.5, size = 3) +
geom_hline(yintercept = 0, linetype = "dashed", color = "gray50") +
geom_vline(xintercept = 0, linetype = "dashed", color = "gray50") +
labs(title = "Bottom 50% Wealth Share: Trend vs. Volatility",
x = "Average Annual % Change",
y = "Volatility (Std. Dev. of % Changes)",
color = "Country") +
theme_minimal() +
theme(legend.position = "none") +
scale_color_viridis_d()
print(mobility_scatterplot)
I analyze how global economic shocks have impacted wealth inequality across countries.
# Define major global economic shocks
global_shocks <- data.frame(
event = c("Oil Crisis", "Black Monday", "Dot-com Bubble", "Global Financial Crisis", "COVID-19"),
year = c(1973, 1987, 2000, 2008, 2020)
)
# Function to analyze impact of shocks
shock_impact <- function(shock_year, window = 3) {
# Calculate average change before and after the shock
shock_data <- wealth_ts_data %>%
filter(percentile == "p90p100") %>%
mutate(
period = case_when(
year >= (shock_year - window) & year < shock_year ~ "Before",
year >= shock_year & year <= (shock_year + window) ~ "After",
TRUE ~ "Outside Window"
)
) %>%
filter(period != "Outside Window") %>%
group_by(country, shortname, period) %>%
summarise(avg_value = mean(value, na.rm = TRUE)) %>%
pivot_wider(names_from = period, values_from = avg_value) %>%
mutate(
change = After - Before,
pct_change = (change / Before) * 100
) %>%
filter(!is.na(change))
# Add shock information
shock_data$shock_year <- shock_year
shock_data$event <- global_shocks$event[global_shocks$year == shock_year]
return(shock_data)
}
# Apply to all shocks that fall within our data period
available_shock_years <- global_shocks$year[global_shocks$year %in%
unique(wealth_ts_data$year)]
shock_impacts <- map_dfr(available_shock_years, shock_impact)
# Plot shock impacts by country
ggplot(shock_impacts,
aes(x = event, y = change, fill = shortname)) +
geom_bar(stat = "identity", position = "dodge") +
geom_hline(yintercept = 0, linetype = "dashed", color = "gray50") +
labs(title = "Impact of Global Shocks on Top 10% Wealth Share",
subtitle = "Change in share before vs. after shock (3-year window)",
x = "Global Economic Event",
y = "Change in Top 10% Share (percentage points)",
fill = "Country") +
theme_minimal() +
theme(axis.text.x = element_text(angle = 45, hjust = 1),
legend.position = "bottom") +
scale_fill_viridis_d()
# Calculate average response by shock
shock_summary <- shock_impacts %>%
group_by(event, shock_year) %>%
summarise(
avg_change = mean(change, na.rm = TRUE),
median_change = median(change, na.rm = TRUE),
sd_change = sd(change, na.rm = TRUE),
n_countries = n()
) %>%
arrange(shock_year)
print(shock_summary)
## # A tibble: 5 × 6
## # Groups: event [5]
## event shock_year avg_change median_change sd_change n_countries
## <chr> <dbl> <dbl> <dbl> <dbl> <int>
## 1 Oil Crisis 1973 -0.0125 -0.0126 0.00859 5
## 2 Black Monday 1987 0.00796 0.00782 0.0114 8
## 3 Dot-com Bubble 2000 0.0195 0.0161 0.0163 8
## 4 Global Financial Cr… 2008 0.00841 0.00230 0.0190 8
## 5 COVID-19 2020 0.000962 0.00285 0.0232 8
I develop a composite inequality index to rank and compare countries.
#to be continued, have to work on that properly
I develop long-term forecasts for wealth inequality trends.
# Select countries with sufficient historical data for long-term forecasting
forecast_candidates <- wealth_ts_data %>%
filter(percentile == "p90p100") %>%
group_by(country, shortname) %>%
summarise(
n_years = n_distinct(year),
min_year = min(year),
max_year = max(year),
data_span = max_year - min_year
) %>%
filter(n_years >= 15, data_span >= 20) %>% # Need substantial historical data
arrange(desc(data_span))
# If I have suitable candidates, perform long-term forecasting
if(nrow(forecast_candidates) > 0) {
# Select top candidates
top_forecast_countries <- head(forecast_candidates, 3)$country
# Prepare data for forecasting
ltf_data <- wealth_ts_data %>%
filter(country %in% top_forecast_countries,
percentile == "p90p100") %>%
select(country, shortname, year, value) %>%
arrange(country, year)
# Create time series objects for each country
forecast_results <- data.frame()
forecast_horizon <- 30 # Forecast to 2050 or 30 years ahead
for(c in top_forecast_countries) {
country_name <- ltf_data$shortname[ltf_data$country == c][1]
country_data <- filter(ltf_data, country == c)
# Create time series
ts_obj <- ts(country_data$value,
start = min(country_data$year),
frequency = 1)
# Fit auto ARIMA model
model_fit <- auto.arima(ts_obj)
# Create forecast
fc <- forecast(model_fit, h = forecast_horizon)
# Extract forecast results
fc_years <- seq(from = max(country_data$year) + 1,
length.out = forecast_horizon)
# Store results
country_fc <- data.frame(
country = c,
shortname = country_name,
year = fc_years,
forecast = as.numeric(fc$mean),
lower_80 = as.numeric(fc$lower[,1]),
upper_80 = as.numeric(fc$upper[,1]),
lower_95 = as.numeric(fc$lower[,2]),
upper_95 = as.numeric(fc$upper[,2])
)
# Combine with historical data for plotting
country_historical <- data.frame(
country = c,
shortname = country_name,
year = country_data$year,
forecast = country_data$value,
lower_80 = NA,
upper_80 = NA,
lower_95 = NA,
upper_95 = NA
)
forecast_results <- rbind(forecast_results,
country_historical,
country_fc)
}
# Plot long-term forecasts
ggplot(forecast_results, aes(x = year, y = forecast, color = shortname)) +
geom_line(linewidth = 1) +
geom_ribbon(aes(ymin = lower_95, ymax = upper_95, fill = shortname),
alpha = 0.1, color = NA) +
geom_ribbon(aes(ymin = lower_80, ymax = upper_80, fill = shortname),
alpha = 0.2, color = NA) +
geom_vline(xintercept = max(ltf_data$year),
linetype = "dashed", color = "gray50") +
labs(title = "Long-term Forecast of Top 10% Wealth Share (2025-2050)",
subtitle = "Shaded areas represent 80% and 95% prediction intervals",
x = "Year",
y = "Top 10% Share (%)",
color = "Country",
fill = "Country") +
theme_minimal() +
theme(legend.position = "bottom") +
scale_color_viridis_d() +
scale_fill_viridis_d()
# Summarize forecast changes
fc_summary <- forecast_results %>%
filter(!is.na(lower_95)) %>% # Only forecast period
group_by(country, shortname) %>%
summarise(
start_value = first(forecast),
end_value = last(forecast),
absolute_change = end_value - start_value,
percent_change = (absolute_change / start_value) * 100,
avg_annual_change = absolute_change / forecast_horizon
) %>%
arrange(desc(absolute_change))
print(fc_summary)
}
## # A tibble: 3 × 7
## # Groups: country [3]
## country shortname start_value end_value absolute_change percent_change
## <chr> <chr> <dbl> <dbl> <dbl> <dbl>
## 1 AU Australia 0.322 0.328 0.00612 1.90
## 2 BR Brazil 0.592 0.592 0 0
## 3 CN China 0.433 0.433 0 0
## # ℹ 1 more variable: avg_annual_change <dbl>
I was interested how concrete policy would impacy wealth inequality. So, I decided to develop simplified simulation models to explore it.
# Simple simulation function for policy scenarios
simulate_policy_impact <- function(base_value, policy_effect, years) {
# Base_value: Starting inequality level
# Policy_effect: Annual percentage point change due to policy
# Years: Number of years to simulate
# Simple linear progression
simulated_values <- base_value + cumsum(rep(policy_effect, years))
# Ensure values stay within reasonable bounds (0-100%)
simulated_values <- pmin(pmax(simulated_values, 0), 100)
return(simulated_values)
}
# Policy scenarios for a representative country
# Using the US as an example
us_data <- wealth_ts_data %>%
filter(country == "US", percentile == "p90p100") %>%
arrange(desc(year))
# If we have recent US data
if(nrow(us_data) > 0) {
base_year <- max(us_data$year)
base_value <- us_data$value[us_data$year == base_year]
# Define policy scenarios
scenarios <- data.frame(
scenario = c("Status Quo", "Progressive Taxation", "Wealth Tax",
"Deregulation", "Education Investment"),
annual_effect = c(0.0, -0.3, -0.5, 0.2, -0.2) # percentage points per year
)
# Simulation period
sim_years <- 25
year_range <- base_year + 1:sim_years
# Run simulations
sim_results <- data.frame()
for(i in 1:nrow(scenarios)) {
scenario_name <- scenarios$scenario[i]
effect <- scenarios$annual_effect[i]
# Run simulation
sim_values <- simulate_policy_impact(base_value, effect, sim_years)
# Store results
scenario_results <- data.frame(
year = year_range,
scenario = scenario_name,
top10_share = sim_values
)
sim_results <- rbind(sim_results, scenario_results)
}
# Add historical data for context
historical <- us_data %>%
filter(year >= (base_year - 10)) %>%
select(year, value) %>%
mutate(scenario = "Historical") %>%
rename(top10_share = value)
all_results <- rbind(
historical,
sim_results
)
# Plot scenarios
ggplot(all_results, aes(x = year, y = top10_share, color = scenario)) +
geom_line(linewidth = 1) +
geom_point(data = filter(all_results, scenario == "Historical"),
size = 2) +
geom_vline(xintercept = base_year,
linetype = "dashed", color = "gray50") +
labs(title = "Policy Scenario Simulations for US Top 10% Wealth Share",
subtitle = paste("Base year:", base_year, "| Simulation period:", sim_years, "years"),
x = "Year",
y = "Top 10% Share (%)",
color = "Scenario") +
theme_minimal() +
theme(legend.position = "bottom") +
scale_color_viridis_d()
# Calculate end-of-period outcomes
scenario_outcomes <- sim_results %>%
filter(year == max(year)) %>%
mutate(
change_from_base = top10_share - base_value,
pct_change = (change_from_base / base_value) * 100
) %>%
arrange(top10_share)
print(scenario_outcomes)
}
## year scenario top10_share change_from_base pct_change
## 1 2048 Progressive Taxation 0.0000 -0.4676 -100.00
## 2 2048 Wealth Tax 0.0000 -0.4676 -100.00
## 3 2048 Education Investment 0.0000 -0.4676 -100.00
## 4 2048 Status Quo 0.4676 0.0000 0.00
## 5 2048 Deregulation 5.4676 5.0000 1069.29
I examine the spatial distribution of wealth inequality around the world.
# Create a dataset for mapping wealth inequality globally
map_data <- wealth_ts_data %>%
filter(percentile == "p90p100") %>%
group_by(country) %>%
# Get most recent value for each country
filter(year == max(year)) %>%
select(country, shortname, region, year, value) %>%
ungroup()
# Join country codes for mapping
# Note: In a real analysis, you would join with proper country codes and use packages
# like 'maps' or 'sf' to create actual maps. Here we'll simulate the output.
# Create a simplified visualization by region
regional_map <- map_data %>%
group_by(region) %>%
summarise(
avg_value = mean(value, na.rm = TRUE),
n_countries = n()
) %>%
arrange(desc(avg_value))
# Plot regional averages
ggplot(regional_map, aes(x = reorder(region, avg_value), y = avg_value, fill = avg_value)) +
geom_bar(stat = "identity") +
coord_flip() +
labs(title = "Global Distribution of Top 10% Wealth Share",
subtitle = "Most recent data by region",
x = "Region",
y = "Average Top 10% Share (%)",
fill = "Share (%)") +
theme_minimal() +
scale_fill_viridis_c(option = "B")
# Create visualization showing variation within regions
region_variation <- map_data %>%
group_by(region) %>%
filter(n() >= 2) %>% # Regions with at least 2 countries
summarise(
min_value = min(value, na.rm = TRUE),
max_value = max(value, na.rm = TRUE),
mean_value = mean(value, na.rm = TRUE),
range = max_value - min_value,
n_countries = n()
) %>%
arrange(desc(range))
# Plot intra-regional variation
ggplot(region_variation, aes(x = reorder(region, range))) +
geom_linerange(aes(ymin = min_value, ymax = max_value, color = mean_value),
linewidth = 1.5) +
geom_point(aes(y = mean_value, color = mean_value), size = 3) +
coord_flip() +
labs(title = "Intra-regional Variation in Top 10% Wealth Share",
subtitle = "Range from minimum to maximum with mean point",
x = "Region",
y = "Top 10% Share (%)",
color = "Mean Share (%)") +
theme_minimal() +
scale_color_viridis_c(option = "B")
My analysis of the World Inequality Database shows several key insights about global wealth inequality trends. I observed significant differences in wealth concentration patterns across regions, with some showing increasing inequality while others are stable or have even declining trends. I identified key turning points in inequality trends, often happening together with major economic events or policy changes. I also found out significant differences in wealth distribution across age groups, with adults generally showing different patterns compared to the full population. In most countries, the share of wealth held by the top 10% has shown an increasing trend over recent decades, while the bottom 50% share has either stagnated or declined.There is limited evidence for global convergence in inequality levels, suggesting that country-specific factors continue to play a dominant role in shaping wealth distribution patterns.
I think policies addressing wealth inequality should be tailored to specific country contexts, taking into account the historical trajectories and structural characteristics. Given the disparities across age groups, policies should consider differential impacts on various demographic segments. Future research should divide wealth inequality into its components (housing, financial assets, business equity) to provide more nuanced insights.Also probably more detailed analysis of inequality across demographic dimensions beyond age, such as gender, education, and occupation, would improve the understanding of distributional dynamics.