World Development Indicators provides measures of social progress, quality of life, economic development, physical infrastructure, environment and government performance.
World Development Indicators (WDI) is the primary World Bank collection of development indicators, compiled from officially recognized international sources. It presents the most current and accurate global development data available, and includes national, regional and global estimates.
Note: Even though Global Development Finance (GDF) is no longer listed in the WDI database name, all external debt and financial flows data continue to be included in WDI. The GDF publication has been renamed International Debt Statistics (IDS), and has its own separate database, as well.
Last Updated:12/16/2024 Dataset source: https://databank.worldbank.org/source/world-development-indicators/Type/TABLE/preview/on
This dataset is World Development Indicators with multiple indicators including Population growth, GNI, GDP, Inflation etc
This report is focusing on the report from the year 2023. There were missing values for some indicators, some countries. All the countries with missing values were removed, indictors with no number published were removed
Dealing missing values with estimated number (mean/median/mode/regression methods) will not represent the true value of these countries and the countries are having very different economical conditions. It will not give us the accurate interpretation if we used any of these artificial data. Hence, the countries/indicators with empty values were removed
The aim of this report is to find out what could be the main indicators to represent the development of countries
Initial hypothesis is to use PCA to determine the optimal dimension to represent the most variance
There is a possibility that the quality factors might be related/similar to each other. We will apply MDS to find out the indicators with similarities. (We may use only one of these indicators in the to represent the world development)
Hence, we would apply PCA method on the dataset and interpret the results and see how well PCA fit this dataset
options(repos = c(CRAN = "https://cloud.r-project.org"))
install.packages("caret")
install.packages("factoextra")
install.packages("clusterSim")
install.packages("corrplot")
install.packages("psych")
install.packages("smacof")
install.packages("ClusterR")
install.packages("gridExtra")
install.packages("skewness")
install.packages("ggfortify")
library(ggfortify)
library(MASS)
library(e1071)
library(caret)
library(factoextra)
library(clusterSim)
library(corrplot)
library(psych)
library(smacof)
library(ClusterR)
library(gridExtra)
library(ggplot2)
library(reshape2)
library(GGally)
# Load the dataset
dev_indicator <- read.csv("developement_indicator.csv", sep=",", dec=".", header=TRUE, fileEncoding = "Latin1")
# Shorten column names
colnames(dev_indicator)[3:15] <- c(
"Pop_Total",
"Pop_Growth",
"GNI_Atlas",
"GNI_PC_Atlas",
"GNI_PPP",
"GNI_PC_PPP",
"Immunization_Measles",
"GDP_USD",
"GDP_Growth",
"Inflation_GDP",
"Trade_GDP",
"Net_Migration",
"FDI_Net_Inflow"
)
# Checking the new column names
colnames(dev_indicator)
## [1] "Country.Name" "Country.Code" "Pop_Total"
## [4] "Pop_Growth" "GNI_Atlas" "GNI_PC_Atlas"
## [7] "GNI_PPP" "GNI_PC_PPP" "Immunization_Measles"
## [10] "GDP_USD" "GDP_Growth" "Inflation_GDP"
## [13] "Trade_GDP" "Net_Migration" "FDI_Net_Inflow"
# Checking the dataset dimensions: 165 countries with 13 indicators
dim(dev_indicator)
## [1] 165 15
# Clearing the labels, keep the data only. 1st row: world development indicators; 1st and 2nd columns: countries names and codes
country_list<-dev_indicator[,1] # first column: all the countries
indicator_list<-colnames(dev_indicator[3:15]) # first row: all indicators
dev_indicator_data<-as.matrix(dev_indicator[,3:15]) # data only
head(dev_indicator_data)
## Pop_Total Pop_Growth GNI_Atlas GNI_PC_Atlas GNI_PPP GNI_PC_PPP
## [1,] 2745972 -1.1484176 21085728669 7680 5.943826e+10 21650
## [2,] 46164219 1.4989760 228481173658 4950 7.654448e+11 16580
## [3,] 36749906 3.0806553 78048984687 2120 2.673950e+11 7280
## [4,] 93316 0.5114002 1884713523 20200 2.862820e+09 30680
## [5,] 45538401 0.2869761 586900838957 12890 1.341603e+12 29460
## [6,] 2990900 0.7281789 20271184637 6780 6.210631e+10 20770
## Immunization_Measles GDP_USD GDP_Growth Inflation_GDP Trade_GDP
## [1,] 83 23547179830 3.936625 6.066049 55.49709
## [2,] 99 247626161016 4.100000 0.854841 40.30188
## [3,] 50 84824654482 1.001289 17.619538 64.08514
## [4,] 94 2033085185 3.862012 4.805476 42.98885
## [5,] 80 646075277525 -1.611002 135.368876 21.74716
## [6,] 96 24085749592 8.300000 2.673004 86.03843
## Net_Migration FDI_Net_Inflow
## [1,] -25357 1620982551
## [2,] -25963 1215776627
## [3,] -995 -2119632186
## [4,] 0 300597732
## [5,] 4133 23866141440
## [6,] 75000 580365079
# Extracting the data (excluding country names and indicator names)
dev_indicator_data <- as.matrix(dev_indicator[, 3:15])
# Calculate the correlation matrix to check for multicollinearity
cor_matrix <- cor(dev_indicator_data, use = "complete.obs")
# Visualize the correlation matrix with a heatmap
corr_melt <- melt(cor_matrix)
ggplot(corr_melt, aes(Var1, Var2, fill = value)) +
geom_tile() +
scale_fill_gradient2(low = "blue", high = "red", mid = "white", midpoint = 0, limit = c(-1, 1)) +
theme(axis.text.x = element_text(angle = 45, hjust = 1)) +
labs(title = "Correlation Matrix")
cor<-cor(dev_indicator_data, method="pearson")
corrplot(cor)
# Visualize the relationships between variables using scatter plots
ggpairs(dev_indicator_data) # This will show scatter plots between all pairs of variables
# We can observe some strong correlations in these visualisations
# Dataset skewness check before performing PCA
skewness <- skewness(dev_indicator_data)
skewness
## [1] 16.4182
# The skewness of the current dataset is 16.4182 which indicates that the dataset is right skewed
# Hence, we need to perform Box and Cox transformation to normalise the dataset
# Shift data to ensure all values are positive
shifted_data <- dev_indicator_data + abs(min(dev_indicator_data)) + 1
# Apply Box-Cox transformation
boxcox_result <- boxcox(lm(shifted_data ~ 1), lambda = seq(-2, 2, 0.1))
# Find optimal lambda
optimal_lambda <- boxcox_result$x[which.max(boxcox_result$y)]
print(paste("Optimal Lambda:", optimal_lambda))
## [1] "Optimal Lambda: 0.101010101010101"
# Transform data using optimal lambda
if (optimal_lambda == 0) {
boxcox_transformed <- log(shifted_data)
} else {
boxcox_transformed <- (shifted_data^optimal_lambda - 1) / optimal_lambda
}
# Recalculate skewness
new_skewness <- skewness(boxcox_transformed)
print(paste("Skewness after Box-Cox:", new_skewness))
## [1] "Skewness after Box-Cox: 2.61540035739465"
dev_indicator_scaled <- boxcox_transformed
# Skewness after Box-Cox: 2.61540035739465 which is a significant improvement comparing to the result before Box-Cox transformation
# After the dataset is normalised, we need to standardise the dataset in order for future visualisation and analysis since different these indicators are measured in different ways
preproc1 <- preProcess(dev_indicator_scaled, method=c("center", "scale"))
dev_indicator_scaled <- predict(preproc1, dev_indicator_scaled)
summary(dev_indicator_scaled)
## Pop_Total Pop_Growth GNI_Atlas GNI_PC_Atlas
## Min. :-0.28717 Min. :-6.319170 Min. :-0.6281 Min. :-0.7676
## 1st Qu.:-0.27102 1st Qu.:-0.548228 1st Qu.:-0.5758 1st Qu.:-0.6659
## Median :-0.22459 Median :-0.024720 Median :-0.4418 Median :-0.4696
## Mean : 0.00000 Mean : 0.000097 Mean : 0.0000 Mean : 0.0000
## 3rd Qu.:-0.07827 3rd Qu.: 0.594478 3rd Qu.: 0.2631 3rd Qu.: 0.2440
## Max. : 8.59786 Max. : 3.504670 Max. : 5.7166 Max. : 3.8762
## GNI_PPP GNI_PC_PPP Immunization_Measles GDP_USD
## Min. :-0.7718 Min. :-1.0317 Min. :-4.849921 Min. :-0.6369
## 1st Qu.:-0.6617 1st Qu.:-0.8065 1st Qu.:-0.359419 1st Qu.:-0.5818
## Median :-0.4118 Median :-0.3409 Median : 0.337387 Median :-0.4172
## Mean : 0.0000 Mean : 0.0000 Mean : 0.000004 Mean : 0.0000
## 3rd Qu.: 0.3101 3rd Qu.: 0.5855 3rd Qu.: 0.724522 3rd Qu.: 0.2476
## Max. : 4.7507 Max. : 3.6179 Max. : 0.956727 Max. : 5.7752
## GDP_Growth Inflation_GDP Trade_GDP Net_Migration
## Min. :-5.390811 Min. :-0.45950 Min. :-1.457919 Min. :-7.374068
## 1st Qu.:-0.473880 1st Qu.:-0.14395 1st Qu.:-0.679260 1st Qu.:-0.043785
## Median :-0.047603 Median :-0.10157 Median :-0.259708 Median :-0.000925
## Mean :-0.000004 Mean : 0.00000 Mean : 0.000007 Mean : 0.000000
## 3rd Qu.: 0.471884 3rd Qu.:-0.05267 3rd Qu.: 0.467417 3rd Qu.: 0.048233
## Max. : 7.749156 Max. :12.52453 Max. : 3.724078 Max. : 6.025483
## FDI_Net_Inflow
## Min. :-12.65044
## 1st Qu.: 0.05736
## Median : 0.06047
## Mean : 0.00000
## 3rd Qu.: 0.07815
## Max. : 1.13976
# eigenvalues on the basis of covariance
dev_indicator_cov<-cov(dev_indicator_scaled)
dev_indicator_eigen<-eigen(dev_indicator_cov)
dev_indicator_eigen$values
## [1] 4.0621987414 2.2740935541 1.1736705728 1.0376341701 0.9990853113
## [6] 0.9639206020 0.8866427185 0.7040669648 0.6069645077 0.2150877663
## [11] 0.0567933219 0.0189592896 0.0008824796
dev_indicator_eigen$vectors
## [,1] [,2] [,3] [,4] [,5] [,6]
## [1,] -0.25389073 0.46527157 -0.14160592 0.16640653 0.012282123 0.008331713
## [2,] 0.07581656 -0.07341531 0.54865682 0.35136885 -0.092836775 0.260278585
## [3,] -0.47030478 0.15666269 0.08666272 0.05649129 -0.005327823 0.074809943
## [4,] -0.32642961 -0.40834183 0.04319235 -0.04182367 -0.028807079 -0.024246871
## [5,] -0.44727734 0.23397424 0.01787922 0.05485852 -0.006667847 0.041531229
## [6,] -0.32781002 -0.42493098 -0.07063588 -0.06332984 0.014358986 -0.008700748
## [7,] -0.19936169 -0.17030843 -0.37912980 -0.43087279 0.092802687 0.093571413
## [8,] -0.47102174 0.15071286 0.08637868 0.05692553 -0.007674747 0.075638564
## [9,] 0.11991335 0.20363672 -0.18582484 0.04215446 0.286165826 0.773781336
## [10,] 0.03931900 0.08148468 0.02037995 -0.42952088 -0.797450591 0.352286313
## [11,] 0.02949844 -0.32504455 -0.42893813 0.27365541 0.097295351 0.299375002
## [12,] -0.11886659 -0.34818886 0.44271764 -0.03988126 0.109253949 0.310826100
## [13,] 0.04473088 0.15754235 0.31321659 -0.62068778 0.492167890 0.025345518
## [,7] [,8] [,9] [,10] [,11]
## [1,] -0.2509382208 -0.120563003 -0.03591862 0.76610459 -0.069727422
## [2,] -0.6507050323 0.007282050 0.22584134 -0.10987977 -0.029200830
## [3,] 0.1123030309 -0.054392187 0.05270728 -0.17786465 0.259419304
## [4,] -0.1960949633 0.110560965 -0.43845077 0.12708081 0.537733025
## [5,] 0.0700560560 -0.076587927 0.10826588 -0.33856586 -0.396568189
## [6,] -0.1836184235 0.007909346 -0.37913521 0.02554177 -0.635904190
## [7,] -0.3290772968 0.338802469 0.59782987 0.03676355 0.050900199
## [8,] 0.1142404969 -0.058929892 0.05128495 -0.18859231 0.230608817
## [9,] 0.0334378799 0.337659065 -0.32186238 -0.09196090 -0.004619413
## [10,] -0.0001495262 -0.197721748 -0.08071531 0.03347601 -0.011637699
## [11,] -0.0182714060 -0.704246599 0.16410900 -0.03772935 0.096998761
## [12,] 0.5158405938 0.039513872 0.29204441 0.43482374 -0.101180822
## [13,] -0.1871842765 -0.442050937 -0.12190426 -0.03089248 0.024807233
## [,12] [,13]
## [1,] 0.021882776 0.0067551508
## [2,] -0.024714291 -0.0016233071
## [3,] -0.333263951 -0.7138599281
## [4,] 0.415831087 0.0230268086
## [5,] 0.667536061 0.0103915516
## [6,] -0.345682455 -0.0254479831
## [7,] -0.007925706 0.0025505616
## [8,] -0.376810921 0.6993243446
## [9,] 0.003234677 0.0009112711
## [10,] 0.003165786 -0.0016333179
## [11,] 0.038042069 -0.0007455510
## [12,] 0.078997770 0.0019600998
## [13,] 0.008462948 0.0009886980
# PCA loadings, rotation
xxx <- dev_indicator_scaled
xxx.pca1 <- prcomp(xxx, center=TRUE, scale.=TRUE)
summary(xxx.pca1)
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7
## Standard deviation 2.0155 1.5080 1.08336 1.01864 0.99954 0.98179 0.9416
## Proportion of Variance 0.3125 0.1749 0.09028 0.07982 0.07685 0.07415 0.0682
## Cumulative Proportion 0.3125 0.4874 0.57769 0.65751 0.73436 0.80851 0.8767
## PC8 PC9 PC10 PC11 PC12 PC13
## Standard deviation 0.83909 0.77908 0.46378 0.23831 0.13769 0.02971
## Proportion of Variance 0.05416 0.04669 0.01655 0.00437 0.00146 0.00007
## Cumulative Proportion 0.93087 0.97756 0.99410 0.99847 0.99993 1.00000
xxx.pca1$rotation
## PC1 PC2 PC3 PC4
## Pop_Total 0.25389073 0.46527157 -0.14160592 -0.16640653
## Pop_Growth -0.07581656 -0.07341531 0.54865682 -0.35136885
## GNI_Atlas 0.47030478 0.15666269 0.08666272 -0.05649129
## GNI_PC_Atlas 0.32642961 -0.40834183 0.04319235 0.04182367
## GNI_PPP 0.44727734 0.23397424 0.01787922 -0.05485852
## GNI_PC_PPP 0.32781002 -0.42493098 -0.07063588 0.06332984
## Immunization_Measles 0.19936169 -0.17030843 -0.37912980 0.43087279
## GDP_USD 0.47102174 0.15071286 0.08637868 -0.05692553
## GDP_Growth -0.11991335 0.20363672 -0.18582484 -0.04215446
## Inflation_GDP -0.03931900 0.08148468 0.02037995 0.42952088
## Trade_GDP -0.02949844 -0.32504455 -0.42893813 -0.27365541
## Net_Migration 0.11886659 -0.34818886 0.44271764 0.03988126
## FDI_Net_Inflow -0.04473088 0.15754235 0.31321659 0.62068778
## PC5 PC6 PC7 PC8
## Pop_Total -0.012282123 -0.008331713 0.2509382208 -0.120563003
## Pop_Growth 0.092836775 -0.260278585 0.6507050323 0.007282050
## GNI_Atlas 0.005327823 -0.074809943 -0.1123030309 -0.054392187
## GNI_PC_Atlas 0.028807079 0.024246871 0.1960949633 0.110560965
## GNI_PPP 0.006667847 -0.041531229 -0.0700560560 -0.076587927
## GNI_PC_PPP -0.014358986 0.008700748 0.1836184235 0.007909346
## Immunization_Measles -0.092802687 -0.093571413 0.3290772968 0.338802469
## GDP_USD 0.007674747 -0.075638564 -0.1142404969 -0.058929892
## GDP_Growth -0.286165826 -0.773781336 -0.0334378799 0.337659065
## Inflation_GDP 0.797450591 -0.352286313 0.0001495262 -0.197721748
## Trade_GDP -0.097295351 -0.299375002 0.0182714060 -0.704246599
## Net_Migration -0.109253949 -0.310826100 -0.5158405938 0.039513872
## FDI_Net_Inflow -0.492167890 -0.025345518 0.1871842765 -0.442050937
## PC9 PC10 PC11 PC12
## Pop_Total -0.03591862 0.76610459 -0.069727422 -0.021882776
## Pop_Growth 0.22584134 -0.10987977 -0.029200830 0.024714291
## GNI_Atlas 0.05270728 -0.17786465 0.259419304 0.333263951
## GNI_PC_Atlas -0.43845077 0.12708081 0.537733025 -0.415831087
## GNI_PPP 0.10826588 -0.33856586 -0.396568189 -0.667536061
## GNI_PC_PPP -0.37913521 0.02554177 -0.635904190 0.345682455
## Immunization_Measles 0.59782987 0.03676355 0.050900199 0.007925706
## GDP_USD 0.05128495 -0.18859231 0.230608817 0.376810921
## GDP_Growth -0.32186238 -0.09196090 -0.004619413 -0.003234677
## Inflation_GDP -0.08071531 0.03347601 -0.011637699 -0.003165786
## Trade_GDP 0.16410900 -0.03772935 0.096998761 -0.038042069
## Net_Migration 0.29204441 0.43482374 -0.101180822 -0.078997770
## FDI_Net_Inflow -0.12190426 -0.03089248 0.024807233 -0.008462948
## PC13
## Pop_Total -0.0067551508
## Pop_Growth 0.0016233071
## GNI_Atlas 0.7138599281
## GNI_PC_Atlas -0.0230268086
## GNI_PPP -0.0103915516
## GNI_PC_PPP 0.0254479831
## Immunization_Measles -0.0025505616
## GDP_USD -0.6993243446
## GDP_Growth -0.0009112711
## Inflation_GDP 0.0016333179
## Trade_GDP 0.0007455510
## Net_Migration -0.0019600998
## FDI_Net_Inflow -0.0009886980
xxx.pca2 <- princomp(xxx)
loadings(xxx.pca2)
##
## Loadings:
## Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8
## Pop_Total 0.254 0.465 0.142 0.166 0.251 0.121
## Pop_Growth -0.549 0.351 0.260 0.651
## GNI_Atlas 0.470 0.157 -0.112
## GNI_PC_Atlas 0.326 -0.408 0.196 -0.111
## GNI_PPP 0.447 0.234
## GNI_PC_PPP 0.328 -0.425 0.184
## Immunization_Measles 0.199 -0.170 0.379 -0.431 0.329 -0.339
## GDP_USD 0.471 0.151 -0.114
## GDP_Growth -0.120 0.204 0.186 0.286 0.774 -0.338
## Inflation_GDP -0.430 -0.797 0.352 0.198
## Trade_GDP -0.325 0.429 0.274 0.299 0.704
## Net_Migration 0.119 -0.348 -0.443 0.109 0.311 -0.516
## FDI_Net_Inflow 0.158 -0.313 -0.621 0.492 0.187 0.442
## Comp.9 Comp.10 Comp.11 Comp.12 Comp.13
## Pop_Total 0.766
## Pop_Growth -0.226 -0.110
## GNI_Atlas -0.178 -0.259 -0.333 -0.714
## GNI_PC_Atlas 0.438 0.127 -0.538 0.416
## GNI_PPP -0.108 -0.339 0.397 0.668
## GNI_PC_PPP 0.379 0.636 -0.346
## Immunization_Measles -0.598
## GDP_USD -0.189 -0.231 -0.377 0.699
## GDP_Growth 0.322
## Inflation_GDP
## Trade_GDP -0.164
## Net_Migration -0.292 0.435 0.101
## FDI_Net_Inflow 0.122
##
## Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8 Comp.9
## SS loadings 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000 1.000
## Proportion Var 0.077 0.077 0.077 0.077 0.077 0.077 0.077 0.077 0.077
## Cumulative Var 0.077 0.154 0.231 0.308 0.385 0.462 0.538 0.615 0.692
## Comp.10 Comp.11 Comp.12 Comp.13
## SS loadings 1.000 1.000 1.000 1.000
## Proportion Var 0.077 0.077 0.077 0.077
## Cumulative Var 0.769 0.846 0.923 1.000
summary(xxx.pca2)
## Importance of components:
## Comp.1 Comp.2 Comp.3 Comp.4 Comp.5
## Standard deviation 2.0093729 1.5034331 1.08007288 1.01555181 0.99650903
## Proportion of Variance 0.3124768 0.1749303 0.09028235 0.07981801 0.07685272
## Cumulative Proportion 0.3124768 0.4874071 0.57768945 0.65750746 0.73436018
## Comp.6 Comp.7 Comp.8 Comp.9 Comp.10
## Standard deviation 0.97881493 0.93875935 0.8365404 0.77671484 0.46236804
## Proportion of Variance 0.07414774 0.06820329 0.0541590 0.04668958 0.01654521
## Cumulative Proportion 0.80850792 0.87671121 0.9308702 0.97755978 0.99410499
## Comp.11 Comp.12 Comp.13
## Standard deviation 0.237590235 0.137274851 2.961640e-02
## Proportion of Variance 0.004368717 0.001458407 6.788304e-05
## Cumulative Proportion 0.998473710 0.999932117 1.000000e+00
# We are aiming at explaining over 90% of the variance. Based on the results, we can maintain 8 components to explain 93.087% of the variance
# Visualisation
# PCA variance visualisation and variable relations
plot(xxx.pca2)# xxx.pca1 has the same result
fviz_pca_var(xxx.pca1, col.var="steelblue")# Corr plot, for xxx.pca2 looks similar
autoplot(xxx.pca1, loadings=TRUE, loadings.colour='blue', loadings.label=TRUE, loadings.label.size=3)
# The graph indicates that GNI_Atlas and GDP_USD are more similar, the angle is almost 0
# GNI_PC_Atlas and GNI_PC_PPP are more similar, the angle is almost 0
# GDP_Growth and Inflation_GDP are more similar, the angle is almost 0
# GDP_Growth and Inflation_GD are almost negatively related to GNI_PC_Atlas and GNI_PC_PPP
# visusalisation of eigen value
fviz_eig(xxx.pca1, choice='eigenvalue') # eigenvalues on y-axis
fviz_eig(xxx.pca1) # percentage of explained variance on y-axis
eig.val<-get_eigenvalue(xxx.pca1)
eig.val
## eigenvalue variance.percent cumulative.variance.percent
## Dim.1 4.0621987414 31.247682626 31.24768
## Dim.2 2.2740935541 17.493027339 48.74071
## Dim.3 1.1736705728 9.028235175 57.76895
## Dim.4 1.0376341701 7.981801308 65.75075
## Dim.5 0.9990853113 7.685271625 73.43602
## Dim.6 0.9639206020 7.414773862 80.85079
## Dim.7 0.8866427185 6.820328604 87.67112
## Dim.8 0.7040669648 5.415899729 93.08702
## Dim.9 0.6069645077 4.668957751 97.75598
## Dim.10 0.2150877663 1.654521279 99.41050
## Dim.11 0.0567933219 0.436871707 99.84737
## Dim.12 0.0189592896 0.145840689 99.99321
## Dim.13 0.0008824796 0.006788304 100.00000
# cumulative variance
a<-summary(xxx.pca1)
plot(a$importance[3,],type="l")
# contributions of individual variables to PCA
pca_loadings <- xxx.pca1
cat("PCA Loadings:\n")
## PCA Loadings:
pca_loadings
## Standard deviations (1, .., p=13):
## [1] 2.01548970 1.50800980 1.08336078 1.01864330 0.99954255 0.98179458
## [7] 0.94161708 0.83908698 0.77907927 0.46377556 0.23831349 0.13769274
## [13] 0.02970656
##
## Rotation (n x k) = (13 x 13):
## PC1 PC2 PC3 PC4
## Pop_Total 0.25389073 0.46527157 -0.14160592 -0.16640653
## Pop_Growth -0.07581656 -0.07341531 0.54865682 -0.35136885
## GNI_Atlas 0.47030478 0.15666269 0.08666272 -0.05649129
## GNI_PC_Atlas 0.32642961 -0.40834183 0.04319235 0.04182367
## GNI_PPP 0.44727734 0.23397424 0.01787922 -0.05485852
## GNI_PC_PPP 0.32781002 -0.42493098 -0.07063588 0.06332984
## Immunization_Measles 0.19936169 -0.17030843 -0.37912980 0.43087279
## GDP_USD 0.47102174 0.15071286 0.08637868 -0.05692553
## GDP_Growth -0.11991335 0.20363672 -0.18582484 -0.04215446
## Inflation_GDP -0.03931900 0.08148468 0.02037995 0.42952088
## Trade_GDP -0.02949844 -0.32504455 -0.42893813 -0.27365541
## Net_Migration 0.11886659 -0.34818886 0.44271764 0.03988126
## FDI_Net_Inflow -0.04473088 0.15754235 0.31321659 0.62068778
## PC5 PC6 PC7 PC8
## Pop_Total -0.012282123 -0.008331713 0.2509382208 -0.120563003
## Pop_Growth 0.092836775 -0.260278585 0.6507050323 0.007282050
## GNI_Atlas 0.005327823 -0.074809943 -0.1123030309 -0.054392187
## GNI_PC_Atlas 0.028807079 0.024246871 0.1960949633 0.110560965
## GNI_PPP 0.006667847 -0.041531229 -0.0700560560 -0.076587927
## GNI_PC_PPP -0.014358986 0.008700748 0.1836184235 0.007909346
## Immunization_Measles -0.092802687 -0.093571413 0.3290772968 0.338802469
## GDP_USD 0.007674747 -0.075638564 -0.1142404969 -0.058929892
## GDP_Growth -0.286165826 -0.773781336 -0.0334378799 0.337659065
## Inflation_GDP 0.797450591 -0.352286313 0.0001495262 -0.197721748
## Trade_GDP -0.097295351 -0.299375002 0.0182714060 -0.704246599
## Net_Migration -0.109253949 -0.310826100 -0.5158405938 0.039513872
## FDI_Net_Inflow -0.492167890 -0.025345518 0.1871842765 -0.442050937
## PC9 PC10 PC11 PC12
## Pop_Total -0.03591862 0.76610459 -0.069727422 -0.021882776
## Pop_Growth 0.22584134 -0.10987977 -0.029200830 0.024714291
## GNI_Atlas 0.05270728 -0.17786465 0.259419304 0.333263951
## GNI_PC_Atlas -0.43845077 0.12708081 0.537733025 -0.415831087
## GNI_PPP 0.10826588 -0.33856586 -0.396568189 -0.667536061
## GNI_PC_PPP -0.37913521 0.02554177 -0.635904190 0.345682455
## Immunization_Measles 0.59782987 0.03676355 0.050900199 0.007925706
## GDP_USD 0.05128495 -0.18859231 0.230608817 0.376810921
## GDP_Growth -0.32186238 -0.09196090 -0.004619413 -0.003234677
## Inflation_GDP -0.08071531 0.03347601 -0.011637699 -0.003165786
## Trade_GDP 0.16410900 -0.03772935 0.096998761 -0.038042069
## Net_Migration 0.29204441 0.43482374 -0.101180822 -0.078997770
## FDI_Net_Inflow -0.12190426 -0.03089248 0.024807233 -0.008462948
## PC13
## Pop_Total -0.0067551508
## Pop_Growth 0.0016233071
## GNI_Atlas 0.7138599281
## GNI_PC_Atlas -0.0230268086
## GNI_PPP -0.0103915516
## GNI_PC_PPP 0.0254479831
## Immunization_Measles -0.0025505616
## GDP_USD -0.6993243446
## GDP_Growth -0.0009112711
## Inflation_GDP 0.0016333179
## Trade_GDP 0.0007455510
## Net_Migration -0.0019600998
## FDI_Net_Inflow -0.0009886980
# Loading visualisation
var<-get_pca_var(xxx.pca1)
a<-fviz_contrib(xxx.pca1, "var", axes=1, xtickslab.rt=90) # default angle=45°
b<-fviz_contrib(xxx.pca1, "var", axes=2, xtickslab.rt=90)
c<-fviz_contrib(xxx.pca1, "var", axes=3, xtickslab.rt=90)
d<-fviz_contrib(xxx.pca1, "var", axes=4, xtickslab.rt=90)
e<-fviz_contrib(xxx.pca1, "var", axes=5, xtickslab.rt=90)
f<-fviz_contrib(xxx.pca1, "var", axes=6, xtickslab.rt=90)
g<-fviz_contrib(xxx.pca1, "var", axes=7, xtickslab.rt=90)
h<-fviz_contrib(xxx.pca1, "var", axes=8, xtickslab.rt=90)
grid.arrange(a,b,top='Contribution to the first 8 Principal Components')
grid.arrange(c,d,top='Contribution to the first 8 Principal Components')
grid.arrange(e,f,top='Contribution to the first 8 Principal Components')
grid.arrange(g,h,top='Contribution to the first 8 Principal Components')
Based on the PCA results, we can interpret the contributions of the indicators to each principal component (PC). The table below shows which indicators are most influential for each principal component. This helps us understand the key drivers of development as represented by each PC.
Principal Component Contribution Breakdown:
| PC Number | Key Indicators Contributing Most |
|---|---|
| PC1 | GDP_USD, GNI_Atlas, GNI_PPP, GNI_PC_PPP, GNI_PC_Atlas |
| PC2 | Pop_Total, GNI_PC_PPP, GNI_PC_Atlas, Net_Migration, Trade_GDP |
| PC3 | Pop_Growth, Net_Migration, Trade_GDP, Immunization_Measles, FDI_Net_Inflow |
| PC4 | FDI_Net_Inflow, Immunization_Measles, Inflation_GDP, Pop_Growth |
| PC5 | Inflation_GDP, FDI_Net_Inflow, GDP_Growth |
| PC6 | GDP_Growth, Inflation_GDP, Net_Migration, Trade_GDP |
| PC7 | Pop_Growth, Net_Migration, Immunization_Measles |
| PC8 | Trade_GDP, FDI_Net_Inflow, Immunization_Measles, GDP_Growth |
These interpretations help to summarize the most impactful development factors as captured by PCA, revealing the complex relationship between economic, demographic, and social factors across countries.
# RMSA (Root Mean Square of the Residuals) of 8 components: 0.04. This indicates a good fit of of our model
xxx.pca4<-principal(xxx, nfactors=8, rotate="varimax")
xxx.pca4
## Principal Components Analysis
## Call: principal(r = xxx, nfactors = 8, rotate = "varimax")
## Standardized loadings (pattern matrix) based upon correlation matrix
## RC1 RC3 RC2 RC6 RC7 RC8 RC4 RC5 h2
## Pop_Total 0.76 -0.10 -0.52 0.12 0.04 -0.05 0.04 -0.02 0.87
## Pop_Growth -0.10 -0.04 0.04 0.02 0.98 -0.02 0.01 0.01 0.97
## GNI_Atlas 0.96 0.20 0.14 -0.07 -0.05 -0.05 -0.02 -0.01 0.99
## GNI_PC_Atlas 0.26 0.74 0.32 -0.29 0.15 0.11 -0.13 -0.07 0.86
## GNI_PPP 0.96 0.15 0.00 -0.04 -0.08 -0.06 0.00 0.00 0.95
## GNI_PC_PPP 0.26 0.77 0.28 -0.28 0.07 0.23 -0.10 -0.08 0.89
## Immunization_Measles 0.08 0.83 -0.14 0.15 -0.19 -0.04 0.07 0.05 0.78
## GDP_USD 0.96 0.20 0.14 -0.07 -0.05 -0.05 -0.02 -0.01 0.98
## GDP_Growth -0.04 -0.10 -0.02 0.96 0.02 0.03 0.02 0.01 0.94
## Inflation_GDP -0.02 -0.03 -0.02 0.01 0.01 -0.03 0.00 1.00 1.00
## Trade_GDP -0.14 0.11 0.05 0.03 -0.03 0.97 -0.09 -0.03 0.98
## Net_Migration 0.07 0.10 0.94 0.00 0.04 0.04 0.01 -0.02 0.91
## FDI_Net_Inflow -0.01 -0.04 0.00 0.03 0.01 -0.08 0.99 0.00 0.99
## u2 com
## Pop_Total 0.1273 1.9
## Pop_Growth 0.0336 1.0
## GNI_Atlas 0.0149 1.2
## GNI_PC_Atlas 0.1399 2.3
## GNI_PPP 0.0491 1.1
## GNI_PC_PPP 0.1126 2.1
## Immunization_Measles 0.2174 1.3
## GDP_USD 0.0154 1.2
## GDP_Growth 0.0647 1.0
## Inflation_GDP 0.0042 1.0
## Trade_GDP 0.0172 1.1
## Net_Migration 0.0931 1.0
## FDI_Net_Inflow 0.0093 1.0
##
## RC1 RC3 RC2 RC6 RC7 RC8 RC4 RC5
## SS loadings 3.50 1.98 1.40 1.13 1.03 1.03 1.03 1.01
## Proportion Var 0.27 0.15 0.11 0.09 0.08 0.08 0.08 0.08
## Cumulative Var 0.27 0.42 0.53 0.62 0.70 0.77 0.85 0.93
## Proportion Explained 0.29 0.16 0.12 0.09 0.09 0.08 0.08 0.08
## Cumulative Proportion 0.29 0.45 0.57 0.66 0.75 0.83 0.92 1.00
##
## Mean item complexity = 1.3
## Test of the hypothesis that 8 components are sufficient.
##
## The root mean square of the residuals (RMSR) is 0.04
## with the empirical chi square 50.37 with prob < 1.2e-11
##
## Fit based upon off diagonal values = 0.98
summary(xxx.pca4)
##
## Factor analysis with Call: principal(r = xxx, nfactors = 8, rotate = "varimax")
##
## Test of the hypothesis that 8 factors are sufficient.
## The degrees of freedom for the model is 2 and the objective function was 5
## The number of observations was 165 with Chi Square = 766.99 with prob < 2.8e-167
##
## The root mean square of the residuals (RMSA) is 0.04
# printing only the significant loadings
print(loadings(xxx.pca4), digits=3, cutoff=0.4, sort=TRUE)
##
## Loadings:
## RC1 RC3 RC2 RC6 RC7 RC8 RC4 RC5
## Pop_Total 0.757 -0.518
## GNI_Atlas 0.957
## GNI_PPP 0.958
## GDP_USD 0.956
## GNI_PC_Atlas 0.738
## GNI_PC_PPP 0.768
## Immunization_Measles 0.832
## Net_Migration 0.942
## GDP_Growth 0.960
## Pop_Growth 0.976
## Trade_GDP 0.969
## FDI_Net_Inflow 0.991
## Inflation_GDP 0.997
##
## RC1 RC3 RC2 RC6 RC7 RC8 RC4 RC5
## SS loadings 3.500 1.975 1.401 1.132 1.031 1.026 1.025 1.010
## Proportion Var 0.269 0.152 0.108 0.087 0.079 0.079 0.079 0.078
## Cumulative Var 0.269 0.421 0.529 0.616 0.695 0.774 0.853 0.931
# PCA Reconstruction Error: 0.06871083. This indicates a good fit of of our model
reconstructed_pca <- xxx.pca1$x[,1:8] %*% t(xxx.pca1$rotation[,1:8])
pca_reconstruction_error <- mean((dev_indicator_scaled - reconstructed_pca)^2)
pca_reconstruction_error
## [1] 0.06871083
# Print Comparison Results
explained_variance <- summary(xxx.pca1)$importance[3, 8]
explained_variance <- paste0(round(explained_variance * 100, 2), "%")
cat("PCA Explained Variance (8 PCs):", explained_variance, "\n")
## PCA Explained Variance (8 PCs): 93.09%
cat("PCA Reconstruction Error:", pca_reconstruction_error, "\n")
## PCA Reconstruction Error: 0.06871083
# Compute the distance matrix based on the scaled data
dist_matrix <- dist(dev_indicator_scaled)
# Apply MDS using cmdscale (Classical MDS)
mds_result <- cmdscale(dist_matrix, k=2, eig=TRUE, x.ret=TRUE)
# View MDS result
mds_result$eig # Eigenvalues (helps in assessing the stress)
## [1] 6.662006e+02 3.729513e+02 1.924820e+02 1.701720e+02 1.638500e+02
## [6] 1.580830e+02 1.454094e+02 1.154670e+02 9.954218e+01 3.527439e+01
## [11] 9.314105e+00 3.109323e+00 1.447266e-01 7.885353e-13 1.499981e-13
## [16] 7.010117e-14 3.095465e-14 2.611694e-14 2.577339e-14 2.134075e-14
## [21] 1.889772e-14 1.521166e-14 1.510201e-14 1.401122e-14 1.245441e-14
## [26] 1.212355e-14 1.189712e-14 1.046888e-14 1.042632e-14 8.834358e-15
## [31] 7.846946e-15 7.200908e-15 6.842846e-15 6.687045e-15 5.982195e-15
## [36] 5.708233e-15 5.503582e-15 5.487744e-15 5.420435e-15 5.041465e-15
## [41] 4.863760e-15 4.861311e-15 4.493075e-15 4.402926e-15 4.197475e-15
## [46] 4.010273e-15 3.999657e-15 3.988785e-15 3.599212e-15 3.400358e-15
## [51] 3.360455e-15 3.244477e-15 2.884666e-15 2.869712e-15 2.841133e-15
## [56] 2.818166e-15 2.687535e-15 2.671438e-15 2.505267e-15 2.496544e-15
## [61] 2.463576e-15 2.176349e-15 1.919128e-15 1.789885e-15 1.775034e-15
## [66] 1.770791e-15 1.723108e-15 1.715451e-15 1.593330e-15 1.572962e-15
## [71] 1.428685e-15 1.365792e-15 1.292620e-15 1.039485e-15 9.848296e-16
## [76] 9.456518e-16 9.418397e-16 9.289444e-16 8.792492e-16 8.688674e-16
## [81] 8.296682e-16 7.813462e-16 5.239830e-16 4.610904e-16 4.263342e-16
## [86] 3.826263e-16 3.289476e-16 3.069959e-16 2.442877e-16 1.844797e-16
## [91] 1.311333e-16 5.037430e-17 -1.356786e-17 -7.989673e-17 -1.874778e-16
## [96] -1.907325e-16 -2.607337e-16 -2.711674e-16 -2.847218e-16 -2.957561e-16
## [101] -3.485657e-16 -3.515611e-16 -3.992663e-16 -5.822304e-16 -6.419330e-16
## [106] -7.676274e-16 -7.692225e-16 -9.119656e-16 -9.299824e-16 -9.865198e-16
## [111] -1.140675e-15 -1.164634e-15 -1.193398e-15 -1.197124e-15 -1.217231e-15
## [116] -1.284052e-15 -1.299112e-15 -1.352693e-15 -1.483887e-15 -1.906473e-15
## [121] -1.976361e-15 -1.990628e-15 -2.066404e-15 -2.102117e-15 -2.195415e-15
## [126] -2.256268e-15 -2.271737e-15 -2.316848e-15 -2.433527e-15 -2.700578e-15
## [131] -2.720232e-15 -2.796867e-15 -2.827160e-15 -2.887901e-15 -2.981860e-15
## [136] -3.219183e-15 -3.254504e-15 -3.391862e-15 -3.970319e-15 -4.052037e-15
## [141] -4.052990e-15 -4.359873e-15 -4.635374e-15 -5.064067e-15 -5.430382e-15
## [146] -5.765813e-15 -6.495861e-15 -7.141185e-15 -7.329281e-15 -7.721392e-15
## [151] -1.065468e-14 -1.087529e-14 -1.136229e-14 -1.459384e-14 -1.541271e-14
## [156] -1.669482e-14 -1.751121e-14 -2.210431e-14 -2.228052e-14 -2.325372e-14
## [161] -2.836779e-14 -3.329816e-14 -4.621098e-14 -1.562906e-13 -8.016612e-13
head(mds_result$points) # Coordinates in the reduced 2D space
## [,1] [,2]
## [1,] 1.05503298 -0.1974985
## [2,] -0.05226467 -0.6729309
## [3,] 1.52338067 -0.7613413
## [4,] 0.78585930 0.4068552
## [5,] -1.02740612 -0.9157622
## [6,] 1.06883764 0.2327346
# --- Calculate Stress Manually ---
# Calculate the original distance matrix (before MDS)
original_dist_matrix <- dist(dev_indicator_scaled)
# Calculate the distances between the MDS points
mds_dist_matrix <- dist(mds_result$points)
# Calculate the stress
stress <- sqrt(sum((original_dist_matrix - mds_dist_matrix)^2) / sum(original_dist_matrix^2))
# Print the stress value
cat("Stress of the MDS configuration:", stress, "\n")
## Stress of the MDS configuration: 0.4475294
# Visualize MDS results using a 2D scatter plot
mds_data <- data.frame(mds_result$points) # MDS coordinates
colnames(mds_data) <- c("Dimension1", "Dimension2") # Name the dimensions for clarity
ggplot(mds_data, aes(x=Dimension1, y=Dimension2)) +
geom_point(size=3, color="blue") +
labs(title="MDS Visualization of Development Indicators",
x="Dimension 1", y="Dimension 2") +
theme_minimal()
# Compute the correlation matrix
cor_matrix <- cor(dev_indicator_scaled)
# Create the correlation plot with circular markers and color gradients
corrplot(cor_matrix,
method = "circle", # Use circle as the method
type = "full", # Show the full correlation matrix
order = "hclust", # Cluster the variables based on similarity
col = colorRampPalette(c("blue", "white", "red"))(200), # Color gradient
tl.col = "red", # Set text label color to red
tl.cex = 0.8, # Adjust text label size
addCoef.col = "black", # Add correlation coefficient text in black
number.cex = 0.7, # Adjust the size of correlation coefficient text
mar = c(0, 0, 1, 0) # Adjust margins for better visualization
)
# Based on the graph we can see that Pop_total, GNI_PPP, GDP_Atlas and GDP_USD have strong correlations. GNI_PC_Atlas and GNI_PC_PPP have strong connections. We may perform further analysis on these factors and see if we could optimise the World Devlopment Indicators. However, this is not the main focus of this report, we will not perform further analysis on mentioned idicators
This report is focusing on dimensional reduction on the World Development Indicators 2023, to find out the most impactful indicators that would represent the development index
| code_name | Key Indicators |
|---|---|
| Pop_Total | Population, total [SP.POP.TOTL] |
| Pop_Growth | Population growth (annual %) [SP.POP.GROW] |
| GNI_Atlas | GNI, Atlas method (current US$) [NY.GNP.ATLS.CD] |
| GNI_PC_Atlas | GNI per capita, Atlas method |
| GNI_PPP | GNI, PPP (current international \() [NY.GNP.MKTP.PP.CD] | | GDP_USD | GDP (current US\)) [NY.GDP.MKTP.CD] |
| GDP_Growth | GDP growth (annual %) [NY.GDP.MKTP.KD.ZG] |
| Inflation_GDP | Inflation, GDP deflator (annual %) [NY.GDP.DEFL.KD.ZG] |
| Trade_GDP | Merchandise trade (% of GDP) [TG.VAL.TOTL.GD.ZS] |
| Net_Migration | Net migration [SM.POP.NETM] → Net_Migration |
| FDI_Net_Inflow | Foreign direct investment, net inflows (BoP, current US$) [BX.KLT.DINV.CD.WD] |
| GNI_PC_PPP | GNI per capita, PPP (current international $) [NY.GNP.PCAP.PP.CD] |
| Immunization_Measles | Immunization, measles (% of children ages 12-23 months) [SH.IMM.MEAS] |