1 Dataset description

1.1 World Development Indicators

World Development Indicators provides measures of social progress, quality of life, economic development, physical infrastructure, environment and government performance.

World Development Indicators (WDI) is the primary World Bank collection of development indicators, compiled from officially recognized international sources. It presents the most current and accurate global development data available, and includes national, regional and global estimates.

Note: Even though Global Development Finance (GDF) is no longer listed in the WDI database name, all external debt and financial flows data continue to be included in WDI. The GDF publication has been renamed International Debt Statistics (IDS), and has its own separate database, as well.

Last Updated:12/16/2024 Dataset source: https://databank.worldbank.org/source/world-development-indicators/Type/TABLE/preview/on

1.2 Dataset used in this report

This dataset is World Development Indicators with multiple indicators including Population growth, GNI, GDP, Inflation etc

This report is focusing on the report from the year 2023. There were missing values for some indicators, some countries. All the countries with missing values were removed, indictors with no number published were removed

Dealing missing values with estimated number (mean/median/mode/regression methods) will not represent the true value of these countries and the countries are having very different economical conditions. It will not give us the accurate interpretation if we used any of these artificial data. Hence, the countries/indicators with empty values were removed

1.3 Aim and methods

The aim of this report is to find out what could be the main indicators to represent the development of countries

Initial hypothesis is to use PCA to determine the optimal dimension to represent the most variance

There is a possibility that the quality factors might be related/similar to each other. We will apply MDS to find out the indicators with similarities. (We may use only one of these indicators in the to represent the world development)

Hence, we would apply PCA method on the dataset and interpret the results and see how well PCA fit this dataset

2 Load packages and libraries

options(repos = c(CRAN = "https://cloud.r-project.org"))
install.packages("caret")
install.packages("factoextra")
install.packages("clusterSim")
install.packages("corrplot")
install.packages("psych")
install.packages("smacof")
install.packages("ClusterR")
install.packages("gridExtra")
install.packages("skewness")
install.packages("ggfortify")
library(ggfortify)
library(MASS)
library(e1071)
library(caret)
library(factoextra)
library(clusterSim)
library(corrplot)
library(psych)
library(smacof)
library(ClusterR)
library(gridExtra)
library(ggplot2)
library(reshape2)
library(GGally)

3 Pre-Processing/Pre-Checking the dataset

# Load the dataset
dev_indicator <- read.csv("developement_indicator.csv", sep=",", dec=".", header=TRUE, fileEncoding = "Latin1")

# Shorten column names
colnames(dev_indicator)[3:15] <- c(
  "Pop_Total",
  "Pop_Growth",
  "GNI_Atlas", 
  "GNI_PC_Atlas",
  "GNI_PPP",
  "GNI_PC_PPP",
  "Immunization_Measles",
  "GDP_USD",
  "GDP_Growth",
  "Inflation_GDP",
  "Trade_GDP",
  "Net_Migration",
  "FDI_Net_Inflow"
)

# Checking the new column names
colnames(dev_indicator)
##  [1] "Country.Name"         "Country.Code"         "Pop_Total"           
##  [4] "Pop_Growth"           "GNI_Atlas"            "GNI_PC_Atlas"        
##  [7] "GNI_PPP"              "GNI_PC_PPP"           "Immunization_Measles"
## [10] "GDP_USD"              "GDP_Growth"           "Inflation_GDP"       
## [13] "Trade_GDP"            "Net_Migration"        "FDI_Net_Inflow"
# Checking the dataset dimensions: 165 countries with 13 indicators
dim(dev_indicator)
## [1] 165  15
# Clearing the labels, keep the data only. 1st row: world development indicators; 1st and 2nd columns: countries names and codes
country_list<-dev_indicator[,1] # first column: all the countries
indicator_list<-colnames(dev_indicator[3:15]) # first row: all indicators
dev_indicator_data<-as.matrix(dev_indicator[,3:15]) # data only
head(dev_indicator_data)
##      Pop_Total Pop_Growth    GNI_Atlas GNI_PC_Atlas      GNI_PPP GNI_PC_PPP
## [1,]   2745972 -1.1484176  21085728669         7680 5.943826e+10      21650
## [2,]  46164219  1.4989760 228481173658         4950 7.654448e+11      16580
## [3,]  36749906  3.0806553  78048984687         2120 2.673950e+11       7280
## [4,]     93316  0.5114002   1884713523        20200 2.862820e+09      30680
## [5,]  45538401  0.2869761 586900838957        12890 1.341603e+12      29460
## [6,]   2990900  0.7281789  20271184637         6780 6.210631e+10      20770
##      Immunization_Measles      GDP_USD GDP_Growth Inflation_GDP Trade_GDP
## [1,]                   83  23547179830   3.936625      6.066049  55.49709
## [2,]                   99 247626161016   4.100000      0.854841  40.30188
## [3,]                   50  84824654482   1.001289     17.619538  64.08514
## [4,]                   94   2033085185   3.862012      4.805476  42.98885
## [5,]                   80 646075277525  -1.611002    135.368876  21.74716
## [6,]                   96  24085749592   8.300000      2.673004  86.03843
##      Net_Migration FDI_Net_Inflow
## [1,]        -25357     1620982551
## [2,]        -25963     1215776627
## [3,]          -995    -2119632186
## [4,]             0      300597732
## [5,]          4133    23866141440
## [6,]         75000      580365079
# Extracting the data (excluding country names and indicator names)
dev_indicator_data <- as.matrix(dev_indicator[, 3:15])
# Calculate the correlation matrix to check for multicollinearity
cor_matrix <- cor(dev_indicator_data, use = "complete.obs")

4 Data Correlation Check

# Visualize the correlation matrix with a heatmap
corr_melt <- melt(cor_matrix)
ggplot(corr_melt, aes(Var1, Var2, fill = value)) +
  geom_tile() + 
  scale_fill_gradient2(low = "blue", high = "red", mid = "white", midpoint = 0, limit = c(-1, 1)) + 
  theme(axis.text.x = element_text(angle = 45, hjust = 1)) + 
  labs(title = "Correlation Matrix")

cor<-cor(dev_indicator_data, method="pearson") 
corrplot(cor)

# Visualize the relationships between variables using scatter plots
ggpairs(dev_indicator_data)  # This will show scatter plots between all pairs of variables

# We can observe some strong correlations in these visualisations

5 PCA pre-check

# Dataset skewness check before performing PCA
skewness <- skewness(dev_indicator_data)
skewness
## [1] 16.4182
# The skewness of the current dataset is 16.4182 which indicates that the dataset is right skewed
# Hence, we need to perform Box and Cox transformation to normalise the dataset
# Shift data to ensure all values are positive
shifted_data <- dev_indicator_data + abs(min(dev_indicator_data)) + 1
# Apply Box-Cox transformation
boxcox_result <- boxcox(lm(shifted_data ~ 1), lambda = seq(-2, 2, 0.1))

# Find optimal lambda
optimal_lambda <- boxcox_result$x[which.max(boxcox_result$y)]
print(paste("Optimal Lambda:", optimal_lambda))
## [1] "Optimal Lambda: 0.101010101010101"
# Transform data using optimal lambda
if (optimal_lambda == 0) {
  boxcox_transformed <- log(shifted_data)
} else {
  boxcox_transformed <- (shifted_data^optimal_lambda - 1) / optimal_lambda
}
# Recalculate skewness
new_skewness <- skewness(boxcox_transformed)
print(paste("Skewness after Box-Cox:", new_skewness))
## [1] "Skewness after Box-Cox: 2.61540035739465"
dev_indicator_scaled <- boxcox_transformed
# Skewness after Box-Cox: 2.61540035739465 which is a significant improvement comparing to the result before Box-Cox transformation

# After the dataset is normalised, we need to standardise the dataset in order for future visualisation and analysis since different these indicators are measured in different ways
preproc1 <- preProcess(dev_indicator_scaled, method=c("center", "scale"))
dev_indicator_scaled <- predict(preproc1, dev_indicator_scaled)
summary(dev_indicator_scaled)
##    Pop_Total          Pop_Growth          GNI_Atlas        GNI_PC_Atlas    
##  Min.   :-0.28717   Min.   :-6.319170   Min.   :-0.6281   Min.   :-0.7676  
##  1st Qu.:-0.27102   1st Qu.:-0.548228   1st Qu.:-0.5758   1st Qu.:-0.6659  
##  Median :-0.22459   Median :-0.024720   Median :-0.4418   Median :-0.4696  
##  Mean   : 0.00000   Mean   : 0.000097   Mean   : 0.0000   Mean   : 0.0000  
##  3rd Qu.:-0.07827   3rd Qu.: 0.594478   3rd Qu.: 0.2631   3rd Qu.: 0.2440  
##  Max.   : 8.59786   Max.   : 3.504670   Max.   : 5.7166   Max.   : 3.8762  
##     GNI_PPP          GNI_PC_PPP      Immunization_Measles    GDP_USD       
##  Min.   :-0.7718   Min.   :-1.0317   Min.   :-4.849921    Min.   :-0.6369  
##  1st Qu.:-0.6617   1st Qu.:-0.8065   1st Qu.:-0.359419    1st Qu.:-0.5818  
##  Median :-0.4118   Median :-0.3409   Median : 0.337387    Median :-0.4172  
##  Mean   : 0.0000   Mean   : 0.0000   Mean   : 0.000004    Mean   : 0.0000  
##  3rd Qu.: 0.3101   3rd Qu.: 0.5855   3rd Qu.: 0.724522    3rd Qu.: 0.2476  
##  Max.   : 4.7507   Max.   : 3.6179   Max.   : 0.956727    Max.   : 5.7752  
##    GDP_Growth        Inflation_GDP        Trade_GDP         Net_Migration      
##  Min.   :-5.390811   Min.   :-0.45950   Min.   :-1.457919   Min.   :-7.374068  
##  1st Qu.:-0.473880   1st Qu.:-0.14395   1st Qu.:-0.679260   1st Qu.:-0.043785  
##  Median :-0.047603   Median :-0.10157   Median :-0.259708   Median :-0.000925  
##  Mean   :-0.000004   Mean   : 0.00000   Mean   : 0.000007   Mean   : 0.000000  
##  3rd Qu.: 0.471884   3rd Qu.:-0.05267   3rd Qu.: 0.467417   3rd Qu.: 0.048233  
##  Max.   : 7.749156   Max.   :12.52453   Max.   : 3.724078   Max.   : 6.025483  
##  FDI_Net_Inflow     
##  Min.   :-12.65044  
##  1st Qu.:  0.05736  
##  Median :  0.06047  
##  Mean   :  0.00000  
##  3rd Qu.:  0.07815  
##  Max.   :  1.13976

6 PCA Analysis

# eigenvalues on the basis of covariance
dev_indicator_cov<-cov(dev_indicator_scaled)
dev_indicator_eigen<-eigen(dev_indicator_cov)
dev_indicator_eigen$values
##  [1] 4.0621987414 2.2740935541 1.1736705728 1.0376341701 0.9990853113
##  [6] 0.9639206020 0.8866427185 0.7040669648 0.6069645077 0.2150877663
## [11] 0.0567933219 0.0189592896 0.0008824796
dev_indicator_eigen$vectors
##              [,1]        [,2]        [,3]        [,4]         [,5]         [,6]
##  [1,] -0.25389073  0.46527157 -0.14160592  0.16640653  0.012282123  0.008331713
##  [2,]  0.07581656 -0.07341531  0.54865682  0.35136885 -0.092836775  0.260278585
##  [3,] -0.47030478  0.15666269  0.08666272  0.05649129 -0.005327823  0.074809943
##  [4,] -0.32642961 -0.40834183  0.04319235 -0.04182367 -0.028807079 -0.024246871
##  [5,] -0.44727734  0.23397424  0.01787922  0.05485852 -0.006667847  0.041531229
##  [6,] -0.32781002 -0.42493098 -0.07063588 -0.06332984  0.014358986 -0.008700748
##  [7,] -0.19936169 -0.17030843 -0.37912980 -0.43087279  0.092802687  0.093571413
##  [8,] -0.47102174  0.15071286  0.08637868  0.05692553 -0.007674747  0.075638564
##  [9,]  0.11991335  0.20363672 -0.18582484  0.04215446  0.286165826  0.773781336
## [10,]  0.03931900  0.08148468  0.02037995 -0.42952088 -0.797450591  0.352286313
## [11,]  0.02949844 -0.32504455 -0.42893813  0.27365541  0.097295351  0.299375002
## [12,] -0.11886659 -0.34818886  0.44271764 -0.03988126  0.109253949  0.310826100
## [13,]  0.04473088  0.15754235  0.31321659 -0.62068778  0.492167890  0.025345518
##                [,7]         [,8]        [,9]       [,10]        [,11]
##  [1,] -0.2509382208 -0.120563003 -0.03591862  0.76610459 -0.069727422
##  [2,] -0.6507050323  0.007282050  0.22584134 -0.10987977 -0.029200830
##  [3,]  0.1123030309 -0.054392187  0.05270728 -0.17786465  0.259419304
##  [4,] -0.1960949633  0.110560965 -0.43845077  0.12708081  0.537733025
##  [5,]  0.0700560560 -0.076587927  0.10826588 -0.33856586 -0.396568189
##  [6,] -0.1836184235  0.007909346 -0.37913521  0.02554177 -0.635904190
##  [7,] -0.3290772968  0.338802469  0.59782987  0.03676355  0.050900199
##  [8,]  0.1142404969 -0.058929892  0.05128495 -0.18859231  0.230608817
##  [9,]  0.0334378799  0.337659065 -0.32186238 -0.09196090 -0.004619413
## [10,] -0.0001495262 -0.197721748 -0.08071531  0.03347601 -0.011637699
## [11,] -0.0182714060 -0.704246599  0.16410900 -0.03772935  0.096998761
## [12,]  0.5158405938  0.039513872  0.29204441  0.43482374 -0.101180822
## [13,] -0.1871842765 -0.442050937 -0.12190426 -0.03089248  0.024807233
##              [,12]         [,13]
##  [1,]  0.021882776  0.0067551508
##  [2,] -0.024714291 -0.0016233071
##  [3,] -0.333263951 -0.7138599281
##  [4,]  0.415831087  0.0230268086
##  [5,]  0.667536061  0.0103915516
##  [6,] -0.345682455 -0.0254479831
##  [7,] -0.007925706  0.0025505616
##  [8,] -0.376810921  0.6993243446
##  [9,]  0.003234677  0.0009112711
## [10,]  0.003165786 -0.0016333179
## [11,]  0.038042069 -0.0007455510
## [12,]  0.078997770  0.0019600998
## [13,]  0.008462948  0.0009886980
# PCA loadings, rotation
xxx <- dev_indicator_scaled
xxx.pca1 <- prcomp(xxx, center=TRUE, scale.=TRUE)
summary(xxx.pca1)
## Importance of components:
##                           PC1    PC2     PC3     PC4     PC5     PC6    PC7
## Standard deviation     2.0155 1.5080 1.08336 1.01864 0.99954 0.98179 0.9416
## Proportion of Variance 0.3125 0.1749 0.09028 0.07982 0.07685 0.07415 0.0682
## Cumulative Proportion  0.3125 0.4874 0.57769 0.65751 0.73436 0.80851 0.8767
##                            PC8     PC9    PC10    PC11    PC12    PC13
## Standard deviation     0.83909 0.77908 0.46378 0.23831 0.13769 0.02971
## Proportion of Variance 0.05416 0.04669 0.01655 0.00437 0.00146 0.00007
## Cumulative Proportion  0.93087 0.97756 0.99410 0.99847 0.99993 1.00000
xxx.pca1$rotation
##                              PC1         PC2         PC3         PC4
## Pop_Total             0.25389073  0.46527157 -0.14160592 -0.16640653
## Pop_Growth           -0.07581656 -0.07341531  0.54865682 -0.35136885
## GNI_Atlas             0.47030478  0.15666269  0.08666272 -0.05649129
## GNI_PC_Atlas          0.32642961 -0.40834183  0.04319235  0.04182367
## GNI_PPP               0.44727734  0.23397424  0.01787922 -0.05485852
## GNI_PC_PPP            0.32781002 -0.42493098 -0.07063588  0.06332984
## Immunization_Measles  0.19936169 -0.17030843 -0.37912980  0.43087279
## GDP_USD               0.47102174  0.15071286  0.08637868 -0.05692553
## GDP_Growth           -0.11991335  0.20363672 -0.18582484 -0.04215446
## Inflation_GDP        -0.03931900  0.08148468  0.02037995  0.42952088
## Trade_GDP            -0.02949844 -0.32504455 -0.42893813 -0.27365541
## Net_Migration         0.11886659 -0.34818886  0.44271764  0.03988126
## FDI_Net_Inflow       -0.04473088  0.15754235  0.31321659  0.62068778
##                               PC5          PC6           PC7          PC8
## Pop_Total            -0.012282123 -0.008331713  0.2509382208 -0.120563003
## Pop_Growth            0.092836775 -0.260278585  0.6507050323  0.007282050
## GNI_Atlas             0.005327823 -0.074809943 -0.1123030309 -0.054392187
## GNI_PC_Atlas          0.028807079  0.024246871  0.1960949633  0.110560965
## GNI_PPP               0.006667847 -0.041531229 -0.0700560560 -0.076587927
## GNI_PC_PPP           -0.014358986  0.008700748  0.1836184235  0.007909346
## Immunization_Measles -0.092802687 -0.093571413  0.3290772968  0.338802469
## GDP_USD               0.007674747 -0.075638564 -0.1142404969 -0.058929892
## GDP_Growth           -0.286165826 -0.773781336 -0.0334378799  0.337659065
## Inflation_GDP         0.797450591 -0.352286313  0.0001495262 -0.197721748
## Trade_GDP            -0.097295351 -0.299375002  0.0182714060 -0.704246599
## Net_Migration        -0.109253949 -0.310826100 -0.5158405938  0.039513872
## FDI_Net_Inflow       -0.492167890 -0.025345518  0.1871842765 -0.442050937
##                              PC9        PC10         PC11         PC12
## Pop_Total            -0.03591862  0.76610459 -0.069727422 -0.021882776
## Pop_Growth            0.22584134 -0.10987977 -0.029200830  0.024714291
## GNI_Atlas             0.05270728 -0.17786465  0.259419304  0.333263951
## GNI_PC_Atlas         -0.43845077  0.12708081  0.537733025 -0.415831087
## GNI_PPP               0.10826588 -0.33856586 -0.396568189 -0.667536061
## GNI_PC_PPP           -0.37913521  0.02554177 -0.635904190  0.345682455
## Immunization_Measles  0.59782987  0.03676355  0.050900199  0.007925706
## GDP_USD               0.05128495 -0.18859231  0.230608817  0.376810921
## GDP_Growth           -0.32186238 -0.09196090 -0.004619413 -0.003234677
## Inflation_GDP        -0.08071531  0.03347601 -0.011637699 -0.003165786
## Trade_GDP             0.16410900 -0.03772935  0.096998761 -0.038042069
## Net_Migration         0.29204441  0.43482374 -0.101180822 -0.078997770
## FDI_Net_Inflow       -0.12190426 -0.03089248  0.024807233 -0.008462948
##                               PC13
## Pop_Total            -0.0067551508
## Pop_Growth            0.0016233071
## GNI_Atlas             0.7138599281
## GNI_PC_Atlas         -0.0230268086
## GNI_PPP              -0.0103915516
## GNI_PC_PPP            0.0254479831
## Immunization_Measles -0.0025505616
## GDP_USD              -0.6993243446
## GDP_Growth           -0.0009112711
## Inflation_GDP         0.0016333179
## Trade_GDP             0.0007455510
## Net_Migration        -0.0019600998
## FDI_Net_Inflow       -0.0009886980
xxx.pca2 <- princomp(xxx)
loadings(xxx.pca2)
## 
## Loadings:
##                      Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8
## Pop_Total             0.254  0.465  0.142  0.166                0.251  0.121
## Pop_Growth                         -0.549  0.351         0.260  0.651       
## GNI_Atlas             0.470  0.157                             -0.112       
## GNI_PC_Atlas          0.326 -0.408                              0.196 -0.111
## GNI_PPP               0.447  0.234                                          
## GNI_PC_PPP            0.328 -0.425                              0.184       
## Immunization_Measles  0.199 -0.170  0.379 -0.431                0.329 -0.339
## GDP_USD               0.471  0.151                             -0.114       
## GDP_Growth           -0.120  0.204  0.186         0.286  0.774        -0.338
## Inflation_GDP                             -0.430 -0.797  0.352         0.198
## Trade_GDP                   -0.325  0.429  0.274         0.299         0.704
## Net_Migration         0.119 -0.348 -0.443         0.109  0.311 -0.516       
## FDI_Net_Inflow               0.158 -0.313 -0.621  0.492         0.187  0.442
##                      Comp.9 Comp.10 Comp.11 Comp.12 Comp.13
## Pop_Total                    0.766                         
## Pop_Growth           -0.226 -0.110                         
## GNI_Atlas                   -0.178  -0.259  -0.333  -0.714 
## GNI_PC_Atlas          0.438  0.127  -0.538   0.416         
## GNI_PPP              -0.108 -0.339   0.397   0.668         
## GNI_PC_PPP            0.379          0.636  -0.346         
## Immunization_Measles -0.598                                
## GDP_USD                     -0.189  -0.231  -0.377   0.699 
## GDP_Growth            0.322                                
## Inflation_GDP                                              
## Trade_GDP            -0.164                                
## Net_Migration        -0.292  0.435   0.101                 
## FDI_Net_Inflow        0.122                                
## 
##                Comp.1 Comp.2 Comp.3 Comp.4 Comp.5 Comp.6 Comp.7 Comp.8 Comp.9
## SS loadings     1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000  1.000
## Proportion Var  0.077  0.077  0.077  0.077  0.077  0.077  0.077  0.077  0.077
## Cumulative Var  0.077  0.154  0.231  0.308  0.385  0.462  0.538  0.615  0.692
##                Comp.10 Comp.11 Comp.12 Comp.13
## SS loadings      1.000   1.000   1.000   1.000
## Proportion Var   0.077   0.077   0.077   0.077
## Cumulative Var   0.769   0.846   0.923   1.000
summary(xxx.pca2)
## Importance of components:
##                           Comp.1    Comp.2     Comp.3     Comp.4     Comp.5
## Standard deviation     2.0093729 1.5034331 1.08007288 1.01555181 0.99650903
## Proportion of Variance 0.3124768 0.1749303 0.09028235 0.07981801 0.07685272
## Cumulative Proportion  0.3124768 0.4874071 0.57768945 0.65750746 0.73436018
##                            Comp.6     Comp.7    Comp.8     Comp.9    Comp.10
## Standard deviation     0.97881493 0.93875935 0.8365404 0.77671484 0.46236804
## Proportion of Variance 0.07414774 0.06820329 0.0541590 0.04668958 0.01654521
## Cumulative Proportion  0.80850792 0.87671121 0.9308702 0.97755978 0.99410499
##                            Comp.11     Comp.12      Comp.13
## Standard deviation     0.237590235 0.137274851 2.961640e-02
## Proportion of Variance 0.004368717 0.001458407 6.788304e-05
## Cumulative Proportion  0.998473710 0.999932117 1.000000e+00
# We are aiming at explaining over 90% of the variance. Based on the results, we can maintain 8 components to explain 93.087% of the variance

# Visualisation
# PCA variance visualisation and variable relations
plot(xxx.pca2)# xxx.pca1 has the same result

fviz_pca_var(xxx.pca1, col.var="steelblue")# Corr plot, for xxx.pca2 looks similar

autoplot(xxx.pca1, loadings=TRUE, loadings.colour='blue', loadings.label=TRUE, loadings.label.size=3)

# The graph indicates that GNI_Atlas and GDP_USD are more similar, the angle is almost 0
# GNI_PC_Atlas and GNI_PC_PPP are more similar, the angle is almost 0
# GDP_Growth and Inflation_GDP are more similar, the angle is almost 0
# GDP_Growth and Inflation_GD are almost negatively related to GNI_PC_Atlas and GNI_PC_PPP

# visusalisation of eigen value
fviz_eig(xxx.pca1, choice='eigenvalue') # eigenvalues on y-axis

fviz_eig(xxx.pca1) # percentage of explained variance on y-axis

eig.val<-get_eigenvalue(xxx.pca1)
eig.val
##          eigenvalue variance.percent cumulative.variance.percent
## Dim.1  4.0621987414     31.247682626                    31.24768
## Dim.2  2.2740935541     17.493027339                    48.74071
## Dim.3  1.1736705728      9.028235175                    57.76895
## Dim.4  1.0376341701      7.981801308                    65.75075
## Dim.5  0.9990853113      7.685271625                    73.43602
## Dim.6  0.9639206020      7.414773862                    80.85079
## Dim.7  0.8866427185      6.820328604                    87.67112
## Dim.8  0.7040669648      5.415899729                    93.08702
## Dim.9  0.6069645077      4.668957751                    97.75598
## Dim.10 0.2150877663      1.654521279                    99.41050
## Dim.11 0.0567933219      0.436871707                    99.84737
## Dim.12 0.0189592896      0.145840689                    99.99321
## Dim.13 0.0008824796      0.006788304                   100.00000
# cumulative variance
a<-summary(xxx.pca1)
plot(a$importance[3,],type="l") 

# contributions of individual variables to PCA
pca_loadings <- xxx.pca1
cat("PCA Loadings:\n")
## PCA Loadings:
pca_loadings
## Standard deviations (1, .., p=13):
##  [1] 2.01548970 1.50800980 1.08336078 1.01864330 0.99954255 0.98179458
##  [7] 0.94161708 0.83908698 0.77907927 0.46377556 0.23831349 0.13769274
## [13] 0.02970656
## 
## Rotation (n x k) = (13 x 13):
##                              PC1         PC2         PC3         PC4
## Pop_Total             0.25389073  0.46527157 -0.14160592 -0.16640653
## Pop_Growth           -0.07581656 -0.07341531  0.54865682 -0.35136885
## GNI_Atlas             0.47030478  0.15666269  0.08666272 -0.05649129
## GNI_PC_Atlas          0.32642961 -0.40834183  0.04319235  0.04182367
## GNI_PPP               0.44727734  0.23397424  0.01787922 -0.05485852
## GNI_PC_PPP            0.32781002 -0.42493098 -0.07063588  0.06332984
## Immunization_Measles  0.19936169 -0.17030843 -0.37912980  0.43087279
## GDP_USD               0.47102174  0.15071286  0.08637868 -0.05692553
## GDP_Growth           -0.11991335  0.20363672 -0.18582484 -0.04215446
## Inflation_GDP        -0.03931900  0.08148468  0.02037995  0.42952088
## Trade_GDP            -0.02949844 -0.32504455 -0.42893813 -0.27365541
## Net_Migration         0.11886659 -0.34818886  0.44271764  0.03988126
## FDI_Net_Inflow       -0.04473088  0.15754235  0.31321659  0.62068778
##                               PC5          PC6           PC7          PC8
## Pop_Total            -0.012282123 -0.008331713  0.2509382208 -0.120563003
## Pop_Growth            0.092836775 -0.260278585  0.6507050323  0.007282050
## GNI_Atlas             0.005327823 -0.074809943 -0.1123030309 -0.054392187
## GNI_PC_Atlas          0.028807079  0.024246871  0.1960949633  0.110560965
## GNI_PPP               0.006667847 -0.041531229 -0.0700560560 -0.076587927
## GNI_PC_PPP           -0.014358986  0.008700748  0.1836184235  0.007909346
## Immunization_Measles -0.092802687 -0.093571413  0.3290772968  0.338802469
## GDP_USD               0.007674747 -0.075638564 -0.1142404969 -0.058929892
## GDP_Growth           -0.286165826 -0.773781336 -0.0334378799  0.337659065
## Inflation_GDP         0.797450591 -0.352286313  0.0001495262 -0.197721748
## Trade_GDP            -0.097295351 -0.299375002  0.0182714060 -0.704246599
## Net_Migration        -0.109253949 -0.310826100 -0.5158405938  0.039513872
## FDI_Net_Inflow       -0.492167890 -0.025345518  0.1871842765 -0.442050937
##                              PC9        PC10         PC11         PC12
## Pop_Total            -0.03591862  0.76610459 -0.069727422 -0.021882776
## Pop_Growth            0.22584134 -0.10987977 -0.029200830  0.024714291
## GNI_Atlas             0.05270728 -0.17786465  0.259419304  0.333263951
## GNI_PC_Atlas         -0.43845077  0.12708081  0.537733025 -0.415831087
## GNI_PPP               0.10826588 -0.33856586 -0.396568189 -0.667536061
## GNI_PC_PPP           -0.37913521  0.02554177 -0.635904190  0.345682455
## Immunization_Measles  0.59782987  0.03676355  0.050900199  0.007925706
## GDP_USD               0.05128495 -0.18859231  0.230608817  0.376810921
## GDP_Growth           -0.32186238 -0.09196090 -0.004619413 -0.003234677
## Inflation_GDP        -0.08071531  0.03347601 -0.011637699 -0.003165786
## Trade_GDP             0.16410900 -0.03772935  0.096998761 -0.038042069
## Net_Migration         0.29204441  0.43482374 -0.101180822 -0.078997770
## FDI_Net_Inflow       -0.12190426 -0.03089248  0.024807233 -0.008462948
##                               PC13
## Pop_Total            -0.0067551508
## Pop_Growth            0.0016233071
## GNI_Atlas             0.7138599281
## GNI_PC_Atlas         -0.0230268086
## GNI_PPP              -0.0103915516
## GNI_PC_PPP            0.0254479831
## Immunization_Measles -0.0025505616
## GDP_USD              -0.6993243446
## GDP_Growth           -0.0009112711
## Inflation_GDP         0.0016333179
## Trade_GDP             0.0007455510
## Net_Migration        -0.0019600998
## FDI_Net_Inflow       -0.0009886980
# Loading visualisation
var<-get_pca_var(xxx.pca1)
a<-fviz_contrib(xxx.pca1, "var", axes=1, xtickslab.rt=90) # default angle=45°
b<-fviz_contrib(xxx.pca1, "var", axes=2, xtickslab.rt=90)
c<-fviz_contrib(xxx.pca1, "var", axes=3, xtickslab.rt=90)
d<-fviz_contrib(xxx.pca1, "var", axes=4, xtickslab.rt=90)
e<-fviz_contrib(xxx.pca1, "var", axes=5, xtickslab.rt=90)
f<-fviz_contrib(xxx.pca1, "var", axes=6, xtickslab.rt=90)
g<-fviz_contrib(xxx.pca1, "var", axes=7, xtickslab.rt=90)
h<-fviz_contrib(xxx.pca1, "var", axes=8, xtickslab.rt=90)
grid.arrange(a,b,top='Contribution to the first 8 Principal Components')

grid.arrange(c,d,top='Contribution to the first 8 Principal Components')

grid.arrange(e,f,top='Contribution to the first 8 Principal Components')

grid.arrange(g,h,top='Contribution to the first 8 Principal Components')

Based on the PCA results, we can interpret the contributions of the indicators to each principal component (PC). The table below shows which indicators are most influential for each principal component. This helps us understand the key drivers of development as represented by each PC.

Principal Component Contribution Breakdown:

PC Number Key Indicators Contributing Most
PC1 GDP_USD, GNI_Atlas, GNI_PPP, GNI_PC_PPP, GNI_PC_Atlas
PC2 Pop_Total, GNI_PC_PPP, GNI_PC_Atlas, Net_Migration, Trade_GDP
PC3 Pop_Growth, Net_Migration, Trade_GDP, Immunization_Measles, FDI_Net_Inflow
PC4 FDI_Net_Inflow, Immunization_Measles, Inflation_GDP, Pop_Growth
PC5 Inflation_GDP, FDI_Net_Inflow, GDP_Growth
PC6 GDP_Growth, Inflation_GDP, Net_Migration, Trade_GDP
PC7 Pop_Growth, Net_Migration, Immunization_Measles
PC8 Trade_GDP, FDI_Net_Inflow, Immunization_Measles, GDP_Growth

7 Observations

  1. PC1: The first principal component is heavily influenced by economic indicators such as GDP, GNI, and their per capita variants. This suggests that economic size and income levels are key factors in development.
  2. PC2: The second component highlights the importance of population (total and growth) alongside migration and trade. This could reflect how demographic factors and trade openness affect development.
  3. PC3: In PC3, population growth, migration, trade, immunization, and FDI inflows contribute significantly. These factors might represent more dynamic or growth-related aspects of development.
  4. PC4: PC4’s key contributors are FDI inflows, immunization rates, inflation, and population growth. These could be related to economic stability, healthcare access, and demographic trends.
  5. PC5: Economic growth, inflation, and FDI inflows are prominent here, suggesting that economic stability and investment flows are key drivers of development.
  6. PC6: This component is heavily influenced by GDP growth, inflation, migration, and trade, pointing to the interconnectedness of economic performance and globalization factors.
  7. PC7: PC7 highlights the relationship between population growth, migration, and immunization rates, reflecting social and demographic aspects of development.
  8. PC8: The final component sees significant contributions from trade, FDI, immunization, and GDP growth, indicating the importance of trade and investment for long-term economic health.

These interpretations help to summarize the most impactful development factors as captured by PCA, revealing the complex relationship between economic, demographic, and social factors across countries.

8 PCA fitting test

# RMSA (Root Mean Square of the Residuals) of 8 components: 0.04. This indicates a good fit of of our model
xxx.pca4<-principal(xxx, nfactors=8, rotate="varimax")
xxx.pca4
## Principal Components Analysis
## Call: principal(r = xxx, nfactors = 8, rotate = "varimax")
## Standardized loadings (pattern matrix) based upon correlation matrix
##                        RC1   RC3   RC2   RC6   RC7   RC8   RC4   RC5   h2
## Pop_Total             0.76 -0.10 -0.52  0.12  0.04 -0.05  0.04 -0.02 0.87
## Pop_Growth           -0.10 -0.04  0.04  0.02  0.98 -0.02  0.01  0.01 0.97
## GNI_Atlas             0.96  0.20  0.14 -0.07 -0.05 -0.05 -0.02 -0.01 0.99
## GNI_PC_Atlas          0.26  0.74  0.32 -0.29  0.15  0.11 -0.13 -0.07 0.86
## GNI_PPP               0.96  0.15  0.00 -0.04 -0.08 -0.06  0.00  0.00 0.95
## GNI_PC_PPP            0.26  0.77  0.28 -0.28  0.07  0.23 -0.10 -0.08 0.89
## Immunization_Measles  0.08  0.83 -0.14  0.15 -0.19 -0.04  0.07  0.05 0.78
## GDP_USD               0.96  0.20  0.14 -0.07 -0.05 -0.05 -0.02 -0.01 0.98
## GDP_Growth           -0.04 -0.10 -0.02  0.96  0.02  0.03  0.02  0.01 0.94
## Inflation_GDP        -0.02 -0.03 -0.02  0.01  0.01 -0.03  0.00  1.00 1.00
## Trade_GDP            -0.14  0.11  0.05  0.03 -0.03  0.97 -0.09 -0.03 0.98
## Net_Migration         0.07  0.10  0.94  0.00  0.04  0.04  0.01 -0.02 0.91
## FDI_Net_Inflow       -0.01 -0.04  0.00  0.03  0.01 -0.08  0.99  0.00 0.99
##                          u2 com
## Pop_Total            0.1273 1.9
## Pop_Growth           0.0336 1.0
## GNI_Atlas            0.0149 1.2
## GNI_PC_Atlas         0.1399 2.3
## GNI_PPP              0.0491 1.1
## GNI_PC_PPP           0.1126 2.1
## Immunization_Measles 0.2174 1.3
## GDP_USD              0.0154 1.2
## GDP_Growth           0.0647 1.0
## Inflation_GDP        0.0042 1.0
## Trade_GDP            0.0172 1.1
## Net_Migration        0.0931 1.0
## FDI_Net_Inflow       0.0093 1.0
## 
##                        RC1  RC3  RC2  RC6  RC7  RC8  RC4  RC5
## SS loadings           3.50 1.98 1.40 1.13 1.03 1.03 1.03 1.01
## Proportion Var        0.27 0.15 0.11 0.09 0.08 0.08 0.08 0.08
## Cumulative Var        0.27 0.42 0.53 0.62 0.70 0.77 0.85 0.93
## Proportion Explained  0.29 0.16 0.12 0.09 0.09 0.08 0.08 0.08
## Cumulative Proportion 0.29 0.45 0.57 0.66 0.75 0.83 0.92 1.00
## 
## Mean item complexity =  1.3
## Test of the hypothesis that 8 components are sufficient.
## 
## The root mean square of the residuals (RMSR) is  0.04 
##  with the empirical chi square  50.37  with prob <  1.2e-11 
## 
## Fit based upon off diagonal values = 0.98
summary(xxx.pca4)
## 
## Factor analysis with Call: principal(r = xxx, nfactors = 8, rotate = "varimax")
## 
## Test of the hypothesis that 8 factors are sufficient.
## The degrees of freedom for the model is 2  and the objective function was  5 
## The number of observations was  165  with Chi Square =  766.99  with prob <  2.8e-167 
## 
## The root mean square of the residuals (RMSA) is  0.04
# printing only the significant loadings
print(loadings(xxx.pca4), digits=3, cutoff=0.4, sort=TRUE)
## 
## Loadings:
##                      RC1    RC3    RC2    RC6    RC7    RC8    RC4    RC5   
## Pop_Total             0.757        -0.518                                   
## GNI_Atlas             0.957                                                 
## GNI_PPP               0.958                                                 
## GDP_USD               0.956                                                 
## GNI_PC_Atlas                 0.738                                          
## GNI_PC_PPP                   0.768                                          
## Immunization_Measles         0.832                                          
## Net_Migration                       0.942                                   
## GDP_Growth                                 0.960                            
## Pop_Growth                                        0.976                     
## Trade_GDP                                                0.969              
## FDI_Net_Inflow                                                  0.991       
## Inflation_GDP                                                          0.997
## 
##                  RC1   RC3   RC2   RC6   RC7   RC8   RC4   RC5
## SS loadings    3.500 1.975 1.401 1.132 1.031 1.026 1.025 1.010
## Proportion Var 0.269 0.152 0.108 0.087 0.079 0.079 0.079 0.078
## Cumulative Var 0.269 0.421 0.529 0.616 0.695 0.774 0.853 0.931
# PCA Reconstruction Error: 0.06871083. This indicates a good fit of of our model
reconstructed_pca <- xxx.pca1$x[,1:8] %*% t(xxx.pca1$rotation[,1:8])
pca_reconstruction_error <- mean((dev_indicator_scaled - reconstructed_pca)^2)
pca_reconstruction_error
## [1] 0.06871083
# Print Comparison Results
explained_variance <- summary(xxx.pca1)$importance[3, 8]
explained_variance <- paste0(round(explained_variance * 100, 2), "%")
cat("PCA Explained Variance (8 PCs):", explained_variance, "\n")
## PCA Explained Variance (8 PCs): 93.09%
cat("PCA Reconstruction Error:", pca_reconstruction_error, "\n")
## PCA Reconstruction Error: 0.06871083

9 MDS Analysis

# Compute the distance matrix based on the scaled data
dist_matrix <- dist(dev_indicator_scaled)

# Apply MDS using cmdscale (Classical MDS)
mds_result <- cmdscale(dist_matrix, k=2, eig=TRUE, x.ret=TRUE)

# View MDS result
mds_result$eig  # Eigenvalues (helps in assessing the stress)
##   [1]  6.662006e+02  3.729513e+02  1.924820e+02  1.701720e+02  1.638500e+02
##   [6]  1.580830e+02  1.454094e+02  1.154670e+02  9.954218e+01  3.527439e+01
##  [11]  9.314105e+00  3.109323e+00  1.447266e-01  7.885353e-13  1.499981e-13
##  [16]  7.010117e-14  3.095465e-14  2.611694e-14  2.577339e-14  2.134075e-14
##  [21]  1.889772e-14  1.521166e-14  1.510201e-14  1.401122e-14  1.245441e-14
##  [26]  1.212355e-14  1.189712e-14  1.046888e-14  1.042632e-14  8.834358e-15
##  [31]  7.846946e-15  7.200908e-15  6.842846e-15  6.687045e-15  5.982195e-15
##  [36]  5.708233e-15  5.503582e-15  5.487744e-15  5.420435e-15  5.041465e-15
##  [41]  4.863760e-15  4.861311e-15  4.493075e-15  4.402926e-15  4.197475e-15
##  [46]  4.010273e-15  3.999657e-15  3.988785e-15  3.599212e-15  3.400358e-15
##  [51]  3.360455e-15  3.244477e-15  2.884666e-15  2.869712e-15  2.841133e-15
##  [56]  2.818166e-15  2.687535e-15  2.671438e-15  2.505267e-15  2.496544e-15
##  [61]  2.463576e-15  2.176349e-15  1.919128e-15  1.789885e-15  1.775034e-15
##  [66]  1.770791e-15  1.723108e-15  1.715451e-15  1.593330e-15  1.572962e-15
##  [71]  1.428685e-15  1.365792e-15  1.292620e-15  1.039485e-15  9.848296e-16
##  [76]  9.456518e-16  9.418397e-16  9.289444e-16  8.792492e-16  8.688674e-16
##  [81]  8.296682e-16  7.813462e-16  5.239830e-16  4.610904e-16  4.263342e-16
##  [86]  3.826263e-16  3.289476e-16  3.069959e-16  2.442877e-16  1.844797e-16
##  [91]  1.311333e-16  5.037430e-17 -1.356786e-17 -7.989673e-17 -1.874778e-16
##  [96] -1.907325e-16 -2.607337e-16 -2.711674e-16 -2.847218e-16 -2.957561e-16
## [101] -3.485657e-16 -3.515611e-16 -3.992663e-16 -5.822304e-16 -6.419330e-16
## [106] -7.676274e-16 -7.692225e-16 -9.119656e-16 -9.299824e-16 -9.865198e-16
## [111] -1.140675e-15 -1.164634e-15 -1.193398e-15 -1.197124e-15 -1.217231e-15
## [116] -1.284052e-15 -1.299112e-15 -1.352693e-15 -1.483887e-15 -1.906473e-15
## [121] -1.976361e-15 -1.990628e-15 -2.066404e-15 -2.102117e-15 -2.195415e-15
## [126] -2.256268e-15 -2.271737e-15 -2.316848e-15 -2.433527e-15 -2.700578e-15
## [131] -2.720232e-15 -2.796867e-15 -2.827160e-15 -2.887901e-15 -2.981860e-15
## [136] -3.219183e-15 -3.254504e-15 -3.391862e-15 -3.970319e-15 -4.052037e-15
## [141] -4.052990e-15 -4.359873e-15 -4.635374e-15 -5.064067e-15 -5.430382e-15
## [146] -5.765813e-15 -6.495861e-15 -7.141185e-15 -7.329281e-15 -7.721392e-15
## [151] -1.065468e-14 -1.087529e-14 -1.136229e-14 -1.459384e-14 -1.541271e-14
## [156] -1.669482e-14 -1.751121e-14 -2.210431e-14 -2.228052e-14 -2.325372e-14
## [161] -2.836779e-14 -3.329816e-14 -4.621098e-14 -1.562906e-13 -8.016612e-13
head(mds_result$points) # Coordinates in the reduced 2D space
##             [,1]       [,2]
## [1,]  1.05503298 -0.1974985
## [2,] -0.05226467 -0.6729309
## [3,]  1.52338067 -0.7613413
## [4,]  0.78585930  0.4068552
## [5,] -1.02740612 -0.9157622
## [6,]  1.06883764  0.2327346
# --- Calculate Stress Manually ---
# Calculate the original distance matrix (before MDS)
original_dist_matrix <- dist(dev_indicator_scaled)

# Calculate the distances between the MDS points
mds_dist_matrix <- dist(mds_result$points)

# Calculate the stress
stress <- sqrt(sum((original_dist_matrix - mds_dist_matrix)^2) / sum(original_dist_matrix^2))

# Print the stress value
cat("Stress of the MDS configuration:", stress, "\n")
## Stress of the MDS configuration: 0.4475294
# Visualize MDS results using a 2D scatter plot
mds_data <- data.frame(mds_result$points)  # MDS coordinates
colnames(mds_data) <- c("Dimension1", "Dimension2")  # Name the dimensions for clarity
ggplot(mds_data, aes(x=Dimension1, y=Dimension2)) +
  geom_point(size=3, color="blue") +
  labs(title="MDS Visualization of Development Indicators",
       x="Dimension 1", y="Dimension 2") +
  theme_minimal()

10 Correlation Plot (corrplot)

# Compute the correlation matrix
cor_matrix <- cor(dev_indicator_scaled)

# Create the correlation plot with circular markers and color gradients
corrplot(cor_matrix, 
         method = "circle",  # Use circle as the method
         type = "full",  # Show the full correlation matrix
         order = "hclust",  # Cluster the variables based on similarity
         col = colorRampPalette(c("blue", "white", "red"))(200),  # Color gradient
         tl.col = "red",  # Set text label color to red
         tl.cex = 0.8,  # Adjust text label size
         addCoef.col = "black",  # Add correlation coefficient text in black
         number.cex = 0.7,  # Adjust the size of correlation coefficient text
         mar = c(0, 0, 1, 0)  # Adjust margins for better visualization
)

# Based on the graph we can see that Pop_total, GNI_PPP, GDP_Atlas and GDP_USD have strong correlations. GNI_PC_Atlas and GNI_PC_PPP have strong connections. We may perform further analysis on these factors and see if we could optimise the World Devlopment Indicators. However, this is not the main focus of this report, we will not perform further analysis on mentioned idicators

11 Conclusion

This report is focusing on dimensional reduction on the World Development Indicators 2023, to find out the most impactful indicators that would represent the development index

  1. We applied PCA dimensional reduction method based on the nature of the dataset and the main goal of this analysis
    • We are able to explain 93.0875% of the variance of this dataset by reducing the dimension to 8 components
    • We are able to achieve 0.04 RMSA (Root Mean Square of the Residuals) of 8 components which indicates that PCA method was able to fit the dataset well on the dimensional reduction, this suggests a low error explaining the original dataset
    • PCA Reconstruction Error: 0.06871083, this also shows the positive result of PCA on this dataset, we have negligible error with remaining 8 components
    • Based on the 8 components, we can see the most impactful indicators are: GNI_Atlas, GDP_USD and GNI_PC_PPP(also has higher value in PC2) with most loading value in PC1
  2. From the initial correlation plot and analysis, and MDS, we are able to see some strong correlations among some indicators
    • Based on the result we have, the stress of MDS is 0.4475294, which might be high comparing to the PCA Reconstruction Error which is 0.06871083
    • This could potentially suggest optimising the indicators to evaluate the World Development performance. However, this requires further analysis and we might need more dataset across a longer period to support this argument

12 Appendix

code_name Key Indicators
Pop_Total Population, total [SP.POP.TOTL]
Pop_Growth Population growth (annual %) [SP.POP.GROW]
GNI_Atlas GNI, Atlas method (current US$) [NY.GNP.ATLS.CD]
GNI_PC_Atlas GNI per capita, Atlas method
GNI_PPP GNI, PPP (current international \() [NY.GNP.MKTP.PP.CD] | | GDP_USD | GDP (current US\)) [NY.GDP.MKTP.CD]
GDP_Growth GDP growth (annual %) [NY.GDP.MKTP.KD.ZG]
Inflation_GDP Inflation, GDP deflator (annual %) [NY.GDP.DEFL.KD.ZG]
Trade_GDP Merchandise trade (% of GDP) [TG.VAL.TOTL.GD.ZS]
Net_Migration Net migration [SM.POP.NETM] → Net_Migration
FDI_Net_Inflow Foreign direct investment, net inflows (BoP, current US$) [BX.KLT.DINV.CD.WD]
GNI_PC_PPP GNI per capita, PPP (current international $) [NY.GNP.PCAP.PP.CD]
Immunization_Measles Immunization, measles (% of children ages 12-23 months) [SH.IMM.MEAS]