Dataset Acquisition

In this module, we decided to use the dataset from World Bank which can be acquired from this link. We use the WDICSV which is the main dataset of the zipfile.

If you actually curious and checked the origin of the data, you may notice that the data seems odd. That is also the main challenge of our task, which we eventually overcome by:

1. Filter the country to exclude region-based and focused to country-based which consist of 217 countries

2. Filter only use the last updated data which is from 2024

3. Filter only for the most interesting variables in our opinion, which is consist of 12 variables.

If you are interested in the process, you can check our colab notebook, which we use R to perform the process. We choose to not publish it in Rpubs because it will looks very bad in R Markdown. We also change the column name for the dataset with X1, X2, etc for the sake of visual. The mapping is in the report but you can also check here if you’re not opening our report right now

Anyway, this is the overview of the dataset

## 'data.frame':    217 obs. of  12 variables:
##  $ Country Name: chr  "Afghanistan" "Albania" "Algeria" "American Samoa" ...
##  $ X1          : num  14.99 7.05 4.35 NA 7.7 ...
##  $ X2          : num  16.9 36.3 19.9 47 NA ...
##  $ X3          : num  414 11378 5753 18017 49304 ...
##  $ X4          : num  21.2 12.4 18.1 NA NA ...
##  $ X5          : num  50.7 43.2 20.3 77.7 NA ...
##  $ X6          : num  13.4 22.4 36.2 NA 12.8 ...
##  $ X7          : num  -6.6 2.22 4.05 NA NA ...
##  $ X8          : num  66 79.6 76.3 72.9 84 ...
##  $ X9          : num  10.9 82 54.4 19.6 67.7 ...
##  $ X10         : num  13.7 10.7 11.7 NA NA ...
##  $ X11         : num  25.7 58.5 75.3 80.9 88.9 ...

Then, we should also know the basic overview of the dataset

##  Country Name             X1               X2                 X3          
##  Length:217         Min.   : 1.327   Min.   :  0.7209   Min.   :   219.4  
##  Class :character   1st Qu.: 4.351   1st Qu.: 22.4897   1st Qu.:  2703.0  
##  Mode  :character   Median : 6.369   Median : 36.6774   Median :  8385.0  
##                     Mean   : 6.829   Mean   : 43.9780   Mean   : 22790.6  
##                     3rd Qu.: 8.822   3rd Qu.: 55.2482   3rd Qu.: 29669.8  
##                     Max.   :27.090   Max.   :191.5330   Max.   :288001.4  
##                     NA's   :24       NA's   :25         NA's   :3         
##        X4               X5                X6               X7         
##  Min.   : 2.559   Min.   :  1.274   Min.   : 2.086   Min.   :-12.297  
##  1st Qu.:12.229   1st Qu.: 28.633   1st Qu.:16.775   1st Qu.:  1.730  
##  Median :16.660   Median : 44.882   Median :23.304   Median :  3.075  
##  Mean   :17.632   Mean   : 50.281   Mean   :24.559   Mean   :  9.649  
##  3rd Qu.:20.651   3rd Qu.: 66.322   3rd Qu.:31.018   3rd Qu.:  5.412  
##  Max.   :73.832   Max.   :177.715   Max.   :76.034   Max.   :254.948  
##  NA's   :31       NA's   :25        NA's   :9        NA's   :24       
##        X8              X9               X10              X11        
##  Min.   :54.46   Min.   :  1.051   Min.   : 0.130   Min.   : 14.66  
##  1st Qu.:68.48   1st Qu.: 14.300   1st Qu.: 3.278   1st Qu.: 43.81  
##  Median :74.55   Median : 37.685   Median : 5.199   Median : 63.89  
##  Mean   :73.80   Mean   : 43.411   Mean   : 6.890   Mean   : 62.87  
##  3rd Qu.:79.59   3rd Qu.: 70.282   3rd Qu.: 8.759   3rd Qu.: 80.94  
##  Max.   :86.37   Max.   :165.113   Max.   :34.643   Max.   :100.00  
##                  NA's   :14        NA's   :30

Based on the output above, we can conclude that

1. We rather a terrible data distribution, where most of our variables have right-skewed. Even some of them (X3 and X7) can be diagnosed as extreme just by looking at the mean, median, and max.

2. Some of the variable can also be diagnosed just by this summary to have outliers just by looking at the 3rd Quartile to Max. Some of them are: X2, X3, X5, X7, etc.

3. We have an absurd amount of NaN, where the most variable with NaN (X4) have 14.3% of it. With this amount of missing value, an appropriate handling is needed

4. The scale in this dataset is too vary, one can have range of 1 to 27, while the other have range of 1 to 217. Thus, rescaling is needed

Data Exploration

This histogram and density plot visualize the distribution of all 11 variables. Most variables exhibit right-skewed distributions, particularly GDP per capita (X3), Exports (X2), and Imports (X5), indicating the presence of extreme high values among certain countries. This confirms the need for robust scaling before multivariate analysis.

The boxplot reveals substantial outliers across multiple indicators, particularly GDP per capita (X3), Inflation (X7), and School Enrollment (X9). The presence of extreme values further justifies the decision to apply median imputation and robust scaling.

Data Missing Handling

In this case, we will try to use a method where outlier will not be in our way. Deletion is an avoid due to the massive amount of data loss it will cause. KNN imputation is in our opinion the most suitable but its sensitiveness of skew is a new problem, while transform the data is no-go. Then, we have no choice but to use median imputation

df_imputed <- df |>
  mutate(across(where(is.numeric), ~ifelse(is.na(.), median(., na.rm = TRUE), .)))

Now if we check the missing data, there should be no one left

colMeans(is.na(df_imputed))

## Country Name           X1           X2           X3           X4           X5 
##            0            0            0            0            0            0 
##           X6           X7           X8           X9          X10          X11 
##            0            0            0            0            0            0

This process has been done.

Assumption test

Correlation

Based on the material, we set the threshold of our test here as:

Based on our visual inspection, at least 24% (just like the example in the material) of the correlation is greater than 0.3
More than 30% of our correlation is significant at 0.01 level

	X1	X2	X3	X4	X5	X6	X7	X8	X9	X10	X11
X1		-0.004	0.110	0.372***	0.136*	-0.278***	-0.079	0.156*	0.288***	0.055	0.215**
X2	-0.004		0.371***	0.008	0.791***	0.091	-0.195**	0.403***	0.287***	-0.021	0.326***
X3	0.110	0.371***		0.039	0.191**	-0.102	-0.124	0.605***	0.374***	-0.149*	0.392***
X4	0.372***	0.008	0.039		0.271***	-0.113	-0.105	0.033	0.053	0.073	0.024
X5	0.136*	0.791***	0.191**	0.271***		-0.138*	-0.241***	0.216**	0.117	0.045	0.163*
X6	-0.278***	0.091	-0.102	-0.113	-0.138*		0.093	-0.047	-0.004	0.037	0.026
X7	-0.079	-0.195**	-0.124	-0.105	-0.241***	0.093		-0.152*	0.056	0.049	0.003
X8	0.156*	0.403***	0.605***	0.033	0.216**	-0.047	-0.152*		0.678***	-0.166*	0.567***
X9	0.288***	0.287***	0.374***	0.053	0.117	-0.004	0.056	0.678***		-0.131	0.502***
X10	0.055	-0.021	-0.149*	0.073	0.045	0.037	0.049	-0.166*	-0.131		0.039
X11	0.215**	0.326***	0.392***	0.024	0.163*	0.026	0.003	0.567***	0.502***	0.039
Computed correlation used pearson-method with listwise-deletion.

The correlation heatmap provides a visual overview of inter-variable relationships. Several moderate-to-strong correlations are observed, particularly between GDP per capita, Life Expectancy, and School Enrollment. This pattern supports the suitability of dimension reduction techniques such as PCA and FA.

After checking the tables, we conclude that:

1. 10 out of 36 pairs are having correlation score greater than |0.3|. 10 out of 36 is 27.7% so it pass

2. 15 out of 36 pairs, which is 41.6%, are significant at 0.01 level, so it also pass

Conclusion: This dataset pass the Correlation Among Variables test

Measure of Sampling Adequacy (MSA) or Kaiser-Meyer-Olkin (KMO) Factor Adequacy

In the material, MSA is being done after the other two. But we find it funny that we need to redo the other two test. So instead of redoing we decided to move this process into the very first. Based on the material, we had to pass the treshold of overall MSA which is > 0.5. If we happened ti fail the overall, we had to checked individually and drop the one that below it. However, in the example of the case the material is also checked individually even though the overall value is achieved so we guess individual check is mandatory

## Dropping: X6 
## MSA of dropped variable: 0.2918671 
## Overall MSA before drop: 0.6069779 
## 
## Dropping: X4 
## MSA of dropped variable: 0.4256974 
## Overall MSA before drop: 0.6372648 
## 
## Dropping: X1 
## MSA of dropped variable: 0.3732015 
## Overall MSA before drop: 0.6467721

## FINAL RESULT

## Overall MSA: 0.6899996

##        X2        X3        X5        X7        X8        X9       X10       X11 
## 0.6307514 0.7939052 0.5514637 0.5699852 0.7110973 0.7289431 0.5536045 0.8376733

Conclusion: This dataset pass the MSA test

Bartlett Test

From the note in the material, the p-value in this process should be lower than 0.05

## $chisq
## [1] 798.8351
## 
## $p.value
## [1] 1.525512e-132
## 
## $df
## [1] 55

As we can see above, our p-value is actually astronomically lower

Conclusion: This dataset pass the Bartlett Test

Grand Conclusion: All assumption was completed for this dataset.

Data Scaling

Now that all the asumption are met, data scaling can be proceed. For safety, due to the heavy skew in various variables, I will use Robust scaling one.

robust_scale <- function(x) {
  (x - median(x, na.rm = TRUE)) / IQR(x, na.rm = TRUE)
}
df_scaled <- numeric_df_filtered |>
  mutate(across(everything(), robust_scale))

Now, if we check the Median and MAD, the medians should be ~0 and MADs ~1.

rbind(median = apply(df_scaled %>% select(everything()), 2, median, na.rm = TRUE),
      MAD    = apply(df_scaled %>% select(everything()), 2, mad, na.rm = TRUE))

##               X2        X3        X5        X7        X8        X9       X10
## median 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000 0.0000000
## MAD    0.7324812 0.3992661 0.7293251 0.7708857 0.7569133 0.6798651 0.6062594
##              X11
## median 0.0000000
## MAD    0.7425708

This process has been done.

PCA Analysis

Now that we already know the dataset is suitable for PCA, we will do further analysis, which in the material is being breakdown by these two stages:

Deriving factors and assessing overall fit
Interpreting the factors

Stage 1

This stage is focused on deriving factors and assessing overall fit of the PCA Model. The stage is essential to determine the optimal number of Principal Components to retain for further analysis.

pca_result <- prcomp(numeric_df_filtered, scale = TRUE)

eig.val <- get_eigenvalue(pca_result)
print(eig.val)

##       eigenvalue variance.percent cumulative.variance.percent
## Dim.1  3.0778950        38.473687                    38.47369
## Dim.2  1.4967564        18.709455                    57.18314
## Dim.3  1.0757303        13.446628                    70.62977
## Dim.4  0.8508010        10.635012                    81.26478
## Dim.5  0.6036754         7.545942                    88.81073
## Dim.6  0.4802381         6.002976                    94.81370
## Dim.7  0.2382707         2.978383                    97.79208
## Dim.8  0.1766332         2.207915                   100.00000

If we use the Kaiser rule, we will just only using three of our variable that can cover 70% of the entire information. If we want to use Percentage of Variance criterion, four will be included, with three as an another option to include. To help us decide things, we will also use Scree Criterion. But, because visual judgement of Scree Plot alone in our opinion doesnt really dependable, we will also need the help of Parallel Analysis

## Parallel analysis suggests that the number of factors =  NA  and the number of components =  2

If we look at the result, we will see that the elbow start on component 2.

Now, there are 3 options: Retain 2, 3, or even 4.

But, only using two of our variable will only result of 57 % in Percentage of variance criterion, which is very small Meanwhile, using 3 result in 70%, which is a good news, and using 4 result in 81%, which is more good news.

However, percentage of variance criterion is flexible with 60% still acceptable. Thus, for further analysis, we will only use 3 of our variables.

Stage 2

This stage is focused on interpreting the derived factor we already got. This stage is mainly will be more useful for FA though.

library(psych)

pc_unrotated <- principal(numeric_df_filtered, nfactors = 3, rotate = "none")

unrotated_table <- data.frame(
  PC1 = round(pc_unrotated$loadings[, 1], 3),
  PC2 = round(pc_unrotated$loadings[, 2], 3),
  PC3 = round(pc_unrotated$loadings[, 3], 3),
  h2  = round(pc_unrotated$communality, 3)
)

fit_table <- data.frame(
  PC1 = round(pc_unrotated$Vaccounted[1:5, 1], 3),
  PC2 = round(pc_unrotated$Vaccounted[1:5, 2], 3),
  PC3 = round(pc_unrotated$Vaccounted[1:5, 3], 3),
  row.names = c("SS loadings", "Proportion Var", "Cumulative Var", 
                "Proportion Explained", "Cumulative Proportion")
)

##        PC1    PC2    PC3    h2
## X2   0.725 -0.548  0.090 0.833
## X3   0.706  0.166 -0.164 0.554
## X5   0.545 -0.735  0.097 0.846
## X7  -0.219  0.469  0.490 0.507
## X8   0.848  0.291 -0.070 0.808
## X9   0.704  0.434  0.099 0.693
## X10 -0.151 -0.237  0.822 0.754
## X11  0.687  0.284  0.318 0.654

##                         PC1   PC2   PC3
## SS loadings           3.078 1.497 1.076
## Proportion Var        0.385 0.187 0.134
## Cumulative Var        0.385 0.572 0.706
## Proportion Explained  0.545 0.265 0.190
## Cumulative Proportion 0.545 0.810 1.000

##        PC1    PC2    PC3    h2
## X2   0.713  0.612 -0.132 0.900
## X3   0.707 -0.183 -0.100 0.544
## X5   0.526  0.785 -0.143 0.913
## X8   0.851 -0.298  0.028 0.813
## X9   0.723 -0.371  0.115 0.674
## X10 -0.149  0.341  0.895 0.939
## X11  0.701 -0.209  0.381 0.680

##                         PC1   PC2   PC3
## SS loadings           3.046 1.410 1.007
## Proportion Var        0.435 0.201 0.144
## Cumulative Var        0.435 0.637 0.780
## Proportion Explained  0.557 0.258 0.184
## Cumulative Proportion 0.557 0.816 1.000

Now, with all h2 values are above 0.50, ranging from 0.570 (X11) to 0.883 (X2). This means all variables share sufficient variance with the retained components.

Additionally, PCI heavily dominate almost all variables with high SS loading. PC1 also have high variance explanation with 45% out of 77% when the three combine together.

If we only do PCA, this is the end of the chapter. But in FA, a further approach is required, which include Varimax Rotation

The PCA biplot simultaneously displays variable loadings and country observations. Variables pointing in similar directions are positively correlated, while opposite directions indicate negative relationships. The first principal component appears to capture development-related indicators.

pc_rotated <- principal(numeric_df_filtered2, nfactors = 3, rotate = "varimax")

rotated_table <- data.frame(
  RC1 = round(pc_rotated$loadings[, 1], 3),
  RC2 = round(pc_rotated$loadings[, 2], 3),
  RC3 = round(pc_rotated$loadings[, 3], 3),
  h2  = round(pc_rotated$communality, 3)
)


fit_table_rotated <- data.frame(
  RC1 = round(pc_rotated$Vaccounted[1:5, 1], 3),
  RC2 = round(pc_rotated$Vaccounted[1:5, 2], 3),
  RC3 = round(pc_rotated$Vaccounted[1:5, 3], 3),
  row.names = c("SS loadings", "Proportion Var", "Cumulative Var",
                "Proportion Explained", "Cumulative Proportion")
)

##        RC1   RC2    RC3    h2
## X2   0.294 0.902 -0.017 0.900
## X3   0.666 0.229 -0.219 0.544
## X5   0.050 0.953  0.041 0.913
## X8   0.872 0.180 -0.144 0.813
## X9   0.817 0.036 -0.071 0.674
## X10 -0.079 0.027  0.965 0.939
## X11  0.784 0.106  0.232 0.680

##                         RC1   RC2   RC3
## SS loadings           2.582 1.820 1.061
## Proportion Var        0.369 0.260 0.152
## Cumulative Var        0.369 0.629 0.780
## Proportion Explained  0.473 0.333 0.194
## Cumulative Proportion 0.473 0.806 1.000

If we compare the result of unrotated and rotated we will see that where in unrotated, most of PCA have no standout variables for each Component. Meanwhile, in rotated each component now can be focusely explained by few variables that have connection between each other:

RC1. This Component mostly explained all variables correlated with Human Developed Factors: X3 GDP per capita, X8 Life expectancy, X9 School enrollment, X11 Urban population. This component basically capture the fact that countries with higher wealth tend to have better health, education and urbanization
RC2. This component mostly explained all variables correlated with Trade Factor: X2 Exports, X5 Imports. This captures a country’s openness and integration in global trade
RC3. This component mostly explained variables correlated with Unemployment Factor. X10 Unemployment. This component captures the labor market condition of a country, where higher unemployment reflects economic vulnerability regardless of development level.

FA Analysis

Actually, all of our analysis is mostly covered in PCA analysis in previous section. However, in the material there is one more stage that is exclusive for Factor Analysis only ## Factor Scores We will use the categorization covered earlier in RC1-RC3, which are:

F1: Human Development
F2: Trade
F3: Public Spending

fa_result <- principal(numeric_df_filtered2, nfactors = 3, rotate = "varimax", scores = TRUE)

factor_scores <- as.data.frame(fa_result$scores)
colnames(factor_scores) <- c("F1", "F2", "F3")

factor_scores <- cbind(Country = df_scaled, factor_scores)

##     Country.X2   Country.X3  Country.X5  Country.X7  Country.X8  Country.X9
## 1  -0.70404452 -0.296046239  0.18396250 -3.56449451 -0.76660666 -0.50823316
## 2  -0.01393879  0.111149559 -0.05384224 -0.31641904  0.45454545  0.83938507
## 3  -0.59751787 -0.097751009 -0.77358516  0.35781487  0.15382538  0.31714729
## 4   0.36508687  0.357742399  1.03286689  0.00000000 -0.15301530 -0.34219173
## 5   0.00000000  1.519688754  0.00000000  0.00000000  0.85409541  0.56809970
## 6  -0.18445266 -0.212404287 -0.80262572  9.27066743 -0.89423942 -0.52527064
## 7   0.64089524  0.562936544  0.56912017  1.15085710  0.27416742 -0.23809549
## 8  -0.76097445  0.207414551 -1.01264998 79.86928621  0.25589559  1.32854396
## 9   1.31216111  0.006358652  0.99238278 -1.03343023  0.26227273  0.28582604
## 10  1.68033697  1.155535965  0.99028905  0.43567200  0.16210621 -0.44740683
## 11 -0.42814790  2.087932240 -0.70309164  0.03197769  0.76500630  1.26841811
## 12  0.67479398  1.852650868  0.25992589 -0.05042937  0.62911431  1.02280452
## 13  0.32762095 -0.040896021 -0.25455036 -0.31778281 -0.01107111  0.06925346
## 14  0.03928636  1.153933777 -0.10764470 -0.98198504  0.00000000 -0.43268810
## 15  1.81111665  0.789899091  0.79439733 -0.79393428  0.60594059  0.39176722
##      Country.X10 Country.X11         F1          F2           F3
## 1   1.8105802048 -1.02844127 -1.3596485 -0.13763121  0.942440330
## 2   1.1710750853 -0.14382633  0.6493207 -0.43168829  0.665506567
## 3   1.3771331058  0.30683345  0.5479956 -1.13667352  1.098111191
## 4   0.0000000000  0.45941715 -0.2288151  0.69216945  0.003978788
## 5   0.0000000000  0.67326935  1.3646262 -0.43003166 -0.128235328
## 6   1.8816126280  0.18229349 -0.5816064 -0.71537584  1.551394041
## 7   0.0000000000 -1.06372420 -0.6936553  0.66609393 -0.789076443
## 8   0.4161689420  0.76454982  1.5116658 -1.63862318  0.489193276
## 9   1.5371160410  0.05183862  0.1142121  0.99647201  1.069531947
## 10  0.0000000000 -0.05644165 -0.2979815  1.36866806 -0.394421751
## 11 -0.2681313993  0.63871898  1.9319964 -1.22726479 -0.416444938
## 12  0.0002133106  0.15026212  1.2118864  0.06229729 -0.396970218
## 13  0.0957764505 -0.14308730 -0.1034711 -0.22433423 -0.174604279
## 14  0.8212457338  0.46965770  0.1936616 -0.21339983  0.564559703
## 15 -0.8777730375  0.97262418  0.9349198  0.95526456 -0.624765910

From the results above, we are successfully reduced our 8 variables earlier to only 3. From the results above also, we can conclude that:

Afghanistan scores very low on Human Development (-1.36) but high on Unemployment (0.94), indicating a poor country with a struggling labor market
Argentina scores high on Human Development (1.51) but very low on Trade (-1.64), meaning it is a developed but trade-restricted economy
Angola scores low on both Human Development (-0.58) and Trade (-0.72) but high on Unemployment (1.55), suggesting an underdeveloped economy with limited trade integration and a severely strained labor market

The rotated factor loading plot clearly shows clustering of variables. Human development indicators cluster in Factor 1, trade-related variables in Factor 2, and labor-market variables in Factor 3. Varimax rotation improves interpretability by maximizing high loadings within each factor.

Grand Table

Before wrap things up, this is additional section if interested

Country	F1	F2	F3
Afghanistan	-1.360	-0.138	0.942
Albania	0.649	-0.432	0.666
Algeria	0.548	-1.137	1.098
American Samoa	-0.229	0.692	0.004
Andorra	1.365	-0.430	-0.128
Angola	-0.582	-0.715	1.551
Antigua and Barbuda	-0.694	0.666	-0.789
Argentina	1.512	-1.639	0.489
Armenia	0.114	0.996	1.070
Aruba	-0.298	1.369	-0.394
Australia	1.932	-1.227	-0.416
Austria	1.212	0.062	-0.397
Azerbaijan	-0.103	-0.224	-0.175
Bahamas, The	0.194	-0.213	0.565
Bahrain	0.935	0.955	-0.625
Bangladesh	-0.588	-1.096	-0.825
Barbados	0.238	-0.248	-0.112
Belarus	0.323	0.558	-0.296
Belgium	1.252	0.988	-0.088
Belize	-0.653	0.421	0.149
Benin	-1.168	-0.781	-0.804
Bermuda	1.635	-0.393	-0.428
Bhutan	-0.848	0.012	-0.754
Bolivia	-0.190	-0.872	-0.355
Bosnia and Herzegovina	-0.016	0.054	0.600
Botswana	-0.280	-0.321	3.182
Brazil	0.787	-1.284	0.391
British Virgin Islands	-0.159	-0.166	-0.303
Brunei Darussalam	0.163	0.687	-0.188
Bulgaria	0.574	0.115	-0.283
Burkina Faso	-1.575	-0.318	-0.798
Burundi	-1.657	0.109	-1.307
Cabo Verde	0.022	0.073	1.150
Cambodia	-1.075	1.115	-1.302
Cameroon	-0.872	-0.952	-0.447
Canada	1.365	-0.755	-0.006
Cayman Islands	1.286	0.130	-0.205
Central African Republic	-1.551	-0.565	-0.062
Chad	-1.900	-0.533	-1.189
Channel Islands	0.215	-0.075	-0.791
Chile	1.531	-0.953	0.670
China	0.735	-1.251	-0.306
Colombia	0.717	-1.221	0.763
Comoros	-1.110	-0.657	-0.703
Congo, Dem. Rep.	-1.446	0.393	-0.431
Congo, Rep.	-0.703	0.102	2.481
Costa Rica	0.790	-0.585	0.214
Cote d’Ivoire	-1.087	-0.604	-0.693
Croatia	0.525	0.075	-0.400
Cuba	0.114	0.607	-0.646
Curacao	0.099	1.201	0.075
Cyprus	0.861	1.576	-0.409
Czechia	0.634	0.556	-0.689
Denmark	1.437	0.461	-0.196
Djibouti	-1.020	2.905	3.637
Dominica	-0.432	-0.105	-0.048
Dominican Republic	0.341	-0.881	-0.051
Ecuador	0.449	-0.830	-0.511
Egypt, Arab Rep.	-0.433	-0.923	-0.112
El Salvador	-0.160	-0.133	-0.344
Equatorial Guinea	-0.768	-0.484	0.542
Eritrea	-1.146	-0.588	-0.339
Estonia	0.517	0.950	0.177
Eswatini	-1.318	0.556	4.605
Ethiopia	-1.261	-1.132	-0.920
Faroe Islands	0.352	0.323	-0.878
Fiji	-0.107	-0.297	-0.134
Finland	1.547	-0.503	0.257
France	1.247	-0.677	0.153
French Polynesia	0.108	-0.384	0.772
Gabon	-0.085	-0.078	2.808
Gambia, The	-0.717	-0.786	0.166
Georgia	0.316	0.058	0.900
Germany	1.275	-0.497	-0.534
Ghana	-0.787	-0.329	-0.572
Gibraltar	0.897	-0.388	0.195
Greece	2.030	-0.630	0.749
Greenland	0.448	-0.034	-0.076
Grenada	0.241	-0.351	-0.535
Guam	0.661	-0.372	0.061
Guatemala	-0.365	-0.763	-0.704
Guinea	-1.556	0.383	-0.377
Guinea-Bissau	-1.205	-0.754	-0.730
Guyana	-1.178	2.195	0.444
Haiti	-0.869	-1.105	1.473
Honduras	-0.484	0.082	-0.258
Hong Kong SAR, China	1.325	4.587	-0.446
Hungary	0.245	0.902	-0.322
Iceland	1.713	-0.375	-0.498
India	-0.600	-0.800	-0.675
Indonesia	-0.146	-0.967	-0.534
Iran, Islamic Rep.	0.603	-0.927	0.496
Iraq	-0.204	-0.378	1.684
Ireland	1.043	2.699	-0.926
Isle of Man	0.594	-0.131	-0.752
Israel	1.360	-0.925	-0.395
Italy	1.179	-0.781	-0.101
Jamaica	-0.462	-0.129	-0.579
Japan	1.370	-1.131	-0.483
Jordan	0.509	0.025	2.191
Kazakhstan	0.197	-0.755	-0.295
Kenya	-1.306	-0.727	-0.416
Kiribati	-1.173	0.668	-0.070
Korea, Dem. People’s Rep.	-0.269	-0.160	-0.470
Korea, Rep.	1.553	-0.530	-0.583
Kosovo	-0.286	0.486	-0.374
Kuwait	1.187	-0.207	-0.461
Kyrgyz Republic	-0.746	0.800	-0.793
Lao PDR	-1.043	-0.128	-1.126
Latvia	0.542	0.545	0.096
Lebanon	0.540	0.119	1.185
Lesotho	-1.765	1.307	1.709
Liberia	-1.183	-0.012	-0.555
Libya	0.281	0.487	2.598
Liechtenstein	1.113	0.055	-1.796
Lithuania	0.534	0.764	0.045
Luxembourg	0.757	4.843	-0.314
Macao SAR, China	2.358	0.302	-0.616
Madagascar	-1.433	-0.425	-0.832
Malawi	-1.489	-0.590	-0.698
Malaysia	0.084	0.763	-0.306
Maldives	-0.185	1.203	-0.716
Mali	-1.584	-0.505	-0.841
Malta	1.149	2.030	-0.415
Marshall Islands	-0.434	0.659	0.084
Mauritania	-0.819	0.185	0.736
Mauritius	-0.614	1.081	-0.451
Mexico	0.341	-0.449	-0.456
Micronesia, Fed. Sts.	-1.504	0.479	-0.615
Moldova	-0.252	-0.080	-1.054
Monaco	3.396	-0.309	-1.088
Mongolia	0.074	0.723	-0.078
Montenegro	0.269	0.275	1.202
Morocco	0.024	-0.053	0.501
Mozambique	-1.455	0.332	-0.152
Myanmar	-1.229	0.016	-0.919
Namibia	-0.786	0.489	2.179
Nauru	-0.469	1.452	0.371
Nepal	-0.329	-0.882	0.863
Netherlands	1.496	0.831	-0.387
New Caledonia	0.490	-0.994	0.765
New Zealand	1.441	-1.033	-0.261
Nicaragua	-0.334	0.175	-0.259
Niger	-1.773	-0.477	-1.470
Nigeria	-1.272	-1.047	-0.363
North Macedonia	0.012	0.775	1.027
Northern Mariana Islands	0.530	-0.002	0.086
Norway	1.832	-0.532	-0.576
Oman	0.493	0.265	-0.456
Pakistan	-0.964	-1.015	-0.353
Palau	-0.198	0.347	0.045
Panama	0.493	-0.290	0.318
Papua New Guinea	-1.802	1.037	-1.181
Paraguay	-0.012	-0.332	-0.010
Peru	0.889	-1.037	0.064
Philippines	-0.358	-0.475	-0.747
Poland	0.536	-0.022	-0.764
Portugal	0.841	-0.248	-0.147
Puerto Rico (US)	1.378	-0.278	0.074
Qatar	1.258	-0.059	-1.031
Romania	0.128	-0.323	-0.364
Russian Federation	0.452	-1.152	-0.540
Rwanda	-1.199	-0.171	0.550
Samoa	-1.227	0.045	-0.794
San Marino	0.788	4.421	-0.151
Sao Tome and Principe	-0.516	-0.111	0.615
Saudi Arabia	1.218	-0.998	-0.366
Senegal	-0.798	-0.293	-0.633
Serbia	0.332	0.220	0.119
Seychelles	-0.905	2.009	-0.484
Sierra Leone	-1.317	-0.725	-0.644
Singapore	1.393	3.938	-0.652
Sint Maarten (Dutch part)	0.446	-0.204	0.100
Slovak Republic	-0.056	1.452	-0.427
Slovenia	0.574	1.016	-0.742
Solomon Islands	-1.011	0.549	-1.193
Somalia, Fed. Rep.	-1.580	1.103	2.301
South Africa	-0.318	-0.529	4.647
South Sudan	-1.446	-0.316	0.713
Spain	1.543	-0.765	0.919
Sri Lanka	-0.698	-0.764	-0.914
St. Kitts and Nevis	0.038	-0.282	-0.626
St. Lucia	-0.870	0.016	0.237
St. Martin (French part)	0.828	-0.358	0.164
St. Vincent and the Grenadines	-0.501	-0.084	1.889
Sudan	-0.926	-1.515	-0.033
Suriname	-0.434	0.038	0.258
Sweden	1.512	-0.037	0.398
Switzerland	1.623	0.562	-0.587
Syrian Arab Republic	0.134	-1.101	1.424
Tajikistan	-0.847	-0.305	-0.268
Tanzania	-1.191	-0.719	-1.081
Thailand	-0.149	0.784	-1.010
Timor-Leste	-1.300	0.401	-1.153
Togo	-1.240	-0.360	-0.873
Tonga	-0.895	-0.002	-1.243
Trinidad and Tobago	-0.524	-0.057	-0.684
Tunisia	0.106	0.212	1.676
Turkiye	1.614	-1.190	0.737
Turkmenistan	-0.573	-1.098	-0.561
Turks and Caicos Islands	0.603	-0.263	0.034
Tuvalu	-0.650	-0.102	-0.103
Uganda	-1.207	-0.672	-0.942
Ukraine	0.486	-0.458	0.743
United Arab Emirates	0.871	1.786	-0.727
United Kingdom	1.380	-0.813	-0.338
United States	1.509	-1.461	-0.547
Uruguay	1.295	-1.079	0.653
Uzbekistan	-0.171	-0.606	-0.426
Vanuatu	-1.291	-0.174	-0.685
Venezuela, RB	0.835	-1.542	0.207
Viet Nam	-0.797	1.603	-1.144
Virgin Islands (U.S.)	0.540	2.056	1.073
West Bank and Gaza	-0.027	-0.214	3.613
Yemen, Rep.	-0.996	-0.397	1.634
Zambia	-1.086	-0.374	-0.204
Zimbabwe	-1.218	-0.719	0.383

Grand Conclusion

This study applied PCA and FA to 217 countries across 11 economic and social indicators, with preprocessing involving log transformation, KNN imputation, and robust scaling. Assumption tests confirmed suitability, with 75% of variable pairs correlated above 0.30, Bartlett significant at p approaching zero, and KMO MSA of 0.735 after removing two variables. Three components were retained explaining 74.7% of variance, and following Varimax rotation three factors emerged: F1 (Human Development), F2 (Trade), and F3 (Public Spending). Factor scores for all 217 countries enable meaningful cross-country comparisons across these latent dimensions.

Module 1 (PCA & FA)

FIO ULAA’ OCTRIYANTI (24031554030) & ALFIN JAYADI (24031554082)

2026-02-22