library("tidyverse")
library("FactoMineR")
library("ggthemes")
library("readxl")
library("ggthemes")
library("dplyr")
library("ggplot2")
library("formatR")

The Dataset

comp_index <- read_xlsx(path = "Competitiveness Index.xlsx")
colnames(comp_index) <- c("city_muncplty","actv_locEstab", "cap_healthSer", "cap_schoolSer","cap_genLocResource", "cost_living", "gen_employment","giv_bsPermit","grwth_locEconomy","size_locEconomy","peace_order")

The dataset contains data on 10 indicators from 94 cities/municipalities. The 10 indicators were already normalized such that a higher value means a better state (e.g., a higher score in cost of living means a better or cheaper cost of living), while the cities and municipalities included are LGUs in Metro Manila or capitals of a provinces. The overall competitiveness index is calculated by simply adding the normalized values of 40 indicators. In this activity, use Principal Component Analysis to identify possible indices of competitiveness based on the 10 selected indicators only.

head(comp_index)

Identifying Numbers of PC

Perform PCA and identify how many principal components should be kept if we wish to keep as much information as possible within fewer dimensions.

We first obtain the numerical values on our dataset which are the 10 indicators.

indicators <- comp_index %>%
    select(actv_locEstab, cap_healthSer, cap_schoolSer, cap_genLocResource,
        cost_living, gen_employment, giv_bsPermit, grwth_locEconomy,
        size_locEconomy, peace_order)

head(indicators)

Note that there is no need to standardized the values because the variables are already commensurable. The values are already normalized such that they are unitless or comparable-unit wise. As described the variables are already scaled from 0 to 2.5. Given this criteria, we use the original values of the dataset for PCA.

We performed PCA on the values and the results is shown below.

pca1 <- PCA(X=indicators, scale.unit=FALSE, ncp=10)

summary(pca1)
## 
## Call:
## PCA(X = indicators, scale.unit = FALSE, ncp = 10) 
## 
## 
## Eigenvalues
##                        Dim.1   Dim.2   Dim.3   Dim.4   Dim.5   Dim.6   Dim.7
## Variance               1.510   0.501   0.277   0.183   0.132   0.107   0.082
## % of var.             51.780  17.173   9.516   6.262   4.528   3.668   2.825
## Cumulative % of var.  51.780  68.953  78.468  84.731  89.259  92.926  95.752
##                        Dim.8   Dim.9  Dim.10
## Variance               0.067   0.033   0.024
## % of var.              2.290   1.128   0.830
## Cumulative % of var.  98.041  99.170 100.000
## 
## Individuals (the 10 first)
##                        Dist    Dim.1    ctr   cos2    Dim.2    ctr   cos2  
## 1                  |  1.749 | -1.640  1.895  0.879 |  0.493  0.516  0.079 |
## 2                  |  2.282 |  1.063  0.797  0.217 |  1.481  4.659  0.421 |
## 3                  |  1.654 | -1.296  1.184  0.615 | -0.946  1.901  0.327 |
## 4                  |  1.278 |  0.893  0.562  0.488 |  0.244  0.126  0.036 |
## 5                  |  1.564 | -1.387  1.355  0.786 | -0.586  0.729  0.140 |
## 6                  |  1.542 | -1.460  1.502  0.897 |  0.381  0.308  0.061 |
## 7                  |  1.533 | -1.475  1.533  0.926 | -0.390  0.324  0.065 |
## 8                  |  2.116 |  1.383  1.348  0.427 |  1.111  2.622  0.276 |
## 9                  |  1.413 | -1.345  1.275  0.907 | -0.101  0.022  0.005 |
## 10                 |  1.463 | -1.427  1.435  0.952 |  0.142  0.043  0.009 |
##                     Dim.3    ctr   cos2  
## 1                  -0.178  0.122  0.010 |
## 2                   1.088  4.537  0.227 |
## 3                   0.220  0.186  0.018 |
## 4                  -0.059  0.013  0.002 |
## 5                   0.193  0.143  0.015 |
## 6                   0.238  0.217  0.024 |
## 7                  -0.028  0.003  0.000 |
## 8                   0.706  1.911  0.111 |
## 9                   0.388  0.577  0.075 |
## 10                  0.229  0.201  0.025 |
## 
## Variables
##                       Dim.1    ctr   cos2    Dim.2    ctr   cos2    Dim.3
## actv_locEstab      |  0.269  4.789  0.204 |  0.357 25.417  0.360 |  0.367
## cap_healthSer      |  0.367  8.915  0.538 |  0.046  0.416  0.008 |  0.028
## cap_schoolSer      |  0.318  6.695  0.580 | -0.158  4.966  0.143 | -0.012
## cap_genLocResource |  0.374  9.281  0.586 |  0.080  1.270  0.027 | -0.042
## cost_living        | -0.318  6.687  0.214 |  0.553 61.058  0.649 | -0.215
## gen_employment     |  0.008  0.004  0.001 |  0.051  0.512  0.040 |  0.125
## giv_bsPermit       |  0.923 56.452  0.917 |  0.125  3.132  0.017 | -0.176
## grwth_locEconomy   |  0.199  2.635  0.338 | -0.122  2.955  0.126 | -0.006
## size_locEconomy    |  0.148  1.445  0.172 | -0.003  0.002  0.000 |  0.189
## peace_order        |  0.216  3.096  0.247 | -0.037  0.271  0.007 | -0.109
##                       ctr   cos2  
## actv_locEstab      48.420  0.380 |
## cap_healthSer       0.289  0.003 |
## cap_schoolSer       0.054  0.001 |
## cap_genLocResource  0.627  0.007 |
## cost_living        16.724  0.098 |
## gen_employment      5.611  0.244 |
## giv_bsPermit       11.105  0.033 |
## grwth_locEconomy    0.014  0.000 |
## size_locEconomy    12.892  0.281 |
## peace_order         4.265  0.063 |

Percentage of Variance Explained

We obtain proportion of total variance explained by each PC and their standard deviation as shown below. PC1, PC2, and PC3 has a standard deviation greater than 0.50.

In addition, the cumulative total variance if we keep PC1, PC2, and PC3 is at least 78%, which is close to 80%.

pca1comp <- prcomp(x = indicators, scale. = FALSE)
summary(pca1comp)
## Importance of components:
##                           PC1    PC2     PC3     PC4     PC5     PC6     PC7
## Standard deviation     1.2353 0.7114 0.52957 0.42961 0.36532 0.32878 0.28855
## Proportion of Variance 0.5178 0.1717 0.09516 0.06262 0.04528 0.03668 0.02825
## Cumulative Proportion  0.5178 0.6895 0.78468 0.84731 0.89259 0.92926 0.95752
##                           PC8     PC9   PC10
## Standard deviation     0.2598 0.18236 0.1564
## Proportion of Variance 0.0229 0.01128 0.0083
## Cumulative Proportion  0.9804 0.99170 1.0000
pca1$eig
##         eigenvalue percentage of variance cumulative percentage of variance
## comp 1  1.50984577             51.7798239                          51.77982
## comp 2  0.50074182             17.1728291                          68.95265
## comp 3  0.27746555              9.5156192                          78.46827
## comp 4  0.18260147              6.2622766                          84.73055
## comp 5  0.13203930              4.5282585                          89.25881
## comp 6  0.10694494              3.6676529                          92.92646
## comp 7  0.08237759              2.8251212                          95.75158
## comp 8  0.06676857              2.2898133                          98.04139
## comp 9  0.03289963              1.1282855                          99.16968
## comp 10 0.02421126              0.8303199                         100.00000

Scree Plot

Considering the scree plot, PC4 or PC5 seems to be the elbow point of the scree plot.

tibble(eigenvalues = pca1$eig[,1], PC = 1:10) %>%
  ggplot(aes(y = eigenvalues, x = PC)) +
  geom_point() +
  geom_line()

Since our goal is to reduce the dimension and retain as much information as possible, we choose to retain PC1, PC2, and PC3. Keeping the first three components, we retain a cumulative total variance of at least 78%.

PC Loadings

pca1$var$coord
##                           Dim.1        Dim.2        Dim.3        Dim.4
## actv_locEstab       0.268909916  0.356756841  0.366536422  0.016809918
## cap_healthSer       0.366878914  0.045638730  0.028308508  0.166634603
## cap_schoolSer       0.317932091 -0.157687641 -0.012211467  0.072180336
## cap_genLocResource  0.374345109  0.079758042 -0.041698864 -0.153358276
## cost_living        -0.317738520  0.552939565 -0.215414253  0.087566143
## gen_employment      0.007799476  0.050629321  0.124771483  0.004564541
## giv_bsPermit        0.923225509  0.125236705 -0.175538145 -0.110015000
## grwth_locEconomy    0.199459381 -0.121648615 -0.006196442  0.112932772
## size_locEconomy     0.147723678 -0.003381702  0.189132253  0.081012418
## peace_order         0.216212258 -0.036840189 -0.108779814  0.294472837
##                          Dim.5        Dim.6        Dim.7        Dim.8
## actv_locEstab      -0.01859674 -0.125408369  0.018589174 -0.050037148
## cap_healthSer      -0.22654491  0.154844066  0.083740200  0.043515974
## cap_schoolSer       0.01537232  0.009652385  0.084923987 -0.146914421
## cap_genLocResource  0.13744419  0.011685055  0.197672150  0.096163257
## cost_living         0.05666104  0.069688916  0.011566502 -0.043511910
## gen_employment      0.02248055  0.079410452 -0.091886265  0.127707139
## giv_bsPermit       -0.02199766 -0.009076530 -0.132537288 -0.015613477
## grwth_locEconomy    0.14998762  0.073484396 -0.015774288 -0.066034943
## size_locEconomy     0.17361723  0.146004337 -0.048364802  0.005372029
## peace_order         0.06636779 -0.170438011  0.001505739  0.093297073
##                           Dim.9       Dim.10
## actv_locEstab      -0.021369397  0.012955302
## cap_healthSer      -0.026982612 -0.008850503
## cap_schoolSer       0.110954172  0.040173218
## cap_genLocResource -0.004358313  0.006345021
## cost_living         0.017846516  0.009762189
## gen_employment      0.065505889  0.098242805
## giv_bsPermit        0.001615107 -0.011446831
## grwth_locEconomy   -0.115172092  0.066332055
## size_locEconomy     0.035522751 -0.088823006
## peace_order         0.015699962 -0.011979110

Getting PC Scores

head(pca1$ind$coord)
##        Dim.1      Dim.2       Dim.3        Dim.4        Dim.5       Dim.6
## 1 -1.6397600  0.4929375 -0.17809728  0.188694107  0.107071151  0.14806891
## 2  1.0633708  1.4808959  1.08777404 -0.577267999  0.275681855 -0.47168701
## 3 -1.2964072 -0.9458672  0.22036848 -0.162879585 -0.148480736 -0.11264790
## 4  0.8930121  0.2437175 -0.05873066  0.003879864 -0.819645886  0.26238598
## 5 -1.3867456 -0.5856241  0.19297986 -0.053030783 -0.008774525  0.11901744
## 6 -1.4601889  0.3807499  0.23774802  0.137141850  0.056965111 -0.04183857
##          Dim.7       Dim.8        Dim.9      Dim.10
## 1  0.036099147 -0.11986566  0.071009578  0.07456634
## 2  0.150400516  0.04218484 -0.046044735  0.20876298
## 3 -0.006235194  0.15489990 -0.095356787 -0.12612694
## 4 -0.077772498 -0.08968668 -0.046180825 -0.12613648
## 5 -0.138686895  0.28936053  0.114823760  0.10154968
## 6  0.034001715 -0.12351983  0.002233034  0.05607322

Loadings of the Selected PCs

Below are the PC loadings computed after applying PCA on the columns of the competitive index data set.

pc_loadings <- as.data.frame(pca1$var$coord)
select(pc_loadings, c("Dim.1", "Dim.2", "Dim.3"))

Interpretation

Providing an interpretation and description of the 1st and 2nd principal components.

Shown below are the loadings of PC1.

First, the largest coefficient is 0.92. Thus, we shall not highlight employment generation [gen_employment], with a coefficient of 0.007, in the interpretation of PC1 since it is less than 0.23 (quarter of the largest).

Second, only the coefficient of cost of living [cost_living] is negative, thus PC1 is inversely proportional to the cost of living in the city. Since the variable is normalized such that a higher value means a better state, the higher the value for cost of living, the cheaper/better the cost of living.

The other variables, has positive coefficient. Thus, PC1 is directly proportional to the other variables which are related to the accessibility of better services from the city and better quality of living; which are: active local establishments [actv_locEstab], capacity of health services [cap_healthSer], capacity of school services [cap_schoolSer], capacity of general local resource [cap_genLocResource], giving business permit [giv_bsPermit], local economy growth and size [grwth_locEconomy, size_locEconomy], and peace and order [peace_order].

From this, we can have the “top quality city factor” dimension. The first principal component retains 52% of variability in the data.

The score for the first principal component tends to be high when cost of living in the city is low (or expensive cost of living); in return, this also means that the people in the city has access on better services from the city and has better quality of living.

On the other hand, the score tends to be low when the cost of living in the city is high (or cheaper cost of living in the city); however services from the city and quality of living is poor.

select(pc_loadings, c("Dim.1"))

Shown below are the loadings of PC2.

First, the largest coefficient is 0.55. Thus we shall not highlight capacity of health services, capacity to generate local resource, employment generation, getting business permits, size and growth of local economy, and peace and order, with coefficients of 0.04, 0.08, 0.05, 0.12, 0.003, and 0.04 respectively, in the interpretation of PC2 since it is less than 0.138 (quarter of the largest).

Second, the coefficient of capacity of school services are negative, thus PC2 is inversely proportional to this variable. Since the variable is normalized such that a higher value means a better state, the better capacity of school services in the city, the lower the score of PC2. However, we have to take note that the coefficient for capacity of school services is close to 0.138, which has the value of 0.158.

The other variables, has positive coefficient. Thus, PC2 is directly proportional to the other variables which are related to characteristics of an modernized city when the value is high; which are: active local establishments and cost of living.

From this, we can have the “country-sde factor” dimension. The second principal component retains 17% of variability in the data.

The score for the second principal component tends to be high when the capacity of school services is low; in return, this also means that the city has a lot of local establishment and cheaper cost of living. These are the characteristics of rural areas.

On the other hand, the score tends to be low when the capacity of school services is high. Modernized area has better capacity for school services. Also, it tends to be low when the cost of living is expensive and there are minimal local establishment. Thus, we can say that PC2 tends to be low when the cost of living become more expensive, which are also characteristics of an modernized cities.

select(pc_loadings, c("Dim.2"))

Plotting the cities and municipalities with respect to the first two PCs

The following graph is used in answering question no.3 and 4.

ncr <-  c("Manila","Quezon (MM)","Caloocan","Las Pinas","Makati",
          "Malabon","Mandaluyong","Marikina","Muntinlupa","Navotas",
          "Paranaque","Pasay", "Pasig","San Juan (MM)","Taguig",
          "Valenzuela","Pateros")
comp_index$region <- ifelse(is.element(comp_index$city_muncplty,ncr),"NCR","Outside")
pcscores <- as.data.frame(pca1$ind$coord[,c(1,2)])
comp_index_scores <- bind_cols(comp_index, pcscores)
comp_index_scores %>%
  mutate(id = row_number()) %>%
  ggplot(aes(x = Dim.1, y = Dim.2,color = region, shape = region)) +
  geom_text(aes(label = id), col = "dark green") +
  geom_point()+
  xlab("PC 1 (51.78%)") + 
  ylab("PC 2 (17.17%)") 

Identifying the Performance of Cities in NCR

# Best performing cities in NCR
tail(comp_index_scores %>% select(c(city_muncplty,Dim.1,Dim.2, region))%>%
       filter(region == "NCR")%>%
       arrange(Dim.1,Dim.2))

Generally, cities in the National Capital Region (NCR) are performing comparably better than other LGUs. Notice that in the plot, most of the cities in the NCR are leaning towards the right side, which can be interpreted as the more that you are close on the upper right side the more your city is considered “top quality and mordernized” and has better quality of life. Also, high cost of living is found in modernized cities since more establishments are able to produce goods and services. We also note that NCR cities are commonly found on the lower right side of the plot, which is explained by the fact that these cities have low numbers of establishments that are able to produce goods and services. Connected to that, these cities are highly populated, which in turn results to having low school capacity such as Quezon City (73) and Manila City(53). One interesting peculiarity that we have is Pasay (66) , which has a high PC1 and low PC2. Unlike the two cities stated before, Pasay (66) has a low number of establishments that can produce goods and services, but have relatively high school capacity.

Identifying Poorly Performing LGUs

On the other side of the coin, Pateros(68), despite being part of the NCR, is one of the LGUs that has poor performance. Compared to other LGUs in NCR, there is a notable difference between them where Pateros (68) has lower PC1 grades. Among cities in the NCR region, Pateros is the only municipalities. It only has 68, 580 population. Considering our description of PC1, Pateros is the only place in NCR that is not considered “top quality”.

Shown below are the cities and municipalities that has low score in PC1. To round it up, Pili City (69), Alabel City (1), President Roxas (CZ) City (70), Mamburao City (51), Jordan City (34), and Kabugao City (35) can be thought of as the opposites of the Big 3 from NCR. These cities have low PC1 grades, meaning the cost of living is high but the services of the city and the quality of living is poor.

# Poor performing LGUs
lowpc1 <- comp_index_scores %>% select (c(city_muncplty,Dim.1))%>%
       arrange(desc(Dim.1))
tail(lowpc1)

Shown below are the cities and municipalities that has high score in PC2. These cities are considered contry-side cities because the expenses of living is cheaper.

highpc2 <- comp_index_scores %>% select (c(city_muncplty,Dim.2))%>%
       arrange(Dim.2)
tail(highpc2)

Shown below are the cities and municipalities that has low score in PC2. These cities are considered modernized cities because the expenses of living is expensive and it has a lot of local establishment while it has a poor capacity of education.

head(highpc2)

Base on our graph, our front runner, La Trinidad (38) dominates our PC2. The La Trinidad (38) City contains high amounts of establishments to produce goods and services, and also boasts cheap cost of living, but on the flipside, it has no school capacity. Trailing behind is Kabugao (35) with low amounts of active establishments to produce goods and services, and also having no schol capacity. The only good thing here is that it also has a cheap cost of living. Finally, we have Siquijor (81) that has low amounts of active establish emnts to produce goods and services, low school capacity, and has expensive cost of living.

Other cities and municipalities, aside from Pateros, are far from modernized region of the Philippines.They are mostly far-flung areas. Thus, there is a need to increase opportunities and accessibility of services in these areas to improve the quality of living and economic activity in the city.

Looking back again in the graph, there seems to be a clustering of poor quality cities against top quality cities. This is shown when we look on x-axis of the graph. It is important also to consider not just the countries that are in the extremes but those that are in the clusters of “poor quality cities”.