Dimension reduction process using MDS and PCA - example of Bitcoin market data





1. Introduction


Various analytical problems often include the necessity to handle robust datasets, that contain not only large amounts of observations, but potentially also immense numbers of variables that could be considered. However, high-dimensional data spaces can be difficult to navigate, especially during the clustering, visualization or predictive processes, therefore certain transformation solutions are required. The two most popular techniques to reduce the number of dimensions are MDS and PCA procedures, which use differing approaches, but their main goal remains the same - to transform the variable space into a lower dimensional one, while maintaining as much information present on the data as possible.


The aim of the analysis is to showcase how to effectively apply both MDS and PCA methods to perform the dimension reduction process using the example of Bitcoin market data. Cryptocurrencies exist in digital spaces and in effect, the corresponding data about its processes, transactions or market characteristics is very well documented. However, at the same time its digitization means that there are enormous amounts of information concerning various currencies, including Bitcoin, and in many differing analyses it is not possible to consider all of the available data. Moreover, it would be exceptionally challenging to manually distinguish which characteristics bring the most valuable information about the general processes in a market space. In result, dimension reduction might be imperative.




2. Dataset


The dimension reduction process will be showcased using various daily characteristics of the Bitcoin market data, using 1000 observations during the period from 2019-10-28 to 2022-07-23 that were derived from the dataset “https://www.kaggle.com/datasets/sushilkumarinfo/bitcoin-transactional-data/data”. The digital character of the Bitcoin market allowed for access to over 700 different daily variables, however for computational reasons only 106 characteristics were selected. The conducted process will aim to use fewer dimensions in order to characterize different days on the Bitcoin market, while still maintaining the most crucial information that each of the variables brings to the dependencies present.




3. Necessity for dimension reduction


Conducting an analysis using a very robust dataset can result in many probable difficulties. Various of the considered features might actually be redundant within the context of the entire dataset, not contribute to it in any significant way, and instead only add to the noise. Using high-dimensional data can hinder the visualization and analytical processes not only because of high noise, but also obstacles like difficulties in finding any significant relationships or computational problems. Reduction methods can help distinguish the variables that are the most informative when it comes to the meaningful patterns that are present in the data. The considered dataset contains 106 variables, which which amounts to a very large number of dimensions - such data will be difficult to hard to use for most analytical purposes. In order to convey just how little meaningful information can be derived from the Bitcoin transactional characteristics in their current state the correlation plots for all of the variables in both nominal and scaled forms have been visualized below.


library(corrplot)
library(clusterSim)
library(qgraph)
#nominal values
data.cor<-cor(data, method="pearson") 
corrplot(data.cor, order ="alphabet", tl.cex=0.6)

#scaled values
data.s<-scale(data)
data.s.cor<-cor(data.s, method="pearson") 
corrplot(data.s.cor, order ="alphabet", tl.cex=0.6)

The fact that some variables are well correlated can be notices, however the amount of dimensions does not allow for acquiring more information than that. Similarly, various relationships present in the data can be visualized by the function visible below, however again nothing significant in terms of valuable facts can be obtained in the current state of this data. In effect, dimension reduction processes are indeed necessary to implement and in the following parts of the analysis 2 of them will be showcased.


qgraph(cor(data), shape="rectangle", posCol="darkgreen", negCol="darkmagenta")




4. Multidimensional Scaling, MDS


MDS (Mutlidimensional Scaling) tries to achieve the goal of reducing dimensions by projecting the data into a two-dimensional plane based on the similarities or dissimilarieties of the inputs. The desired result is to maintain the pairwise distances between data points, in a way that the newly defined euclidian distances between objects resemble the higher-dimensional original ones as closely as possible. Subsequently, the stress function, which measures the dissimilarity of the original distances and the reduced-dimensional distances, is minimized and an optimal dimension reduction in the dataset can be achieved.


library(smacof)
library(labdsv)
library(vegan)
library(MASS)
library(ape)
library(ggfortify)
library(pls)
library(ClusterR)




4.1 Applying MDS solution


The MDS method can be applied to selected variables through the simple function “mds” that traditionally relies on a distance matrix, which in this case has been calculated using scaled data. The transformed two-dimensional space can be firstly visualized in order to quickly inspect the results and potentially spot any visible outliers.


data.s<-scale(data)
dist.var<-dist(t(data.s))
fit.data<-mds(dist.var, ndim=2,  type="ratio")
plot(fit.data, pch=21, cex=as.numeric(fit.data$spp), bg="blue", main="MDS for all variables")

The visual representation of the created 2-dimensional space shows, that all of the considered characteristics are grouped similarly and no evident outliers can be noticed. Additionally, the same function could also be applied to the observations, which in this case are different days during the analyzed period, which could show potential diverging points


dist.obs<-dist(data.s)
fit.data<-mds(dist.obs, ndim=2,  type="ratio")
plot(fit.data, pch=21, cex=as.numeric(fit.data$spp), bg="yellow", main="MDS for observations")

Even though some variables and observations stray away from the main groups, the dataset does not seem to have any very evident outliers, therefore it is possible to move to the next step and focus of the quality of fit for the MDS solution for the considered variables. Apart from the basic statistics of the results, additionally a stress decomposition chart can be plotted for the variables.


fit.data<-mds(dist.var, ndim=2,  type="ratio")
fit.data            # see quality of fit
## 
## Call:
## mds(delta = dist.var, ndim = 2, type = "ratio")
## 
## Model: Symmetric SMACOF 
## Number of objects: 106 
## Stress-1 value: 0.277 
## Number of iterations: 92
plot(fit.data, plot.type = "stressplot") 

The first thing that stands out in the stress decomposition chart is that all of the variables used in the MDS can be characterized by individual stress per point scores that are lower than 2%. However, the most important information from those results to take into account is the stress value calculated for the process, which in this case has been settled at a relatively high value of 0.277. Stress levels above 0.2 usually imply a rather poor quality of fit, however the used dataset does have an immense amount of variables, and the MDS solution can sill be statistically significant and helpful in reducing the high-dimensionality.




4.2 Statistical significance


The stress value calculated for the MDS process indicates that the quality of fit might be rather poor, however that might be unavoidable with that many various variables. Therefore, in the next step of the analysis the actual statistical significance of the solution needs to be tested. For that the “permtest” will be applied in order to check if the tested configuration resembles one obtained from a random permutation of dissimilarities.


fit.data<-mds(dist.var, ndim=2,  type="ratio")
set.seed(123) 
permFit<-permtest(fit.data, nrep = 100, verbose=FALSE) 
permFit
## 
## Call: permtest.smacof(object = fit.data, nrep = 100, verbose = FALSE)
## 
## SMACOF Permutation Test
## Number of objects: 106 
## Number of replications (permutations): 100 
## 
## Observed stress value: 0.277 
## p-value: <0.001

The attained results show that the calculated p-value is smaller than 0.001, which means that there are grounds to reject the null hypothesis and in result, the tested model is not a random fit. Therefore, it can be concluded that even though stress value is low, the MDS solution is still statistically significant. Additionally, the analysis could include a second method of testing this fact by finding the statistical significance threshold for a model with the same amount of variables and then comparing it to the actual obtained stress value.


# upper significance threshold
stress.vec<-randomstress(n=106, ndim=2, nrep = 100)
upper<-mean(stress.vec) - 2 * sd(stress.vec)
upper
## [1] 0.5306757
# empirical stress function (Kruskal’s stress)
fit.data$stress
## [1] 0.2770874

The upper significance threshold was calculated to be 0.5306757, while the actual stress value amounted to 0.2770874. In effect, the empirical value was definitely below the calculated significance threshold, which statistically means that the MDS model is indeed significant.




4.3 Similarity vs dissimilarity measures in MDS


All of the analysis so far has been conducted with the help of a distance matrix that is based on the dissimilarities between variables, however the MDS process can also be implemented using a similarity matrix, which is in fact a bit of a different measure. As a part of examining the results of the MDS solution, it might also be beneficiary to check if basing the process on similarity and dissimilarity matrices will give the same or different results. For that goal, the Mantel test can be implemented in order to check if the 2 mentioned matrices can be classified as random, which would imply that they are statistically different from each other.


# similarity matrix
sim<-cor((data.s))
# dissimilarity matrix
dis<-dist(data.s)
dis.t<-dist(t(data.s))
sim[1:5, 1:5]
as.matrix(dis)[1:5, 1:5] 
library(smacof)
dis2<-sim2diss(sim, method=1, to.dist = TRUE)
as.matrix(dis2)[1:5, 1:5]
library(ape)
mantel.test(as.matrix(sim), as.matrix(dis2)) 
## $z.stat
## [1] -124.8857
## 
## $p
## [1] 0.001
## 
## $alternative
## [1] "two.sided"

The calculated p-value equaled to 0.001, which means that it is necessary to reject the null hypothesis, and in effect, the matrices are indeed statistically similar to each other. Therefore, the MDS process should produce similar results while using both the similarity and the dissimilarity measure, which can indeed be noticed in the visualizations below.


# MDS based on similarity (correlation matrix)
sim<-cor(data.s)
dis2<-sim2diss(sim, method=1, to.dist=TRUE)
fit.data<-mds(dis2, ndim=2,  type="ratio") # from smacof::
plot(fit.data, pch=21, cex=as.numeric(fit.data$spp), bg="blue", main="MDS based on similarity")

# MDS based on dissimilarity (distance matrix)
dist<-dist(t(data.s))
fit.data<-mds(dist, ndim=2,  type="ratio") # from smacof::
plot(fit.data, pch=21, cex=as.numeric(fit.data$spp), bg="purple", main="MDS based on dissimilarity")




4.4 MDS process results


After exploring the results for both correlation and distance solutions, as well as ensuring the statistical significance of the process, in the final step the summery of the MDS solution can be explored in more detail.


fit.data
## 
## Call:
## mds(delta = dist, ndim = 2, type = "ratio")
## 
## Model: Symmetric SMACOF 
## Number of objects: 106 
## Stress-1 value: 0.277 
## Number of iterations: 92
summary(fit.data)
## 
## Configurations:
##                                D1      D2
## priceUSD                   0.4342  0.7545
## transactions              -0.2052 -0.5048
## size                      -0.4071  0.0630
## sentbyaddress              0.0034 -0.3459
## difficulty                 0.5529  0.6566
## hashrate                   0.6043  0.6065
## mining_profitability       0.3608  0.7715
## sentinusdUSD               0.6355  0.4863
## transactionfeesUSD        -0.5095  0.5423
## median_transaction_feeUSD -0.5240  0.4974
## confirmationtime          -0.0735  0.8609
## transactionvalueUSD        0.7074  0.5436
## mediantransactionvalueUSD -0.1114  0.7568
## activeaddresses           -0.1241  0.2655
## top100cap                  0.9137  0.3863
## fee_to_rewardUSD          -0.5479  0.3495
## transactions3sma          -0.3694 -0.5240
## transactions7sma          -0.4756 -0.4792
## transactions14sma         -0.5124 -0.4310
## transactions30sma         -0.5510 -0.3639
## transactions90sma         -0.6424 -0.2487
## transactions3ema          -0.3379 -0.4988
## transactions7ema          -0.4290 -0.4723
## transactions14ema         -0.4838 -0.4350
## transactions30ema         -0.5362 -0.3837
## transactions90ema         -0.6148 -0.2889
## transactions3wma          -0.3210 -0.5108
## transactions7wma          -0.4309 -0.4897
## transactions14wma         -0.4745 -0.4546
## transactions30wma         -0.5227 -0.4012
## transactions90wma         -0.5969 -0.3090
## transactions3trx           0.2695 -0.5476
## transactions7trx           0.2305 -0.7969
## transactions14trx          0.1022 -0.8183
## transactions30trx         -0.0691 -0.7620
## transactions90trx         -0.2609 -0.3242
## transactions3mom           0.4100 -0.3221
## transactions7mom           0.5054 -0.6264
## transactions14mom          0.3643 -0.7225
## transactions30mom          0.4658 -0.4205
## transactions90mom          0.0752 -0.6593
## transactions3std           0.1494  0.7917
## transactions7std           0.0647  0.6768
## transactions14std         -0.1434  0.6475
## transactions30std         -0.3475  0.6522
## transactions90std         -0.3378  0.7557
## transactions3var           0.1708  0.7627
## transactions7var           0.0276  0.6753
## transactions14var         -0.1654  0.6458
## transactions30var         -0.3631  0.6526
## transactions90var         -0.3305  0.7708
## transactions3rsi           0.3432 -0.3839
## transactions7rsi           0.3363 -0.4386
## transactions14rsi          0.3294 -0.4742
## transactions30rsi          0.2886 -0.4958
## transactions90rsi          0.1691 -0.5118
## transactions3roc           0.4476 -0.2993
## transactions7roc           0.5088 -0.6148
## transactions14roc          0.3712 -0.7205
## transactions30roc          0.5069 -0.3920
## transactions90roc          0.0897 -0.6648
## size3sma                  -0.5550  0.0626
## size7sma                  -0.6584  0.0541
## size14sma                 -0.6990  0.0684
## size30sma                 -0.7188  0.1228
## size90sma                 -0.7122  0.2516
## size3ema                  -0.5275  0.0649
## size7ema                  -0.6152  0.0643
## size14ema                 -0.6645  0.0788
## size30ema                 -0.6967  0.1164
## size90ema                 -0.7085  0.2119
## size3wma                  -0.5113  0.0616
## size7wma                  -0.6096  0.0697
## size14wma                 -0.6595  0.0679
## size30wma                 -0.7020  0.0910
## size90wma                 -0.7120  0.1920
## size3trx                   0.3360  0.0606
## size7trx                   0.0701  0.3708
## size14trx                 -0.2044  0.3548
## size30trx                 -0.7440 -0.3054
## size90trx                 -0.6774  0.3053
## size3mom                   0.4213 -0.0352
## size7mom                   0.3671  0.3161
## size14mom                  0.2235  0.3737
## size30mom                  0.1591  0.0936
## size90mom                 -0.1175  0.0090
## size3std                   0.9019  0.2216
## size7std                   0.9229  0.0055
## size14std                  0.9504 -0.0272
## size30std                  0.9467 -0.1229
## size90std                  0.8972 -0.2785
## size3var                   0.8492  0.2865
## size7var                   0.8994  0.0890
## size14var                  0.9456  0.0402
## size30var                  0.9547 -0.0894
## size90var                  0.9149 -0.2511
## size3rsi                   0.3460 -0.0632
## size7rsi                   0.2693 -0.0354
## size14rsi                  0.1943 -0.0363
## size30rsi                  0.0959 -0.0652
## size90rsi                 -0.1209 -0.1048
## size3roc                   0.4743 -0.0303
## size7roc                   0.4076  0.3243
## size14roc                  0.2659  0.4017
## size30roc                  0.2448  0.1252
## size90roc                 -0.0625  0.0542
## 
## 
## Stress per point (in %):
##                  priceUSD              transactions                      size 
##                      1.23                      0.44                      0.48 
##             sentbyaddress                difficulty                  hashrate 
##                      1.03                      1.27                      1.42 
##      mining_profitability              sentinusdUSD        transactionfeesUSD 
##                      1.47                      1.22                      0.94 
## median_transaction_feeUSD          confirmationtime       transactionvalueUSD 
##                      0.87                      1.38                      1.06 
## mediantransactionvalueUSD           activeaddresses                 top100cap 
##                      1.22                      1.31                      1.04 
##          fee_to_rewardUSD          transactions3sma          transactions7sma 
##                      0.62                      0.44                      0.49 
##         transactions14sma         transactions30sma         transactions90sma 
##                      0.48                      0.44                      0.50 
##          transactions3ema          transactions7ema         transactions14ema 
##                      0.38                      0.40                      0.42 
##         transactions30ema         transactions90ema          transactions3wma 
##                      0.42                      0.46                      0.39 
##          transactions7wma         transactions14wma         transactions30wma 
##                      0.43                      0.44                      0.44 
##         transactions90wma          transactions3trx          transactions7trx 
##                      0.45                      1.05                      1.32 
##         transactions14trx         transactions30trx         transactions90trx 
##                      1.44                      1.44                      1.58 
##          transactions3mom          transactions7mom         transactions14mom 
##                      1.12                      1.45                      1.29 
##         transactions30mom         transactions90mom          transactions3std 
##                      0.99                      0.99                      1.59 
##          transactions7std         transactions14std         transactions30std 
##                      1.40                      1.20                      1.14 
##         transactions90std          transactions3var          transactions7var 
##                      1.24                      1.53                      1.37 
##         transactions14var         transactions30var         transactions90var 
##                      1.18                      1.16                      1.26 
##          transactions3rsi          transactions7rsi         transactions14rsi 
##                      1.03                      0.89                      0.79 
##         transactions30rsi         transactions90rsi          transactions3roc 
##                      0.71                      0.65                      1.17 
##          transactions7roc         transactions14roc         transactions30roc 
##                      1.49                      1.33                      1.04 
##         transactions90roc                  size3sma                  size7sma 
##                      1.06                      0.40                      0.36 
##                 size14sma                 size30sma                 size90sma 
##                      0.37                      0.45                      0.64 
##                  size3ema                  size7ema                 size14ema 
##                      0.35                      0.32                      0.33 
##                 size30ema                 size90ema                  size3wma 
##                      0.39                      0.55                      0.38 
##                  size7wma                 size14wma                 size30wma 
##                      0.34                      0.33                      0.38 
##                 size90wma                  size3trx                  size7trx 
##                      0.52                      1.12                      1.49 
##                 size14trx                 size30trx                 size90trx 
##                      1.67                      1.50                      1.00 
##                  size3mom                  size7mom                 size14mom 
##                      1.14                      1.35                      1.31 
##                 size30mom                 size90mom                  size3std 
##                      1.16                      1.18                      1.29 
##                  size7std                 size14std                 size30std 
##                      0.95                      0.90                      0.91 
##                 size90std                  size3var                  size7var 
##                      0.93                      1.44                      1.06 
##                 size14var                 size30var                 size90var 
##                      0.95                      0.93                      0.94 
##                  size3rsi                  size7rsi                 size14rsi 
##                      1.08                      1.00                      0.97 
##                 size30rsi                 size90rsi                  size3roc 
##                      0.94                      0.80                      1.13 
##                  size7roc                 size14roc                 size30roc 
##                      1.35                      1.33                      1.23 
##                 size90roc 
##                      1.31

The summary allows to inspect the exact configurations of the created 2 dimensions for each of the Bitcoin characteristics. Moreover, it is also possible to explore all of the stress per point values that can show, which of the variables drive the MDS process in the strongest way, and discern that within the analyzed dataset it was a variable “transactions30std”. In conclusion, through the use of the MDS method it was possible to successfully reflect 106 different characteristics of the Bitcoin market features in a 2 dimensional space, as well as obtain relevant information as to which of them affect that process in the strongest way.




5. Principal Component Analysis, PCA


PCA (Principal Component Analysis) is another method of reducing the dimensions of a dataset, that this time approaches the issue by standardizing the data and attempting to transform it into newly created linear combinations of the original variables in order to create uncorrelated principal components (eigenvectors), which are meant to be orthogonal to each other. The goal of the transformation is to find the directions of maximum variability in the dataset. Therefore, the first principal component is set to encapsulate the maximum portion of variance present in the data. Subsequently, each of the remaining principal components tries to capture the maximum amount of remaining variance, while maintaining the assumption that it remains orthogonal to the previous one.




5.1 General application and statistics


The PCA method can be applied to the scaled (standardized) data using the function “princomp”, and subsequently the basic statistics of the result can be viewed as shown below. Within the PCA process the loadings can be understood as weights that link the original variables to the created composite variables (factors). They express the importance of each of the variables in a factor, as well as the correlation between the two, which is why the loadings values can be negative.


data.pca<-princomp(data.s) 
head(data.pca$loadings)
##                    Comp.1       Comp.2      Comp.3      Comp.4      Comp.5
## priceUSD       0.05111680  0.026097999  0.17636776  0.07872926  0.20613726
## transactions  -0.11709504 -0.158752933 -0.06179520  0.09036462 -0.01924573
## size          -0.15137660 -0.034088993  0.10045304 -0.08262183 -0.07067707
## sentbyaddress -0.07457389 -0.130687510  0.11075640  0.11953986  0.11534905
## difficulty     0.05286409  0.002263429  0.11604920 -0.05815843  0.16545500
## hashrate       0.05832434 -0.019629111  0.06982211  0.01492703  0.19371231
##                     Comp.6      Comp.7       Comp.8       Comp.9      Comp.10
## priceUSD       0.134015056  0.08599056  0.080877870  0.015154138  0.047233839
## transactions  -0.001135062 -0.05034896  0.032390049  0.008083582 -0.010453221
## size          -0.034094070  0.01151702 -0.006469017  0.018167575  0.002462194
## sentbyaddress  0.096317638 -0.07308126 -0.055335150 -0.008109474  0.056978644
## difficulty     0.265373931 -0.10562832 -0.164091736  0.111943108  0.078509884
## hashrate       0.273426020 -0.08770796 -0.158466614  0.129434649  0.088794137
##                     Comp.11     Comp.12      Comp.13     Comp.14     Comp.15
## priceUSD       0.1962473145  0.10578141  0.022172202  0.05618963  0.08506191
## transactions   0.0004119554  0.04130323  0.025297947 -0.02858266 -0.01588057
## size           0.0562486204 -0.04388101 -0.005564006  0.01188730 -0.01331406
## sentbyaddress  0.0151574791 -0.15662040 -0.022807595 -0.10655223 -0.09299314
## difficulty    -0.0959263733 -0.15455030  0.034505218 -0.09593637 -0.01480545
## hashrate      -0.0556242440 -0.16430743 -0.007795980 -0.14834640  0.01837544
##                    Comp.16      Comp.17      Comp.18      Comp.19     Comp.20
## priceUSD       0.033559455  0.006640207  0.001771252  0.137388668  0.08192377
## transactions   0.056236296  0.018429616 -0.025179774  0.020334148 -0.03718080
## size          -0.008603603 -0.021587059  0.040449506 -0.029116626  0.02440028
## sentbyaddress -0.015596631  0.047973972  0.042782586  0.063948447  0.03179815
## difficulty    -0.015333871  0.011366831  0.019783403  0.007938643  0.04141178
## hashrate      -0.052845753 -0.038589977 -0.009491789  0.029825200  0.12080006
##                   Comp.21      Comp.22       Comp.23     Comp.24     Comp.25
## priceUSD       0.10934552  0.014789862  5.533314e-05  0.06339260  0.00840398
## transactions  -0.02948391  0.004689639  1.162335e-02  0.03899402  0.02658182
## size          -0.04869833  0.045200203 -2.945314e-02  0.01181154  0.03390168
## sentbyaddress  0.06793023  0.065561306  4.940057e-02  0.03861093 -0.00299140
## difficulty     0.05726463 -0.016520407 -1.180958e-02 -0.06400761 -0.12674196
## hashrate       0.01426733  0.019984417  5.002637e-02 -0.08072282 -0.03255380
##                    Comp.26     Comp.27     Comp.28     Comp.29      Comp.30
## priceUSD       0.073026571  0.01028046  0.06307572  0.01439356  0.008086406
## transactions  -0.002547555 -0.03694008  0.01772148 -0.04741350 -0.031023421
## size           0.054909254 -0.02236732  0.02048729 -0.05260563 -0.008026352
## sentbyaddress -0.049963090 -0.07870684  0.09755140  0.05545494 -0.092409716
## difficulty     0.025351201  0.09036203 -0.02399749 -0.01623073 -0.069129331
## hashrate       0.002983003  0.11640819 -0.02811749 -0.01872212  0.063788266
##                    Comp.31     Comp.32      Comp.33      Comp.34      Comp.35
## priceUSD       0.045342260  0.04182771  0.133175649  0.061003025  0.174290972
## transactions   0.055855089  0.04478540 -0.031400459  0.012508195  0.001442123
## size           0.054227811 -0.02136636 -0.006420975  0.002746118  0.056191730
## sentbyaddress  0.129829647 -0.08010907  0.072163843  0.076106363  0.081867965
## difficulty     0.064164952  0.15000210  0.018982043 -0.043608750 -0.082787327
## hashrate      -0.002952678  0.13288725 -0.012094168  0.003514193 -0.021934017
##                    Comp.36     Comp.37      Comp.38      Comp.39     Comp.40
## priceUSD       0.007646896  0.03346079  0.125358730  0.008232662  0.16417381
## transactions  -0.015254608 -0.02697608 -0.032458370 -0.015509273 -0.05180346
## size           0.025682691  0.04893039 -0.001273718 -0.020204384  0.02346076
## sentbyaddress -0.099803016 -0.23044364 -0.005011301 -0.294251858 -0.19688455
## difficulty     0.010046461 -0.04865949  0.095387287  0.142938457  0.07739211
## hashrate      -0.016842236 -0.12541033  0.174531361  0.154363327  0.05587428
##                   Comp.41     Comp.42      Comp.43     Comp.44     Comp.45
## priceUSD       0.07675725  0.27412781  0.006730634  0.02539000  0.09397635
## transactions  -0.03383250  0.04713079 -0.061174732  0.05929670 -0.05299128
## size          -0.08345323 -0.09076253 -0.034504463  0.10129646  0.06389249
## sentbyaddress -0.31746705  0.01620469 -0.413875633 -0.28131571 -0.19479198
## difficulty     0.03996527  0.03289331  0.045399246  0.12019176 -0.02173706
## hashrate       0.14131226 -0.18906385  0.189687835  0.04020788 -0.17160913
##                   Comp.46      Comp.47     Comp.48     Comp.49     Comp.50
## priceUSD       0.04612887  0.004850835  0.17327258  0.17849847  0.09524084
## transactions  -0.05206526 -0.005870482 -0.19990603  0.13247282  0.01346469
## size           0.05948502  0.101463928 -0.01570080  0.31634795 -0.11487712
## sentbyaddress  0.33109707 -0.146819776  0.03551422 -0.07872893 -0.01058765
## difficulty    -0.08158621 -0.168304505 -0.18294824  0.07228358 -0.10357034
## hashrate       0.10695154 -0.176831442  0.16946951  0.34020899  0.09120167
##                    Comp.51     Comp.52     Comp.53     Comp.54     Comp.55
## priceUSD       0.006949002  0.18653738  0.13688666  0.12650392  0.04611830
## transactions  -0.102336097 -0.09603269  0.07036774  0.10016298 -0.01667652
## size          -0.163575184  0.05707289  0.24674082 -0.54286818  0.12099012
## sentbyaddress  0.062223525 -0.15187023  0.03241078 -0.04107910  0.02743741
## difficulty    -0.018050019 -0.37488259 -0.38299218 -0.05065773 -0.14313806
## hashrate       0.014950976  0.06308067  0.31973000  0.04381560 -0.01407097
##                   Comp.56      Comp.57     Comp.58     Comp.59      Comp.60
## priceUSD       0.06947233  0.066252124  0.01584674  0.07941346  0.121058994
## transactions   0.09307189 -0.014809696  0.01091873 -0.05135883  0.066201475
## size          -0.13847587  0.076905031 -0.11026553 -0.19889625 -0.009037145
## sentbyaddress  0.06311640  0.008591438  0.09178966 -0.03515779  0.017866764
## difficulty    -0.27561164 -0.065563940 -0.09112468 -0.12623877  0.240499794
## hashrate       0.19120138  0.019609755  0.14440639  0.05639713 -0.163762539
##                   Comp.61       Comp.62     Comp.63     Comp.64     Comp.65
## priceUSD       0.04823538  0.0908001882  0.06272563  0.22702078  0.10647241
## transactions  -0.04342646 -0.0918548832 -0.08556036  0.02546046  0.32295911
## size          -0.01698407 -0.0518201229  0.16099998 -0.06587435 -0.12846823
## sentbyaddress  0.01707151  0.0307163708  0.00358533  0.01542193 -0.07977589
## difficulty     0.05876263 -0.1213070749  0.01207312  0.07242276 -0.02840847
## hashrate      -0.06187226  0.0003229005 -0.02590655 -0.16435984 -0.02936401
##                   Comp.66     Comp.67     Comp.68     Comp.69      Comp.70
## priceUSD       0.28289664  0.11457920  0.04492184  0.27701510  0.216317210
## transactions   0.04146886 -0.09227915 -0.23912251  0.22454065  0.056725856
## size           0.05634155  0.11134696 -0.13316975  0.07841827  0.012361741
## sentbyaddress  0.03008876  0.03523165  0.03787399  0.01503181  0.030816554
## difficulty     0.06806736  0.01041814  0.04106257  0.13310304 -0.007406141
## hashrate      -0.15451486 -0.05407727 -0.04741709 -0.11856581 -0.072176462
##                   Comp.71     Comp.72     Comp.73     Comp.74       Comp.75
## priceUSD       0.19853947  0.03427775  0.14558956  0.10688243  0.0264718415
## transactions  -0.04689586  0.08180985  0.07822094 -0.13452243  0.0003952265
## size           0.02072237 -0.02022414  0.18752789 -0.24110468 -0.1332785354
## sentbyaddress  0.01568997 -0.01341173 -0.01642704  0.01922028  0.0068663902
## difficulty     0.08079410 -0.06123598  0.07555449 -0.01445676  0.0047536363
## hashrate      -0.09705242 -0.02969681 -0.09867863  0.01963325  0.0266351899
##                    Comp.76     Comp.77      Comp.78      Comp.79     Comp.80
## priceUSD       0.015010436  0.04684014  0.054358994  0.046190799  0.13243531
## transactions   0.127518233 -0.24586627  0.263812546 -0.296081331 -0.24561648
## size          -0.005214799  0.08546870 -0.132264548 -0.008779473 -0.03765026
## sentbyaddress -0.008086521  0.03474203 -0.008044177  0.029181707  0.01777672
## difficulty     0.034408985  0.11458024  0.016751426 -0.003515343  0.07283478
## hashrate      -0.002691720 -0.07767162  0.007001689 -0.030624093 -0.03582800
##                    Comp.81      Comp.82       Comp.83      Comp.84     Comp.85
## priceUSD       0.033129917  0.017130831  0.0211251176  0.032475718 0.029848336
## transactions  -0.348218403 -0.001916504 -0.0442611712 -0.079949610 0.015828159
## size          -0.040351025 -0.018709370  0.0029555417  0.062199704 0.041440397
## sentbyaddress  0.012394810 -0.004423517 -0.0002851159  0.006078708 0.006252547
## difficulty     0.009263981  0.018432983  0.0389894441  0.025941956 0.001176009
## hashrate      -0.030182290  0.027923097 -0.0152538625 -0.020954564 0.002093543
##                    Comp.86      Comp.87      Comp.88      Comp.89      Comp.90
## priceUSD       0.021399764  0.016137599  0.030872798  0.024573644  0.004017256
## transactions   0.062852317 -0.153628649 -0.126283637 -0.077916290 -0.027885402
## size          -0.115666204  0.110305884 -0.052326807  0.036851359 -0.036316705
## sentbyaddress -0.015834155 -0.007106276 -0.001792852  0.002334451 -0.007518546
## difficulty     0.014669761  0.029620864 -0.004030727  0.003458794  0.030264052
## hashrate      -0.004980217 -0.011612117 -0.001337955 -0.004982411 -0.001145273
##                     Comp.91      Comp.92      Comp.93      Comp.94      Comp.95
## priceUSD       0.0072441499  0.001963360  0.033263393  0.019194856  0.032970022
## transactions   0.0280543936  0.023248950  0.067080980  0.079201321  0.047674991
## size          -0.0065484394 -0.005574613 -0.004083877 -0.002406366 -0.005026523
## sentbyaddress -0.0077169609 -0.003557902 -0.005430556 -0.004854498 -0.006539158
## difficulty     0.0006156113  0.001195779  0.006449102  0.017768025 -0.015520780
## hashrate      -0.0065703277 -0.008745925 -0.017043714  0.003277960 -0.008478999
##                     Comp.96      Comp.97       Comp.98      Comp.99
## priceUSD       0.0011161395  0.008057712  0.0057509624  0.001264983
## transactions  -0.0139583955  0.043407558  0.0687568321 -0.008322029
## size           0.0037286538 -0.007975685  0.0020609561 -0.015558737
## sentbyaddress  0.0004307203 -0.003023487 -0.0001682611 -0.000413360
## difficulty    -0.0184464924 -0.012884747  0.0073539525  0.003258814
## hashrate      -0.0032627219 -0.005563868 -0.0027467221  0.001889612
##                    Comp.100      Comp.101      Comp.102      Comp.103
## priceUSD       0.0058984326  5.486655e-04  0.0033262416  4.044426e-05
## transactions  -0.0149766744 -3.309526e-02  0.0442165645  8.503120e-02
## size           0.0029951357 -3.605871e-02 -0.0241691015  7.237447e-02
## sentbyaddress  0.0006325967  8.850871e-05  0.0011164614  7.414042e-05
## difficulty    -0.0028442534 -1.887503e-03 -0.0003803873 -7.506978e-04
## hashrate      -0.0047862276  4.769622e-04 -0.0020148114  4.629469e-04
##                    Comp.104      Comp.105      Comp.106
## priceUSD       1.199717e-03  0.0001149336  7.383428e-05
## transactions  -7.095461e-02  0.0179135075  5.519364e-02
## size           6.763177e-02  0.0588684698 -4.886255e-02
## sentbyaddress  2.899010e-04  0.0001672690  1.613470e-04
## difficulty     3.366370e-06  0.0002915409 -1.138659e-04
## hashrate      -8.528443e-04 -0.0001975415  1.514862e-04

After applying the PCA method, it might be beneficial to visualize how much of the overall variance was explained in different created principal components. Additionally it is also possible to see how different variables were grouped together to create the eigenvector, although in this case the relationships between various characteristics is diffficult to see due to their amount.


library(factoextra)
plot(data.pca)

fviz_pca_var(data.pca, col.var="purple")

As show in the column chart above, the first principal component explains a little more than 30% of variance present in the dataset. Apart from a visualization, in order to better understand the characteristics of the principal components, it is also possible to display more details about all the created eigenvectors.


eig.val<-get_eigenvalue(data.pca)
eig.val
##           eigenvalue variance.percent cumulative.variance.percent
## Dim.1   3.262344e+01     3.080764e+01                    30.80764
## Dim.2   1.630408e+01     1.539661e+01                    46.20425
## Dim.3   1.106560e+01     1.044970e+01                    56.65395
## Dim.4   7.761252e+00     7.329266e+00                    63.98321
## Dim.5   5.354647e+00     5.056610e+00                    69.03982
## Dim.6   4.588589e+00     4.333191e+00                    73.37301
## Dim.7   3.300731e+00     3.117015e+00                    76.49003
## Dim.8   2.852151e+00     2.693402e+00                    79.18343
## Dim.9   2.354893e+00     2.223821e+00                    81.40725
## Dim.10  2.259631e+00     2.133861e+00                    83.54111
## Dim.11  1.632669e+00     1.541795e+00                    85.08291
## Dim.12  1.508827e+00     1.424847e+00                    86.50775
## Dim.13  1.351329e+00     1.276114e+00                    87.78387
## Dim.14  1.097609e+00     1.036516e+00                    88.82038
## Dim.15  9.625008e-01     9.089285e-01                    89.72931
## Dim.16  9.113836e-01     8.606565e-01                    90.58997
## Dim.17  8.477256e-01     8.005417e-01                    91.39051
## Dim.18  7.886294e-01     7.447348e-01                    92.13525
## Dim.19  7.533692e-01     7.114371e-01                    92.84668
## Dim.20  6.708984e-01     6.335566e-01                    93.48024
## Dim.21  6.581412e-01     6.215094e-01                    94.10175
## Dim.22  5.986940e-01     5.653711e-01                    94.66712
## Dim.23  5.796669e-01     5.474030e-01                    95.21452
## Dim.24  5.611426e-01     5.299097e-01                    95.74443
## Dim.25  4.308537e-01     4.068726e-01                    96.15130
## Dim.26  3.807151e-01     3.595247e-01                    96.51083
## Dim.27  3.586030e-01     3.386434e-01                    96.84947
## Dim.28  3.432126e-01     3.241096e-01                    97.17358
## Dim.29  3.187399e-01     3.009990e-01                    97.47458
## Dim.30  3.101841e-01     2.929194e-01                    97.76750
## Dim.31  2.706232e-01     2.555605e-01                    98.02306
## Dim.32  2.090865e-01     1.974489e-01                    98.22051
## Dim.33  1.999273e-01     1.887995e-01                    98.40931
## Dim.34  1.672666e-01     1.579566e-01                    98.56727
## Dim.35  1.555848e-01     1.469251e-01                    98.71419
## Dim.36  1.447504e-01     1.366937e-01                    98.85089
## Dim.37  1.315291e-01     1.242082e-01                    98.97509
## Dim.38  1.247936e-01     1.178476e-01                    99.09294
## Dim.39  1.065216e-01     1.005927e-01                    99.19353
## Dim.40  9.872152e-02     9.322673e-02                    99.28676
## Dim.41  7.393082e-02     6.981587e-02                    99.35658
## Dim.42  7.087815e-02     6.693311e-02                    99.42351
## Dim.43  6.045968e-02     5.709453e-02                    99.48060
## Dim.44  5.093847e-02     4.810327e-02                    99.52871
## Dim.45  4.726190e-02     4.463133e-02                    99.57334
## Dim.46  4.110173e-02     3.881403e-02                    99.61215
## Dim.47  3.961794e-02     3.741283e-02                    99.64957
## Dim.48  3.378478e-02     3.190434e-02                    99.68147
## Dim.49  3.117525e-02     2.944005e-02                    99.71091
## Dim.50  2.469747e-02     2.332282e-02                    99.73423
## Dim.51  2.285924e-02     2.158691e-02                    99.75582
## Dim.52  2.075052e-02     1.959556e-02                    99.77542
## Dim.53  1.885381e-02     1.780441e-02                    99.79322
## Dim.54  1.653295e-02     1.561274e-02                    99.80883
## Dim.55  1.575504e-02     1.487813e-02                    99.82371
## Dim.56  1.539612e-02     1.453918e-02                    99.83825
## Dim.57  1.431699e-02     1.352012e-02                    99.85177
## Dim.58  1.349621e-02     1.274502e-02                    99.86451
## Dim.59  1.271989e-02     1.201191e-02                    99.87653
## Dim.60  1.148745e-02     1.084807e-02                    99.88737
## Dim.61  1.107981e-02     1.046311e-02                    99.89784
## Dim.62  1.030201e-02     9.728602e-03                    99.90757
## Dim.63  9.496860e-03     8.968270e-03                    99.91653
## Dim.64  8.499052e-03     8.025999e-03                    99.92456
## Dim.65  7.804225e-03     7.369846e-03                    99.93193
## Dim.66  7.527123e-03     7.108168e-03                    99.93904
## Dim.67  7.307953e-03     6.901196e-03                    99.94594
## Dim.68  6.414212e-03     6.057201e-03                    99.95200
## Dim.69  5.489877e-03     5.184313e-03                    99.95718
## Dim.70  5.131146e-03     4.845550e-03                    99.96203
## Dim.71  4.519456e-03     4.267906e-03                    99.96629
## Dim.72  4.111992e-03     3.883121e-03                    99.97018
## Dim.73  3.740470e-03     3.532278e-03                    99.97371
## Dim.74  3.467160e-03     3.274180e-03                    99.97698
## Dim.75  2.847916e-03     2.689403e-03                    99.97967
## Dim.76  2.675027e-03     2.526136e-03                    99.98220
## Dim.77  2.490277e-03     2.351670e-03                    99.98455
## Dim.78  2.374990e-03     2.242800e-03                    99.98679
## Dim.79  2.257126e-03     2.131496e-03                    99.98893
## Dim.80  1.594975e-03     1.506200e-03                    99.99043
## Dim.81  1.500816e-03     1.417282e-03                    99.99185
## Dim.82  1.379507e-03     1.302724e-03                    99.99315
## Dim.83  1.078399e-03     1.018375e-03                    99.99417
## Dim.84  1.022391e-03     9.654856e-04                    99.99514
## Dim.85  8.233927e-04     7.775631e-04                    99.99591
## Dim.86  7.831736e-04     7.395826e-04                    99.99665
## Dim.87  6.193632e-04     5.848898e-04                    99.99724
## Dim.88  5.665247e-04     5.349922e-04                    99.99777
## Dim.89  5.333665e-04     5.036797e-04                    99.99828
## Dim.90  3.901322e-04     3.684176e-04                    99.99865
## Dim.91  3.522696e-04     3.326625e-04                    99.99898
## Dim.92  2.633405e-04     2.486831e-04                    99.99923
## Dim.93  2.111102e-04     1.993600e-04                    99.99943
## Dim.94  1.825045e-04     1.723464e-04                    99.99960
## Dim.95  1.427290e-04     1.347848e-04                    99.99973
## Dim.96  1.007373e-04     9.513029e-05                    99.99983
## Dim.97  8.161323e-05     7.707069e-05                    99.99991
## Dim.98  5.753841e-05     5.433585e-05                    99.99996
## Dim.99  1.628254e-05     1.537626e-05                    99.99997
## Dim.100 1.312356e-05     1.239311e-05                    99.99999
## Dim.101 5.608622e-06     5.296449e-06                    99.99999
## Dim.102 4.132084e-06     3.902095e-06                   100.00000
## Dim.103 1.734943e-06     1.638377e-06                   100.00000
## Dim.104 1.193522e-06     1.127091e-06                   100.00000
## Dim.105 4.624676e-07     4.367269e-07                   100.00000
## Dim.106 3.239679e-07     3.059361e-07                   100.00000

In this way, it is possible to conclude that the first eigenvector describes exactly 30.80764% of variance, and by the 10th component 83.54111% of variance is explained. Both the visualizations concerning eigenvalues and percentage of variance explained by each of them can also be displayed, this time in more precise way.


# eigenvalues
fviz_eig(data.pca, choice='eigenvalue')

# percentage of explained variance
fviz_eig(data.pca)

Additionally, the changes in the cumulative variance based on the number of principal components included can be plotted for easier interpretation of the process.


data.pca<-prcomp(data.s, center=FALSE, scale.=FALSE) 
sum<-summary(data.pca)
plot(sum$importance[3,],type="l", main="Cumulative variance")

Finally, in terms of the basic statistics for the whole process, it might also be beneficial to take a more detailed look at the most significant variables for each of the principal components separately, in order to understand, what exactly drives the eigenvectors’ creation in a deeper way.


loading_scores_PC_1<-data.pca$rotation[,1]
fac_scores_PC_1<-abs(loading_scores_PC_1)
fac_scores_PC_1_ranked<-names(sort(fac_scores_PC_1, decreasing=T))
data.pca$rotation[fac_scores_PC_1_ranked, 1]
##                  size7ema                 size14ema                  size7wma 
##              0.1645689260              0.1629269116              0.1629246317 
##                  size3ema                 size14wma                  size3wma 
##              0.1628308664              0.1623796300              0.1605348354 
##                  size7sma                  size3sma                 size30wma 
##              0.1602670721              0.1591954091              0.1589241052 
##                 size14sma                 size30ema                 size30sma 
##              0.1588579007              0.1580706325              0.1537059328 
##                      size         transactions90ema         transactions90wma 
##              0.1513766009              0.1495033726              0.1480431083 
##         transactions90sma                 size90wma         transactions30sma 
##              0.1475113327              0.1464743069              0.1441335042 
##         transactions30ema                 size90ema         transactions30wma 
##              0.1440719810              0.1438701215              0.1410718869 
##         transactions14ema                 size90sma         transactions14sma 
##              0.1385521372              0.1358706673              0.1357401592 
##         transactions14wma          transactions7ema                 size30var 
##              0.1349389611              0.1344969454             -0.1318808769 
##          transactions7wma                 size14std                 size30std 
##              0.1314552536             -0.1308497039             -0.1307202893 
##          transactions7sma          fee_to_rewardUSD          transactions3ema 
##              0.1303771560              0.1302623738              0.1297148334 
##                 top100cap          transactions3wma                 size14var 
##             -0.1294492006              0.1271253887             -0.1267641306 
##          transactions3sma                 size90var                  size7std 
##              0.1264183495             -0.1249641899             -0.1229628288 
##                 size90std              transactions                 size90trx 
##             -0.1183991278              0.1170950418              0.1156275287 
##                  size7var                  size3std                 size90rsi 
##             -0.1102529488             -0.1039026108              0.1008269040 
## median_transaction_feeUSD       transactionvalueUSD                  size3var 
##              0.0977879316             -0.0951290422             -0.0854654016 
##        transactionfeesUSD                 size30trx                 size90mom 
##              0.0847160180              0.0771490442              0.0763621391 
##           activeaddresses             sentbyaddress         transactions90trx 
##              0.0749731386              0.0745738851              0.0699416612 
##                 size30rsi                 size90roc                  hashrate 
##              0.0638001702              0.0626974122             -0.0583243384 
##              sentinusdUSD                 size14trx                difficulty 
##             -0.0578267759              0.0565768732             -0.0528640947 
##                  priceUSD         transactions30std         transactions30var 
##             -0.0511168037              0.0493454200              0.0492705775 
##                 size30mom                 size14rsi                  size7trx 
##              0.0487407154              0.0481550745              0.0415746372 
##         transactions14var                  size7rsi         transactions14std 
##              0.0399929099              0.0380160107              0.0372612334 
##                  size3trx                 size30roc      mining_profitability 
##              0.0354784180              0.0353650603             -0.0344252810 
##         transactions90rsi         transactions90std                 size14mom 
##              0.0342999449              0.0320991652              0.0320321395 
## mediantransactionvalueUSD                  size3rsi         transactions90var 
##              0.0291217934              0.0284176431              0.0283837299 
##                  size3mom                 size14roc                  size7mom 
##              0.0275561064              0.0237120167              0.0229050249 
##          transactions3rsi          transactions3trx          transactions3mom 
##              0.0217470806              0.0210672801              0.0206059111 
##          transactions7var          transactions7rsi         transactions30rsi 
##              0.0205449800              0.0203788142              0.0192550664 
##         transactions90mom         transactions14rsi          transactions3roc 
##              0.0188443215              0.0182015822              0.0173642142 
##                  size3roc         transactions90roc          confirmationtime 
##              0.0160387985              0.0157089033              0.0152855550 
##                  size7roc          transactions7std         transactions30trx 
##              0.0149431504              0.0140690830              0.0130605243 
##         transactions14trx         transactions30mom          transactions7trx 
##             -0.0100785096              0.0098965192             -0.0090900606 
##         transactions14mom         transactions30roc         transactions14roc 
##             -0.0062846467              0.0052013703             -0.0046428040 
##          transactions3std          transactions3var          transactions7mom 
##              0.0041816541              0.0041386013             -0.0021291575 
##          transactions7roc 
##              0.0005921151

When it comes to the first principal component (PC1) and the proportions of variables that influenced its structure, it can be concluded that the most important characteristics in this case were variables concerning the size of EMA (Exponential Moving Average) and WMA (Week Moving Average) for different length periods.




5.2 Individual results


Aside from the basic indicators for created principal components, it is also possible to examine more detailed statistics for individual results of each of the variables with the help of the function “get_pca_ind”.


library(factoextra)
ind<-get_pca_ind(data.pca)  

Namely, 2 different measures might be worth showcasing: coordinates of the variables and also the contributions of individual characteristics to the principal components.


print(ind)
## Principal Component Analysis Results for individuals
##  ===================================================
##   Name       Description                       
## 1 "$coord"   "Coordinates for the individuals" 
## 2 "$cos2"    "Cos2 for the individuals"        
## 3 "$contrib" "contributions of the individuals"
# coordinates of variables
head(ind$coord)
##      Dim.1     Dim.2    Dim.3     Dim.4     Dim.5      Dim.6      Dim.7
## 1 3.110182 -2.115125 1.996596 -5.341517 0.4114232 -1.9810495  0.5949921
## 2 3.046220 -1.279052 2.393504 -4.942883 0.7048107 -1.2786222  0.1198232
## 3 3.018103 -1.514443 2.972079 -3.272689 1.9574775  0.2589512  0.4380423
## 4 2.443978  1.099390 3.927878 -3.761725 1.0670355  0.4138992  0.7706130
## 5 3.972577 -3.207101 1.570670 -5.686634 1.8723561 -0.1995235  0.4626377
## 6 1.247672  2.306453 5.466992 -2.459489 0.7260533  0.1781364 -0.4589488
##        Dim.8      Dim.9    Dim.10     Dim.11    Dim.12    Dim.13     Dim.14
## 1 -1.9253815  0.7098793 1.2471655 -0.4664735 -1.161641 0.8935077  0.2023248
## 2 -1.8581880  0.4389236 1.9820983 -0.9952177 -1.368670 1.2217388  0.1701462
## 3 -1.7596777 -1.6775371 0.9532125 -1.1913068 -2.193198 1.5783667  1.1058310
## 4 -0.8994188 -1.5429243 1.2566485 -1.3288615 -2.308481 1.1992347  0.5621118
## 5 -0.5378921 -0.8697609 1.5107494  0.2876124 -2.150945 1.0256856  0.2788326
## 6 -0.1350763 -1.7049140 1.8235696  0.5987034 -1.825420 1.7616414 -0.3411941
##        Dim.15      Dim.16     Dim.17     Dim.18      Dim.19     Dim.20
## 1  0.81830817 -0.99598415 -1.3361990 -1.1351167  0.23617133  0.5368643
## 2  0.33245966 -0.39026018 -1.1652220 -2.1308776  1.34049137  0.1201212
## 3 -0.08286611  0.49226495 -0.7016716 -1.6639702  1.23215607  1.3068707
## 4  0.22193386  0.05410055 -0.3935210 -0.5819853  1.16630157 -0.2840510
## 5  0.82273917 -0.78102033  1.1559272 -0.5326324 -0.02939048 -0.7221905
## 6  0.96427292  0.17563643  0.5160374 -0.9406989 -0.68995497 -0.5113339
##       Dim.21       Dim.22      Dim.23    Dim.24     Dim.25    Dim.26     Dim.27
## 1 -0.5763006  1.712489151 -0.58298671 0.9269925 0.45260097 1.4331882  0.5078085
## 2 -0.2755938  0.326562487 -0.35470242 0.8944426 0.15282470 0.6095208  0.3697847
## 3  0.8350777 -0.009043401 -0.54205381 0.9918937 0.79563419 0.8711833 -0.3229900
## 4 -0.3043625 -0.475535281 -0.26003923 0.7015818 0.33284132 0.9527877  0.4827439
## 5 -0.6106870 -0.049690817 -0.07088650 1.0853385 0.07328544 0.4573706  0.4862175
## 6 -0.3167080 -0.636291427 -0.05991018 0.7098797 0.72819017 0.7886199  0.6069105
##       Dim.28     Dim.29       Dim.30     Dim.31      Dim.32      Dim.33
## 1 -0.2764405 -0.5167003 -0.008195915  0.3927332  0.67565208 -0.01478236
## 2  0.0966099 -0.4138701  0.167349928  0.1586064  0.02736994  0.08178511
## 3 -0.5224968 -0.2245880  0.536478199 -0.2060138 -0.17813112  0.61380674
## 4 -0.2948006 -0.2585394  0.348798225 -0.2892611 -0.27013295  0.33078512
## 5 -0.2218936 -0.4933053  0.921692114 -0.8580277 -0.45829701  0.30803700
## 6 -0.6343732 -0.5628218  0.170973636 -0.1466100 -0.35775427  0.22710844
##       Dim.34     Dim.35     Dim.36     Dim.37    Dim.38        Dim.39
## 1 0.03154552 0.43648503 -0.6068635 -0.1968980 0.1425365  0.2692500723
## 2 0.19878746 0.35891980  0.0654158  0.1188744 0.1840157  0.0236590266
## 3 0.53283776 0.06131692  0.1224146 -0.2175115 0.3109167  0.1077589797
## 4 0.42046347 0.02275661  0.5490888  0.1994958 0.1613902 -0.0481327649
## 5 0.59471038 0.30251646  0.1148599  0.2741160 0.1406704  0.0005625872
## 6 0.20883461 0.20130096  0.2146794  0.3631749 0.1852843 -0.0926328893
##        Dim.40      Dim.41       Dim.42      Dim.43      Dim.44       Dim.45
## 1  0.05317964  0.00320484  0.057976042  0.09341185  0.02019876 -0.157259745
## 2  0.01530264 -0.11864393  0.043865934  0.08024837  0.02959030 -0.231344366
## 3  0.21799308 -0.06371825 -0.003800305 -0.05540291 -0.19097931  0.202806272
## 4 -0.09343039 -0.12769108 -0.188817984  0.11374199 -0.28484041 -0.114517677
## 5  0.28217444 -0.04021435 -0.232488371  0.25730168 -0.16741969  0.003718343
## 6  0.27235344  0.17872418 -0.179710045  0.20635130 -0.03138347 -0.008653634
##        Dim.46     Dim.47       Dim.48      Dim.49     Dim.50      Dim.51
## 1 -0.09140335 0.09545111  0.054690650  0.18687741 0.07320165 -0.04322019
## 2 -0.04494536 0.20889327  0.016293145  0.17185904 0.06130035 -0.04942264
## 3 -0.23515573 0.23660540 -0.174446531  0.13855632 0.17633099  0.10068796
## 4  0.07541472 0.10882991 -0.182636612  0.07282895 0.16354599 -0.11306805
## 5  0.11256109 0.14476982 -0.003990558  0.14460988 0.10488537  0.16006093
## 6  0.19686523 0.06705063  0.090260813 -0.03146318 0.12257169  0.08298218
##         Dim.52       Dim.53       Dim.54      Dim.55       Dim.56      Dim.57
## 1 -0.213158667  0.167021447  0.029248437 -0.03602267 -0.028942279  0.13275802
## 2 -0.033581701  0.276210774 -0.037774165 -0.02641210  0.048112924  0.10271896
## 3 -0.052662632  0.282059615  0.152942018 -0.11723364  0.004653775 -0.05248682
## 4  0.003906629  0.325140065  0.118560026 -0.21184400 -0.056527231 -0.10403943
## 5 -0.069423599 -0.005580431  0.006015602 -0.15352835 -0.030684072  0.07096521
## 6 -0.104418521 -0.029187474 -0.075249906 -0.05155839 -0.153318791  0.17601262
##        Dim.58      Dim.59       Dim.60      Dim.61      Dim.62      Dim.63
## 1  0.07565446 -0.04226184  0.000485747 -0.04875638 -0.09368409 -0.06154087
## 2  0.06934813  0.04493857  0.058947079  0.05482487 -0.07948826  0.04072994
## 3  0.15186378  0.07339651  0.081146956  0.08085223 -0.06499902 -0.05914954
## 4  0.07422748  0.23856619  0.034602747  0.11688741  0.02847900  0.08234961
## 5 -0.00885657  0.02488315  0.006944500  0.09238875  0.02970363 -0.08466706
## 6 -0.01503800 -0.02118165 -0.040335291 -0.13682040  0.07768572  0.06269353
##         Dim.64       Dim.65     Dim.66      Dim.67      Dim.68     Dim.69
## 1 -0.032214243 -0.009387191 0.03370427 -0.13342130  0.09699895 0.01690295
## 2 -0.043540082 -0.201719212 0.10020688 -0.04004851  0.02388404 0.07005445
## 3  0.007075675  0.009022764 0.02638833  0.10460149  0.01615774 0.09938640
## 4 -0.062513662 -0.080573937 0.07405205  0.07743866 -0.01472699 0.10502983
## 5 -0.044513382  0.058709807 0.01909733  0.11255932  0.03124236 0.10526506
## 6 -0.048238360 -0.081293227 0.02171592 -0.02096412 -0.09243918 0.04505514
##         Dim.70       Dim.71      Dim.72       Dim.73       Dim.74       Dim.75
## 1  0.025366385  0.037228600 -0.00433998 -0.008675643 -0.044796285  0.023084541
## 2 -0.002588932  0.051675471 -0.02941749 -0.038080312 -0.068866096  0.030465840
## 3 -0.081070567 -0.011990231 -0.06160425  0.044635696 -0.048299862  0.053720638
## 4 -0.118369619  0.054720664 -0.04157752  0.035237162  0.019104754 -0.019510668
## 5 -0.123902276  0.004128187 -0.08623306  0.019481748 -0.003304265  0.028104857
## 6 -0.097642515 -0.004868667 -0.12300637  0.049559680  0.058340159  0.002655803
##         Dim.76      Dim.77      Dim.78      Dim.79      Dim.80       Dim.81
## 1  0.003650269 -0.04578579  0.01581004 0.030167279  0.01868854  0.045631063
## 2  0.007615971 -0.04278198 -0.02183660 0.006496716  0.02732224 -0.033339939
## 3 -0.001984994 -0.01118750 -0.04502446 0.025320532  0.02132527 -0.029958854
## 4  0.032858614 -0.01358574 -0.06722917 0.053217157  0.03823546  0.027080528
## 5 -0.005226603 -0.01189815 -0.03118259 0.049314516 -0.01791007 -0.006754942
## 6  0.048080454  0.02312411 -0.08286899 0.021663349 -0.01420106 -0.003753235
##       Dim.82     Dim.83     Dim.84        Dim.85       Dim.86       Dim.87
## 1 0.01891128 0.02254287 0.02474040  4.460161e-02  0.009096565 -0.050007017
## 2 0.04431966 0.01951589 0.02926036  4.909250e-02 -0.004684553 -0.007389649
## 3 0.04021769 0.01998814 0.02566855  2.413401e-02 -0.012490292  0.006821574
## 4 0.02441829 0.03932462 0.07104711  5.261024e-02  0.005123208  0.029749465
## 5 0.02348882 0.04098130 0.06031194 -9.869125e-05 -0.026263392 -0.018234124
## 6 0.02869658 0.03083394 0.04522095  2.286492e-02  0.007731091 -0.002938292
##          Dim.88      Dim.89     Dim.90       Dim.91      Dim.92       Dim.93
## 1  0.0008745544 0.008442963 0.05853373 -0.028460265 -0.02133826 -0.033754001
## 2 -0.0117972721 0.003430789 0.05791235  0.001777109 -0.02233977 -0.025736248
## 3 -0.0373611659 0.025598234 0.05132453 -0.005757939 -0.01926119 -0.012058277
## 4 -0.0036510809 0.021070426 0.05131036  0.009135699 -0.02704393 -0.010653811
## 5  0.0109087151 0.012160899 0.05731507 -0.015192868 -0.02528357 -0.005321053
## 6  0.0087689964 0.016938911 0.05425207 -0.049333167 -0.01013424 -0.001816369
##        Dim.94     Dim.95     Dim.96       Dim.97       Dim.98       Dim.99
## 1 -0.03132429 0.01889590 0.02515069 0.0180061107 -0.013935496 -0.009015511
## 2 -0.02776477 0.01925609 0.01899356 0.0229815380 -0.011567653 -0.004332362
## 3 -0.02264392 0.01503741 0.02158640 0.0133417932 -0.011410139 -0.010989707
## 4 -0.03458096 0.01251773 0.01988936 0.0124739306 -0.005338152 -0.010464584
## 5 -0.01790417 0.01556958 0.02824810 0.0017043146 -0.008797041 -0.005092330
## 6 -0.01900111 0.01376501 0.03218332 0.0004517494 -0.010316294 -0.008388026
##         Dim.100     Dim.101       Dim.102       Dim.103       Dim.104
## 1 -0.0006795002 0.003388928  2.783292e-03  0.0024663243  0.0006253679
## 2 -0.0016713722 0.008644409  8.391441e-05  0.0006260123  0.0025185725
## 3  0.0021443804 0.004458362  1.498714e-03  0.0021757596  0.0004056451
## 4  0.0016547121 0.002678760 -5.303867e-04  0.0020453424  0.0003901029
## 5  0.0014596359 0.003360269  1.704527e-03 -0.0007668016  0.0008047144
## 6  0.0022029326 0.001024555  5.783175e-04  0.0012007151 -0.0013209207
##         Dim.105       Dim.106
## 1 -0.0015424297  0.0009720011
## 2 -0.0001629268  0.0001308702
## 3 -0.0003446367  0.0005145677
## 4 -0.0004053317  0.0003588643
## 5 -0.0006079400 -0.0010977419
## 6  0.0003508210 -0.0005238898
# contributions of individuals to PC
head(ind$contrib)
##         Dim.1       Dim.2      Dim.3     Dim.4       Dim.5        Dim.6
## 1 0.029621521 0.027412022 0.03598910 0.3672510 0.003158000 0.0854431012
## 2 0.028415700 0.010024108 0.05172001 0.3144810 0.009267863 0.0355935119
## 3 0.027893549 0.014053198 0.07974641 0.1378616 0.071487190 0.0014598971
## 4 0.018290695 0.007405818 0.13928561 0.1821411 0.021241855 0.0037297144
## 5 0.048325941 0.063022334 0.02227206 0.4162404 0.065405095 0.0008667116
## 6 0.004766908 0.032595553 0.26982815 0.0778616 0.009834939 0.0006908627
##          Dim.7        Dim.8       Dim.9     Dim.10      Dim.11     Dim.12
## 1 0.0107146433 0.1298453815 0.021377815 0.06876637 0.013314392 0.08934489
## 2 0.0004345471 0.1209406279 0.008172823 0.17369140 0.060604319 0.12402901
## 3 0.0058074758 0.1084574015 0.119381938 0.04017051 0.086838958 0.31847959
## 4 0.0179733057 0.0283345826 0.100991212 0.06981610 0.108050508 0.35284046
## 5 0.0064779473 0.0101340537 0.032091805 0.10090502 0.005061539 0.30632644
## 6 0.0063750520 0.0006390738 0.123310277 0.14701871 0.021932633 0.22062344
##       Dim.13      Dim.14       Dim.15       Dim.16     Dim.17     Dim.18
## 1 0.05902025 0.003725772 0.0695021389 0.1087349469 0.21040327 0.16322005
## 2 0.11034717 0.002634891 0.0114720838 0.0166944757 0.16000278 0.57518758
## 3 0.18417061 0.111300082 0.0007127189 0.0265620822 0.05802003 0.35073863
## 4 0.10631948 0.028758313 0.0051122432 0.0003208246 0.01824930 0.04290586
## 5 0.07777374 0.007076283 0.0702568616 0.0668634790 0.15746032 0.03593749
## 6 0.22942434 0.010595489 0.0965082266 0.0033813764 0.03138141 0.11209695
##         Dim.19      Dim.20     Dim.21       Dim.22       Dim.23     Dim.24
## 1 0.0073962567 0.042917822 0.05041324 4.893462e-01 0.0585739183 0.15298354
## 2 0.2382789527 0.002148564 0.01152883 1.779480e-02 0.0216827956 0.14242859
## 3 0.2013209997 0.254316140 0.10585226 1.364659e-05 0.0506374431 0.17515501
## 4 0.1803762509 0.012014377 0.01406140 3.773341e-02 0.0116537234 0.08762920
## 5 0.0001145437 0.077662647 0.05660878 4.120148e-04 0.0008659924 0.20971170
## 6 0.0631246710 0.038933007 0.01522525 6.755736e-02 0.0006185690 0.08971431
##        Dim.25     Dim.26     Dim.27      Dim.28     Dim.29       Dim.30
## 1 0.047497053 0.53897905 0.07183754 0.022243618 0.08367707 0.0000216342
## 2 0.005415303 0.09748605 0.03809337 0.002716724 0.05368551 0.0090198016
## 3 0.146778547 0.19915193 0.02906229 0.079463823 0.01580891 0.0926936713
## 4 0.025686810 0.23820878 0.06492098 0.025296407 0.02094993 0.0391827090
## 5 0.001245292 0.05489111 0.06585862 0.014331507 0.07627121 0.2736009936
## 6 0.122949093 0.16319277 0.10261263 0.117136411 0.09928206 0.0094146509
##        Dim.31       Dim.32       Dim.33       Dim.34       Dim.35      Dim.36
## 1 0.056937138 0.2181150589 0.0001091895 0.0005943353 0.1223311238 0.254172029
## 2 0.009286282 0.0003579209 0.0033422726 0.0236012078 0.0827166759 0.002953323
## 3 0.015667266 0.0151606910 0.1882593961 0.1695689196 0.0024141206 0.010342188
## 4 0.030887347 0.0348653902 0.0546745603 0.1055875695 0.0003325166 0.208080264
## 5 0.271770932 0.1003537172 0.0474131838 0.2112356884 0.0587619596 0.009105051
## 6 0.007934648 0.0611517754 0.0257726983 0.0260472095 0.0260189618 0.031807270
##       Dim.37     Dim.38       Dim.39       Dim.40       Dim.41       Dim.42
## 1 0.02944599 0.01626393 6.798910e-02 0.0028618340 1.387883e-05 4.737511e-03
## 2 0.01073297 0.02710711 5.249541e-04 0.0002369661 1.902090e-02 2.712114e-03
## 3 0.03593423 0.07738584 1.089017e-02 0.0480882618 5.486149e-03 2.035589e-05
## 4 0.03022812 0.02085104 2.172747e-03 0.0088334429 2.203236e-02 5.025044e-02
## 5 0.05707061 0.01584086 2.968297e-07 0.0805729007 2.185254e-03 7.618257e-02
## 6 0.10017872 0.02748214 8.047445e-03 0.0750618705 4.316250e-02 4.551954e-02
##        Dim.43       Dim.44       Dim.45      Dim.46     Dim.47       Dim.48
## 1 0.014417951 0.0008001454 0.0522744468 0.020306245 0.02297394 8.844444e-03
## 2 0.010640746 0.0017171897 0.1131285350 0.004909927 0.11003288 7.849721e-04
## 3 0.005071832 0.0715306546 0.0869394873 0.134405330 0.14116365 8.998477e-02
## 4 0.021376731 0.1591192727 0.0277203919 0.013823485 0.02986552 9.863250e-02
## 5 0.109391832 0.0549708710 0.0000292249 0.030795120 0.05284813 4.708816e-05
## 6 0.070358094 0.0019316193 0.0001582892 0.094198372 0.01133651 2.409033e-02
##        Dim.49     Dim.50      Dim.51       Dim.52       Dim.53      Dim.54
## 1 0.111910064 0.02167478 0.008163512 2.187472e-01 0.1478124173 0.005169165
## 2 0.094645568 0.01519984 0.010674697 5.429274e-03 0.4042478004 0.008621937
## 3 0.061518853 0.12576803 0.044305608 1.335185e-02 0.4215491863 0.141341180
## 4 0.016996662 0.10819141 0.055870618 7.347518e-05 0.5601539636 0.084935970
## 5 0.067011826 0.04449824 0.111962942 2.320335e-02 0.0001650068 0.000218662
## 6 0.003172202 0.06077058 0.030093544 5.249180e-02 0.0045139782 0.034215824
##        Dim.55       Dim.56     Dim.57       Dim.58      Dim.59       Dim.60
## 1 0.008228064 0.0054352513 0.12298019 0.0423665069 0.014027453 2.051928e-06
## 2 0.004423355 0.0150202682 0.07362323 0.0355978005 0.015860631 3.021805e-02
## 3 0.087146589 0.0001405287 0.01922269 0.1707112107 0.042309004 5.726461e-02
## 4 0.284562867 0.0207333558 0.07552827 0.0407833635 0.446992007 1.041269e-02
## 5 0.149459342 0.0061091414 0.03514023 0.0005806103 0.004862871 4.193956e-04
## 6 0.016855615 0.1525263769 0.21617289 0.0016739176 0.003523722 1.414856e-02
##       Dim.61      Dim.62     Dim.63       Dim.64      Dim.65      Dim.66
## 1 0.02143365 0.085108985 0.03983939 0.0121980634 0.001127995 0.015076701
## 2 0.02710120 0.061270257 0.01745071 0.0222829920 0.520871098 0.133269746
## 3 0.05894096 0.040969184 0.03680342 0.0005884787 0.001042113 0.009241876
## 4 0.12318807 0.007864897 0.07133596 0.0459351254 0.083104569 0.072779770
## 5 0.07696112 0.008555840 0.07540747 0.0232903606 0.044122188 0.004840405
## 6 0.16878539 0.058522936 0.04134576 0.0273514332 0.084594954 0.006258830
##        Dim.67      Dim.68      Dim.69       Dim.70       Dim.71       Dim.72
## 1 0.243343694 0.146540030 0.005199095 0.0125276102 0.0306360459 0.0004576029
## 2 0.021925148 0.008884597 0.089304707 0.0001304945 0.0590266577 0.0210244359
## 3 0.149570345 0.004066150 0.179745009 0.1279609637 0.0031778576 0.0922007914
## 4 0.081975758 0.003377926 0.200737383 0.2727919641 0.0661884217 0.0419981635
## 5 0.173193928 0.015202318 0.201637537 0.2988888039 0.0003767021 0.1806594895
## 6 0.006007904 0.133086596 0.036939552 0.1856218050 0.0005239616 0.3675939910
##        Dim.73       Dim.74       Dim.75       Dim.76      Dim.77     Dim.78
## 1 0.002010216 0.0578196630 0.0186930754 0.0004976076 0.084096758 0.01051403
## 2 0.038729357 0.1366477482 0.0325585132 0.0021661473 0.073424272 0.02005736
## 3 0.053211306 0.0672176529 0.1012326462 0.0001471485 0.005020925 0.08527088
## 4 0.033162032 0.0105165782 0.0133531135 0.0403214236 0.007404308 0.19011620
## 5 0.010136666 0.0003145874 0.0277077346 0.0010201791 0.005679064 0.04090043
## 6 0.065598858 0.0980678786 0.0002474172 0.0863325336 0.021451021 0.28886022
##        Dim.79     Dim.80       Dim.81     Dim.82     Dim.83     Dim.84
## 1 0.040279299 0.02187572 0.1385987019 0.02589904 0.04707656 0.05980834
## 2 0.001868088 0.04675673 0.0739890727 0.14224419 0.03528278 0.08365804
## 3 0.028376268 0.02848398 0.0597431882 0.11713207 0.03701102 0.06438000
## 4 0.125346713 0.09156809 0.0488148828 0.04317894 0.14325679 0.49322053
## 5 0.107636399 0.02009122 0.0030372546 0.03995437 0.15558139 0.35543073
## 6 0.020771163 0.01263144 0.0009376688 0.05963509 0.08807328 0.19981485
##         Dim.85      Dim.86      Dim.87       Dim.88     Dim.89    Dim.90
## 1 2.413568e-01 0.010555100 0.403349919 0.0001348716 0.01335148 0.8773365
## 2 2.924077e-01 0.002799263 0.008807805 0.0245419953 0.00220459 0.8588080
## 3 7.066711e-02 0.019899980 0.007505667 0.2461430160 0.12273254 0.6745337
## 4 3.358141e-01 0.003348047 0.142750746 0.0023506587 0.08315461 0.6741614
## 5 1.181723e-06 0.087985096 0.053627784 0.0209842704 0.02769945 0.8411848
## 6 6.343046e-02 0.007624107 0.001392548 0.0135595869 0.05374161 0.7536789
##         Dim.91     Dim.92      Dim.93    Dim.94    Dim.95    Dim.96
## 1 0.2297037911 0.17272928 0.539146370 0.5370990 0.2499127 0.6272997
## 2 0.0008956091 0.18932385 0.313434382 0.4219686 0.2595312 0.3577570
## 3 0.0094020909 0.14073889 0.068806064 0.2806695 0.1582702 0.4620998
## 4 0.0236686728 0.27745181 0.053711364 0.6545848 0.1096742 0.3922987
## 5 0.0654590695 0.24250713 0.013398349 0.1754689 0.1696708 0.7913230
## 6 0.6901894149 0.03896099 0.001561221 0.1976286 0.1326192 1.0271572
##         Dim.97     Dim.98    Dim.99     Dim.100    Dim.101      Dim.102
## 1 0.3968667740 0.33717280 0.4986823 0.003514738 0.20456625 0.1872896759
## 2 0.6464918989 0.23232617 0.1151577 0.021264740 1.33100583 0.0001702431
## 3 0.2178880021 0.22604220 0.7409954 0.035003976 0.35404627 0.0543042537
## 4 0.1904634101 0.04947542 0.6718730 0.020842922 0.12781354 0.0068011368
## 5 0.0035555307 0.13436337 0.1591023 0.016218205 0.20112105 0.0702431352
## 6 0.0002498044 0.18478006 0.4316809 0.036941637 0.01869736 0.0080859120
##      Dim.103    Dim.104     Dim.105     Dim.106
## 1 0.35025199 0.03273453 0.513919315 0.291337865
## 2 0.02256556 0.53093809 0.005734149 0.005281353
## 3 0.27258509 0.01377296 0.025657080 0.081648542
## 4 0.24088641 0.01273777 0.035489944 0.039712196
## 5 0.03385683 0.05420240 0.079837249 0.371589953
## 6 0.08301570 0.14604560 0.026586142 0.084633695

Additionally, the individual contributions can also be visualized in a more easily interpretable way, as shown below for the first 2 principal components.


# contributions of individual variables to PC
library(gridExtra)
var<-get_pca_var(data.pca)
a<-fviz_contrib(data.pca, "var", axes=1, xtickslab.rt=90)
b<-fviz_contrib(data.pca, "var", axes=2, xtickslab.rt=90)
grid.arrange(a,b,top='Contribution to the first two Principal Components')

Despite the visualizations feeling very “busy” due to the large number of individual variables, this function might prove to be especially helpful in understanding the compositions of the created PCs.




5.3 Rotated PCA


While using PCS for dimension reduction, it is also possible to implement a transformation process of the results in order to obtain what is often called “rotated PCA”, which can be characterized by many benefits compared to the initial version. The process is based on the rotation of the original facors (loadings) in order to achieve a simpler structure that will be easier to interpret. The rotated PCA results can simplify the exploration of the influences of each of the variables on created factors, and later even use those conclusions to build “synthetic” variables for future analysis. The statistics of the results for the rotated PCA based on the “varimax” (orthogonal) approach can be viewed below.


library(psych)
data.pca.rotated<-principal(data.s, nfactors=3, rotate="varimax")
summary(data.pca.rotated)
## 
## Factor analysis with Call: principal(r = data.s, nfactors = 3, rotate = "varimax")
## 
## Test of the hypothesis that 3 factors are sufficient.
## The degrees of freedom for the model is 5250  and the objective function was  316.06 
## The number of observations was  1000  with Chi Square =  303681.9  with prob <  0 
## 
## The root mean square of the residuals (RMSA) is  0.11


Moreover, it might also be beneficial to showcase only the significant factors.

print(loadings(data.pca.rotated), digits=3, cutoff=0.4, sort=TRUE)
## 
## Loadings:
##                           RC1    RC3    RC2   
## priceUSD                  -0.619              
## transactions               0.727         0.610
## transactionvalueUSD       -0.650              
## top100cap                 -0.578 -0.459       
## transactions3sma           0.883              
## transactions7sma           0.946              
## transactions14sma          0.946              
## transactions30sma          0.937              
## transactions90sma          0.873              
## transactions3ema           0.878         0.421
## transactions7ema           0.942              
## transactions14ema          0.958              
## transactions30ema          0.956              
## transactions90ema          0.918              
## transactions3wma           0.855         0.439
## transactions7wma           0.943              
## transactions14wma          0.954              
## transactions30wma          0.952              
## transactions90wma          0.921              
## size                       0.434  0.698  0.451
## transactionfeesUSD                0.768       
## median_transaction_feeUSD         0.783       
## mediantransactionvalueUSD -0.424  0.673       
## fee_to_rewardUSD                  0.806       
## transactions30std                 0.543       
## transactions90std                 0.615       
## transactions30var                 0.532       
## transactions90var                 0.605       
## size3sma                   0.540  0.737       
## size7sma                   0.572  0.770       
## size14sma                  0.572  0.775       
## size30sma                  0.544  0.772       
## size90sma                  0.410  0.777       
## size3ema                   0.528  0.756       
## size7ema                   0.570  0.779       
## size14ema                  0.577  0.787       
## size30ema                  0.553  0.792       
## size90ema                  0.453  0.794       
## size3wma                   0.517  0.740       
## size7wma                   0.569  0.769       
## size14wma                  0.579  0.780       
## size30wma                  0.569  0.781       
## size90wma                  0.480  0.787       
## size90trx                  0.476  0.510       
## size3std                         -0.594       
## size7std                         -0.752       
## size14std                        -0.769       
## size30std                        -0.763       
## size90std                        -0.815       
## size7var                         -0.631       
## size14var                        -0.696       
## size30var                        -0.705       
## size90var                        -0.756       
## sentbyaddress                            0.714
## activeaddresses                   0.479  0.540
## transactions3trx                         0.714
## transactions3mom                         0.711
## transactions30mom                        0.713
## transactions90mom                        0.570
## transactions3rsi                         0.765
## transactions7rsi                         0.807
## transactions14rsi                        0.831
## transactions30rsi                        0.851
## transactions90rsi                        0.846
## transactions3roc                         0.689
## transactions30roc                        0.687
## transactions90roc                        0.556
## size3trx                                 0.719
## size3mom                                 0.713
## size30mom                                0.653
## size3rsi                                 0.742
## size7rsi                                 0.779
## size14rsi                                0.780
## size30rsi                                0.754
## size90rsi                                0.641
## size3roc                                 0.683
## size30roc                                0.626
## difficulty                -0.479              
## hashrate                                      
## mining_profitability      -0.478              
## sentinusdUSD              -0.477              
## confirmationtime                  0.418       
## transactions7trx                              
## transactions14trx                -0.414       
## transactions30trx                             
## transactions90trx          0.431              
## transactions7mom                              
## transactions14mom                             
## transactions3std                              
## transactions7std                              
## transactions14std                 0.421       
## transactions3var                              
## transactions7var                              
## transactions14var                 0.433       
## transactions7roc                              
## transactions14roc                             
## size7trx                                      
## size14trx                                     
## size30trx                  0.454              
## size7mom                                 0.417
## size14mom                                0.414
## size90mom                                0.435
## size3var                         -0.456       
## size7roc                                      
## size14roc                                     
## size90roc                                0.424
## 
##                   RC1    RC3    RC2
## SS loadings    22.775 21.248 16.031
## Proportion Var  0.215  0.200  0.151
## Cumulative Var  0.215  0.415  0.567

Through the application of the rotated PCA algorithm, it is now possible to take time to analyze what each of the groups of obtained RCs consist of in terms of their components. Additionally, it might be worth observing that some variables (like “hash rate”, “transactions30trx” or “size14trx”) are not included in any of the 3 displayed RCs, meaning that in this case they are not important in explaining the changes.




5.4 PCA quality measures


After applying the PCA algorithm, apart from drawing conclusions in terms of the principal components and the variables that affect their structures, it is also important to focus on the quality of the principal components, which can be achieved through the calculations of 2 additional measures: complexity and uniqueness. The complexity of the loadings refers to how spread out factors are across different variables, which directly influences how easy or difficult it is to interpret the loadings. In effect, high complexity is undesirable because it means that the factors created during the PCA process will be difficult to understand. Therefore, it is important to take a look at the calculated values for each of the variables as well as the visualization of their spread.


data.pca.rotated$complexity
##                  priceUSD              transactions                      size 
##                  1.300937                  1.941516                  2.458451 
##             sentbyaddress                difficulty                  hashrate 
##                  1.347281                  1.101114                  1.314606 
##      mining_profitability              sentinusdUSD        transactionfeesUSD 
##                  1.439063                  1.593108                  1.003374 
## median_transaction_feeUSD          confirmationtime       transactionvalueUSD 
##                  1.016573                  1.665037                  1.287575 
## mediantransactionvalueUSD           activeaddresses                 top100cap 
##                  1.881133                  1.972638                  1.959982 
##          fee_to_rewardUSD          transactions3sma          transactions7sma 
##                  1.262669                  1.266287                  1.050004 
##         transactions14sma         transactions30sma         transactions90sma 
##                  1.047037                  1.094414                  1.289239 
##          transactions3ema          transactions7ema         transactions14ema 
##                  1.436888                  1.149558                  1.077558 
##         transactions30ema         transactions90ema          transactions3wma 
##                  1.075277                  1.193590                  1.492876 
##          transactions7wma         transactions14wma         transactions30wma 
##                  1.117641                  1.070121                  1.066333 
##         transactions90wma          transactions3trx          transactions7trx 
##                  1.161239                  1.022429                  2.344668 
##         transactions14trx         transactions30trx         transactions90trx 
##                  2.243331                  2.936866                  1.210142 
##          transactions3mom          transactions7mom         transactions14mom 
##                  1.009473                  1.545844                  2.031853 
##         transactions30mom         transactions90mom          transactions3std 
##                  1.110905                  2.007374                  2.100434 
##          transactions7std         transactions14std         transactions30std 
##                  1.945391                  1.214663                  1.056201 
##         transactions90std          transactions3var          transactions7var 
##                  1.408223                  2.135820                  1.528041 
##         transactions14var         transactions30var         transactions90var 
##                  1.144435                  1.051178                  1.476603 
##          transactions3rsi          transactions7rsi         transactions14rsi 
##                  1.007430                  1.016799                  1.059940 
##         transactions30rsi         transactions90rsi          transactions3roc 
##                  1.140254                  1.281869                  1.025492 
##          transactions7roc         transactions14roc         transactions30roc 
##                  1.409408                  2.017190                  1.111015 
##         transactions90roc                  size3sma                  size7sma 
##                  1.877202                  1.967403                  1.846589 
##                 size14sma                 size30sma                 size90sma 
##                  1.853742                  1.829000                  1.575368 
##                  size3ema                  size7ema                 size14ema 
##                  2.055588                  1.862957                  1.833296 
##                 size30ema                 size90ema                  size3wma 
##                  1.806564                  1.636280                  2.115278 
##                  size7wma                 size14wma                 size30wma 
##                  1.866535                  1.845220                  1.845540 
##                 size90wma                  size3trx                  size7trx 
##                  1.704087                  1.039676                  1.661457 
##                 size14trx                 size30trx                 size90trx 
##                  1.891562                  1.271004                  2.047885 
##                  size3mom                  size7mom                 size14mom 
##                  1.062148                  1.090727                  1.180622 
##                 size30mom                 size90mom                  size3std 
##                  1.118843                  2.208161                  1.524624 
##                  size7std                 size14std                 size30std 
##                  1.296188                  1.367637                  1.385863 
##                 size90std                  size3var                  size7var 
##                  1.170908                  1.840990                  1.420463 
##                 size14var                 size30var                 size90var 
##                  1.513936                  1.581235                  1.385783 
##                  size3rsi                  size7rsi                 size14rsi 
##                  1.039317                  1.037515                  1.057863 
##                 size30rsi                 size90rsi                  size3roc 
##                  1.157272                  1.979183                  1.062259 
##                  size7roc                 size14roc                 size30roc 
##                  1.065166                  1.110975                  1.039777 
##                 size90roc 
##                  1.845790
plot(data.pca.rotated$complexity)

plot(data.pca.rotated$complexity, pch=".", xlim=c(-20, 110), main="Complexity of factors", xlab=" ", ylab="complexity")
text(data.pca.rotated$complexity, labels=names(data.pca.rotated$complexity), cex=0.8) 

In the next step we can do the same examination can be done for uniquness, which can be understood as the proportion of variance of each variable that is not explained by the created factors. Similarly to complexity, high uniquness is not desired, because it means that the variable contains information that the principal components were unable to efficiently capture.


data.pca.rotated$uniqueness
##                  priceUSD              transactions                      size 
##                0.55900940                0.09863132                0.12095161 
##             sentbyaddress                difficulty                  hashrate 
##                0.40377351                0.75948097                0.82862431 
##      mining_profitability              sentinusdUSD        transactionfeesUSD 
##                0.72112520                0.69975579                0.40897090 
## median_transaction_feeUSD          confirmationtime       transactionvalueUSD 
##                0.38252822                0.76530710                0.51771423 
## mediantransactionvalueUSD           activeaddresses                 top100cap 
##                0.33015346                0.47881333                0.44671774 
##          fee_to_rewardUSD          transactions3sma          transactions7sma 
##                0.26490135                0.11409302                0.08353428 
##         transactions14sma         transactions30sma         transactions90sma 
##                0.08343551                0.08094767                0.12522770 
##          transactions3ema          transactions7ema         transactions14ema 
##                0.05219094                0.04516760                0.04658005 
##         transactions30ema         transactions90ema          transactions3wma 
##                0.05120829                0.07421813                0.07660139 
##          transactions7wma         transactions14wma         transactions30wma 
##                0.05864940                0.05808460                0.06375678 
##         transactions90wma          transactions3trx          transactions7trx 
##                0.08195136                0.48404258                0.76885008 
##         transactions14trx         transactions30trx         transactions90trx 
##                0.71590339                0.72615861                0.79486751 
##          transactions3mom          transactions7mom         transactions14mom 
##                0.49203628                0.87666750                0.80469455 
##         transactions30mom         transactions90mom          transactions3std 
##                0.46286881                0.50925069                0.96137520 
##          transactions7std         transactions14std         transactions30std 
##                0.91807045                0.80413384                0.69738164 
##         transactions90std          transactions3var          transactions7var 
##                0.54246942                0.96768776                0.90696171 
##         transactions14var         transactions30var         transactions90var 
##                0.79873246                0.70995063                0.54339654 
##          transactions3rsi          transactions7rsi         transactions14rsi 
##                0.41232468                0.34392117                0.28949473 
##         transactions30rsi         transactions90rsi          transactions3roc 
##                0.22474294                0.18493573                0.51948048 
##          transactions7roc         transactions14roc         transactions30roc 
##                0.88688261                0.82313740                0.50175606 
##         transactions90roc                  size3sma                  size7sma 
##                0.55448999                0.13427534                0.08018607 
##                 size14sma                 size30sma                 size90sma 
##                0.06840191                0.10117688                0.21383984 
##                  size3ema                  size7ema                 size14ema 
##                0.08354019                0.06023494                0.04740556 
##                 size30ema                 size90ema                  size3wma 
##                0.06189627                0.15216207                0.10568626 
##                  size7wma                 size14wma                 size30wma 
##                0.07827041                0.05748333                0.06198925 
##                 size90wma                  size3trx                  size7trx 
##                0.13758350                0.47210972                0.87323775 
##                 size14trx                 size30trx                 size90trx 
##                0.88512912                0.76525480                0.50639630 
##                  size3mom                  size7mom                 size14mom 
##                0.47665409                0.81787951                0.81326655 
##                 size30mom                 size90mom                  size3std 
##                0.54789308                0.68378794                0.55659667 
##                  size7std                 size14std                 size30std 
##                0.34963913                0.29550381                0.30021115 
##                 size90std                  size3var                  size7var 
##                0.27895855                0.70556269                0.51372931 
##                 size14var                 size30var                 size90var 
##                0.38153223                0.34348092                0.31592572 
##                  size3rsi                  size7rsi                 size14rsi 
##                0.43876031                0.38133084                0.37381345 
##                 size30rsi                 size90rsi                  size3roc 
##                0.38743310                0.38543647                0.51839600 
##                  size7roc                 size14roc                 size30roc 
##                0.84059111                0.84276662                0.59976530 
##                 size90roc 
##                0.74287337
plot(data.pca.rotated$uniqueness, pch=".", xlim=c(-20, 110), main="Uniqueness of factors", sub="Proportion of variance that is not shared with other variables.
The higher the number, the higher the (undesired) uniquenss", xlab=" ", ylab="complexity")
text(data.pca.rotated$uniqueness, labels=names(data.pca.rotated$uniqueness), cex=0.8) 


Additionally, both of these measures can be analyzed and visualized together

plot(data.pca.rotated$complexity, data.pca.rotated$uniqueness)

plot(data.pca.rotated$complexity, data.pca.rotated$uniqueness, xlim=c(0, 4))
text(data.pca.rotated$complexity, data.pca.rotated$uniqueness, labels=names(data.pca.rotated$uniqueness), cex=0.8)
abline(h=c(0.38, 0.75), lty=3, col=2)
abline(v=c(1.8), lty=3, col=2)

In result, throughout this examination, it is possible to find the variables that can be characterized by the highest values of complexity and uniquness and therefore, perform the worst among all the characteristics.


#variables with highest complxity and uniquness
set<-data.frame(complex=data.pca.rotated$complexity, unique=data.pca.rotated$uniqueness)
set.worst<-set[set$complex>1.8 & set$unique>0.78,]
set.worst
##                    complex    unique
## transactions14mom 2.031853 0.8046945
## transactions3std  2.100434 0.9613752
## transactions7std  1.945391 0.9180704
## transactions3var  2.135820 0.9676878
## transactions14roc 2.017190 0.8231374
## size14trx         1.891562 0.8851291

Including the variables displayed above in the PCA reduced the quality of the entire process, which is why it is important to analyze the character of these features and their importance for the analysis. Subsequently, if they are not essential, it might be worth to remove them from the dataset, in order to achieve higher quality PCA results.


data.new<-subset(data.s, select = -c(transactions14mom, transactions3std, transactions7std, transactions3var, transactions14roc, size14trx))
data.pca.new<-princomp(data.new)
plot(data.pca.new)

fviz_eig(data.pca.new, choice='eigenvalue')

fviz_eig(data.pca.new)  




5.5 PCA visualizations


Finally, the obtained PCA results can be visualized in many different ways, in order to deepen the understanding of the variables and the relationships featured in the dataset.


# labeled observations in two dimensions
fviz_pca_ind(data.pca, col.ind="#00AFBB", repel=TRUE)

# unlabeled observations in two dimensions with coloured quality of representation
fviz_pca_ind(data.pca, col.ind="cos2", geom="point", gradient.cols=c("white", "#2E9FDF", "#FC4E07" ))

# colour correlation plot
fviz_pca_var(data.pca, col.var = "steelblue")

# Another function based on ggplot2 – autoplot()
library(ggfortify)
autoplot(data.pca)

autoplot(data.pca, loadings=TRUE, loadings.colour='blue', loadings.label=TRUE, loadings.label.size=3)




6. Conclusion


High-dimensional datasets can face many problems with it comes to their interpretability and analytical potential, which is why reduction techniques are often necessary. The MDS method can effectively showcase not only the information and dependencies from a dataset in a 2 dimensional space, but also which of the variables influences those the decisions in the strongest way.PCA, on the other hand, can be used for effective reduction of dimensions with the help of principal components that are formed to explain the variance and maintain the information and the relationships present in the dataset. Both of the showcased methods can not only help to handle the problems that are typical for high-dimensional data, but also the obtained results can be later used for effective clustering, or even for predictive algorithms with the help of other machine learning techniques.