Various analytical problems often include the necessity to handle robust datasets, that contain not only large amounts of observations, but potentially also immense numbers of variables that could be considered. However, high-dimensional data spaces can be difficult to navigate, especially during the clustering, visualization or predictive processes, therefore certain transformation solutions are required. The two most popular techniques to reduce the number of dimensions are MDS and PCA procedures, which use differing approaches, but their main goal remains the same - to transform the variable space into a lower dimensional one, while maintaining as much information present on the data as possible.
The aim of the analysis is to showcase how to effectively apply both MDS and PCA methods to perform the dimension reduction process using the example of Bitcoin market data. Cryptocurrencies exist in digital spaces and in effect, the corresponding data about its processes, transactions or market characteristics is very well documented. However, at the same time its digitization means that there are enormous amounts of information concerning various currencies, including Bitcoin, and in many differing analyses it is not possible to consider all of the available data. Moreover, it would be exceptionally challenging to manually distinguish which characteristics bring the most valuable information about the general processes in a market space. In result, dimension reduction might be imperative.
The dimension reduction process will be showcased using various daily characteristics of the Bitcoin market data, using 1000 observations during the period from 2019-10-28 to 2022-07-23 that were derived from the dataset “https://www.kaggle.com/datasets/sushilkumarinfo/bitcoin-transactional-data/data”. The digital character of the Bitcoin market allowed for access to over 700 different daily variables, however for computational reasons only 106 characteristics were selected. The conducted process will aim to use fewer dimensions in order to characterize different days on the Bitcoin market, while still maintaining the most crucial information that each of the variables brings to the dependencies present.
Conducting an analysis using a very robust dataset can result in many probable difficulties. Various of the considered features might actually be redundant within the context of the entire dataset, not contribute to it in any significant way, and instead only add to the noise. Using high-dimensional data can hinder the visualization and analytical processes not only because of high noise, but also obstacles like difficulties in finding any significant relationships or computational problems. Reduction methods can help distinguish the variables that are the most informative when it comes to the meaningful patterns that are present in the data. The considered dataset contains 106 variables, which which amounts to a very large number of dimensions - such data will be difficult to hard to use for most analytical purposes. In order to convey just how little meaningful information can be derived from the Bitcoin transactional characteristics in their current state the correlation plots for all of the variables in both nominal and scaled forms have been visualized below.
#nominal values
data.cor<-cor(data, method="pearson")
corrplot(data.cor, order ="alphabet", tl.cex=0.6)#scaled values
data.s<-scale(data)
data.s.cor<-cor(data.s, method="pearson")
corrplot(data.s.cor, order ="alphabet", tl.cex=0.6)The fact that some variables are well correlated can be notices, however the amount of dimensions does not allow for acquiring more information than that. Similarly, various relationships present in the data can be visualized by the function visible below, however again nothing significant in terms of valuable facts can be obtained in the current state of this data. In effect, dimension reduction processes are indeed necessary to implement and in the following parts of the analysis 2 of them will be showcased.
MDS (Mutlidimensional Scaling) tries to achieve the goal of reducing dimensions by projecting the data into a two-dimensional plane based on the similarities or dissimilarieties of the inputs. The desired result is to maintain the pairwise distances between data points, in a way that the newly defined euclidian distances between objects resemble the higher-dimensional original ones as closely as possible. Subsequently, the stress function, which measures the dissimilarity of the original distances and the reduced-dimensional distances, is minimized and an optimal dimension reduction in the dataset can be achieved.
library(smacof)
library(labdsv)
library(vegan)
library(MASS)
library(ape)
library(ggfortify)
library(pls)
library(ClusterR)The MDS method can be applied to selected variables through the simple function “mds” that traditionally relies on a distance matrix, which in this case has been calculated using scaled data. The transformed two-dimensional space can be firstly visualized in order to quickly inspect the results and potentially spot any visible outliers.
data.s<-scale(data)
dist.var<-dist(t(data.s))
fit.data<-mds(dist.var, ndim=2, type="ratio")
plot(fit.data, pch=21, cex=as.numeric(fit.data$spp), bg="blue", main="MDS for all variables")The visual representation of the created 2-dimensional space shows, that all of the considered characteristics are grouped similarly and no evident outliers can be noticed. Additionally, the same function could also be applied to the observations, which in this case are different days during the analyzed period, which could show potential diverging points
dist.obs<-dist(data.s)
fit.data<-mds(dist.obs, ndim=2, type="ratio")
plot(fit.data, pch=21, cex=as.numeric(fit.data$spp), bg="yellow", main="MDS for observations")Even though some variables and observations stray away from the main groups, the dataset does not seem to have any very evident outliers, therefore it is possible to move to the next step and focus of the quality of fit for the MDS solution for the considered variables. Apart from the basic statistics of the results, additionally a stress decomposition chart can be plotted for the variables.
##
## Call:
## mds(delta = dist.var, ndim = 2, type = "ratio")
##
## Model: Symmetric SMACOF
## Number of objects: 106
## Stress-1 value: 0.277
## Number of iterations: 92
The first thing that stands out in the stress decomposition chart is that all of the variables used in the MDS can be characterized by individual stress per point scores that are lower than 2%. However, the most important information from those results to take into account is the stress value calculated for the process, which in this case has been settled at a relatively high value of 0.277. Stress levels above 0.2 usually imply a rather poor quality of fit, however the used dataset does have an immense amount of variables, and the MDS solution can sill be statistically significant and helpful in reducing the high-dimensionality.
The stress value calculated for the MDS process indicates that the quality of fit might be rather poor, however that might be unavoidable with that many various variables. Therefore, in the next step of the analysis the actual statistical significance of the solution needs to be tested. For that the “permtest” will be applied in order to check if the tested configuration resembles one obtained from a random permutation of dissimilarities.
fit.data<-mds(dist.var, ndim=2, type="ratio")
set.seed(123)
permFit<-permtest(fit.data, nrep = 100, verbose=FALSE)
permFit##
## Call: permtest.smacof(object = fit.data, nrep = 100, verbose = FALSE)
##
## SMACOF Permutation Test
## Number of objects: 106
## Number of replications (permutations): 100
##
## Observed stress value: 0.277
## p-value: <0.001
The attained results show that the calculated p-value is smaller than 0.001, which means that there are grounds to reject the null hypothesis and in result, the tested model is not a random fit. Therefore, it can be concluded that even though stress value is low, the MDS solution is still statistically significant. Additionally, the analysis could include a second method of testing this fact by finding the statistical significance threshold for a model with the same amount of variables and then comparing it to the actual obtained stress value.
# upper significance threshold
stress.vec<-randomstress(n=106, ndim=2, nrep = 100)
upper<-mean(stress.vec) - 2 * sd(stress.vec)
upper## [1] 0.5306757
## [1] 0.2770874
The upper significance threshold was calculated to be 0.5306757, while the actual stress value amounted to 0.2770874. In effect, the empirical value was definitely below the calculated significance threshold, which statistically means that the MDS model is indeed significant.
All of the analysis so far has been conducted with the help of a distance matrix that is based on the dissimilarities between variables, however the MDS process can also be implemented using a similarity matrix, which is in fact a bit of a different measure. As a part of examining the results of the MDS solution, it might also be beneficiary to check if basing the process on similarity and dissimilarity matrices will give the same or different results. For that goal, the Mantel test can be implemented in order to check if the 2 mentioned matrices can be classified as random, which would imply that they are statistically different from each other.
# similarity matrix
sim<-cor((data.s))
# dissimilarity matrix
dis<-dist(data.s)
dis.t<-dist(t(data.s))
sim[1:5, 1:5]
as.matrix(dis)[1:5, 1:5]
library(smacof)
dis2<-sim2diss(sim, method=1, to.dist = TRUE)
as.matrix(dis2)[1:5, 1:5]
library(ape)## $z.stat
## [1] -124.8857
##
## $p
## [1] 0.001
##
## $alternative
## [1] "two.sided"
The calculated p-value equaled to 0.001, which means that it is necessary to reject the null hypothesis, and in effect, the matrices are indeed statistically similar to each other. Therefore, the MDS process should produce similar results while using both the similarity and the dissimilarity measure, which can indeed be noticed in the visualizations below.
# MDS based on similarity (correlation matrix)
sim<-cor(data.s)
dis2<-sim2diss(sim, method=1, to.dist=TRUE)
fit.data<-mds(dis2, ndim=2, type="ratio") # from smacof::
plot(fit.data, pch=21, cex=as.numeric(fit.data$spp), bg="blue", main="MDS based on similarity")# MDS based on dissimilarity (distance matrix)
dist<-dist(t(data.s))
fit.data<-mds(dist, ndim=2, type="ratio") # from smacof::
plot(fit.data, pch=21, cex=as.numeric(fit.data$spp), bg="purple", main="MDS based on dissimilarity")
After exploring the results for both correlation and distance solutions, as well as ensuring the statistical significance of the process, in the final step the summery of the MDS solution can be explored in more detail.
##
## Call:
## mds(delta = dist, ndim = 2, type = "ratio")
##
## Model: Symmetric SMACOF
## Number of objects: 106
## Stress-1 value: 0.277
## Number of iterations: 92
##
## Configurations:
## D1 D2
## priceUSD 0.4342 0.7545
## transactions -0.2052 -0.5048
## size -0.4071 0.0630
## sentbyaddress 0.0034 -0.3459
## difficulty 0.5529 0.6566
## hashrate 0.6043 0.6065
## mining_profitability 0.3608 0.7715
## sentinusdUSD 0.6355 0.4863
## transactionfeesUSD -0.5095 0.5423
## median_transaction_feeUSD -0.5240 0.4974
## confirmationtime -0.0735 0.8609
## transactionvalueUSD 0.7074 0.5436
## mediantransactionvalueUSD -0.1114 0.7568
## activeaddresses -0.1241 0.2655
## top100cap 0.9137 0.3863
## fee_to_rewardUSD -0.5479 0.3495
## transactions3sma -0.3694 -0.5240
## transactions7sma -0.4756 -0.4792
## transactions14sma -0.5124 -0.4310
## transactions30sma -0.5510 -0.3639
## transactions90sma -0.6424 -0.2487
## transactions3ema -0.3379 -0.4988
## transactions7ema -0.4290 -0.4723
## transactions14ema -0.4838 -0.4350
## transactions30ema -0.5362 -0.3837
## transactions90ema -0.6148 -0.2889
## transactions3wma -0.3210 -0.5108
## transactions7wma -0.4309 -0.4897
## transactions14wma -0.4745 -0.4546
## transactions30wma -0.5227 -0.4012
## transactions90wma -0.5969 -0.3090
## transactions3trx 0.2695 -0.5476
## transactions7trx 0.2305 -0.7969
## transactions14trx 0.1022 -0.8183
## transactions30trx -0.0691 -0.7620
## transactions90trx -0.2609 -0.3242
## transactions3mom 0.4100 -0.3221
## transactions7mom 0.5054 -0.6264
## transactions14mom 0.3643 -0.7225
## transactions30mom 0.4658 -0.4205
## transactions90mom 0.0752 -0.6593
## transactions3std 0.1494 0.7917
## transactions7std 0.0647 0.6768
## transactions14std -0.1434 0.6475
## transactions30std -0.3475 0.6522
## transactions90std -0.3378 0.7557
## transactions3var 0.1708 0.7627
## transactions7var 0.0276 0.6753
## transactions14var -0.1654 0.6458
## transactions30var -0.3631 0.6526
## transactions90var -0.3305 0.7708
## transactions3rsi 0.3432 -0.3839
## transactions7rsi 0.3363 -0.4386
## transactions14rsi 0.3294 -0.4742
## transactions30rsi 0.2886 -0.4958
## transactions90rsi 0.1691 -0.5118
## transactions3roc 0.4476 -0.2993
## transactions7roc 0.5088 -0.6148
## transactions14roc 0.3712 -0.7205
## transactions30roc 0.5069 -0.3920
## transactions90roc 0.0897 -0.6648
## size3sma -0.5550 0.0626
## size7sma -0.6584 0.0541
## size14sma -0.6990 0.0684
## size30sma -0.7188 0.1228
## size90sma -0.7122 0.2516
## size3ema -0.5275 0.0649
## size7ema -0.6152 0.0643
## size14ema -0.6645 0.0788
## size30ema -0.6967 0.1164
## size90ema -0.7085 0.2119
## size3wma -0.5113 0.0616
## size7wma -0.6096 0.0697
## size14wma -0.6595 0.0679
## size30wma -0.7020 0.0910
## size90wma -0.7120 0.1920
## size3trx 0.3360 0.0606
## size7trx 0.0701 0.3708
## size14trx -0.2044 0.3548
## size30trx -0.7440 -0.3054
## size90trx -0.6774 0.3053
## size3mom 0.4213 -0.0352
## size7mom 0.3671 0.3161
## size14mom 0.2235 0.3737
## size30mom 0.1591 0.0936
## size90mom -0.1175 0.0090
## size3std 0.9019 0.2216
## size7std 0.9229 0.0055
## size14std 0.9504 -0.0272
## size30std 0.9467 -0.1229
## size90std 0.8972 -0.2785
## size3var 0.8492 0.2865
## size7var 0.8994 0.0890
## size14var 0.9456 0.0402
## size30var 0.9547 -0.0894
## size90var 0.9149 -0.2511
## size3rsi 0.3460 -0.0632
## size7rsi 0.2693 -0.0354
## size14rsi 0.1943 -0.0363
## size30rsi 0.0959 -0.0652
## size90rsi -0.1209 -0.1048
## size3roc 0.4743 -0.0303
## size7roc 0.4076 0.3243
## size14roc 0.2659 0.4017
## size30roc 0.2448 0.1252
## size90roc -0.0625 0.0542
##
##
## Stress per point (in %):
## priceUSD transactions size
## 1.23 0.44 0.48
## sentbyaddress difficulty hashrate
## 1.03 1.27 1.42
## mining_profitability sentinusdUSD transactionfeesUSD
## 1.47 1.22 0.94
## median_transaction_feeUSD confirmationtime transactionvalueUSD
## 0.87 1.38 1.06
## mediantransactionvalueUSD activeaddresses top100cap
## 1.22 1.31 1.04
## fee_to_rewardUSD transactions3sma transactions7sma
## 0.62 0.44 0.49
## transactions14sma transactions30sma transactions90sma
## 0.48 0.44 0.50
## transactions3ema transactions7ema transactions14ema
## 0.38 0.40 0.42
## transactions30ema transactions90ema transactions3wma
## 0.42 0.46 0.39
## transactions7wma transactions14wma transactions30wma
## 0.43 0.44 0.44
## transactions90wma transactions3trx transactions7trx
## 0.45 1.05 1.32
## transactions14trx transactions30trx transactions90trx
## 1.44 1.44 1.58
## transactions3mom transactions7mom transactions14mom
## 1.12 1.45 1.29
## transactions30mom transactions90mom transactions3std
## 0.99 0.99 1.59
## transactions7std transactions14std transactions30std
## 1.40 1.20 1.14
## transactions90std transactions3var transactions7var
## 1.24 1.53 1.37
## transactions14var transactions30var transactions90var
## 1.18 1.16 1.26
## transactions3rsi transactions7rsi transactions14rsi
## 1.03 0.89 0.79
## transactions30rsi transactions90rsi transactions3roc
## 0.71 0.65 1.17
## transactions7roc transactions14roc transactions30roc
## 1.49 1.33 1.04
## transactions90roc size3sma size7sma
## 1.06 0.40 0.36
## size14sma size30sma size90sma
## 0.37 0.45 0.64
## size3ema size7ema size14ema
## 0.35 0.32 0.33
## size30ema size90ema size3wma
## 0.39 0.55 0.38
## size7wma size14wma size30wma
## 0.34 0.33 0.38
## size90wma size3trx size7trx
## 0.52 1.12 1.49
## size14trx size30trx size90trx
## 1.67 1.50 1.00
## size3mom size7mom size14mom
## 1.14 1.35 1.31
## size30mom size90mom size3std
## 1.16 1.18 1.29
## size7std size14std size30std
## 0.95 0.90 0.91
## size90std size3var size7var
## 0.93 1.44 1.06
## size14var size30var size90var
## 0.95 0.93 0.94
## size3rsi size7rsi size14rsi
## 1.08 1.00 0.97
## size30rsi size90rsi size3roc
## 0.94 0.80 1.13
## size7roc size14roc size30roc
## 1.35 1.33 1.23
## size90roc
## 1.31
The summary allows to inspect the exact configurations of the created 2 dimensions for each of the Bitcoin characteristics. Moreover, it is also possible to explore all of the stress per point values that can show, which of the variables drive the MDS process in the strongest way, and discern that within the analyzed dataset it was a variable “transactions30std”. In conclusion, through the use of the MDS method it was possible to successfully reflect 106 different characteristics of the Bitcoin market features in a 2 dimensional space, as well as obtain relevant information as to which of them affect that process in the strongest way.
PCA (Principal Component Analysis) is another method of reducing the dimensions of a dataset, that this time approaches the issue by standardizing the data and attempting to transform it into newly created linear combinations of the original variables in order to create uncorrelated principal components (eigenvectors), which are meant to be orthogonal to each other. The goal of the transformation is to find the directions of maximum variability in the dataset. Therefore, the first principal component is set to encapsulate the maximum portion of variance present in the data. Subsequently, each of the remaining principal components tries to capture the maximum amount of remaining variance, while maintaining the assumption that it remains orthogonal to the previous one.
The PCA method can be applied to the scaled (standardized) data using the function “princomp”, and subsequently the basic statistics of the result can be viewed as shown below. Within the PCA process the loadings can be understood as weights that link the original variables to the created composite variables (factors). They express the importance of each of the variables in a factor, as well as the correlation between the two, which is why the loadings values can be negative.
## Comp.1 Comp.2 Comp.3 Comp.4 Comp.5
## priceUSD 0.05111680 0.026097999 0.17636776 0.07872926 0.20613726
## transactions -0.11709504 -0.158752933 -0.06179520 0.09036462 -0.01924573
## size -0.15137660 -0.034088993 0.10045304 -0.08262183 -0.07067707
## sentbyaddress -0.07457389 -0.130687510 0.11075640 0.11953986 0.11534905
## difficulty 0.05286409 0.002263429 0.11604920 -0.05815843 0.16545500
## hashrate 0.05832434 -0.019629111 0.06982211 0.01492703 0.19371231
## Comp.6 Comp.7 Comp.8 Comp.9 Comp.10
## priceUSD 0.134015056 0.08599056 0.080877870 0.015154138 0.047233839
## transactions -0.001135062 -0.05034896 0.032390049 0.008083582 -0.010453221
## size -0.034094070 0.01151702 -0.006469017 0.018167575 0.002462194
## sentbyaddress 0.096317638 -0.07308126 -0.055335150 -0.008109474 0.056978644
## difficulty 0.265373931 -0.10562832 -0.164091736 0.111943108 0.078509884
## hashrate 0.273426020 -0.08770796 -0.158466614 0.129434649 0.088794137
## Comp.11 Comp.12 Comp.13 Comp.14 Comp.15
## priceUSD 0.1962473145 0.10578141 0.022172202 0.05618963 0.08506191
## transactions 0.0004119554 0.04130323 0.025297947 -0.02858266 -0.01588057
## size 0.0562486204 -0.04388101 -0.005564006 0.01188730 -0.01331406
## sentbyaddress 0.0151574791 -0.15662040 -0.022807595 -0.10655223 -0.09299314
## difficulty -0.0959263733 -0.15455030 0.034505218 -0.09593637 -0.01480545
## hashrate -0.0556242440 -0.16430743 -0.007795980 -0.14834640 0.01837544
## Comp.16 Comp.17 Comp.18 Comp.19 Comp.20
## priceUSD 0.033559455 0.006640207 0.001771252 0.137388668 0.08192377
## transactions 0.056236296 0.018429616 -0.025179774 0.020334148 -0.03718080
## size -0.008603603 -0.021587059 0.040449506 -0.029116626 0.02440028
## sentbyaddress -0.015596631 0.047973972 0.042782586 0.063948447 0.03179815
## difficulty -0.015333871 0.011366831 0.019783403 0.007938643 0.04141178
## hashrate -0.052845753 -0.038589977 -0.009491789 0.029825200 0.12080006
## Comp.21 Comp.22 Comp.23 Comp.24 Comp.25
## priceUSD 0.10934552 0.014789862 5.533314e-05 0.06339260 0.00840398
## transactions -0.02948391 0.004689639 1.162335e-02 0.03899402 0.02658182
## size -0.04869833 0.045200203 -2.945314e-02 0.01181154 0.03390168
## sentbyaddress 0.06793023 0.065561306 4.940057e-02 0.03861093 -0.00299140
## difficulty 0.05726463 -0.016520407 -1.180958e-02 -0.06400761 -0.12674196
## hashrate 0.01426733 0.019984417 5.002637e-02 -0.08072282 -0.03255380
## Comp.26 Comp.27 Comp.28 Comp.29 Comp.30
## priceUSD 0.073026571 0.01028046 0.06307572 0.01439356 0.008086406
## transactions -0.002547555 -0.03694008 0.01772148 -0.04741350 -0.031023421
## size 0.054909254 -0.02236732 0.02048729 -0.05260563 -0.008026352
## sentbyaddress -0.049963090 -0.07870684 0.09755140 0.05545494 -0.092409716
## difficulty 0.025351201 0.09036203 -0.02399749 -0.01623073 -0.069129331
## hashrate 0.002983003 0.11640819 -0.02811749 -0.01872212 0.063788266
## Comp.31 Comp.32 Comp.33 Comp.34 Comp.35
## priceUSD 0.045342260 0.04182771 0.133175649 0.061003025 0.174290972
## transactions 0.055855089 0.04478540 -0.031400459 0.012508195 0.001442123
## size 0.054227811 -0.02136636 -0.006420975 0.002746118 0.056191730
## sentbyaddress 0.129829647 -0.08010907 0.072163843 0.076106363 0.081867965
## difficulty 0.064164952 0.15000210 0.018982043 -0.043608750 -0.082787327
## hashrate -0.002952678 0.13288725 -0.012094168 0.003514193 -0.021934017
## Comp.36 Comp.37 Comp.38 Comp.39 Comp.40
## priceUSD 0.007646896 0.03346079 0.125358730 0.008232662 0.16417381
## transactions -0.015254608 -0.02697608 -0.032458370 -0.015509273 -0.05180346
## size 0.025682691 0.04893039 -0.001273718 -0.020204384 0.02346076
## sentbyaddress -0.099803016 -0.23044364 -0.005011301 -0.294251858 -0.19688455
## difficulty 0.010046461 -0.04865949 0.095387287 0.142938457 0.07739211
## hashrate -0.016842236 -0.12541033 0.174531361 0.154363327 0.05587428
## Comp.41 Comp.42 Comp.43 Comp.44 Comp.45
## priceUSD 0.07675725 0.27412781 0.006730634 0.02539000 0.09397635
## transactions -0.03383250 0.04713079 -0.061174732 0.05929670 -0.05299128
## size -0.08345323 -0.09076253 -0.034504463 0.10129646 0.06389249
## sentbyaddress -0.31746705 0.01620469 -0.413875633 -0.28131571 -0.19479198
## difficulty 0.03996527 0.03289331 0.045399246 0.12019176 -0.02173706
## hashrate 0.14131226 -0.18906385 0.189687835 0.04020788 -0.17160913
## Comp.46 Comp.47 Comp.48 Comp.49 Comp.50
## priceUSD 0.04612887 0.004850835 0.17327258 0.17849847 0.09524084
## transactions -0.05206526 -0.005870482 -0.19990603 0.13247282 0.01346469
## size 0.05948502 0.101463928 -0.01570080 0.31634795 -0.11487712
## sentbyaddress 0.33109707 -0.146819776 0.03551422 -0.07872893 -0.01058765
## difficulty -0.08158621 -0.168304505 -0.18294824 0.07228358 -0.10357034
## hashrate 0.10695154 -0.176831442 0.16946951 0.34020899 0.09120167
## Comp.51 Comp.52 Comp.53 Comp.54 Comp.55
## priceUSD 0.006949002 0.18653738 0.13688666 0.12650392 0.04611830
## transactions -0.102336097 -0.09603269 0.07036774 0.10016298 -0.01667652
## size -0.163575184 0.05707289 0.24674082 -0.54286818 0.12099012
## sentbyaddress 0.062223525 -0.15187023 0.03241078 -0.04107910 0.02743741
## difficulty -0.018050019 -0.37488259 -0.38299218 -0.05065773 -0.14313806
## hashrate 0.014950976 0.06308067 0.31973000 0.04381560 -0.01407097
## Comp.56 Comp.57 Comp.58 Comp.59 Comp.60
## priceUSD 0.06947233 0.066252124 0.01584674 0.07941346 0.121058994
## transactions 0.09307189 -0.014809696 0.01091873 -0.05135883 0.066201475
## size -0.13847587 0.076905031 -0.11026553 -0.19889625 -0.009037145
## sentbyaddress 0.06311640 0.008591438 0.09178966 -0.03515779 0.017866764
## difficulty -0.27561164 -0.065563940 -0.09112468 -0.12623877 0.240499794
## hashrate 0.19120138 0.019609755 0.14440639 0.05639713 -0.163762539
## Comp.61 Comp.62 Comp.63 Comp.64 Comp.65
## priceUSD 0.04823538 0.0908001882 0.06272563 0.22702078 0.10647241
## transactions -0.04342646 -0.0918548832 -0.08556036 0.02546046 0.32295911
## size -0.01698407 -0.0518201229 0.16099998 -0.06587435 -0.12846823
## sentbyaddress 0.01707151 0.0307163708 0.00358533 0.01542193 -0.07977589
## difficulty 0.05876263 -0.1213070749 0.01207312 0.07242276 -0.02840847
## hashrate -0.06187226 0.0003229005 -0.02590655 -0.16435984 -0.02936401
## Comp.66 Comp.67 Comp.68 Comp.69 Comp.70
## priceUSD 0.28289664 0.11457920 0.04492184 0.27701510 0.216317210
## transactions 0.04146886 -0.09227915 -0.23912251 0.22454065 0.056725856
## size 0.05634155 0.11134696 -0.13316975 0.07841827 0.012361741
## sentbyaddress 0.03008876 0.03523165 0.03787399 0.01503181 0.030816554
## difficulty 0.06806736 0.01041814 0.04106257 0.13310304 -0.007406141
## hashrate -0.15451486 -0.05407727 -0.04741709 -0.11856581 -0.072176462
## Comp.71 Comp.72 Comp.73 Comp.74 Comp.75
## priceUSD 0.19853947 0.03427775 0.14558956 0.10688243 0.0264718415
## transactions -0.04689586 0.08180985 0.07822094 -0.13452243 0.0003952265
## size 0.02072237 -0.02022414 0.18752789 -0.24110468 -0.1332785354
## sentbyaddress 0.01568997 -0.01341173 -0.01642704 0.01922028 0.0068663902
## difficulty 0.08079410 -0.06123598 0.07555449 -0.01445676 0.0047536363
## hashrate -0.09705242 -0.02969681 -0.09867863 0.01963325 0.0266351899
## Comp.76 Comp.77 Comp.78 Comp.79 Comp.80
## priceUSD 0.015010436 0.04684014 0.054358994 0.046190799 0.13243531
## transactions 0.127518233 -0.24586627 0.263812546 -0.296081331 -0.24561648
## size -0.005214799 0.08546870 -0.132264548 -0.008779473 -0.03765026
## sentbyaddress -0.008086521 0.03474203 -0.008044177 0.029181707 0.01777672
## difficulty 0.034408985 0.11458024 0.016751426 -0.003515343 0.07283478
## hashrate -0.002691720 -0.07767162 0.007001689 -0.030624093 -0.03582800
## Comp.81 Comp.82 Comp.83 Comp.84 Comp.85
## priceUSD 0.033129917 0.017130831 0.0211251176 0.032475718 0.029848336
## transactions -0.348218403 -0.001916504 -0.0442611712 -0.079949610 0.015828159
## size -0.040351025 -0.018709370 0.0029555417 0.062199704 0.041440397
## sentbyaddress 0.012394810 -0.004423517 -0.0002851159 0.006078708 0.006252547
## difficulty 0.009263981 0.018432983 0.0389894441 0.025941956 0.001176009
## hashrate -0.030182290 0.027923097 -0.0152538625 -0.020954564 0.002093543
## Comp.86 Comp.87 Comp.88 Comp.89 Comp.90
## priceUSD 0.021399764 0.016137599 0.030872798 0.024573644 0.004017256
## transactions 0.062852317 -0.153628649 -0.126283637 -0.077916290 -0.027885402
## size -0.115666204 0.110305884 -0.052326807 0.036851359 -0.036316705
## sentbyaddress -0.015834155 -0.007106276 -0.001792852 0.002334451 -0.007518546
## difficulty 0.014669761 0.029620864 -0.004030727 0.003458794 0.030264052
## hashrate -0.004980217 -0.011612117 -0.001337955 -0.004982411 -0.001145273
## Comp.91 Comp.92 Comp.93 Comp.94 Comp.95
## priceUSD 0.0072441499 0.001963360 0.033263393 0.019194856 0.032970022
## transactions 0.0280543936 0.023248950 0.067080980 0.079201321 0.047674991
## size -0.0065484394 -0.005574613 -0.004083877 -0.002406366 -0.005026523
## sentbyaddress -0.0077169609 -0.003557902 -0.005430556 -0.004854498 -0.006539158
## difficulty 0.0006156113 0.001195779 0.006449102 0.017768025 -0.015520780
## hashrate -0.0065703277 -0.008745925 -0.017043714 0.003277960 -0.008478999
## Comp.96 Comp.97 Comp.98 Comp.99
## priceUSD 0.0011161395 0.008057712 0.0057509624 0.001264983
## transactions -0.0139583955 0.043407558 0.0687568321 -0.008322029
## size 0.0037286538 -0.007975685 0.0020609561 -0.015558737
## sentbyaddress 0.0004307203 -0.003023487 -0.0001682611 -0.000413360
## difficulty -0.0184464924 -0.012884747 0.0073539525 0.003258814
## hashrate -0.0032627219 -0.005563868 -0.0027467221 0.001889612
## Comp.100 Comp.101 Comp.102 Comp.103
## priceUSD 0.0058984326 5.486655e-04 0.0033262416 4.044426e-05
## transactions -0.0149766744 -3.309526e-02 0.0442165645 8.503120e-02
## size 0.0029951357 -3.605871e-02 -0.0241691015 7.237447e-02
## sentbyaddress 0.0006325967 8.850871e-05 0.0011164614 7.414042e-05
## difficulty -0.0028442534 -1.887503e-03 -0.0003803873 -7.506978e-04
## hashrate -0.0047862276 4.769622e-04 -0.0020148114 4.629469e-04
## Comp.104 Comp.105 Comp.106
## priceUSD 1.199717e-03 0.0001149336 7.383428e-05
## transactions -7.095461e-02 0.0179135075 5.519364e-02
## size 6.763177e-02 0.0588684698 -4.886255e-02
## sentbyaddress 2.899010e-04 0.0001672690 1.613470e-04
## difficulty 3.366370e-06 0.0002915409 -1.138659e-04
## hashrate -8.528443e-04 -0.0001975415 1.514862e-04
After applying the PCA method, it might be beneficial to visualize how much of the overall variance was explained in different created principal components. Additionally it is also possible to see how different variables were grouped together to create the eigenvector, although in this case the relationships between various characteristics is diffficult to see due to their amount.
As show in the column chart above, the first principal component explains a little more than 30% of variance present in the dataset. Apart from a visualization, in order to better understand the characteristics of the principal components, it is also possible to display more details about all the created eigenvectors.
## eigenvalue variance.percent cumulative.variance.percent
## Dim.1 3.262344e+01 3.080764e+01 30.80764
## Dim.2 1.630408e+01 1.539661e+01 46.20425
## Dim.3 1.106560e+01 1.044970e+01 56.65395
## Dim.4 7.761252e+00 7.329266e+00 63.98321
## Dim.5 5.354647e+00 5.056610e+00 69.03982
## Dim.6 4.588589e+00 4.333191e+00 73.37301
## Dim.7 3.300731e+00 3.117015e+00 76.49003
## Dim.8 2.852151e+00 2.693402e+00 79.18343
## Dim.9 2.354893e+00 2.223821e+00 81.40725
## Dim.10 2.259631e+00 2.133861e+00 83.54111
## Dim.11 1.632669e+00 1.541795e+00 85.08291
## Dim.12 1.508827e+00 1.424847e+00 86.50775
## Dim.13 1.351329e+00 1.276114e+00 87.78387
## Dim.14 1.097609e+00 1.036516e+00 88.82038
## Dim.15 9.625008e-01 9.089285e-01 89.72931
## Dim.16 9.113836e-01 8.606565e-01 90.58997
## Dim.17 8.477256e-01 8.005417e-01 91.39051
## Dim.18 7.886294e-01 7.447348e-01 92.13525
## Dim.19 7.533692e-01 7.114371e-01 92.84668
## Dim.20 6.708984e-01 6.335566e-01 93.48024
## Dim.21 6.581412e-01 6.215094e-01 94.10175
## Dim.22 5.986940e-01 5.653711e-01 94.66712
## Dim.23 5.796669e-01 5.474030e-01 95.21452
## Dim.24 5.611426e-01 5.299097e-01 95.74443
## Dim.25 4.308537e-01 4.068726e-01 96.15130
## Dim.26 3.807151e-01 3.595247e-01 96.51083
## Dim.27 3.586030e-01 3.386434e-01 96.84947
## Dim.28 3.432126e-01 3.241096e-01 97.17358
## Dim.29 3.187399e-01 3.009990e-01 97.47458
## Dim.30 3.101841e-01 2.929194e-01 97.76750
## Dim.31 2.706232e-01 2.555605e-01 98.02306
## Dim.32 2.090865e-01 1.974489e-01 98.22051
## Dim.33 1.999273e-01 1.887995e-01 98.40931
## Dim.34 1.672666e-01 1.579566e-01 98.56727
## Dim.35 1.555848e-01 1.469251e-01 98.71419
## Dim.36 1.447504e-01 1.366937e-01 98.85089
## Dim.37 1.315291e-01 1.242082e-01 98.97509
## Dim.38 1.247936e-01 1.178476e-01 99.09294
## Dim.39 1.065216e-01 1.005927e-01 99.19353
## Dim.40 9.872152e-02 9.322673e-02 99.28676
## Dim.41 7.393082e-02 6.981587e-02 99.35658
## Dim.42 7.087815e-02 6.693311e-02 99.42351
## Dim.43 6.045968e-02 5.709453e-02 99.48060
## Dim.44 5.093847e-02 4.810327e-02 99.52871
## Dim.45 4.726190e-02 4.463133e-02 99.57334
## Dim.46 4.110173e-02 3.881403e-02 99.61215
## Dim.47 3.961794e-02 3.741283e-02 99.64957
## Dim.48 3.378478e-02 3.190434e-02 99.68147
## Dim.49 3.117525e-02 2.944005e-02 99.71091
## Dim.50 2.469747e-02 2.332282e-02 99.73423
## Dim.51 2.285924e-02 2.158691e-02 99.75582
## Dim.52 2.075052e-02 1.959556e-02 99.77542
## Dim.53 1.885381e-02 1.780441e-02 99.79322
## Dim.54 1.653295e-02 1.561274e-02 99.80883
## Dim.55 1.575504e-02 1.487813e-02 99.82371
## Dim.56 1.539612e-02 1.453918e-02 99.83825
## Dim.57 1.431699e-02 1.352012e-02 99.85177
## Dim.58 1.349621e-02 1.274502e-02 99.86451
## Dim.59 1.271989e-02 1.201191e-02 99.87653
## Dim.60 1.148745e-02 1.084807e-02 99.88737
## Dim.61 1.107981e-02 1.046311e-02 99.89784
## Dim.62 1.030201e-02 9.728602e-03 99.90757
## Dim.63 9.496860e-03 8.968270e-03 99.91653
## Dim.64 8.499052e-03 8.025999e-03 99.92456
## Dim.65 7.804225e-03 7.369846e-03 99.93193
## Dim.66 7.527123e-03 7.108168e-03 99.93904
## Dim.67 7.307953e-03 6.901196e-03 99.94594
## Dim.68 6.414212e-03 6.057201e-03 99.95200
## Dim.69 5.489877e-03 5.184313e-03 99.95718
## Dim.70 5.131146e-03 4.845550e-03 99.96203
## Dim.71 4.519456e-03 4.267906e-03 99.96629
## Dim.72 4.111992e-03 3.883121e-03 99.97018
## Dim.73 3.740470e-03 3.532278e-03 99.97371
## Dim.74 3.467160e-03 3.274180e-03 99.97698
## Dim.75 2.847916e-03 2.689403e-03 99.97967
## Dim.76 2.675027e-03 2.526136e-03 99.98220
## Dim.77 2.490277e-03 2.351670e-03 99.98455
## Dim.78 2.374990e-03 2.242800e-03 99.98679
## Dim.79 2.257126e-03 2.131496e-03 99.98893
## Dim.80 1.594975e-03 1.506200e-03 99.99043
## Dim.81 1.500816e-03 1.417282e-03 99.99185
## Dim.82 1.379507e-03 1.302724e-03 99.99315
## Dim.83 1.078399e-03 1.018375e-03 99.99417
## Dim.84 1.022391e-03 9.654856e-04 99.99514
## Dim.85 8.233927e-04 7.775631e-04 99.99591
## Dim.86 7.831736e-04 7.395826e-04 99.99665
## Dim.87 6.193632e-04 5.848898e-04 99.99724
## Dim.88 5.665247e-04 5.349922e-04 99.99777
## Dim.89 5.333665e-04 5.036797e-04 99.99828
## Dim.90 3.901322e-04 3.684176e-04 99.99865
## Dim.91 3.522696e-04 3.326625e-04 99.99898
## Dim.92 2.633405e-04 2.486831e-04 99.99923
## Dim.93 2.111102e-04 1.993600e-04 99.99943
## Dim.94 1.825045e-04 1.723464e-04 99.99960
## Dim.95 1.427290e-04 1.347848e-04 99.99973
## Dim.96 1.007373e-04 9.513029e-05 99.99983
## Dim.97 8.161323e-05 7.707069e-05 99.99991
## Dim.98 5.753841e-05 5.433585e-05 99.99996
## Dim.99 1.628254e-05 1.537626e-05 99.99997
## Dim.100 1.312356e-05 1.239311e-05 99.99999
## Dim.101 5.608622e-06 5.296449e-06 99.99999
## Dim.102 4.132084e-06 3.902095e-06 100.00000
## Dim.103 1.734943e-06 1.638377e-06 100.00000
## Dim.104 1.193522e-06 1.127091e-06 100.00000
## Dim.105 4.624676e-07 4.367269e-07 100.00000
## Dim.106 3.239679e-07 3.059361e-07 100.00000
In this way, it is possible to conclude that the first eigenvector describes exactly 30.80764% of variance, and by the 10th component 83.54111% of variance is explained. Both the visualizations concerning eigenvalues and percentage of variance explained by each of them can also be displayed, this time in more precise way.
Additionally, the changes in the cumulative variance based on the number of principal components included can be plotted for easier interpretation of the process.
data.pca<-prcomp(data.s, center=FALSE, scale.=FALSE)
sum<-summary(data.pca)
plot(sum$importance[3,],type="l", main="Cumulative variance")Finally, in terms of the basic statistics for the whole process, it might also be beneficial to take a more detailed look at the most significant variables for each of the principal components separately, in order to understand, what exactly drives the eigenvectors’ creation in a deeper way.
loading_scores_PC_1<-data.pca$rotation[,1]
fac_scores_PC_1<-abs(loading_scores_PC_1)
fac_scores_PC_1_ranked<-names(sort(fac_scores_PC_1, decreasing=T))
data.pca$rotation[fac_scores_PC_1_ranked, 1]## size7ema size14ema size7wma
## 0.1645689260 0.1629269116 0.1629246317
## size3ema size14wma size3wma
## 0.1628308664 0.1623796300 0.1605348354
## size7sma size3sma size30wma
## 0.1602670721 0.1591954091 0.1589241052
## size14sma size30ema size30sma
## 0.1588579007 0.1580706325 0.1537059328
## size transactions90ema transactions90wma
## 0.1513766009 0.1495033726 0.1480431083
## transactions90sma size90wma transactions30sma
## 0.1475113327 0.1464743069 0.1441335042
## transactions30ema size90ema transactions30wma
## 0.1440719810 0.1438701215 0.1410718869
## transactions14ema size90sma transactions14sma
## 0.1385521372 0.1358706673 0.1357401592
## transactions14wma transactions7ema size30var
## 0.1349389611 0.1344969454 -0.1318808769
## transactions7wma size14std size30std
## 0.1314552536 -0.1308497039 -0.1307202893
## transactions7sma fee_to_rewardUSD transactions3ema
## 0.1303771560 0.1302623738 0.1297148334
## top100cap transactions3wma size14var
## -0.1294492006 0.1271253887 -0.1267641306
## transactions3sma size90var size7std
## 0.1264183495 -0.1249641899 -0.1229628288
## size90std transactions size90trx
## -0.1183991278 0.1170950418 0.1156275287
## size7var size3std size90rsi
## -0.1102529488 -0.1039026108 0.1008269040
## median_transaction_feeUSD transactionvalueUSD size3var
## 0.0977879316 -0.0951290422 -0.0854654016
## transactionfeesUSD size30trx size90mom
## 0.0847160180 0.0771490442 0.0763621391
## activeaddresses sentbyaddress transactions90trx
## 0.0749731386 0.0745738851 0.0699416612
## size30rsi size90roc hashrate
## 0.0638001702 0.0626974122 -0.0583243384
## sentinusdUSD size14trx difficulty
## -0.0578267759 0.0565768732 -0.0528640947
## priceUSD transactions30std transactions30var
## -0.0511168037 0.0493454200 0.0492705775
## size30mom size14rsi size7trx
## 0.0487407154 0.0481550745 0.0415746372
## transactions14var size7rsi transactions14std
## 0.0399929099 0.0380160107 0.0372612334
## size3trx size30roc mining_profitability
## 0.0354784180 0.0353650603 -0.0344252810
## transactions90rsi transactions90std size14mom
## 0.0342999449 0.0320991652 0.0320321395
## mediantransactionvalueUSD size3rsi transactions90var
## 0.0291217934 0.0284176431 0.0283837299
## size3mom size14roc size7mom
## 0.0275561064 0.0237120167 0.0229050249
## transactions3rsi transactions3trx transactions3mom
## 0.0217470806 0.0210672801 0.0206059111
## transactions7var transactions7rsi transactions30rsi
## 0.0205449800 0.0203788142 0.0192550664
## transactions90mom transactions14rsi transactions3roc
## 0.0188443215 0.0182015822 0.0173642142
## size3roc transactions90roc confirmationtime
## 0.0160387985 0.0157089033 0.0152855550
## size7roc transactions7std transactions30trx
## 0.0149431504 0.0140690830 0.0130605243
## transactions14trx transactions30mom transactions7trx
## -0.0100785096 0.0098965192 -0.0090900606
## transactions14mom transactions30roc transactions14roc
## -0.0062846467 0.0052013703 -0.0046428040
## transactions3std transactions3var transactions7mom
## 0.0041816541 0.0041386013 -0.0021291575
## transactions7roc
## 0.0005921151
When it comes to the first principal component (PC1) and the proportions of variables that influenced its structure, it can be concluded that the most important characteristics in this case were variables concerning the size of EMA (Exponential Moving Average) and WMA (Week Moving Average) for different length periods.
Aside from the basic indicators for created principal components, it is also possible to examine more detailed statistics for individual results of each of the variables with the help of the function “get_pca_ind”.
Namely, 2 different measures might be worth showcasing: coordinates of the variables and also the contributions of individual characteristics to the principal components.
## Principal Component Analysis Results for individuals
## ===================================================
## Name Description
## 1 "$coord" "Coordinates for the individuals"
## 2 "$cos2" "Cos2 for the individuals"
## 3 "$contrib" "contributions of the individuals"
## Dim.1 Dim.2 Dim.3 Dim.4 Dim.5 Dim.6 Dim.7
## 1 3.110182 -2.115125 1.996596 -5.341517 0.4114232 -1.9810495 0.5949921
## 2 3.046220 -1.279052 2.393504 -4.942883 0.7048107 -1.2786222 0.1198232
## 3 3.018103 -1.514443 2.972079 -3.272689 1.9574775 0.2589512 0.4380423
## 4 2.443978 1.099390 3.927878 -3.761725 1.0670355 0.4138992 0.7706130
## 5 3.972577 -3.207101 1.570670 -5.686634 1.8723561 -0.1995235 0.4626377
## 6 1.247672 2.306453 5.466992 -2.459489 0.7260533 0.1781364 -0.4589488
## Dim.8 Dim.9 Dim.10 Dim.11 Dim.12 Dim.13 Dim.14
## 1 -1.9253815 0.7098793 1.2471655 -0.4664735 -1.161641 0.8935077 0.2023248
## 2 -1.8581880 0.4389236 1.9820983 -0.9952177 -1.368670 1.2217388 0.1701462
## 3 -1.7596777 -1.6775371 0.9532125 -1.1913068 -2.193198 1.5783667 1.1058310
## 4 -0.8994188 -1.5429243 1.2566485 -1.3288615 -2.308481 1.1992347 0.5621118
## 5 -0.5378921 -0.8697609 1.5107494 0.2876124 -2.150945 1.0256856 0.2788326
## 6 -0.1350763 -1.7049140 1.8235696 0.5987034 -1.825420 1.7616414 -0.3411941
## Dim.15 Dim.16 Dim.17 Dim.18 Dim.19 Dim.20
## 1 0.81830817 -0.99598415 -1.3361990 -1.1351167 0.23617133 0.5368643
## 2 0.33245966 -0.39026018 -1.1652220 -2.1308776 1.34049137 0.1201212
## 3 -0.08286611 0.49226495 -0.7016716 -1.6639702 1.23215607 1.3068707
## 4 0.22193386 0.05410055 -0.3935210 -0.5819853 1.16630157 -0.2840510
## 5 0.82273917 -0.78102033 1.1559272 -0.5326324 -0.02939048 -0.7221905
## 6 0.96427292 0.17563643 0.5160374 -0.9406989 -0.68995497 -0.5113339
## Dim.21 Dim.22 Dim.23 Dim.24 Dim.25 Dim.26 Dim.27
## 1 -0.5763006 1.712489151 -0.58298671 0.9269925 0.45260097 1.4331882 0.5078085
## 2 -0.2755938 0.326562487 -0.35470242 0.8944426 0.15282470 0.6095208 0.3697847
## 3 0.8350777 -0.009043401 -0.54205381 0.9918937 0.79563419 0.8711833 -0.3229900
## 4 -0.3043625 -0.475535281 -0.26003923 0.7015818 0.33284132 0.9527877 0.4827439
## 5 -0.6106870 -0.049690817 -0.07088650 1.0853385 0.07328544 0.4573706 0.4862175
## 6 -0.3167080 -0.636291427 -0.05991018 0.7098797 0.72819017 0.7886199 0.6069105
## Dim.28 Dim.29 Dim.30 Dim.31 Dim.32 Dim.33
## 1 -0.2764405 -0.5167003 -0.008195915 0.3927332 0.67565208 -0.01478236
## 2 0.0966099 -0.4138701 0.167349928 0.1586064 0.02736994 0.08178511
## 3 -0.5224968 -0.2245880 0.536478199 -0.2060138 -0.17813112 0.61380674
## 4 -0.2948006 -0.2585394 0.348798225 -0.2892611 -0.27013295 0.33078512
## 5 -0.2218936 -0.4933053 0.921692114 -0.8580277 -0.45829701 0.30803700
## 6 -0.6343732 -0.5628218 0.170973636 -0.1466100 -0.35775427 0.22710844
## Dim.34 Dim.35 Dim.36 Dim.37 Dim.38 Dim.39
## 1 0.03154552 0.43648503 -0.6068635 -0.1968980 0.1425365 0.2692500723
## 2 0.19878746 0.35891980 0.0654158 0.1188744 0.1840157 0.0236590266
## 3 0.53283776 0.06131692 0.1224146 -0.2175115 0.3109167 0.1077589797
## 4 0.42046347 0.02275661 0.5490888 0.1994958 0.1613902 -0.0481327649
## 5 0.59471038 0.30251646 0.1148599 0.2741160 0.1406704 0.0005625872
## 6 0.20883461 0.20130096 0.2146794 0.3631749 0.1852843 -0.0926328893
## Dim.40 Dim.41 Dim.42 Dim.43 Dim.44 Dim.45
## 1 0.05317964 0.00320484 0.057976042 0.09341185 0.02019876 -0.157259745
## 2 0.01530264 -0.11864393 0.043865934 0.08024837 0.02959030 -0.231344366
## 3 0.21799308 -0.06371825 -0.003800305 -0.05540291 -0.19097931 0.202806272
## 4 -0.09343039 -0.12769108 -0.188817984 0.11374199 -0.28484041 -0.114517677
## 5 0.28217444 -0.04021435 -0.232488371 0.25730168 -0.16741969 0.003718343
## 6 0.27235344 0.17872418 -0.179710045 0.20635130 -0.03138347 -0.008653634
## Dim.46 Dim.47 Dim.48 Dim.49 Dim.50 Dim.51
## 1 -0.09140335 0.09545111 0.054690650 0.18687741 0.07320165 -0.04322019
## 2 -0.04494536 0.20889327 0.016293145 0.17185904 0.06130035 -0.04942264
## 3 -0.23515573 0.23660540 -0.174446531 0.13855632 0.17633099 0.10068796
## 4 0.07541472 0.10882991 -0.182636612 0.07282895 0.16354599 -0.11306805
## 5 0.11256109 0.14476982 -0.003990558 0.14460988 0.10488537 0.16006093
## 6 0.19686523 0.06705063 0.090260813 -0.03146318 0.12257169 0.08298218
## Dim.52 Dim.53 Dim.54 Dim.55 Dim.56 Dim.57
## 1 -0.213158667 0.167021447 0.029248437 -0.03602267 -0.028942279 0.13275802
## 2 -0.033581701 0.276210774 -0.037774165 -0.02641210 0.048112924 0.10271896
## 3 -0.052662632 0.282059615 0.152942018 -0.11723364 0.004653775 -0.05248682
## 4 0.003906629 0.325140065 0.118560026 -0.21184400 -0.056527231 -0.10403943
## 5 -0.069423599 -0.005580431 0.006015602 -0.15352835 -0.030684072 0.07096521
## 6 -0.104418521 -0.029187474 -0.075249906 -0.05155839 -0.153318791 0.17601262
## Dim.58 Dim.59 Dim.60 Dim.61 Dim.62 Dim.63
## 1 0.07565446 -0.04226184 0.000485747 -0.04875638 -0.09368409 -0.06154087
## 2 0.06934813 0.04493857 0.058947079 0.05482487 -0.07948826 0.04072994
## 3 0.15186378 0.07339651 0.081146956 0.08085223 -0.06499902 -0.05914954
## 4 0.07422748 0.23856619 0.034602747 0.11688741 0.02847900 0.08234961
## 5 -0.00885657 0.02488315 0.006944500 0.09238875 0.02970363 -0.08466706
## 6 -0.01503800 -0.02118165 -0.040335291 -0.13682040 0.07768572 0.06269353
## Dim.64 Dim.65 Dim.66 Dim.67 Dim.68 Dim.69
## 1 -0.032214243 -0.009387191 0.03370427 -0.13342130 0.09699895 0.01690295
## 2 -0.043540082 -0.201719212 0.10020688 -0.04004851 0.02388404 0.07005445
## 3 0.007075675 0.009022764 0.02638833 0.10460149 0.01615774 0.09938640
## 4 -0.062513662 -0.080573937 0.07405205 0.07743866 -0.01472699 0.10502983
## 5 -0.044513382 0.058709807 0.01909733 0.11255932 0.03124236 0.10526506
## 6 -0.048238360 -0.081293227 0.02171592 -0.02096412 -0.09243918 0.04505514
## Dim.70 Dim.71 Dim.72 Dim.73 Dim.74 Dim.75
## 1 0.025366385 0.037228600 -0.00433998 -0.008675643 -0.044796285 0.023084541
## 2 -0.002588932 0.051675471 -0.02941749 -0.038080312 -0.068866096 0.030465840
## 3 -0.081070567 -0.011990231 -0.06160425 0.044635696 -0.048299862 0.053720638
## 4 -0.118369619 0.054720664 -0.04157752 0.035237162 0.019104754 -0.019510668
## 5 -0.123902276 0.004128187 -0.08623306 0.019481748 -0.003304265 0.028104857
## 6 -0.097642515 -0.004868667 -0.12300637 0.049559680 0.058340159 0.002655803
## Dim.76 Dim.77 Dim.78 Dim.79 Dim.80 Dim.81
## 1 0.003650269 -0.04578579 0.01581004 0.030167279 0.01868854 0.045631063
## 2 0.007615971 -0.04278198 -0.02183660 0.006496716 0.02732224 -0.033339939
## 3 -0.001984994 -0.01118750 -0.04502446 0.025320532 0.02132527 -0.029958854
## 4 0.032858614 -0.01358574 -0.06722917 0.053217157 0.03823546 0.027080528
## 5 -0.005226603 -0.01189815 -0.03118259 0.049314516 -0.01791007 -0.006754942
## 6 0.048080454 0.02312411 -0.08286899 0.021663349 -0.01420106 -0.003753235
## Dim.82 Dim.83 Dim.84 Dim.85 Dim.86 Dim.87
## 1 0.01891128 0.02254287 0.02474040 4.460161e-02 0.009096565 -0.050007017
## 2 0.04431966 0.01951589 0.02926036 4.909250e-02 -0.004684553 -0.007389649
## 3 0.04021769 0.01998814 0.02566855 2.413401e-02 -0.012490292 0.006821574
## 4 0.02441829 0.03932462 0.07104711 5.261024e-02 0.005123208 0.029749465
## 5 0.02348882 0.04098130 0.06031194 -9.869125e-05 -0.026263392 -0.018234124
## 6 0.02869658 0.03083394 0.04522095 2.286492e-02 0.007731091 -0.002938292
## Dim.88 Dim.89 Dim.90 Dim.91 Dim.92 Dim.93
## 1 0.0008745544 0.008442963 0.05853373 -0.028460265 -0.02133826 -0.033754001
## 2 -0.0117972721 0.003430789 0.05791235 0.001777109 -0.02233977 -0.025736248
## 3 -0.0373611659 0.025598234 0.05132453 -0.005757939 -0.01926119 -0.012058277
## 4 -0.0036510809 0.021070426 0.05131036 0.009135699 -0.02704393 -0.010653811
## 5 0.0109087151 0.012160899 0.05731507 -0.015192868 -0.02528357 -0.005321053
## 6 0.0087689964 0.016938911 0.05425207 -0.049333167 -0.01013424 -0.001816369
## Dim.94 Dim.95 Dim.96 Dim.97 Dim.98 Dim.99
## 1 -0.03132429 0.01889590 0.02515069 0.0180061107 -0.013935496 -0.009015511
## 2 -0.02776477 0.01925609 0.01899356 0.0229815380 -0.011567653 -0.004332362
## 3 -0.02264392 0.01503741 0.02158640 0.0133417932 -0.011410139 -0.010989707
## 4 -0.03458096 0.01251773 0.01988936 0.0124739306 -0.005338152 -0.010464584
## 5 -0.01790417 0.01556958 0.02824810 0.0017043146 -0.008797041 -0.005092330
## 6 -0.01900111 0.01376501 0.03218332 0.0004517494 -0.010316294 -0.008388026
## Dim.100 Dim.101 Dim.102 Dim.103 Dim.104
## 1 -0.0006795002 0.003388928 2.783292e-03 0.0024663243 0.0006253679
## 2 -0.0016713722 0.008644409 8.391441e-05 0.0006260123 0.0025185725
## 3 0.0021443804 0.004458362 1.498714e-03 0.0021757596 0.0004056451
## 4 0.0016547121 0.002678760 -5.303867e-04 0.0020453424 0.0003901029
## 5 0.0014596359 0.003360269 1.704527e-03 -0.0007668016 0.0008047144
## 6 0.0022029326 0.001024555 5.783175e-04 0.0012007151 -0.0013209207
## Dim.105 Dim.106
## 1 -0.0015424297 0.0009720011
## 2 -0.0001629268 0.0001308702
## 3 -0.0003446367 0.0005145677
## 4 -0.0004053317 0.0003588643
## 5 -0.0006079400 -0.0010977419
## 6 0.0003508210 -0.0005238898
## Dim.1 Dim.2 Dim.3 Dim.4 Dim.5 Dim.6
## 1 0.029621521 0.027412022 0.03598910 0.3672510 0.003158000 0.0854431012
## 2 0.028415700 0.010024108 0.05172001 0.3144810 0.009267863 0.0355935119
## 3 0.027893549 0.014053198 0.07974641 0.1378616 0.071487190 0.0014598971
## 4 0.018290695 0.007405818 0.13928561 0.1821411 0.021241855 0.0037297144
## 5 0.048325941 0.063022334 0.02227206 0.4162404 0.065405095 0.0008667116
## 6 0.004766908 0.032595553 0.26982815 0.0778616 0.009834939 0.0006908627
## Dim.7 Dim.8 Dim.9 Dim.10 Dim.11 Dim.12
## 1 0.0107146433 0.1298453815 0.021377815 0.06876637 0.013314392 0.08934489
## 2 0.0004345471 0.1209406279 0.008172823 0.17369140 0.060604319 0.12402901
## 3 0.0058074758 0.1084574015 0.119381938 0.04017051 0.086838958 0.31847959
## 4 0.0179733057 0.0283345826 0.100991212 0.06981610 0.108050508 0.35284046
## 5 0.0064779473 0.0101340537 0.032091805 0.10090502 0.005061539 0.30632644
## 6 0.0063750520 0.0006390738 0.123310277 0.14701871 0.021932633 0.22062344
## Dim.13 Dim.14 Dim.15 Dim.16 Dim.17 Dim.18
## 1 0.05902025 0.003725772 0.0695021389 0.1087349469 0.21040327 0.16322005
## 2 0.11034717 0.002634891 0.0114720838 0.0166944757 0.16000278 0.57518758
## 3 0.18417061 0.111300082 0.0007127189 0.0265620822 0.05802003 0.35073863
## 4 0.10631948 0.028758313 0.0051122432 0.0003208246 0.01824930 0.04290586
## 5 0.07777374 0.007076283 0.0702568616 0.0668634790 0.15746032 0.03593749
## 6 0.22942434 0.010595489 0.0965082266 0.0033813764 0.03138141 0.11209695
## Dim.19 Dim.20 Dim.21 Dim.22 Dim.23 Dim.24
## 1 0.0073962567 0.042917822 0.05041324 4.893462e-01 0.0585739183 0.15298354
## 2 0.2382789527 0.002148564 0.01152883 1.779480e-02 0.0216827956 0.14242859
## 3 0.2013209997 0.254316140 0.10585226 1.364659e-05 0.0506374431 0.17515501
## 4 0.1803762509 0.012014377 0.01406140 3.773341e-02 0.0116537234 0.08762920
## 5 0.0001145437 0.077662647 0.05660878 4.120148e-04 0.0008659924 0.20971170
## 6 0.0631246710 0.038933007 0.01522525 6.755736e-02 0.0006185690 0.08971431
## Dim.25 Dim.26 Dim.27 Dim.28 Dim.29 Dim.30
## 1 0.047497053 0.53897905 0.07183754 0.022243618 0.08367707 0.0000216342
## 2 0.005415303 0.09748605 0.03809337 0.002716724 0.05368551 0.0090198016
## 3 0.146778547 0.19915193 0.02906229 0.079463823 0.01580891 0.0926936713
## 4 0.025686810 0.23820878 0.06492098 0.025296407 0.02094993 0.0391827090
## 5 0.001245292 0.05489111 0.06585862 0.014331507 0.07627121 0.2736009936
## 6 0.122949093 0.16319277 0.10261263 0.117136411 0.09928206 0.0094146509
## Dim.31 Dim.32 Dim.33 Dim.34 Dim.35 Dim.36
## 1 0.056937138 0.2181150589 0.0001091895 0.0005943353 0.1223311238 0.254172029
## 2 0.009286282 0.0003579209 0.0033422726 0.0236012078 0.0827166759 0.002953323
## 3 0.015667266 0.0151606910 0.1882593961 0.1695689196 0.0024141206 0.010342188
## 4 0.030887347 0.0348653902 0.0546745603 0.1055875695 0.0003325166 0.208080264
## 5 0.271770932 0.1003537172 0.0474131838 0.2112356884 0.0587619596 0.009105051
## 6 0.007934648 0.0611517754 0.0257726983 0.0260472095 0.0260189618 0.031807270
## Dim.37 Dim.38 Dim.39 Dim.40 Dim.41 Dim.42
## 1 0.02944599 0.01626393 6.798910e-02 0.0028618340 1.387883e-05 4.737511e-03
## 2 0.01073297 0.02710711 5.249541e-04 0.0002369661 1.902090e-02 2.712114e-03
## 3 0.03593423 0.07738584 1.089017e-02 0.0480882618 5.486149e-03 2.035589e-05
## 4 0.03022812 0.02085104 2.172747e-03 0.0088334429 2.203236e-02 5.025044e-02
## 5 0.05707061 0.01584086 2.968297e-07 0.0805729007 2.185254e-03 7.618257e-02
## 6 0.10017872 0.02748214 8.047445e-03 0.0750618705 4.316250e-02 4.551954e-02
## Dim.43 Dim.44 Dim.45 Dim.46 Dim.47 Dim.48
## 1 0.014417951 0.0008001454 0.0522744468 0.020306245 0.02297394 8.844444e-03
## 2 0.010640746 0.0017171897 0.1131285350 0.004909927 0.11003288 7.849721e-04
## 3 0.005071832 0.0715306546 0.0869394873 0.134405330 0.14116365 8.998477e-02
## 4 0.021376731 0.1591192727 0.0277203919 0.013823485 0.02986552 9.863250e-02
## 5 0.109391832 0.0549708710 0.0000292249 0.030795120 0.05284813 4.708816e-05
## 6 0.070358094 0.0019316193 0.0001582892 0.094198372 0.01133651 2.409033e-02
## Dim.49 Dim.50 Dim.51 Dim.52 Dim.53 Dim.54
## 1 0.111910064 0.02167478 0.008163512 2.187472e-01 0.1478124173 0.005169165
## 2 0.094645568 0.01519984 0.010674697 5.429274e-03 0.4042478004 0.008621937
## 3 0.061518853 0.12576803 0.044305608 1.335185e-02 0.4215491863 0.141341180
## 4 0.016996662 0.10819141 0.055870618 7.347518e-05 0.5601539636 0.084935970
## 5 0.067011826 0.04449824 0.111962942 2.320335e-02 0.0001650068 0.000218662
## 6 0.003172202 0.06077058 0.030093544 5.249180e-02 0.0045139782 0.034215824
## Dim.55 Dim.56 Dim.57 Dim.58 Dim.59 Dim.60
## 1 0.008228064 0.0054352513 0.12298019 0.0423665069 0.014027453 2.051928e-06
## 2 0.004423355 0.0150202682 0.07362323 0.0355978005 0.015860631 3.021805e-02
## 3 0.087146589 0.0001405287 0.01922269 0.1707112107 0.042309004 5.726461e-02
## 4 0.284562867 0.0207333558 0.07552827 0.0407833635 0.446992007 1.041269e-02
## 5 0.149459342 0.0061091414 0.03514023 0.0005806103 0.004862871 4.193956e-04
## 6 0.016855615 0.1525263769 0.21617289 0.0016739176 0.003523722 1.414856e-02
## Dim.61 Dim.62 Dim.63 Dim.64 Dim.65 Dim.66
## 1 0.02143365 0.085108985 0.03983939 0.0121980634 0.001127995 0.015076701
## 2 0.02710120 0.061270257 0.01745071 0.0222829920 0.520871098 0.133269746
## 3 0.05894096 0.040969184 0.03680342 0.0005884787 0.001042113 0.009241876
## 4 0.12318807 0.007864897 0.07133596 0.0459351254 0.083104569 0.072779770
## 5 0.07696112 0.008555840 0.07540747 0.0232903606 0.044122188 0.004840405
## 6 0.16878539 0.058522936 0.04134576 0.0273514332 0.084594954 0.006258830
## Dim.67 Dim.68 Dim.69 Dim.70 Dim.71 Dim.72
## 1 0.243343694 0.146540030 0.005199095 0.0125276102 0.0306360459 0.0004576029
## 2 0.021925148 0.008884597 0.089304707 0.0001304945 0.0590266577 0.0210244359
## 3 0.149570345 0.004066150 0.179745009 0.1279609637 0.0031778576 0.0922007914
## 4 0.081975758 0.003377926 0.200737383 0.2727919641 0.0661884217 0.0419981635
## 5 0.173193928 0.015202318 0.201637537 0.2988888039 0.0003767021 0.1806594895
## 6 0.006007904 0.133086596 0.036939552 0.1856218050 0.0005239616 0.3675939910
## Dim.73 Dim.74 Dim.75 Dim.76 Dim.77 Dim.78
## 1 0.002010216 0.0578196630 0.0186930754 0.0004976076 0.084096758 0.01051403
## 2 0.038729357 0.1366477482 0.0325585132 0.0021661473 0.073424272 0.02005736
## 3 0.053211306 0.0672176529 0.1012326462 0.0001471485 0.005020925 0.08527088
## 4 0.033162032 0.0105165782 0.0133531135 0.0403214236 0.007404308 0.19011620
## 5 0.010136666 0.0003145874 0.0277077346 0.0010201791 0.005679064 0.04090043
## 6 0.065598858 0.0980678786 0.0002474172 0.0863325336 0.021451021 0.28886022
## Dim.79 Dim.80 Dim.81 Dim.82 Dim.83 Dim.84
## 1 0.040279299 0.02187572 0.1385987019 0.02589904 0.04707656 0.05980834
## 2 0.001868088 0.04675673 0.0739890727 0.14224419 0.03528278 0.08365804
## 3 0.028376268 0.02848398 0.0597431882 0.11713207 0.03701102 0.06438000
## 4 0.125346713 0.09156809 0.0488148828 0.04317894 0.14325679 0.49322053
## 5 0.107636399 0.02009122 0.0030372546 0.03995437 0.15558139 0.35543073
## 6 0.020771163 0.01263144 0.0009376688 0.05963509 0.08807328 0.19981485
## Dim.85 Dim.86 Dim.87 Dim.88 Dim.89 Dim.90
## 1 2.413568e-01 0.010555100 0.403349919 0.0001348716 0.01335148 0.8773365
## 2 2.924077e-01 0.002799263 0.008807805 0.0245419953 0.00220459 0.8588080
## 3 7.066711e-02 0.019899980 0.007505667 0.2461430160 0.12273254 0.6745337
## 4 3.358141e-01 0.003348047 0.142750746 0.0023506587 0.08315461 0.6741614
## 5 1.181723e-06 0.087985096 0.053627784 0.0209842704 0.02769945 0.8411848
## 6 6.343046e-02 0.007624107 0.001392548 0.0135595869 0.05374161 0.7536789
## Dim.91 Dim.92 Dim.93 Dim.94 Dim.95 Dim.96
## 1 0.2297037911 0.17272928 0.539146370 0.5370990 0.2499127 0.6272997
## 2 0.0008956091 0.18932385 0.313434382 0.4219686 0.2595312 0.3577570
## 3 0.0094020909 0.14073889 0.068806064 0.2806695 0.1582702 0.4620998
## 4 0.0236686728 0.27745181 0.053711364 0.6545848 0.1096742 0.3922987
## 5 0.0654590695 0.24250713 0.013398349 0.1754689 0.1696708 0.7913230
## 6 0.6901894149 0.03896099 0.001561221 0.1976286 0.1326192 1.0271572
## Dim.97 Dim.98 Dim.99 Dim.100 Dim.101 Dim.102
## 1 0.3968667740 0.33717280 0.4986823 0.003514738 0.20456625 0.1872896759
## 2 0.6464918989 0.23232617 0.1151577 0.021264740 1.33100583 0.0001702431
## 3 0.2178880021 0.22604220 0.7409954 0.035003976 0.35404627 0.0543042537
## 4 0.1904634101 0.04947542 0.6718730 0.020842922 0.12781354 0.0068011368
## 5 0.0035555307 0.13436337 0.1591023 0.016218205 0.20112105 0.0702431352
## 6 0.0002498044 0.18478006 0.4316809 0.036941637 0.01869736 0.0080859120
## Dim.103 Dim.104 Dim.105 Dim.106
## 1 0.35025199 0.03273453 0.513919315 0.291337865
## 2 0.02256556 0.53093809 0.005734149 0.005281353
## 3 0.27258509 0.01377296 0.025657080 0.081648542
## 4 0.24088641 0.01273777 0.035489944 0.039712196
## 5 0.03385683 0.05420240 0.079837249 0.371589953
## 6 0.08301570 0.14604560 0.026586142 0.084633695
Additionally, the individual contributions can also be visualized in a more easily interpretable way, as shown below for the first 2 principal components.
# contributions of individual variables to PC
library(gridExtra)
var<-get_pca_var(data.pca)
a<-fviz_contrib(data.pca, "var", axes=1, xtickslab.rt=90)
b<-fviz_contrib(data.pca, "var", axes=2, xtickslab.rt=90)
grid.arrange(a,b,top='Contribution to the first two Principal Components')Despite the visualizations feeling very “busy” due to the large number of individual variables, this function might prove to be especially helpful in understanding the compositions of the created PCs.
While using PCS for dimension reduction, it is also possible to implement a transformation process of the results in order to obtain what is often called “rotated PCA”, which can be characterized by many benefits compared to the initial version. The process is based on the rotation of the original facors (loadings) in order to achieve a simpler structure that will be easier to interpret. The rotated PCA results can simplify the exploration of the influences of each of the variables on created factors, and later even use those conclusions to build “synthetic” variables for future analysis. The statistics of the results for the rotated PCA based on the “varimax” (orthogonal) approach can be viewed below.
##
## Factor analysis with Call: principal(r = data.s, nfactors = 3, rotate = "varimax")
##
## Test of the hypothesis that 3 factors are sufficient.
## The degrees of freedom for the model is 5250 and the objective function was 316.06
## The number of observations was 1000 with Chi Square = 303681.9 with prob < 0
##
## The root mean square of the residuals (RMSA) is 0.11
Moreover, it might also be beneficial to showcase only the
significant factors.
##
## Loadings:
## RC1 RC3 RC2
## priceUSD -0.619
## transactions 0.727 0.610
## transactionvalueUSD -0.650
## top100cap -0.578 -0.459
## transactions3sma 0.883
## transactions7sma 0.946
## transactions14sma 0.946
## transactions30sma 0.937
## transactions90sma 0.873
## transactions3ema 0.878 0.421
## transactions7ema 0.942
## transactions14ema 0.958
## transactions30ema 0.956
## transactions90ema 0.918
## transactions3wma 0.855 0.439
## transactions7wma 0.943
## transactions14wma 0.954
## transactions30wma 0.952
## transactions90wma 0.921
## size 0.434 0.698 0.451
## transactionfeesUSD 0.768
## median_transaction_feeUSD 0.783
## mediantransactionvalueUSD -0.424 0.673
## fee_to_rewardUSD 0.806
## transactions30std 0.543
## transactions90std 0.615
## transactions30var 0.532
## transactions90var 0.605
## size3sma 0.540 0.737
## size7sma 0.572 0.770
## size14sma 0.572 0.775
## size30sma 0.544 0.772
## size90sma 0.410 0.777
## size3ema 0.528 0.756
## size7ema 0.570 0.779
## size14ema 0.577 0.787
## size30ema 0.553 0.792
## size90ema 0.453 0.794
## size3wma 0.517 0.740
## size7wma 0.569 0.769
## size14wma 0.579 0.780
## size30wma 0.569 0.781
## size90wma 0.480 0.787
## size90trx 0.476 0.510
## size3std -0.594
## size7std -0.752
## size14std -0.769
## size30std -0.763
## size90std -0.815
## size7var -0.631
## size14var -0.696
## size30var -0.705
## size90var -0.756
## sentbyaddress 0.714
## activeaddresses 0.479 0.540
## transactions3trx 0.714
## transactions3mom 0.711
## transactions30mom 0.713
## transactions90mom 0.570
## transactions3rsi 0.765
## transactions7rsi 0.807
## transactions14rsi 0.831
## transactions30rsi 0.851
## transactions90rsi 0.846
## transactions3roc 0.689
## transactions30roc 0.687
## transactions90roc 0.556
## size3trx 0.719
## size3mom 0.713
## size30mom 0.653
## size3rsi 0.742
## size7rsi 0.779
## size14rsi 0.780
## size30rsi 0.754
## size90rsi 0.641
## size3roc 0.683
## size30roc 0.626
## difficulty -0.479
## hashrate
## mining_profitability -0.478
## sentinusdUSD -0.477
## confirmationtime 0.418
## transactions7trx
## transactions14trx -0.414
## transactions30trx
## transactions90trx 0.431
## transactions7mom
## transactions14mom
## transactions3std
## transactions7std
## transactions14std 0.421
## transactions3var
## transactions7var
## transactions14var 0.433
## transactions7roc
## transactions14roc
## size7trx
## size14trx
## size30trx 0.454
## size7mom 0.417
## size14mom 0.414
## size90mom 0.435
## size3var -0.456
## size7roc
## size14roc
## size90roc 0.424
##
## RC1 RC3 RC2
## SS loadings 22.775 21.248 16.031
## Proportion Var 0.215 0.200 0.151
## Cumulative Var 0.215 0.415 0.567
Through the application of the rotated PCA algorithm, it is now possible to take time to analyze what each of the groups of obtained RCs consist of in terms of their components. Additionally, it might be worth observing that some variables (like “hash rate”, “transactions30trx” or “size14trx”) are not included in any of the 3 displayed RCs, meaning that in this case they are not important in explaining the changes.
After applying the PCA algorithm, apart from drawing conclusions in terms of the principal components and the variables that affect their structures, it is also important to focus on the quality of the principal components, which can be achieved through the calculations of 2 additional measures: complexity and uniqueness. The complexity of the loadings refers to how spread out factors are across different variables, which directly influences how easy or difficult it is to interpret the loadings. In effect, high complexity is undesirable because it means that the factors created during the PCA process will be difficult to understand. Therefore, it is important to take a look at the calculated values for each of the variables as well as the visualization of their spread.
## priceUSD transactions size
## 1.300937 1.941516 2.458451
## sentbyaddress difficulty hashrate
## 1.347281 1.101114 1.314606
## mining_profitability sentinusdUSD transactionfeesUSD
## 1.439063 1.593108 1.003374
## median_transaction_feeUSD confirmationtime transactionvalueUSD
## 1.016573 1.665037 1.287575
## mediantransactionvalueUSD activeaddresses top100cap
## 1.881133 1.972638 1.959982
## fee_to_rewardUSD transactions3sma transactions7sma
## 1.262669 1.266287 1.050004
## transactions14sma transactions30sma transactions90sma
## 1.047037 1.094414 1.289239
## transactions3ema transactions7ema transactions14ema
## 1.436888 1.149558 1.077558
## transactions30ema transactions90ema transactions3wma
## 1.075277 1.193590 1.492876
## transactions7wma transactions14wma transactions30wma
## 1.117641 1.070121 1.066333
## transactions90wma transactions3trx transactions7trx
## 1.161239 1.022429 2.344668
## transactions14trx transactions30trx transactions90trx
## 2.243331 2.936866 1.210142
## transactions3mom transactions7mom transactions14mom
## 1.009473 1.545844 2.031853
## transactions30mom transactions90mom transactions3std
## 1.110905 2.007374 2.100434
## transactions7std transactions14std transactions30std
## 1.945391 1.214663 1.056201
## transactions90std transactions3var transactions7var
## 1.408223 2.135820 1.528041
## transactions14var transactions30var transactions90var
## 1.144435 1.051178 1.476603
## transactions3rsi transactions7rsi transactions14rsi
## 1.007430 1.016799 1.059940
## transactions30rsi transactions90rsi transactions3roc
## 1.140254 1.281869 1.025492
## transactions7roc transactions14roc transactions30roc
## 1.409408 2.017190 1.111015
## transactions90roc size3sma size7sma
## 1.877202 1.967403 1.846589
## size14sma size30sma size90sma
## 1.853742 1.829000 1.575368
## size3ema size7ema size14ema
## 2.055588 1.862957 1.833296
## size30ema size90ema size3wma
## 1.806564 1.636280 2.115278
## size7wma size14wma size30wma
## 1.866535 1.845220 1.845540
## size90wma size3trx size7trx
## 1.704087 1.039676 1.661457
## size14trx size30trx size90trx
## 1.891562 1.271004 2.047885
## size3mom size7mom size14mom
## 1.062148 1.090727 1.180622
## size30mom size90mom size3std
## 1.118843 2.208161 1.524624
## size7std size14std size30std
## 1.296188 1.367637 1.385863
## size90std size3var size7var
## 1.170908 1.840990 1.420463
## size14var size30var size90var
## 1.513936 1.581235 1.385783
## size3rsi size7rsi size14rsi
## 1.039317 1.037515 1.057863
## size30rsi size90rsi size3roc
## 1.157272 1.979183 1.062259
## size7roc size14roc size30roc
## 1.065166 1.110975 1.039777
## size90roc
## 1.845790
plot(data.pca.rotated$complexity, pch=".", xlim=c(-20, 110), main="Complexity of factors", xlab=" ", ylab="complexity")
text(data.pca.rotated$complexity, labels=names(data.pca.rotated$complexity), cex=0.8) In the next step we can do the same examination can be done for uniquness, which can be understood as the proportion of variance of each variable that is not explained by the created factors. Similarly to complexity, high uniquness is not desired, because it means that the variable contains information that the principal components were unable to efficiently capture.
## priceUSD transactions size
## 0.55900940 0.09863132 0.12095161
## sentbyaddress difficulty hashrate
## 0.40377351 0.75948097 0.82862431
## mining_profitability sentinusdUSD transactionfeesUSD
## 0.72112520 0.69975579 0.40897090
## median_transaction_feeUSD confirmationtime transactionvalueUSD
## 0.38252822 0.76530710 0.51771423
## mediantransactionvalueUSD activeaddresses top100cap
## 0.33015346 0.47881333 0.44671774
## fee_to_rewardUSD transactions3sma transactions7sma
## 0.26490135 0.11409302 0.08353428
## transactions14sma transactions30sma transactions90sma
## 0.08343551 0.08094767 0.12522770
## transactions3ema transactions7ema transactions14ema
## 0.05219094 0.04516760 0.04658005
## transactions30ema transactions90ema transactions3wma
## 0.05120829 0.07421813 0.07660139
## transactions7wma transactions14wma transactions30wma
## 0.05864940 0.05808460 0.06375678
## transactions90wma transactions3trx transactions7trx
## 0.08195136 0.48404258 0.76885008
## transactions14trx transactions30trx transactions90trx
## 0.71590339 0.72615861 0.79486751
## transactions3mom transactions7mom transactions14mom
## 0.49203628 0.87666750 0.80469455
## transactions30mom transactions90mom transactions3std
## 0.46286881 0.50925069 0.96137520
## transactions7std transactions14std transactions30std
## 0.91807045 0.80413384 0.69738164
## transactions90std transactions3var transactions7var
## 0.54246942 0.96768776 0.90696171
## transactions14var transactions30var transactions90var
## 0.79873246 0.70995063 0.54339654
## transactions3rsi transactions7rsi transactions14rsi
## 0.41232468 0.34392117 0.28949473
## transactions30rsi transactions90rsi transactions3roc
## 0.22474294 0.18493573 0.51948048
## transactions7roc transactions14roc transactions30roc
## 0.88688261 0.82313740 0.50175606
## transactions90roc size3sma size7sma
## 0.55448999 0.13427534 0.08018607
## size14sma size30sma size90sma
## 0.06840191 0.10117688 0.21383984
## size3ema size7ema size14ema
## 0.08354019 0.06023494 0.04740556
## size30ema size90ema size3wma
## 0.06189627 0.15216207 0.10568626
## size7wma size14wma size30wma
## 0.07827041 0.05748333 0.06198925
## size90wma size3trx size7trx
## 0.13758350 0.47210972 0.87323775
## size14trx size30trx size90trx
## 0.88512912 0.76525480 0.50639630
## size3mom size7mom size14mom
## 0.47665409 0.81787951 0.81326655
## size30mom size90mom size3std
## 0.54789308 0.68378794 0.55659667
## size7std size14std size30std
## 0.34963913 0.29550381 0.30021115
## size90std size3var size7var
## 0.27895855 0.70556269 0.51372931
## size14var size30var size90var
## 0.38153223 0.34348092 0.31592572
## size3rsi size7rsi size14rsi
## 0.43876031 0.38133084 0.37381345
## size30rsi size90rsi size3roc
## 0.38743310 0.38543647 0.51839600
## size7roc size14roc size30roc
## 0.84059111 0.84276662 0.59976530
## size90roc
## 0.74287337
plot(data.pca.rotated$uniqueness, pch=".", xlim=c(-20, 110), main="Uniqueness of factors", sub="Proportion of variance that is not shared with other variables.
The higher the number, the higher the (undesired) uniquenss", xlab=" ", ylab="complexity")
text(data.pca.rotated$uniqueness, labels=names(data.pca.rotated$uniqueness), cex=0.8)
Additionally, both of these measures can be analyzed and
visualized together
plot(data.pca.rotated$complexity, data.pca.rotated$uniqueness, xlim=c(0, 4))
text(data.pca.rotated$complexity, data.pca.rotated$uniqueness, labels=names(data.pca.rotated$uniqueness), cex=0.8)
abline(h=c(0.38, 0.75), lty=3, col=2)
abline(v=c(1.8), lty=3, col=2)In result, throughout this examination, it is possible to find the variables that can be characterized by the highest values of complexity and uniquness and therefore, perform the worst among all the characteristics.
#variables with highest complxity and uniquness
set<-data.frame(complex=data.pca.rotated$complexity, unique=data.pca.rotated$uniqueness)
set.worst<-set[set$complex>1.8 & set$unique>0.78,]
set.worst## complex unique
## transactions14mom 2.031853 0.8046945
## transactions3std 2.100434 0.9613752
## transactions7std 1.945391 0.9180704
## transactions3var 2.135820 0.9676878
## transactions14roc 2.017190 0.8231374
## size14trx 1.891562 0.8851291
Including the variables displayed above in the PCA reduced the quality of the entire process, which is why it is important to analyze the character of these features and their importance for the analysis. Subsequently, if they are not essential, it might be worth to remove them from the dataset, in order to achieve higher quality PCA results.
data.new<-subset(data.s, select = -c(transactions14mom, transactions3std, transactions7std, transactions3var, transactions14roc, size14trx))
Finally, the obtained PCA results can be visualized in many different ways, in order to deepen the understanding of the variables and the relationships featured in the dataset.
# unlabeled observations in two dimensions with coloured quality of representation
fviz_pca_ind(data.pca, col.ind="cos2", geom="point", gradient.cols=c("white", "#2E9FDF", "#FC4E07" ))autoplot(data.pca, loadings=TRUE, loadings.colour='blue', loadings.label=TRUE, loadings.label.size=3)
High-dimensional datasets can face many problems with it comes to their interpretability and analytical potential, which is why reduction techniques are often necessary. The MDS method can effectively showcase not only the information and dependencies from a dataset in a 2 dimensional space, but also which of the variables influences those the decisions in the strongest way.PCA, on the other hand, can be used for effective reduction of dimensions with the help of principal components that are formed to explain the variance and maintain the information and the relationships present in the dataset. Both of the showcased methods can not only help to handle the problems that are typical for high-dimensional data, but also the obtained results can be later used for effective clustering, or even for predictive algorithms with the help of other machine learning techniques.