I have used al 82 terms taht were sent by Christopher to re-run the analysis. I analysed the titel;d in python and generated table cleaned up for the CA contingency tables and consequtive analysis.
When you click the Knit button a document will be generated that includes both content as well as the output of any embedded R code chunks within the document. You can embed an R code chunk like this:
dt_wr=read.csv("CA_Wratil.csv" , stringsAsFactors = FALSE)
head(dt_wr)
## procedure_ref1 key value
## 1 2009/0026(COD) f_action 1
## 2 2009/0054(COD) f_action 1
## 3 2009/0056(COD) f_action 1
## 4 2010/0044(COD) f_action 1
## 5 2010/0054(COD) f_action 1
## 6 2011/0339(COD) f_action 1
names(dt_wr)[names(dt_wr) == "procedure_ref1"] <- "id"
dt_wr_tbl <- table(dt_wr$id, dt_wr$key)
dt_wr_tbl <- dt_wr_tbl[,colSums(dt_wr_tbl) > 0]
CA_WRT=CA(dt_wr_tbl) ## first quick and not too pretty of a plot
summary(CA_WRT)
##
## Call:
## CA(X = dt_wr_tbl)
##
## The chi square of independence between the two variables is equal to 49846.81 (p-value = 1 ).
##
## Eigenvalues
## Dim.1 Dim.2 Dim.3 Dim.4 Dim.5 Dim.6 Dim.7
## Variance 0.421 0.400 0.380 0.372 0.346 0.302 0.287
## % of var. 3.342 3.181 3.022 2.954 2.748 2.397 2.282
## Cumulative % of var. 3.342 6.523 9.545 12.499 15.247 17.644 19.926
## Dim.8 Dim.9 Dim.10 Dim.11 Dim.12 Dim.13 Dim.14
## Variance 0.282 0.280 0.269 0.262 0.261 0.254 0.253
## % of var. 2.238 2.221 2.135 2.079 2.072 2.016 2.007
## Cumulative % of var. 22.164 24.385 26.521 28.600 30.672 32.688 34.694
## Dim.15 Dim.16 Dim.17 Dim.18 Dim.19 Dim.20 Dim.21
## Variance 0.249 0.247 0.237 0.235 0.233 0.224 0.219
## % of var. 1.976 1.962 1.883 1.871 1.850 1.778 1.743
## Cumulative % of var. 36.671 38.633 40.515 42.386 44.236 46.014 47.757
## Dim.22 Dim.23 Dim.24 Dim.25 Dim.26 Dim.27 Dim.28
## Variance 0.211 0.209 0.205 0.199 0.197 0.196 0.193
## % of var. 1.678 1.657 1.631 1.578 1.567 1.559 1.535
## Cumulative % of var. 49.436 51.093 52.724 54.301 55.868 57.427 58.962
## Dim.29 Dim.30 Dim.31 Dim.32 Dim.33 Dim.34 Dim.35
## Variance 0.189 0.186 0.182 0.177 0.174 0.170 0.163
## % of var. 1.504 1.474 1.442 1.406 1.382 1.347 1.299
## Cumulative % of var. 60.466 61.940 63.382 64.788 66.169 67.516 68.815
## Dim.36 Dim.37 Dim.38 Dim.39 Dim.40 Dim.41 Dim.42
## Variance 0.161 0.157 0.155 0.152 0.149 0.145 0.144
## % of var. 1.281 1.248 1.232 1.207 1.185 1.154 1.148
## Cumulative % of var. 70.096 71.344 72.576 73.783 74.968 76.122 77.269
## Dim.43 Dim.44 Dim.45 Dim.46 Dim.47 Dim.48 Dim.49
## Variance 0.138 0.137 0.134 0.132 0.127 0.125 0.123
## % of var. 1.098 1.090 1.066 1.051 1.005 0.990 0.979
## Cumulative % of var. 78.367 79.457 80.523 81.574 82.580 83.569 84.548
## Dim.50 Dim.51 Dim.52 Dim.53 Dim.54 Dim.55 Dim.56
## Variance 0.116 0.111 0.108 0.108 0.105 0.103 0.100
## % of var. 0.921 0.885 0.856 0.856 0.836 0.817 0.797
## Cumulative % of var. 85.469 86.354 87.210 88.066 88.902 89.719 90.516
## Dim.57 Dim.58 Dim.59 Dim.60 Dim.61 Dim.62 Dim.63
## Variance 0.098 0.095 0.091 0.089 0.085 0.084 0.079
## % of var. 0.779 0.755 0.724 0.705 0.676 0.666 0.625
## Cumulative % of var. 91.295 92.050 92.774 93.479 94.155 94.821 95.447
## Dim.64 Dim.65 Dim.66 Dim.67 Dim.68 Dim.69 Dim.70
## Variance 0.076 0.071 0.070 0.067 0.065 0.054 0.048
## % of var. 0.604 0.568 0.555 0.534 0.518 0.432 0.385
## Cumulative % of var. 96.050 96.618 97.174 97.708 98.226 98.657 99.043
## Dim.71 Dim.72 Dim.73 Dim.74 Dim.75
## Variance 0.039 0.031 0.025 0.016 0.010
## % of var. 0.309 0.246 0.198 0.125 0.079
## Cumulative % of var. 99.352 99.598 99.796 99.921 100.000
##
## Rows (the 10 first)
## Iner*1000 Dim.1 ctr cos2 Dim.2 ctr cos2
## 2005/0214(COD) | 31.400 | 0.248 0.018 0.002 | 1.825 1.050 0.134 |
## 2006/0084(COD) | 1.888 | -0.206 0.008 0.017 | -0.237 0.011 0.023 |
## 2006/0167(COD) | 57.732 | -0.249 0.019 0.001 | -0.582 0.107 0.007 |
## 2007/0112(COD) | 56.305 | -0.341 0.028 0.002 | -0.382 0.037 0.003 |
## 2007/0152(COD) | 66.879 | -0.391 0.028 0.002 | -0.430 0.035 0.002 |
## 2007/0229(COD) | 24.136 | 0.707 0.180 0.031 | 0.313 0.037 0.006 |
## 2007/0286(COD) | 41.960 | -1.558 0.583 0.058 | -1.438 0.522 0.050 |
## 2008/0009(COD) | 50.091 | 0.552 0.037 0.003 | 0.502 0.032 0.003 |
## 2008/0028(COD) | 10.291 | 0.006 0.000 0.000 | -0.129 0.007 0.003 |
## 2008/0062(COD) | 28.924 | 0.755 0.137 0.020 | 0.029 0.000 0.000 |
## Dim.3 ctr cos2
## 2005/0214(COD) 0.380 0.048 0.006 |
## 2006/0084(COD) -0.233 0.011 0.022 |
## 2006/0167(COD) -0.663 0.146 0.010 |
## 2007/0112(COD) -0.488 0.063 0.004 |
## 2007/0152(COD) -0.560 0.062 0.004 |
## 2007/0229(COD) 0.126 0.006 0.001 |
## 2007/0286(COD) 3.151 2.637 0.239 |
## 2008/0009(COD) -0.294 0.011 0.001 |
## 2008/0028(COD) -0.121 0.007 0.003 |
## 2008/0062(COD) 0.252 0.017 0.002 |
##
## Columns (the 10 first)
## Iner*1000 Dim.1 ctr cos2 Dim.2 ctr
## f_action | 173.665 | 0.254 0.105 0.003 | 0.024 0.001
## f_activ | 185.322 | 0.481 0.222 0.005 | -0.250 0.063
## f_adopt | 162.613 | 0.028 0.000 0.000 | 0.160 0.013
## f_agenc | 151.050 | -0.054 0.005 0.000 | -0.288 0.146
## f_amend | 120.812 | -0.247 1.507 0.052 | 0.037 0.036
## f_approxim | 139.454 | 3.065 1.692 0.051 | 1.652 0.517
## f_authoris | 155.916 | -0.325 0.070 0.002 | -0.261 0.047
## f_autonom | 215.152 | -1.209 0.526 0.010 | -1.524 0.879
## f_commiss | 149.654 | -0.169 0.060 0.002 | -0.056 0.007
## f_common | 151.157 | -0.039 0.004 0.000 | -0.435 0.465
## cos2 Dim.3 ctr cos2
## f_action 0.000 | -0.562 0.567 0.012 |
## f_activ 0.001 | -0.549 0.320 0.007 |
## f_adopt 0.000 | -0.414 0.091 0.002 |
## f_agenc 0.004 | -0.387 0.278 0.007 |
## f_amend 0.001 | -0.078 0.165 0.005 |
## f_approxim 0.015 | 1.403 0.392 0.011 |
## f_authoris 0.001 | -0.357 0.093 0.002 |
## f_autonom 0.016 | 5.463 11.886 0.210 |
## f_commiss 0.000 | -0.088 0.018 0.000 |
## f_common 0.012 | -0.288 0.215 0.005 |
The output above suggests that there is no improvement to the quality of the analysis when a substantially larger number of terms/tokens is used. The eigenvalues for dimensions are rather low, as well as the degree of the explain variance by those dimensions. In addition, the same problem with the sky-high chi2 and p for the analysis remains suggesting that there is a very questionable association between the titles and the dimensions.
get_eigenvalue(CA_WRT) ## each constructed dimension allows to explain a very limited amount of the variance.
## eigenvalue variance.percent cumulative.variance.percent
## Dim.1 0.420698326 3.34217023 3.342170
## Dim.2 0.400393822 3.18086436 6.523035
## Dim.3 0.380436602 3.02231744 9.545352
## Dim.4 0.371833030 2.95396774 12.499320
## Dim.5 0.345867989 2.74769265 15.247012
## Dim.6 0.301729163 2.39703884 17.644051
## Dim.7 0.287240545 2.28193634 19.925988
## Dim.8 0.281691276 2.23785107 22.163839
## Dim.9 0.279606146 2.22128609 24.385125
## Dim.10 0.268806465 2.13548976 26.520615
## Dim.11 0.261683616 2.07890343 28.599518
## Dim.12 0.260843089 2.07222600 30.671744
## Dim.13 0.253765671 2.01600059 32.687745
## Dim.14 0.252570164 2.00650307 34.694248
## Dim.15 0.248785251 1.97643444 36.670682
## Dim.16 0.246967944 1.96199714 38.632679
## Dim.17 0.236976685 1.88262318 40.515302
## Dim.18 0.235460956 1.87058172 42.385884
## Dim.19 0.232897697 1.85021832 44.236102
## Dim.20 0.223815959 1.77806991 46.014172
## Dim.21 0.219406730 1.74304150 47.757214
## Dim.22 0.211258902 1.67831239 49.435526
## Dim.23 0.208628562 1.65741608 51.092942
## Dim.24 0.205262095 1.63067172 52.723614
## Dim.25 0.198583456 1.57761435 54.301228
## Dim.26 0.197213223 1.56672875 55.867957
## Dim.27 0.196243835 1.55902761 57.426985
## Dim.28 0.193254766 1.53528143 58.962266
## Dim.29 0.189313807 1.50397311 60.466239
## Dim.30 0.185531408 1.47392445 61.940164
## Dim.31 0.181530560 1.44214035 63.382304
## Dim.32 0.176920294 1.40551484 64.787819
## Dim.33 0.173898682 1.38151013 66.169329
## Dim.34 0.169550173 1.34696410 67.516293
## Dim.35 0.163454469 1.29853776 68.814831
## Dim.36 0.161305518 1.28146576 70.096297
## Dim.37 0.157110290 1.24813745 71.344434
## Dim.38 0.155033985 1.23164257 72.576077
## Dim.39 0.151883385 1.20661314 73.782690
## Dim.40 0.149174314 1.18509136 74.967781
## Dim.41 0.145243356 1.15386249 76.121644
## Dim.42 0.144454716 1.14759728 77.269241
## Dim.43 0.138225013 1.09810640 78.367347
## Dim.44 0.137199497 1.08995936 79.457307
## Dim.45 0.134181504 1.06598339 80.523290
## Dim.46 0.132302266 1.05105409 81.574344
## Dim.47 0.126531442 1.00520870 82.579553
## Dim.48 0.124610181 0.98994556 83.569498
## Dim.49 0.123189282 0.97865745 84.548156
## Dim.50 0.115909861 0.92082726 85.468983
## Dim.51 0.111364073 0.88471398 86.353697
## Dim.52 0.107803278 0.85642582 87.210123
## Dim.53 0.107765121 0.85612268 88.066246
## Dim.54 0.105199734 0.83574237 88.901988
## Dim.55 0.102848798 0.81706575 89.719054
## Dim.56 0.100373111 0.79739805 90.516452
## Dim.57 0.098009943 0.77862424 91.295076
## Dim.58 0.095068082 0.75525310 92.050329
## Dim.59 0.091071667 0.72350422 92.773833
## Dim.60 0.088716368 0.70479293 93.478626
## Dim.61 0.085137452 0.67636080 94.154987
## Dim.62 0.083867025 0.66626810 94.821255
## Dim.63 0.078716778 0.62535279 95.446608
## Dim.64 0.075984626 0.60364765 96.050256
## Dim.65 0.071480682 0.56786679 96.618122
## Dim.66 0.069919609 0.55546510 97.173588
## Dim.67 0.067250262 0.53425890 97.707846
## Dim.68 0.065186420 0.51786303 98.225709
## Dim.69 0.054347887 0.43175806 98.657468
## Dim.70 0.048479523 0.38513778 99.042605
## Dim.71 0.038942131 0.30936950 99.351975
## Dim.72 0.030968822 0.24602683 99.598002
## Dim.73 0.024920668 0.19797824 99.795980
## Dim.74 0.015749878 0.12512238 99.921102
## Dim.75 0.009931318 0.07889776 100.000000
fviz_eig(CA_WRT, addlabels = TRUE, ylim = c(0, 35)) ## this plot show the degree to which each dimension contirbutes to the explaining the variance in the data.
fviz_ca_biplot(CA_WRT,
map ="rowprincipal", arrow = c(TRUE, TRUE),
repel = FALSE)
## normally there would be arrows for red(columns) and blue (row) points which allow to assess the degree pof association; But this gtes very messy when there are a lot of observations
#checking the degree to which each term/token is represented by the constructed dimensions (see cos2 for cols), and whether the the raw points/procedures are well represented by the dimensions
col <- get_ca_col(CA_WRT)
col
## Correspondence Analysis - Results for columns
## ===================================================
## Name Description
## 1 "$coord" "Coordinates for the columns"
## 2 "$cos2" "Cos2 for the columns"
## 3 "$contrib" "contributions of the columns"
## 4 "$inertia" "Inertia of the columns"
fviz_ca_col(CA_WRT, col.col = "cos2",
gradient.cols = c("#00AFBB", "#E7B800", "#FC4E07"),
repel = TRUE)
fviz_cos2(CA_WRT, choice = "col", axes = 1:3, repel = TRUE) # shows which terms are well represented on the factor map across three degenerated dimensions.
#keep in mind that COS2 is limited between 0 and 1, and the closer to one the results is the better the representation.
CA_WRT$col$contrib # the degrees to which the terms contribute to the dimensions. ==> not great!
## Dim 1 Dim 2 Dim 3 Dim 4 Dim 5
## f_action 1.045678e-01 0.001021118 5.669774e-01 2.411370e-01 5.835872e-01
## f_activ 2.218150e-01 0.063046637 3.199378e-01 9.372564e-02 1.135841e+00
## f_adopt 3.876493e-04 0.012953034 9.103039e-02 8.386947e-03 9.457375e-01
## f_agenc 4.878891e-03 0.146241395 2.782137e-01 9.672416e-06 5.121104e-04
## f_amend 1.507428e+00 0.036011727 1.648247e-01 8.405680e-02 1.154418e+00
## f_approxim 1.691553e+00 0.516513443 3.920853e-01 1.843902e-01 2.084624e-01
## f_authoris 6.993309e-02 0.047177027 9.327304e-02 1.348245e-01 4.617433e-01
## f_autonom 5.263411e-01 0.879239182 1.188597e+01 1.587829e+01 1.841490e+00
## f_commiss 6.005059e-02 0.007022089 1.818646e-02 4.226312e-01 1.285634e+00
## f_common 3.545386e-03 0.465119580 2.153973e-01 5.711882e-04 4.373296e-01
## f_commun 1.625399e-03 0.069113802 2.662919e-01 2.625880e-05 1.195429e+00
## f_control 3.606887e-01 0.376225758 6.862773e-01 1.672614e-03 1.167339e-01
## f_cooper 8.684288e-03 0.177478807 8.713662e-03 3.546487e-01 7.538220e-02
## f_derog 1.369796e-02 0.104212932 1.229090e-01 4.225081e-03 1.687458e-02
## f_develop 4.613040e-01 0.055739900 3.751674e-01 9.060600e-01 6.945935e+00
## f_enforce 1.150599e-01 0.001053861 9.734484e-04 2.930348e-01 3.971540e-01
## f_establish 1.678188e-01 1.336369965 2.397571e+00 2.761219e-02 1.225757e+00
## f_european 2.589329e-01 0.166275657 1.626857e-01 3.510048e-02 2.176201e-01
## f_exchang 2.851787e-01 0.034436121 2.406681e-02 9.444453e-01 1.485401e-01
## f_exempt 8.255591e+00 17.891486331 1.193781e-01 1.923611e-03 3.865884e-01
## f_extend 7.747374e-02 0.139154847 2.483327e-01 1.837792e-01 9.091971e-02
## f_extern 5.161590e+00 12.194284824 1.037108e-01 2.366104e-02 3.103284e-01
## f_framework 3.180102e-01 1.465615512 4.505940e-03 1.442694e+00 2.288472e-01
## f_fund 8.814656e-03 0.045973736 4.496555e-02 7.027633e-01 1.040704e+00
## f_govern 1.085052e-01 0.022129617 4.206580e-02 2.679812e-03 1.461814e-03
## f_grant 8.422860e-05 0.167038212 3.939461e-01 8.712904e-03 8.725265e-03
## f_guidelin 2.834860e-02 0.170060221 2.643674e-01 3.877682e-02 1.087507e-01
## f_harmonis 5.631944e+00 2.150668478 1.403130e+00 1.044937e+00 1.307080e+00
## f_implement 7.473637e-02 0.003851285 1.045982e-01 1.128684e-01 9.906052e-01
## f_includ 1.922374e-03 0.025665428 3.163786e-02 8.119682e-03 1.951298e-02
## f_inform 2.076658e-01 0.148376940 2.394220e-03 1.831953e+00 6.759087e-01
## f_instrument 9.189444e-03 0.073416541 3.515175e-02 4.075825e-01 2.997554e-01
## f_intern 7.850530e-03 0.197674842 1.101252e-01 8.781690e-02 1.361978e+00
## f_introduc 5.021417e-01 1.133744913 1.677997e+01 1.784453e+01 8.287552e-01
## f_introduct 2.844949e-01 0.825693560 1.206011e+01 1.017532e+01 8.018637e-01
## f_joint 2.851800e+00 0.307769240 1.206244e-02 1.262727e+00 1.023821e+01
## f_law 8.629153e+00 2.199736335 1.743510e+00 1.374628e+00 1.069136e+00
## f_lay 2.140028e-01 0.171718145 7.758426e-02 1.707257e-01 2.692949e-01
## f_limit 3.720184e-01 0.300972492 6.330215e-02 4.686438e-02 1.514664e-01
## f_list 8.211884e+00 17.852702284 9.482383e-02 4.302914e-03 4.502535e-01
## f_manag 5.160568e-02 0.004899866 3.665466e-01 1.406637e-01 6.313464e-02
## f_member 1.228611e+01 4.022833926 2.165463e+00 9.162897e-02 1.959189e-03
## f_minimum 3.057500e-01 1.514910233 2.226659e-02 3.603917e-03 6.297447e-02
## f_modifi 1.276094e-03 0.071330965 9.228425e-01 9.117820e-01 1.400994e-01
## f_network 6.079475e-03 0.164864816 4.770974e-01 2.007760e-02 2.207168e-01
## f_organis 7.775429e-02 0.302121255 2.379635e-01 6.827954e-01 1.907243e-02
## f_particip 2.668590e+00 0.199457880 2.029617e-01 2.414142e+00 1.098061e+01
## f_period 1.943767e-01 0.248442208 8.403841e-01 6.509303e-01 1.420733e+00
## f_power 3.027160e-03 0.236819185 2.967407e-02 1.500712e-01 2.281311e+00
## f_prevent 2.557510e+00 3.076911772 1.411328e+01 1.201001e+01 5.288607e+00
## f_procedur 3.572020e-02 0.037934775 4.380587e-04 5.129116e-02 5.179519e-01
## f_process 2.588691e-03 0.201848595 5.978460e-01 5.349657e-03 3.024078e-01
## f_programm 1.782263e-01 0.449366651 1.864236e+00 1.303434e+00 7.756407e+00
## f_promot 6.256627e-02 0.005668846 2.403323e-01 1.026780e-02 2.524445e-01
## f_provis 8.282514e-01 0.041804060 3.088455e-02 2.487687e-01 1.464982e-01
## f_purpos 3.673018e-01 0.772317803 3.073972e+00 5.521288e+00 5.427047e-01
## f_reduc 1.338868e-02 0.002665069 3.354725e-01 5.970929e-02 2.442506e+00
## f_regul 1.141973e+00 0.337565639 5.223595e-01 7.895065e-01 4.352098e-01
## f_relat 5.850355e+00 1.579850057 1.438847e+00 1.526493e+00 2.867209e+00
## f_repeal 2.612649e-01 1.110998390 8.975926e-01 2.825150e-02 5.458994e-04
## f_requir 5.416531e+00 13.867047717 4.955946e-02 1.940523e-03 3.018368e-01
## f_respect 6.632479e-01 0.445978718 9.559843e-02 1.291517e-02 6.963202e-02
## f_rev 3.413577e+00 1.850637944 1.418446e+01 1.189858e+01 5.543768e+00
## f_rule 1.071264e-03 0.263926814 9.110778e-02 6.925713e-03 4.314013e-01
## f_scheme 9.288509e-02 0.569019438 1.976352e-01 1.029886e-01 1.569915e-02
## f_second 7.047494e-01 0.369100273 1.576014e-02 2.138199e-01 1.866936e+00
## f_set 1.848284e-01 0.001262177 2.320085e-01 1.484148e-02 2.564594e-01
## f_standard 5.859831e-02 0.076049989 3.851833e-01 2.359935e-02 8.201443e-01
## f_state 1.213097e+01 4.162177639 2.224316e+00 1.211328e-01 5.585722e-04
## f_statist 4.619860e-02 0.054077005 3.356517e-01 1.106902e-02 7.513767e-01
## f_structur 1.485832e-01 0.394706701 1.143457e-01 1.367876e-02 2.825965e-01
## f_support 6.855596e-03 0.553941556 6.821261e-01 1.940129e-01 3.814743e+00
## f_system 3.633121e-02 0.448706370 4.553624e-02 1.922683e+00 1.663872e-02
## f_time 3.181748e-03 0.015970076 3.273492e-01 7.982995e-02 2.364039e-01
## f_undertak 2.898342e+00 0.493060230 3.176362e-02 1.007611e+00 8.458089e+00
## f_union 4.796212e-01 0.068739543 7.969117e-01 1.362092e+00 2.356285e+00
fviz_contrib(CA_WRT, choice = "col", axes = 1:3, top = 20) ## I am looking at the first three dimensions, as there is no point to look further , also consder top 20 only. Also the contribution does not quite change whether one considers 1 or three dimension (very minor differences==> see the next plot for 1 dim)
fviz_contrib(CA_WRT, choice = "col", axes = 1:1, top=20) # plot for the first dimension & top 20 tokens
cal <- dimdesc(CA_WRT, axes = c(1,2))
cal[[1]]$col ## The terms which were used in a more narrow analysis (for instance CA with 5, 7 or 11 terms), appear to have a similar location along the coordinates, suggesting that for instance, 'harmonis' has a strong positive correlation with the first dimension (along the x axis). this is quite encouraging. Yet one ought to be cautions due to the overall weakness of the analysis.
## coord
## f_exempt -3.53598753
## f_list -3.24401400
## f_extern -1.93357409
## f_rev -1.77746719
## f_prevent -1.63185530
## f_requir -1.54099706
## f_autonom -1.20890172
## f_minimum -0.75230673
## f_introduct -0.68844661
## f_purpos -0.68607697
## f_introduc -0.68172603
## f_control -0.56237043
## f_extend -0.50807196
## f_derog -0.47770630
## f_set -0.45307672
## f_structur -0.40623016
## f_period -0.39268605
## f_framework -0.36857106
## f_guidelin -0.34361239
## f_organis -0.32855232
## f_authoris -0.32544530
## f_promot -0.30782705
## f_standard -0.29790595
## f_scheme -0.29320389
## f_lay -0.27839652
## f_amend -0.24688953
## f_statist -0.18292938
## f_regul -0.18080038
## f_commiss -0.16906651
## f_repeal -0.14789279
## f_establish -0.14081332
## f_reduc -0.13633621
## f_system -0.11864206
## f_modifi -0.08418093
## f_network -0.08217134
## f_european -0.07170437
## f_process -0.06567092
## f_support -0.06503898
## f_time -0.06385490
## f_intern -0.05790955
## f_power -0.05614236
## f_agenc -0.05387838
## f_common -0.03891638
## f_commun -0.02142333
## f_rule -0.01870665
## f_grant -0.01248652
## f_adopt 0.02841237
## f_fund 0.05314152
## f_instrument 0.07143597
## f_cooper 0.07188215
## f_includ 0.10332165
## f_procedur 0.14578435
## f_manag 0.16928603
## f_implement 0.18101193
## f_union 0.24062604
## f_action 0.25400962
## f_programm 0.25977209
## f_inform 0.27727403
## f_enforce 0.30212414
## f_govern 0.44816407
## f_activ 0.48058300
## f_develop 0.55444263
## f_provis 0.56647382
## f_exchang 0.62921766
## f_respect 0.69311775
## f_limit 0.78725456
## f_relat 1.12506792
## f_state 1.56990688
## f_member 1.59954008
## f_undertak 1.85713634
## f_law 1.89577624
## f_particip 2.22255572
## f_harmonis 2.28310498
## f_second 2.42289927
## f_joint 2.60521471
## f_approxim 3.06489151
head(CA_WRT$row$contrib)
## Dim 1 Dim 2 Dim 3 Dim 4 Dim 5
## 2005/0214(COD) 0.018466274 1.04995905 0.047843016 0.001507762 0.015595719
## 2006/0084(COD) 0.007624735 0.01060780 0.010804060 0.000882991 0.001683487
## 2006/0167(COD) 0.018678534 0.10675882 0.146046256 0.024773594 0.143970878
## 2007/0112(COD) 0.027910845 0.03679312 0.063120613 0.018308481 0.010594502
## 2007/0152(COD) 0.027512372 0.03498206 0.062342535 0.049514333 0.008050663
## 2007/0229(COD) 0.180036979 0.03711942 0.006368583 0.002410096 0.070025689
tail(CA_WRT$row$contrib)
## Dim 1 Dim 2 Dim 3 Dim 4 Dim 5
## 2020/0069(COD) 7.956638e-03 2.481937e-02 0.0268685356 0.0023492564 0.09185302
## 2020/0071(COD) 2.406152e-02 1.072409e-02 0.0479650458 0.0347237807 0.03518268
## 2020/0075(COD) 4.427837e-05 2.580831e-02 0.0437260566 0.0903548272 0.43312971
## 2020/0099(COD) 1.057420e-03 7.666496e-04 0.0549145972 0.0089783489 0.01100579
## 2020/0113(COD) 1.605031e-04 3.501055e-06 0.0009037658 0.0002295717 0.01731057
## 2020/0128(COD) 3.947967e-01 5.130587e-01 2.6911142935 2.6379561615 0.86837456
row1 <- get_ca_row(CA_WRT)
row1
## Correspondence Analysis - Results for rows
## ===================================================
## Name Description
## 1 "$coord" "Coordinates for the rows"
## 2 "$cos2" "Cos2 for the rows"
## 3 "$contrib" "contributions of the rows"
## 4 "$inertia" "Inertia of the rows"
#let's plot the contribution by row/procedure
fviz_ca_row(CA_WRT, col.row = "contrib",
gradient.cols = c("#00AFBB", "#E7B800", "#FC4E07"),
repel = FALSE)
fviz_ca_row(CA_WRT, col.row = "cos2",
gradient.cols = c("#00AFBB", "#E7B800", "#FC4E07"),
repel =FALSE)
head(CA_WRT$row$cos2, 20) # this can be plotted as well, but with the large N of observation the plot becomes quite useless
## Dim 1 Dim 2 Dim 3 Dim 4 Dim 5
## 2005/0214(COD) 2.474106e-03 1.338838e-01 0.0057965424 1.785456e-04 0.0017178449
## 2006/0084(COD) 1.699449e-02 2.250221e-02 0.0217761867 1.739469e-03 0.0030848396
## 2006/0167(COD) 1.361126e-03 7.404160e-03 0.0096240399 1.595591e-03 0.0086252107
## 2007/0112(COD) 2.085438e-03 2.616421e-03 0.0042648829 1.209077e-03 0.0006507955
## 2007/0152(COD) 1.730646e-03 2.094315e-03 0.0035463056 2.752886e-03 0.0004163432
## 2007/0229(COD) 3.138058e-02 6.157679e-03 0.0010038148 3.712879e-04 0.0100345109
## 2007/0286(COD) 5.847025e-02 4.977031e-02 0.2390459806 1.648894e-01 0.0638742745
## 2008/0009(COD) 3.067789e-03 2.536475e-03 0.0008724071 5.603665e-04 0.0011421051
## 2008/0028(COD) 6.253042e-06 2.875963e-03 0.0025050330 1.624094e-02 0.0303144688
## 2008/0062(COD) 1.992833e-02 2.853791e-05 0.0022199959 3.286652e-02 0.0169541586
## 2008/0098(COD) 1.957680e-02 3.734944e-03 0.0020465616 9.270256e-03 0.0190547017
## 2008/0142(COD) 3.285403e-03 2.109746e-03 0.0020641958 4.453634e-04 0.0027612159
## 2008/0147(COD) 4.636681e-03 4.097779e-03 0.0021611000 1.311987e-05 0.0006842555
## 2008/0157(COD) 2.723746e-02 1.338503e-02 0.0073688074 1.005454e-02 0.0478934805
## 2008/0183(COD) 4.704701e-03 3.060092e-03 0.0229016757 2.882696e-02 0.0013758793
## 2008/0192(COD) 6.153331e-04 3.457522e-03 0.0074503173 7.605986e-04 0.0095567646
## 2008/0196(COD) 2.050087e-02 9.906483e-03 0.0160379513 1.225571e-04 0.0071261991
## 2008/0198(COD) 7.560825e-03 4.403027e-03 0.0032251929 1.387799e-04 0.0064314471
## 2008/0211(COD) 4.470334e-03 8.620708e-03 0.0289360009 5.399474e-02 0.0045943458
## 2008/0222(COD) 5.193406e-03 1.055831e-05 0.0004164456 1.186366e-02 0.0296619754
tail(CA_WRT$row$cos2, 20 )
## Dim 1 Dim 2 Dim 3 Dim 4 Dim 5
## 2019/0107(COD) 1.119704e-04 6.635492e-06 1.940041e-03 0.0048923597 4.085620e-02
## 2019/0108(COD) 9.299831e-06 2.192519e-07 1.560442e-03 0.0053435847 3.611096e-02
## 2019/0179(COD) 9.765435e-03 3.869573e-03 1.424066e-02 0.0111596416 5.479587e-03
## 2019/0180(COD) 5.823129e-03 1.669762e-03 5.791222e-04 0.0147210221 9.901825e-04
## 2019/0192(COD) 3.285403e-03 2.109746e-03 2.064196e-03 0.0004453634 2.761216e-03
## 2020/0043(COD) 8.840948e-02 2.296310e-02 2.007650e-02 0.0115717890 1.082061e-03
## 2020/0054(COD) 5.136736e-03 5.738882e-03 1.746620e-03 0.0024295074 3.915235e-03
## 2020/0058(COD) 1.357262e-02 2.080720e-02 2.587185e-01 0.2750537647 2.283554e-02
## 2020/0059(COD) 4.884862e-02 2.708800e-03 1.331776e-02 0.0108197851 3.031428e-02
## 2020/0060(COD) 1.100056e-04 2.283742e-06 5.601430e-04 0.0001390679 9.753993e-03
## 2020/0065(COD) 3.285403e-03 2.109746e-03 2.064196e-03 0.0004453634 2.761216e-03
## 2020/0066(COD) 4.884862e-02 2.708800e-03 1.331776e-02 0.0108197851 3.031428e-02
## 2020/0067(COD) 8.024978e-06 2.982809e-04 4.057852e-05 0.0003003962 7.203513e-05
## 2020/0068(COD) 5.969770e-03 4.769191e-03 9.064545e-03 0.0001452081 2.672095e-04
## 2020/0069(COD) 3.436439e-03 1.020203e-02 1.049384e-02 0.0008967816 3.261461e-02
## 2020/0071(COD) 6.201916e-03 2.630752e-03 1.117993e-02 0.0079105544 7.455406e-03
## 2020/0075(COD) 1.208107e-05 6.701776e-03 1.078861e-02 0.0217892472 9.715636e-02
## 2020/0099(COD) 1.139139e-04 7.860367e-05 5.349690e-03 0.0008548757 9.747432e-04
## 2020/0113(COD) 1.100056e-04 2.283742e-06 5.601430e-04 0.0001390679 9.753993e-03
## 2020/0128(COD) 1.885457e-02 2.331990e-02 1.162215e-01 0.1113493693 3.409492e-02
fviz_contrib(CA_WRT, choice = "row", axes = 1, top = 20, sort.val = 'desc') ##the most contributing cases to the dimension 1
fviz_cos2(CA_WRT, choice = "row", axes = 1, top = 45
, sort.val = 'desc')# top best represented procedures by the dimension==> the assumption behind is that the dimension is actually adequate
library('tibble')
library(stringr)
dims <- CA_WRT$row$cos2
dims <- as.data.frame(dims)
dims <- dims %>%
rownames_to_column("rname") %>%
as_tibble(dims)
## Warning: The `validate` argument of `as_tibble()` is deprecated as of tibble 2.0.0.
## Please use the `.name_repair` argument instead.
## This warning is displayed once every 8 hours.
## Call `lifecycle::last_warnings()` to see where this warning was generated.
names(dims)<-str_replace_all(names(dims), c(" " = "_", "," = "" ))
dims2 <- dims %>%
select(rname, Dim_1) %>%
arrange(desc((Dim_1)))
print(dims2, n = 40)
## # A tibble: 839 x 2
## rname Dim_1
## <chr> <dbl>
## 1 2011/0349(COD) 0.384
## 2 2011/0350(COD) 0.384
## 3 2011/0351(COD) 0.384
## 4 2011/0354(COD) 0.384
## 5 2011/0358(COD) 0.384
## 6 2013/0221(COD) 0.384
## 7 2012/0283(COD) 0.365
## 8 2016/0142(COD) 0.334
## 9 2011/0356(COD) 0.316
## 10 2011/0352(COD) 0.307
## 11 2011/0353(COD) 0.307
## 12 2010/0137(COD) 0.301
## 13 2010/0192(COD) 0.301
## 14 2011/0138(COD) 0.301
## 15 2012/0309(COD) 0.301
## 16 2013/0415(COD) 0.301
## 17 2016/0075(COD) 0.301
## 18 2016/0125(COD) 0.301
## 19 2018/0066(COD) 0.286
## 20 2018/0390(COD) 0.283
## 21 2013/0162(COD) 0.272
## 22 2017/0354(COD) 0.269
## 23 2011/0357(COD) 0.245
## 24 2010/0204(COD) 0.217
## 25 2009/0005(COD) 0.194
## 26 2009/0056(COD) 0.189
## 27 2009/0169(COD) 0.176
## 28 2016/0151(COD) 0.176
## 29 2016/0325(COD) 0.174
## 30 2013/0242(COD) 0.174
## 31 2013/0233(COD) 0.168
## 32 2010/0197(COD) 0.163
## 33 2013/0232(COD) 0.161
## 34 2015/0283(COD) 0.141
## 35 2011/0209(COD) 0.134
## 36 2011/0211(COD) 0.134
## 37 2012/0366(COD) 0.131
## 38 2013/0089(COD) 0.130
## 39 2014/0206(COD) 0.129
## 40 2011/0283(COD) 0.129
## # ... with 799 more rows
print(tail(dims2, n=40), n=40)
## # A tibble: 40 x 2
## rname Dim_1
## <chr> <dbl>
## 1 2012/0169(COD) 1.61e- 5
## 2 2018/0140(COD) 1.61e- 5
## 3 2010/0195(COD) 1.57e- 5
## 4 2010/0301(COD) 1.57e- 5
## 5 2016/0186(COD) 1.53e- 5
## 6 2013/0092(COD) 1.47e- 5
## 7 2017/0225(COD) 1.38e- 5
## 8 2020/0075(COD) 1.21e- 5
## 9 2012/0179(COD) 1.17e- 5
## 10 2010/0110(COD) 1.07e- 5
## 11 2012/0217(COD) 1.07e- 5
## 12 2012/0278(COD) 1.07e- 5
## 13 2013/0438(COD) 1.07e- 5
## 14 2013/0439(COD) 1.07e- 5
## 15 2014/0165(COD) 1.07e- 5
## 16 2014/0272(COD) 1.07e- 5
## 17 2014/0276(COD) 1.07e- 5
## 18 2015/0906(COD) 1.07e- 5
## 19 2016/0345(COD) 1.07e- 5
## 20 2019/0040(COD) 1.07e- 5
## 21 2019/0108(COD) 9.30e- 6
## 22 2011/0051(COD) 9.23e- 6
## 23 2013/0279(COD) 9.01e- 6
## 24 2020/0067(COD) 8.02e- 6
## 25 2014/0256(COD) 7.02e- 6
## 26 2011/0428(COD) 6.87e- 6
## 27 2008/0028(COD) 6.25e- 6
## 28 2009/0165(COD) 3.86e- 6
## 29 2010/0004(COD) 3.41e- 6
## 30 2016/0257(COD) 2.16e- 6
## 31 2011/0203(COD) 1.78e- 6
## 32 2010/0275(COD) 1.62e- 6
## 33 2011/0294(COD) 1.62e- 6
## 34 2009/0070(COD) 1.36e- 6
## 35 2012/0364(COD) 1.13e- 6
## 36 2010/0041(COD) 8.13e- 7
## 37 2010/0101(COD) 3.63e- 7
## 38 2013/0265(COD) 3.61e- 8
## 39 2011/0093(COD) 1.49e- 9
## 40 2013/0202(COD) 5.29e-10