University of Warsaw
Our life has, in spite of the guarantee of various types of freedom, certain pre-imposed limitations, assumptions and rules which are highly improbable to influence. One such pre-imposed limitation is the law, which, by surrounding us every day, sets moral boundaries, orders, prohibits or advises us.The situation changes when it is broken. In this case, the “favorable” roles it plays for us take the form of punishments and sanctions for not obeying it. A very difficult and no less troubling question is: why do such situations occur at all? Arriving at an unambiguous and truthful answer to this question would undoubtedly turn the world upside down. Just try to imagine our world free of crime. Despite the fact that such a situation is ruled out by evolutionary algorithms (e.g. the evolution of trust), which show how little it takes to introduce society to mutual dishonesty, it is certainly possible to reduce disorderly behavior as evidenced by the differences in crime rates between countries and continents. That being said, what could be the reason for breaking the law? In order to answer this question, we can focus on the other side of the issue: Why do people obey the laws that apply to them? On one hand, it is caused by an aversion to the risk of receiving punishment for their actions. More important for us in this analysis is that it is also caused by the perception of the legal system as an authority and, consequently, respect and trust in it. Especially today, the relationship of trust in the legal system and behavior in accordance with its rules is noticeable, given the situation in Ukraine and Russia. According to the article Trust in Justice: Topline Results from the Round 5 of the European Social Survey, existing evidence suggests that perceptions of legality may be a stronger predictor of behavior in accordance with the law than perceptions of deterrence risk. With that in mind, I will focus in this paper on finding out if there are any relationships between trust in legal system and other factors included in European Social Survey data (round 2016).
library(readr)
library(knitr)
library(clusterSim)
library(psych)
library(factoextra)
library(gridExtra)
library(corrplot)
library(FactoMineR)
library(fpc)
library(stringr)
library(corrplot)
library(hopkins)
Data was gathered by ESS in 2016. They surveyed randomly chosen people from 23 countries. Data consisted of 534 columns and 44 387 observations. For the purpose of this analysis, only numerical data with valid responses was taken into consideration. By cleaning the data, many observations were removed and/or transformed:
essdta <- read.csv("ESS8e02.1_F1.csv", header=TRUE, sep=";")
essdtanew <- data.frame(essdta[,c("trstlgl","ctzcntr","dscrgrp","hinctnta","gndr","eduyrs","trstplc","trstprl","trstplt","agea","psppsgva","actrolga","cptppola","psppipla","nwspol","netustm","cntry","ppltrst","pplfair","pplhlp","polintr","trstprt","trstep","trstun","vote","contplt","wrkprty","wrkorg","badge","sgnptit","pbldmn","bctprd","pstplonl","clsprty","prtdgcl","lrscale","stflife","stfeco","stfgov", "stfdem","stfedu","stfhlth","gincdif")])
countries_lbls <- essdtanew[,17]
rm(essdta)
essdtanew <- na.omit(essdtanew)
for (i in 1:ncol(essdtanew)){
if(i %in% c(1,4,6,7,8,9,10,18,19,20,22:24,36:42)) {essdtanew <- essdtanew[essdtanew[,i]!="99"&essdtanew[,i]!="88"&essdtanew[,i]!="77"&essdtanew[,i]!="999",]}
if(i %in% c(2,3,5,11:14,21,25:34,43)) {essdtanew <- essdtanew[essdtanew[,i]!="7"&essdtanew[,i]!="8"&essdtanew[,i]!="9",]}
if(i == 25) {essdtanew <- essdtanew[essdtanew[,i]!="7"&essdtanew[,i]!="8"&essdtanew[,i]!="9"&essdtanew[,i]!="3",]}
if(i %in% c(15,16,17)) {essdtanew <- essdtanew[essdtanew[,i]!="7777"&essdtanew[,i]!="8888"&essdtanew[,i]!="9999"&essdtanew[,i]!="6666",]}
if(i == 35) {essdtanew <- essdtanew[essdtanew[,i]!="7"&essdtanew[,i]!="8"&essdtanew[,i]!="9"&essdtanew[,i]!="6",]}
}
essdtanew$eduyrs2 <- essdtanew$eduyrs
essdtanew$eduyrs <- ifelse(essdtanew$eduyrs>13,1,0)
The final size of the dataset was following: columns: 41 rows: 9367
Ordinal data on scale 0-10 (type = discrete, format = numeric):
Dependent variable:
trstlgl - trust in the legal system (0: complete distrust, 10:complete trust)
Analogically:
trstplc (trust in police),
trstprl (trust in parliament),
trstplt (trust in politicians),
ppltrst (trust in people),
pplfair (people are fair),
pplhlp (people help),
trstprt (trust in political parties),
trstep (trust in European Parliament),
trstun (trust in United Nations)
lrscale (0- left, 10- right (political views))
stflife (0-Not satisfied with life, 10- Satisfied),
stfeco (satisfied with economy),
stfgov (satisfied with government),
stfdem (satisfied with democracy),
stfedu (satisfied with education level),
stfhlth (satisfied with health)
ctzcntr (citizen of country),
dscrgrp (have been discriminated),
hinctnta (income amount),
gndr (gender),
eduyrs (education level),
agea (age),
psppsgva (Political system allows people to have a say),
actrolga (can take active role in politics),
cptppola (Confident in own ability to participate in politics),
psppipla( Political system allows people to have influence on politics),
nwspol (News about politics and current affairs, watching,reading or listening, in minutes),
netustm (nternet use, how much time on typical day, in minutes),
cntry (country),
polintr (how interested in politics),
vote (voted),
contplt (Contacted politician or government official last 12 months),
wrkprty (worked for a political party),
wrkorg (worked for another organisation),
badge (Worn or displayed campaign badge/sticker last 12 months),
sgnptit (signed a petition last 12 months),
pbldmn (took part in a public demonstration last 12 months),
bctprd (Boycotted certain products last 12 month),
pstplonl (posted or shared anything about politics online last 12months),
clsprty (Feel closer to a particular party than all other parties),
prtdgcl (How close to party)
gincdif (Gov should reduce differences in income levels)
head(essdtanew,3)
## trstlgl ctzcntr dscrgrp hinctnta gndr eduyrs trstplc trstprl trstplt agea
## 5 7 2 2 2 2 0 10 7 6 20
## 10 8 1 2 4 2 1 7 7 4 41
## 12 5 1 2 3 1 0 7 0 0 61
## psppsgva actrolga cptppola psppipla nwspol netustm ppltrst pplfair pplhlp
## 5 3 3 1 4 30 180 5 5 7
## 10 3 2 2 3 60 120 5 3 4
## 12 2 3 3 2 90 90 3 3 2
## polintr trstprt trstep trstun vote contplt wrkprty wrkorg badge sgnptit
## 5 3 7 9 10 2 2 2 2 2 2
## 10 2 4 1 0 1 2 2 2 2 1
## 12 1 0 0 0 1 2 2 2 2 1
## pbldmn bctprd pstplonl prtdgcl lrscale stflife stfeco stfgov stfdem stfedu
## 5 2 2 2 1 5 10 7 5 9 10
## 10 1 2 1 2 5 7 5 3 7 8
## 12 2 2 1 2 8 7 5 0 3 4
## stfhlth gincdif
## 5 10 1
## 10 8 1
## 12 5 4
corr_all = cor(essdtanew, method='pearson')
corrplot(corr_all)
As we can see, there are many variables for which the correlation is low and will most likely cause a lot of confusion in terms of reduction of the dimensions or simply interpretation of the results In this case, it is reasonable to exclude them from our analysis. It refers to variables ctzcntr - eduyrs, agea and vote - lrscale and gincdif. New dataset was created normalized in order to go through with the analysis.
essdtanew.n<-data.Normalization(essdtanew[,c(1,6:9,11:14,17:23,35:40)], type="n1",normalization="column")
essdtanew.n.cor<-cor(essdtanew.n, method="pearson")
corrplot(essdtanew.n.cor, order ="alphabet", tl.cex=0.6)
We clearly see significant correlations between all variables associated with trust (in both, political subjects and people) and satisfaction. What’s interesting is that interest in politics is negatively correlated with variables concerning taking active role in politics. Also variables associated with trust are only weakly correlated with taking actions in politics which comes as surprise.
Now, I will use PCA method in order to find factors which explain variation of the dataset as thorougly as it is possible. Choosing and optimal number of factors is a very important part of the analysis. For this purpose, I will use Kaiser rule. According to this scientist, factors eigenvalue of 1 indicates that it is no better than a variable itself. With that said, I will only include factors which explain more than a single variable (in other words have eigenvalue > 1).
pca_ess <- prcomp(essdtanew.n, center=FALSE, scale.=FALSE)
p1_ess <- fviz_eig(pca_ess, choice='eigenvalue')
p2_ess <- fviz_eig(pca_ess)
grid.arrange(p1_ess, p2_ess, nrow=1)
eig.val<-get_eigenvalue(pca_ess)
eig.val
## eigenvalue variance.percent cumulative.variance.percent
## Dim.1 7.5496475 34.3165794 34.31658
## Dim.2 2.1821270 9.9187590 44.23534
## Dim.3 1.5172566 6.8966207 51.13196
## Dim.4 1.2915356 5.8706164 57.00258
## Dim.5 1.0202465 4.6374841 61.64006
## Dim.6 0.9177990 4.1718138 65.81187
## Dim.7 0.8202560 3.7284363 69.54031
## Dim.8 0.7734869 3.5158495 73.05616
## Dim.9 0.7462702 3.3921372 76.44830
## Dim.10 0.6397988 2.9081762 79.35647
## Dim.11 0.5603821 2.5471915 81.90366
## Dim.12 0.5443526 2.4743301 84.37799
## Dim.13 0.4904481 2.2293093 86.60730
## Dim.14 0.4322204 1.9646382 88.57194
## Dim.15 0.4034640 1.8339271 90.40587
## Dim.16 0.3901396 1.7733618 92.17923
## Dim.17 0.3453808 1.5699126 93.74914
## Dim.18 0.3344177 1.5200803 95.26922
## Dim.19 0.3185890 1.4481320 96.71736
## Dim.20 0.3056784 1.3894475 98.10680
## Dim.21 0.2754122 1.2518737 99.35868
## Dim.22 0.1410911 0.6413231 100.00000
With Kaiser rule in mind, 4 factors will be choosed to represent the dataset (eigenvalue of 5th one is not significantly bigger than 1). They explain jointly 57% of original dataset’s variance which is a decent and acceptable result.
summary(pca_ess)
## Importance of components:
## PC1 PC2 PC3 PC4 PC5 PC6 PC7
## Standard deviation 2.7477 1.47720 1.23177 1.13646 1.01007 0.95802 0.90568
## Proportion of Variance 0.3432 0.09919 0.06897 0.05871 0.04637 0.04172 0.03728
## Cumulative Proportion 0.3432 0.44235 0.51132 0.57003 0.61640 0.65812 0.69540
## PC8 PC9 PC10 PC11 PC12 PC13 PC14
## Standard deviation 0.87948 0.86387 0.79987 0.74859 0.73780 0.70032 0.65743
## Proportion of Variance 0.03516 0.03392 0.02908 0.02547 0.02474 0.02229 0.01965
## Cumulative Proportion 0.73056 0.76448 0.79356 0.81904 0.84378 0.86607 0.88572
## PC15 PC16 PC17 PC18 PC19 PC20 PC21
## Standard deviation 0.63519 0.62461 0.5877 0.5783 0.56444 0.55288 0.52480
## Proportion of Variance 0.01834 0.01773 0.0157 0.0152 0.01448 0.01389 0.01252
## Cumulative Proportion 0.90406 0.92179 0.9375 0.9527 0.96717 0.98107 0.99359
## PC22
## Standard deviation 0.37562
## Proportion of Variance 0.00641
## Cumulative Proportion 1.00000
As we see, consequitive factors are characterised by diminishing standard deviation, which implies diminishing explanation of additional variance. In order to get more detailed analysis, two plots generated below will be inspected as they present more granular info about specific variables and individual observations .
fviz_pca_var(pca_ess, col.var="steelblue")
First one represents importance of the variables and direction of their impact. Almost all variables associated with trust and satisfaction are in the same quandrant which indicates their positive correlation. Additionaly, we can observe that trust variables are characterised by other variables in the quadrant such as satisfaction and people variables. What brings the most attetion is political intrest (polintr) variable which is, as the only variable, negatively correlated with variables associated with the opportunity to take an active role in political affairs what was also described by correlation plot. Its importance is relatively big. In terms of the set of negatively correlated variables, eduyrs has the lowest impact as it is characterised by the lowest quality on the factor map.
fviz_pca_ind(pca_ess, col.ind = "cos2", geom = "point", gradient.cols = c("green", "blue"))
By taking a closer look at the single observation projection on our 2d plot, we can observe that they are roughly evenly distributed around the point (0,0). This indicates balanced amount of observation among every group designated by x and y axis.
It is also crucial to take into consideration what variables contribute to which factor in a significant way. For that purpose we will use plots below and the Rotated PCA.
c1 <- fviz_contrib(pca_ess, "var", axes=1)
c2 <- fviz_contrib(pca_ess, "var", axes=2)
c3 <- fviz_contrib(pca_ess, "var", axes=3)
c4 <- fviz_contrib(pca_ess, "var", axes=4)
grid.arrange(c1, c2, c3, c4, top='Contribution to the Principal Components')
The results show that trust and satisfaction variables are main contributors in terms of 1st factor. In terms of the second one, the varaibles included ar mainly associated with politcs and taking an active role in it. Third factor is solely designated to variables associated of subjective perception of othr people by surveyed. The last component is quite complex as it consist of both, trust in international (and national) political organisations and satisfaction with life and health. However, it can be easily explained when we take into consideration correlation between those variables.
rpca_ess <- principal(essdtanew.n, nfactors=4, rotate="varimax")
print(loadings(rpca_ess), digits=4, cutoff=0.4, sort=TRUE)
##
## Loadings:
## RC4 RC1 RC2 RC3
## trstlgl 0.6037
## trstplc 0.5089
## trstprl 0.6326 0.4954
## trstplt 0.6679 0.5053
## trstprt 0.6635 0.4719
## trstep 0.8297
## trstun 0.7916
## stfeco 0.7564
## stfgov 0.7303
## stfdem 0.6902
## stfedu 0.5563
## stfhlth 0.5712
## actrolga 0.8125
## cptppola 0.8141
## psppipla 0.5507
## polintr -0.6646
## ppltrst 0.7598
## pplfair 0.7837
## pplhlp 0.7032
## eduyrs
## psppsgva 0.4311 0.4804
## stflife 0.4881
##
## RC4 RC1 RC2 RC3
## SS loadings 3.8779 3.8764 2.5965 2.1897
## Proportion Var 0.1763 0.1762 0.1180 0.0995
## Cumulative Var 0.1763 0.3525 0.4705 0.5700
In order to sum up the results of analysis, we can give ‘umbrella names’ to the components:
Sense of well-being of a state (variables: trstprl (parliament), trstplt (politicians), trstprt (party), stfeco (economy), stfegov (government), stfdem (democracy), stfedu (education), stfhlth (health))
Activity in political affairs: (variables: actrolga (Able to take active role), cptppola (own ability to participate in politics), psppipla (can have influence on politics), polintr (interest in politics))
Social interactions (variables: ppltrst (trust in people), pplfair (people are fair), pplhlp(people help each other))
Trust in political organisations (variables: trstprl (parliament), trstplt (politicians), trstprt (party), trstlgl (legal system), trstplc (police), trstep (european parliament), trstun (united nation))
rpca_ess$complexity
## trstlgl eduyrs trstplc trstprl trstplt psppsgva actrolga cptppola
## 2.146952 2.892015 2.024198 2.216447 2.116359 2.656308 1.018798 1.006605
## psppipla ppltrst pplfair pplhlp polintr trstprt trstep trstun
## 2.482055 1.325151 1.198277 1.291313 1.016282 2.035495 1.074493 1.086586
## stflife stfeco stfgov stfdem stfedu stfhlth
## 1.916940 1.193953 1.383906 1.669913 1.778678 1.369962
Unfortunately, the complexity measures are high due to complex architecture of the factors. For many variables the complexity measure exceeds 1.5, which is quite a poor result.
rpca_ess$uniqueness
## trstlgl eduyrs trstplc trstprl trstplt psppsgva actrolga cptppola
## 0.4184188 0.7995494 0.6070140 0.2981703 0.2504386 0.4971353 0.3337035 0.3350575
## psppipla ppltrst pplfair pplhlp polintr trstprt trstep trstun
## 0.4409314 0.3321489 0.3263187 0.4352387 0.5547667 0.2946699 0.2861799 0.3463303
## stflife stfeco stfgov stfdem stfedu stfhlth
## 0.6204741 0.3738201 0.3607750 0.3577169 0.5759380 0.6146375
The uniqueness measure is more satisfactory than complexity as only few variable are characterised by uniqueness higher than 0.5. However, it could still be better and variables could share less variability with each other. Let’s see if there is a need to exclude any of the variable from our analysis.
set <- data.frame(complex = rpca_ess$complexity, uniqu=rpca_ess$uniqueness)
worst <- set[set$complex>1.8&set$unique>0.8,]
worst
## [1] complex uniqu
## <0 wierszy> (lub 'row.names' o zerowej długości)
As we see, despite not very satisfactory measures of complexity and uniqeness, there are no unnecessary variables in the dataset.
In the course of above-described analysis, 4 factors that explain 57% of the original dataset were found and investigated. The original dataset was ‘translated’ into new factors which carry information included in variables assigned to them. I was able to find out that variables associated with trust carry a lot of information about surveyed people. Also satisfaction with life in general and state’s government turned out to be an important when looking at the information it carries. Another variables which were taken into consideration were interest in politics and taking active role in it and attitude towards other people. However, these variables were less important than trust and satisfaction. All in all, the dimension reduction was conducted succesfully and the outcome dataset may be used for the purpose of further analysis.
Possible improvements directions of analysis:
Lowering uniqueness.
Lowering complexity.
Using more advanced techniques of dimension reduction.
Further manipulation of the dataset.
Outliers detection.