University of Warsaw

Introduction

Our life has, in spite of the guarantee of various types of freedom, certain pre-imposed limitations, assumptions and rules which are highly improbable to influence. One such pre-imposed limitation is the law, which, by surrounding us every day, sets moral boundaries, orders, prohibits or advises us.The situation changes when it is broken. In this case, the “favorable” roles it plays for us take the form of punishments and sanctions for not obeying it. A very difficult and no less troubling question is: why do such situations occur at all? Arriving at an unambiguous and truthful answer to this question would undoubtedly turn the world upside down. Just try to imagine our world free of crime. Despite the fact that such a situation is ruled out by evolutionary algorithms (e.g. the evolution of trust), which show how little it takes to introduce society to mutual dishonesty, it is certainly possible to reduce disorderly behavior as evidenced by the differences in crime rates between countries and continents. That being said, what could be the reason for breaking the law? In order to answer this question, we can focus on the other side of the issue: Why do people obey the laws that apply to them? On one hand, it is caused by an aversion to the risk of receiving punishment for their actions. More important for us in this analysis is that it is also caused by the perception of the legal system as an authority and, consequently, respect and trust in it. Especially today, the relationship of trust in the legal system and behavior in accordance with its rules is noticeable, given the situation in Ukraine and Russia. According to the article Trust in Justice: Topline Results from the Round 5 of the European Social Survey, existing evidence suggests that perceptions of legality may be a stronger predictor of behavior in accordance with the law than perceptions of deterrence risk. With that in mind, I will focus in this paper on finding out if there are any relationships between trust in legal system and other factors included in European Social Survey data (round 2016).

Libraries used:
library(readr)
library(knitr)
library(clusterSim)
library(psych)
library(factoextra)
library(gridExtra)
library(corrplot)
library(FactoMineR)
library(fpc)
library(stringr)
library(corrplot)
library(hopkins)

Dataset info

Data was gathered by ESS in 2016. They surveyed randomly chosen people from 23 countries. Data consisted of 534 columns and 44 387 observations. For the purpose of this analysis, only numerical data with valid responses was taken into consideration. By cleaning the data, many observations were removed and/or transformed:

essdta <- read.csv("ESS8e02.1_F1.csv", header=TRUE, sep=";")
essdtanew <- data.frame(essdta[,c("trstlgl","ctzcntr","dscrgrp","hinctnta","gndr","eduyrs","trstplc","trstprl","trstplt","agea","psppsgva","actrolga","cptppola","psppipla","nwspol","netustm","cntry","ppltrst","pplfair","pplhlp","polintr","trstprt","trstep","trstun","vote","contplt","wrkprty","wrkorg","badge","sgnptit","pbldmn","bctprd","pstplonl","clsprty","prtdgcl","lrscale","stflife","stfeco","stfgov", "stfdem","stfedu","stfhlth","gincdif")])
countries_lbls <- essdtanew[,17]
rm(essdta)

essdtanew <- na.omit(essdtanew)

for (i in 1:ncol(essdtanew)){ 
  if(i %in% c(1,4,6,7,8,9,10,18,19,20,22:24,36:42)) {essdtanew <- essdtanew[essdtanew[,i]!="99"&essdtanew[,i]!="88"&essdtanew[,i]!="77"&essdtanew[,i]!="999",]}
  if(i %in% c(2,3,5,11:14,21,25:34,43)) {essdtanew <- essdtanew[essdtanew[,i]!="7"&essdtanew[,i]!="8"&essdtanew[,i]!="9",]}
  if(i == 25) {essdtanew <- essdtanew[essdtanew[,i]!="7"&essdtanew[,i]!="8"&essdtanew[,i]!="9"&essdtanew[,i]!="3",]}
  if(i %in% c(15,16,17)) {essdtanew <- essdtanew[essdtanew[,i]!="7777"&essdtanew[,i]!="8888"&essdtanew[,i]!="9999"&essdtanew[,i]!="6666",]}
  if(i == 35) {essdtanew <- essdtanew[essdtanew[,i]!="7"&essdtanew[,i]!="8"&essdtanew[,i]!="9"&essdtanew[,i]!="6",]}
}

essdtanew$eduyrs2 <- essdtanew$eduyrs
essdtanew$eduyrs <- ifelse(essdtanew$eduyrs>13,1,0)

The final size of the dataset was following: columns: 41 rows: 9367

Variables description:

Ordinal data on scale 0-10 (type = discrete, format = numeric):

Dependent variable:

trstlgl - trust in the legal system (0: complete distrust, 10:complete trust)

Analogically:

trstplc (trust in police),

trstprl (trust in parliament),

trstplt (trust in politicians),

ppltrst (trust in people),

pplfair (people are fair),

pplhlp (people help),

trstprt (trust in political parties),

trstep (trust in European Parliament),

trstun (trust in United Nations)

lrscale (0- left, 10- right (political views))

stflife (0-Not satisfied with life, 10- Satisfied),

stfeco (satisfied with economy),

stfgov (satisfied with government),

stfdem (satisfied with democracy),

stfedu (satisfied with education level),

stfhlth (satisfied with health)

ctzcntr (citizen of country),

dscrgrp (have been discriminated),

hinctnta (income amount),

gndr (gender),

eduyrs (education level),

agea (age),

psppsgva (Political system allows people to have a say),

actrolga (can take active role in politics),

cptppola (Confident in own ability to participate in politics),

psppipla( Political system allows people to have influence on politics),

nwspol (News about politics and current affairs, watching,reading or listening, in minutes),

netustm (nternet use, how much time on typical day, in minutes),

cntry (country),

polintr (how interested in politics),

vote (voted),

contplt (Contacted politician or government official last 12 months),

wrkprty (worked for a political party),

wrkorg (worked for another organisation),

badge (Worn or displayed campaign badge/sticker last 12 months),

sgnptit (signed a petition last 12 months),

pbldmn (took part in a public demonstration last 12 months),

bctprd (Boycotted certain products last 12 month),

pstplonl (posted or shared anything about politics online last 12months),

clsprty (Feel closer to a particular party than all other parties),

prtdgcl (How close to party)

gincdif (Gov should reduce differences in income levels)

head(essdtanew,3)
##    trstlgl ctzcntr dscrgrp hinctnta gndr eduyrs trstplc trstprl trstplt agea
## 5        7       2       2        2    2      0      10       7       6   20
## 10       8       1       2        4    2      1       7       7       4   41
## 12       5       1       2        3    1      0       7       0       0   61
##    psppsgva actrolga cptppola psppipla nwspol netustm ppltrst pplfair pplhlp
## 5         3        3        1        4     30     180       5       5      7
## 10        3        2        2        3     60     120       5       3      4
## 12        2        3        3        2     90      90       3       3      2
##    polintr trstprt trstep trstun vote contplt wrkprty wrkorg badge sgnptit
## 5        3       7      9     10    2       2       2      2     2       2
## 10       2       4      1      0    1       2       2      2     2       1
## 12       1       0      0      0    1       2       2      2     2       1
##    pbldmn bctprd pstplonl prtdgcl lrscale stflife stfeco stfgov stfdem stfedu
## 5       2      2        2       1       5      10      7      5      9     10
## 10      1      2        1       2       5       7      5      3      7      8
## 12      2      2        1       2       8       7      5      0      3      4
##    stfhlth gincdif
## 5       10       1
## 10       8       1
## 12       5       4

Correlation between variables:

corr_all = cor(essdtanew, method='pearson')
corrplot(corr_all)

As we can see, there are many variables for which the correlation is low and will most likely cause a lot of confusion in terms of reduction of the dimensions or simply interpretation of the results In this case, it is reasonable to exclude them from our analysis. It refers to variables ctzcntr - eduyrs, agea and vote - lrscale and gincdif. New dataset was created normalized in order to go through with the analysis.

essdtanew.n<-data.Normalization(essdtanew[,c(1,6:9,11:14,17:23,35:40)], type="n1",normalization="column")
essdtanew.n.cor<-cor(essdtanew.n, method="pearson") 
corrplot(essdtanew.n.cor, order ="alphabet", tl.cex=0.6)

We clearly see significant correlations between all variables associated with trust (in both, political subjects and people) and satisfaction. What’s interesting is that interest in politics is negatively correlated with variables concerning taking active role in politics. Also variables associated with trust are only weakly correlated with taking actions in politics which comes as surprise.

Principal Component Analysis

Now, I will use PCA method in order to find factors which explain variation of the dataset as thorougly as it is possible. Choosing and optimal number of factors is a very important part of the analysis. For this purpose, I will use Kaiser rule. According to this scientist, factors eigenvalue of 1 indicates that it is no better than a variable itself. With that said, I will only include factors which explain more than a single variable (in other words have eigenvalue > 1).

pca_ess <- prcomp(essdtanew.n, center=FALSE, scale.=FALSE)
p1_ess <- fviz_eig(pca_ess, choice='eigenvalue')
p2_ess <- fviz_eig(pca_ess)
grid.arrange(p1_ess, p2_ess, nrow=1)

eig.val<-get_eigenvalue(pca_ess)
eig.val
##        eigenvalue variance.percent cumulative.variance.percent
## Dim.1   7.5496475       34.3165794                    34.31658
## Dim.2   2.1821270        9.9187590                    44.23534
## Dim.3   1.5172566        6.8966207                    51.13196
## Dim.4   1.2915356        5.8706164                    57.00258
## Dim.5   1.0202465        4.6374841                    61.64006
## Dim.6   0.9177990        4.1718138                    65.81187
## Dim.7   0.8202560        3.7284363                    69.54031
## Dim.8   0.7734869        3.5158495                    73.05616
## Dim.9   0.7462702        3.3921372                    76.44830
## Dim.10  0.6397988        2.9081762                    79.35647
## Dim.11  0.5603821        2.5471915                    81.90366
## Dim.12  0.5443526        2.4743301                    84.37799
## Dim.13  0.4904481        2.2293093                    86.60730
## Dim.14  0.4322204        1.9646382                    88.57194
## Dim.15  0.4034640        1.8339271                    90.40587
## Dim.16  0.3901396        1.7733618                    92.17923
## Dim.17  0.3453808        1.5699126                    93.74914
## Dim.18  0.3344177        1.5200803                    95.26922
## Dim.19  0.3185890        1.4481320                    96.71736
## Dim.20  0.3056784        1.3894475                    98.10680
## Dim.21  0.2754122        1.2518737                    99.35868
## Dim.22  0.1410911        0.6413231                   100.00000

With Kaiser rule in mind, 4 factors will be choosed to represent the dataset (eigenvalue of 5th one is not significantly bigger than 1). They explain jointly 57% of original dataset’s variance which is a decent and acceptable result.

summary(pca_ess)
## Importance of components:
##                           PC1     PC2     PC3     PC4     PC5     PC6     PC7
## Standard deviation     2.7477 1.47720 1.23177 1.13646 1.01007 0.95802 0.90568
## Proportion of Variance 0.3432 0.09919 0.06897 0.05871 0.04637 0.04172 0.03728
## Cumulative Proportion  0.3432 0.44235 0.51132 0.57003 0.61640 0.65812 0.69540
##                            PC8     PC9    PC10    PC11    PC12    PC13    PC14
## Standard deviation     0.87948 0.86387 0.79987 0.74859 0.73780 0.70032 0.65743
## Proportion of Variance 0.03516 0.03392 0.02908 0.02547 0.02474 0.02229 0.01965
## Cumulative Proportion  0.73056 0.76448 0.79356 0.81904 0.84378 0.86607 0.88572
##                           PC15    PC16   PC17   PC18    PC19    PC20    PC21
## Standard deviation     0.63519 0.62461 0.5877 0.5783 0.56444 0.55288 0.52480
## Proportion of Variance 0.01834 0.01773 0.0157 0.0152 0.01448 0.01389 0.01252
## Cumulative Proportion  0.90406 0.92179 0.9375 0.9527 0.96717 0.98107 0.99359
##                           PC22
## Standard deviation     0.37562
## Proportion of Variance 0.00641
## Cumulative Proportion  1.00000

As we see, consequitive factors are characterised by diminishing standard deviation, which implies diminishing explanation of additional variance. In order to get more detailed analysis, two plots generated below will be inspected as they present more granular info about specific variables and individual observations .

fviz_pca_var(pca_ess, col.var="steelblue") 

First one represents importance of the variables and direction of their impact. Almost all variables associated with trust and satisfaction are in the same quandrant which indicates their positive correlation. Additionaly, we can observe that trust variables are characterised by other variables in the quadrant such as satisfaction and people variables. What brings the most attetion is political intrest (polintr) variable which is, as the only variable, negatively correlated with variables associated with the opportunity to take an active role in political affairs what was also described by correlation plot. Its importance is relatively big. In terms of the set of negatively correlated variables, eduyrs has the lowest impact as it is characterised by the lowest quality on the factor map.

fviz_pca_ind(pca_ess, col.ind = "cos2", geom = "point", gradient.cols = c("green", "blue")) 

By taking a closer look at the single observation projection on our 2d plot, we can observe that they are roughly evenly distributed around the point (0,0). This indicates balanced amount of observation among every group designated by x and y axis.

Closer look at factors

It is also crucial to take into consideration what variables contribute to which factor in a significant way. For that purpose we will use plots below and the Rotated PCA.

c1 <- fviz_contrib(pca_ess, "var", axes=1)
c2 <- fviz_contrib(pca_ess, "var", axes=2)
c3 <- fviz_contrib(pca_ess, "var", axes=3)
c4 <- fviz_contrib(pca_ess, "var", axes=4)

grid.arrange(c1, c2, c3, c4, top='Contribution to the Principal Components')

The results show that trust and satisfaction variables are main contributors in terms of 1st factor. In terms of the second one, the varaibles included ar mainly associated with politcs and taking an active role in it. Third factor is solely designated to variables associated of subjective perception of othr people by surveyed. The last component is quite complex as it consist of both, trust in international (and national) political organisations and satisfaction with life and health. However, it can be easily explained when we take into consideration correlation between those variables.

rpca_ess <- principal(essdtanew.n, nfactors=4, rotate="varimax")
print(loadings(rpca_ess), digits=4, cutoff=0.4, sort=TRUE)
## 
## Loadings:
##          RC4     RC1     RC2     RC3    
## trstlgl   0.6037                        
## trstplc   0.5089                        
## trstprl   0.6326  0.4954                
## trstplt   0.6679  0.5053                
## trstprt   0.6635  0.4719                
## trstep    0.8297                        
## trstun    0.7916                        
## stfeco            0.7564                
## stfgov            0.7303                
## stfdem            0.6902                
## stfedu            0.5563                
## stfhlth           0.5712                
## actrolga                  0.8125        
## cptppola                  0.8141        
## psppipla                  0.5507        
## polintr                  -0.6646        
## ppltrst                           0.7598
## pplfair                           0.7837
## pplhlp                            0.7032
## eduyrs                                  
## psppsgva          0.4311  0.4804        
## stflife           0.4881                
## 
##                   RC4    RC1    RC2    RC3
## SS loadings    3.8779 3.8764 2.5965 2.1897
## Proportion Var 0.1763 0.1762 0.1180 0.0995
## Cumulative Var 0.1763 0.3525 0.4705 0.5700

In order to sum up the results of analysis, we can give ‘umbrella names’ to the components:

  1. Sense of well-being of a state (variables: trstprl (parliament), trstplt (politicians), trstprt (party), stfeco (economy), stfegov (government), stfdem (democracy), stfedu (education), stfhlth (health))

  2. Activity in political affairs: (variables: actrolga (Able to take active role), cptppola (own ability to participate in politics), psppipla (can have influence on politics), polintr (interest in politics))

  3. Social interactions (variables: ppltrst (trust in people), pplfair (people are fair), pplhlp(people help each other))

  4. Trust in political organisations (variables: trstprl (parliament), trstplt (politicians), trstprt (party), trstlgl (legal system), trstplc (police), trstep (european parliament), trstun (united nation))

Measures of fit

rpca_ess$complexity
##  trstlgl   eduyrs  trstplc  trstprl  trstplt psppsgva actrolga cptppola 
## 2.146952 2.892015 2.024198 2.216447 2.116359 2.656308 1.018798 1.006605 
## psppipla  ppltrst  pplfair   pplhlp  polintr  trstprt   trstep   trstun 
## 2.482055 1.325151 1.198277 1.291313 1.016282 2.035495 1.074493 1.086586 
##  stflife   stfeco   stfgov   stfdem   stfedu  stfhlth 
## 1.916940 1.193953 1.383906 1.669913 1.778678 1.369962

Unfortunately, the complexity measures are high due to complex architecture of the factors. For many variables the complexity measure exceeds 1.5, which is quite a poor result.

rpca_ess$uniqueness
##   trstlgl    eduyrs   trstplc   trstprl   trstplt  psppsgva  actrolga  cptppola 
## 0.4184188 0.7995494 0.6070140 0.2981703 0.2504386 0.4971353 0.3337035 0.3350575 
##  psppipla   ppltrst   pplfair    pplhlp   polintr   trstprt    trstep    trstun 
## 0.4409314 0.3321489 0.3263187 0.4352387 0.5547667 0.2946699 0.2861799 0.3463303 
##   stflife    stfeco    stfgov    stfdem    stfedu   stfhlth 
## 0.6204741 0.3738201 0.3607750 0.3577169 0.5759380 0.6146375

The uniqueness measure is more satisfactory than complexity as only few variable are characterised by uniqueness higher than 0.5. However, it could still be better and variables could share less variability with each other. Let’s see if there is a need to exclude any of the variable from our analysis.

set <- data.frame(complex = rpca_ess$complexity, uniqu=rpca_ess$uniqueness)
worst <- set[set$complex>1.8&set$unique>0.8,]
worst
## [1] complex uniqu  
## <0 wierszy> (lub 'row.names' o zerowej długości)

As we see, despite not very satisfactory measures of complexity and uniqeness, there are no unnecessary variables in the dataset.

Findings

In the course of above-described analysis, 4 factors that explain 57% of the original dataset were found and investigated. The original dataset was ‘translated’ into new factors which carry information included in variables assigned to them. I was able to find out that variables associated with trust carry a lot of information about surveyed people. Also satisfaction with life in general and state’s government turned out to be an important when looking at the information it carries. Another variables which were taken into consideration were interest in politics and taking active role in it and attitude towards other people. However, these variables were less important than trust and satisfaction. All in all, the dimension reduction was conducted succesfully and the outcome dataset may be used for the purpose of further analysis.

Possible improvements directions of analysis:

  1. Lowering uniqueness.

  2. Lowering complexity.

  3. Using more advanced techniques of dimension reduction.

  4. Further manipulation of the dataset.

  5. Outliers detection.

Association rules on the European Social Survey data (Part II)

Introduction

In this part, I will focus more on finding an answer to the question presented in the beggining. As mentioned earlier, existing evidence suggests that perceptions of legality may be a stronger predictor of behavior in accordance with the law than perceptions of deterrence risk. That being said, let’s try to get to the heart of an issue by finding out what are the characteristics of a person who trust the legal system.

Libraries used:
library("arules")
library("arulesViz")
library(kableExtra)

Dataset transformation

As the main goal is to obtain easily understandable rules, all of the columns has undergone encoding operations. I split the data basing on detailed description available at ESS webpage.

essdta <- read.csv("ESS8e02.1_F1.csv", header=TRUE, sep=";")
essdtanew <- data.frame(essdta[,c("trstlgl","ctzcntr","dscrgrp","hinctnta",
                                  "gndr","eduyrs","trstplc","trstprl",
                                  "trstplt","agea","psppsgva","actrolga",
                                  "cptppola","psppipla","nwspol","netustm",
                                  "cntry","ppltrst","pplfair","pplhlp",
                                  "polintr","trstprt","trstep","trstun",
                                  "vote","contplt","wrkprty",
                                  "wrkorg","badge","sgnptit","pbldmn",
                                  "bctprd","pstplonl","clsprty","prtdgcl",
                                  "lrscale","stflife","stfeco","stfgov",
                                  "stfdem","stfedu","stfhlth","gincdif")])

countries_lbls <- essdtanew[,17]
rm(essdta)

essdtanew <- na.omit(essdtanew)

for (i in 1:ncol(essdtanew)){ 
  if(i %in% c(1,4,6,7,8,9,10,18,19,20,22:24,36:42)) {essdtanew <- essdtanew[essdtanew[,i]!="99"&essdtanew[,i]!="88"&essdtanew[,i]!="77"&essdtanew[,i]!="999",]}
  if(i %in% c(2,3,5,11:14,21,25:34,43)) {essdtanew <- essdtanew[essdtanew[,i]!="7"&essdtanew[,i]!="8"&essdtanew[,i]!="9",]}
  if(i == 25) {essdtanew <- essdtanew[essdtanew[,i]!="7"&essdtanew[,i]!="8"&essdtanew[,i]!="9"&essdtanew[,i]!="3",]}
  if(i %in% c(15,16,17)) {essdtanew <- essdtanew[essdtanew[,i]!="7777"&essdtanew[,i]!="8888"&essdtanew[,i]!="9999"&essdtanew[,i]!="6666",]}
  if(i == 35) {essdtanew <- essdtanew[essdtanew[,i]!="7"&essdtanew[,i]!="8"&essdtanew[,i]!="9"&essdtanew[,i]!="6",]}
}

essdtanew$eduyrs2 <- essdtanew$eduyrs
essdtanew$eduyrs <- ifelse(essdtanew$eduyrs>13,1,0)

essdtanew <- essdtanew[,c(1:43)]

essdtanewcba <- essdtanew

essdtanew$hinctnta <- ifelse(essdtanew$hinctnta>5,"Rich","Poor")
essdtanew$eduyrs <- ifelse(essdtanew$eduyrs==1,"University education","No university education")
essdtanew$gndr <- ifelse(essdtanew$gndr==2,"Female","Male")
essdtanew$psppsgva <- ifelse(essdtanew$psppsgva>3,"Plt system allows ppl to have a say","Plt system don't allow ppl to have a say")
essdtanew$agea <- ifelse(essdtanew$agea>48,"Older than 48","Not older than 48")

k <- c(1,7,8,9,18,22:24)

for (i in 1:length(k)) {
  if(k[i]==18){
    essdtanew[,k[i]] <- ifelse(essdtanew[,k[i]]>5,paste("Trust",substr(colnames(essdtanew)[k[i]],1,3)),paste("Distrust",substr(colnames(essdtanew)[k[i]],1,3)))
  }else{
  essdtanew[,k[i]] <- ifelse(essdtanew[,k[i]]>5,paste("Trust",substr(colnames(essdtanew)[k[i]],5,7)),paste("Distrust",substr(colnames(essdtanew)[k[i]],5,7)))
  }
}

l <- c(19,20)

for (i in 1:length(l)) {
    essdtanew[,l[i]] <- ifelse(essdtanew[,l[i]]>5,paste("People",substr(colnames(essdtanew)[l[i]],4,7)),paste("People no",substr(colnames(essdtanew)[l[i]],4,7)))
}

m <- c(37:42)

for (i in 1:length(m)) {
  essdtanew[,m[i]] <- ifelse(essdtanew[,m[i]]>5,paste("Satisfied with",substr(colnames(essdtanew)[m[i]],4,8)),paste("Dissatisfied with",substr(colnames(essdtanew)[m[i]],4,8)))
}

essdtafin <- essdtanew[,-c(2,3,12:16,21,25:36,43)]
essdtafin$cntry <- ifelse(essdtafin$cntry %in% c("NO","SE","FI","AT","BE","CH","DE","GB","FR","NL","IE","IS"),"Developed","Not developed")
write.csv(essdtafin, file="essmba.csv")

Finally, the dataset used for the purpose of further analysis consisted of 22 variables described previously and 1000 randomly sampled observations. Let’s read it in an appropriate format in order to obtain association rules:

trans1<-read.transactions("essmba.csv", format="basket", sep=",", skip=0) 
trans1 <- sample(trans1,1000,replace = FALSE)
itemFrequencyPlot(
  trans1,
  topN = 10,
  type = "absolute",
  main = "Item frequency",
  cex.names = 0.85
)

The plot above presents 10 most popular characteristics of respondents. We draw both, positive and negative conclusions basing on the visual. The good news is that the vast majority of Europeans is satisfied with life, health, democracy, trust the police, legal system and thinks people are fair and live in a developed country. On the other hand, we can spot that more than 600 people state that political system don’t allow people to have a say in politics, distrust parliament and political parties.

Association rules

Now, let’s try to obtain some rules using apriori function.

trans1<-trans1[, itemFrequency(trans1)>0.05]

rules.trustlgl<-apriori(data=trans1, parameter=list(supp=0.12, conf=0.99), appearance=list(default="lhs", rhs="Trust lgl"), control=list(verbose=F)) 
summary(rules.trustlgl)
## set of 31 rules
## 
## rule length distribution (lhs + rhs):sizes
##  6  7  8  9 
##  2  9 14  6 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   6.000   7.000   8.000   7.774   8.000   9.000 
## 
## summary of quality measures:
##     support         confidence        coverage           lift      
##  Min.   :0.1200   Min.   :0.9917   Min.   :0.1210   Min.   :1.498  
##  1st Qu.:0.1220   1st Qu.:0.9919   1st Qu.:0.1230   1st Qu.:1.498  
##  Median :0.1240   Median :0.9920   Median :0.1250   Median :1.498  
##  Mean   :0.1251   Mean   :0.9926   Mean   :0.1260   Mean   :1.499  
##  3rd Qu.:0.1285   3rd Qu.:0.9923   3rd Qu.:0.1295   3rd Qu.:1.499  
##  Max.   :0.1360   Max.   :1.0000   Max.   :0.1370   Max.   :1.511  
##      count      
##  Min.   :120.0  
##  1st Qu.:122.0  
##  Median :124.0  
##  Mean   :125.1  
##  3rd Qu.:128.5  
##  Max.   :136.0  
## 
## mining info:
##    data ntransactions support confidence
##  trans1          1000    0.12       0.99
##                                                                                                                                                    call
##  apriori(data = trans1, parameter = list(supp = 0.12, conf = 0.99), appearance = list(default = "lhs", rhs = "Trust lgl"), control = list(verbose = F))
rules.byconf<-sort(rules.trustlgl, by="confidence", decreasing=TRUE)

More than 30 result were obtained even though a very high level of confidence was set - otherwise, there would be so many rules, that it would be very difficult to take a closer look at any of them. A support level was set to 0.12 which is also quite a conservative restriction. In order to investigate them, their significance and validity needs to be assessed.

rules.clean<-rules.trustlgl[!is.redundant(rules.trustlgl)] 
rules.clean<-rules.clean[is.significant(rules.clean, trans1)] 
rules.clean<-rules.clean[is.maximal(rules.clean)] 
rules.clean.table = inspect(rules.clean, linebreak = FALSE)
rules.clean.table %>%
  kable() %>%
  kable_styling()
lhs rhs support confidence coverage lift count
[1] {Male, Trust plc, Trust plt, Trust prl, Trust un} => {Trust lgl} 0.124 0.9920000 0.125 1.498489 124
[2] {Rich, Satisfied with eco, Satisfied with life, Trust plt, University education} => {Trust lgl} 0.122 0.9918699 0.123 1.498293 122
[3] {Developed, Male, Satisfied with life, Trust plc, Trust plt, Trust prl} => {Trust lgl} 0.129 0.9923077 0.130 1.498954 129
[4] {Rich, Satisfied with eco, Satisfied with life, Trust ep, Trust ppl, Trust prl} => {Trust lgl} 0.123 0.9919355 0.124 1.498392 123
[5] {Rich, Satisfied with dem, Satisfied with eco, Trust prl, Trust un, University education} => {Trust lgl} 0.122 0.9918699 0.123 1.498293 122
[6] {Rich, Satisfied with eco, Trust plc, Trust prl, Trust un, University education} => {Trust lgl} 0.124 0.9920000 0.125 1.498489 124
[7] {Rich, Satisfied with eco, Trust plc, Trust ppl, Trust prl, Trust un} => {Trust lgl} 0.136 0.9927007 0.137 1.499548 136
[8] {Rich, Satisfied with eco, Satisfied with edu, Satisfied with hlth, Trust prl, University education} => {Trust lgl} 0.120 0.9917355 0.121 1.498090 120
[9] {People hlp, Rich, Satisfied with eco, Trust plc, Trust ppl, Trust un} => {Trust lgl} 0.122 0.9918699 0.123 1.498293 122
[10] {Rich, Satisfied with dem, Satisfied with eco, Trust ppl, Trust un, University education} => {Trust lgl} 0.123 1.0000000 0.123 1.510574 123
[11] {People fair, Rich, Satisfied with eco, Satisfied with life, Trust ppl, Trust prl, University education} => {Trust lgl} 0.128 0.9922481 0.129 1.498864 128
[12] {Rich, Satisfied with dem, Satisfied with eco, Satisfied with hlth, Trust plc, Trust prl, University education} => {Trust lgl} 0.124 0.9920000 0.125 1.498489 124
[13] {Rich, Satisfied with dem, Satisfied with eco, Satisfied with hlth, Satisfied with life, Trust prl, University education} => {Trust lgl} 0.130 0.9923664 0.131 1.499043 130
[14] {Rich, Satisfied with dem, Satisfied with eco, Satisfied with edu, Trust plc, Trust ppl, Trust un} => {Trust lgl} 0.130 0.9923664 0.131 1.499043 130
[15] {Rich, Satisfied with dem, Satisfied with eco, Satisfied with hlth, Trust plc, Trust ppl, Trust un} => {Trust lgl} 0.128 0.9922481 0.129 1.498864 128

Fifteen rules turned out to be credible and we can state that there really is relationship between occurences of some specific sets of items and occurence of trust in legal system. As the parameters were set really high, we can claim that:

  1. The trust legal system variable occurs at least in 12% of the occurences of ‘if variables’

  2. The trust legal system variable occurs almost always when ‘if variables’ occure

The confidence, support and lift values are presented on the plot below.

plot(rules.clean, engine="plotly")

This visual is a bit unfortunate at the first sight. However, if we look at the scale and axis, we can observe that the values for all of the rules are very high. We can also use an interactive plot for further investigation of the rules.

plot(rules.clean, method="graph", engine="htmlwidget")

The easiest way to analyse it is to go through a sample rule. For this purpose I’ll pick number 11:

The respondent who:

Thinks people are fair, is rich, satisfied with economy, satisfied with life, trusts people, trust parliament and is university educated => trusts legal system.

Earlier, I wrote about how trusting the legal system may affect the accordance with law and judging by the description of this person, I would rather say that she/he is not in conflict with the law. For that person,the chances of trusting legal system are about 1.5 times higher with the “if” values given above.

Additionally, it is good to look for the values that are included in many rules. What is interesting, “Rich” is included in almost every possible rule, while for instance “Developed” not.

Logistic regression

Even though association rules give us a good overview of cause and result scheme, we don’t get much information on what variables have stronger influence nor their significance as a whole variable and not value solely. For this purpose, I’ll use logistic regression model. I’ll encode trst lgl to binary variable for this purpose. Basing on our rules, all of the variables except from gender should have positive influence on dependent variables. In other words, their increase should increase the probability of trust lgl value being equal to 1. Even though there are 15 rules, we can see that they are in general not so different from each other. Many rules differ just by 1 or 2 variables. In this case, I will simply built a regression model with all of them.

essdtanewcba$trstlgl <- ifelse(essdtanewcba$trstlgl>5,1,0)
essdtanewcba$cntry <- ifelse(essdtanewcba$cntry %in% c("NO","SE","FI","AT","BE","CH","DE","GB","FR","NL","IE","IS"),1,0)
glm.fit <- glm(trstlgl ~ gndr + trstplc + trstplt + trstprl + trstun + hinctnta + stfeco + stflife + eduyrs + trstep + stfdem + ppltrst + cntry, data = essdtanewcba, family = binomial)
summary(glm.fit)
## 
## Call:
## glm(formula = trstlgl ~ gndr + trstplc + trstplt + trstprl + 
##     trstun + hinctnta + stfeco + stflife + eduyrs + trstep + 
##     stfdem + ppltrst + cntry, family = binomial, data = essdtanewcba)
## 
## Deviance Residuals: 
##     Min       1Q   Median       3Q      Max  
## -2.9152  -0.5915   0.3245   0.6357   3.1157  
## 
## Coefficients:
##             Estimate Std. Error z value Pr(>|z|)    
## (Intercept) -6.27745    0.20162 -31.135  < 2e-16 ***
## gndr        -0.21217    0.05738  -3.697 0.000218 ***
## trstplc      0.38330    0.01633  23.477  < 2e-16 ***
## trstplt      0.15264    0.02001   7.629 2.37e-14 ***
## trstprl      0.21818    0.01867  11.686  < 2e-16 ***
## trstun       0.06865    0.01606   4.275 1.91e-05 ***
## hinctnta     0.02198    0.01136   1.934 0.053065 .  
## stfeco       0.07550    0.01651   4.573 4.82e-06 ***
## stflife      0.02577    0.01785   1.444 0.148747    
## eduyrs       0.32539    0.05881   5.533 3.14e-08 ***
## trstep       0.09271    0.01786   5.190 2.10e-07 ***
## stfdem       0.07175    0.01586   4.525 6.05e-06 ***
## ppltrst      0.07961    0.01442   5.521 3.37e-08 ***
## cntry        0.46182    0.06161   7.495 6.61e-14 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## (Dispersion parameter for binomial family taken to be 1)
## 
##     Null deviance: 12226.0  on 9366  degrees of freedom
## Residual deviance:  7847.9  on 9353  degrees of freedom
## AIC: 7875.9
## 
## Number of Fisher Scoring iterations: 5

In general, the model also indicates that the relationships we obtained are valid. However, in case of stflife variable, the statistical significance is doubtful.

Findings & discussion

All in all, many significant relationships were captured basing on the ESS data. The question mentioned in the beggining of the article is justified and the thesis of article authors seems to be correct. It is a really reveiting and, above all, possibly life saving research, which, in my opinion, is worth time and effort of further investigation. In general, there are many factors whose occurences increase a chance of a person trusting the legal system. The vast majority of them is associated with trust, especially to political units and its parts, government and the police. Apart from that, satisfaction with life, education, economy, income, country and education is also important and last, but not least, social interactions. With that being said, if the trust in the legal system is really associated with obeying the law, that should be a message to all of us and especially politicians so that they should start changing the world from their own behavior and not the endless amendments.

Possible improvements’ directions of the analysis:

  1. Adding data about criminal records of the respondents to the database.

  2. Adding variables and/or changing the way the questions are asked so that they are less subjective and more facts oriented.

  3. Using another methods.

  4. Taking into consideration more datasets from various ESS rounds and adding time factor to the analysis.