Trust in European Parliament: Association rules analysis

1. Introduction

In recent years Europe has gone through various issues, including economic recessions, refugee problems and growing eurosceptic sentiments. This situation had a negative impact on almost all European institutions, causing a decrease of trust in them as international organizations, and a decrease of support for further European integration. However, the importance of trust cannot be understated, especially in terms of organizations like the European Parliament, which heavily relies on trust in its politics and decisions. In the case of European institutions, trust in them and their resulting credibility can be defined as the extent to which a given society is ready to respect the institutions, believe that they serve the common good, and comply with their decisions and rules. Lack of trust in organizations like the European Parliament can severely damage their credibility and consequently their ability to function properly. Therefore, it is important to analyze factors that, at the individual level, might be significantly linked to high trust in the EU institutions.

One of the possible ways to analyze meaningful characteristics of individuals who display such high trust is through the association rules process, which aims to find significant relationships between categorical traits. This project is intended to conduct a association rules analysis of individual level characteristics of EU citizens that might be linked to high trust in the European Parliament.

2. Dataset

The dataset for the analysis is derived from the European Social Survey, which regularly interviews EU citizens about both their political views and personal values. In detail, the data comes from the survey’s 10th edition and contains the answers from 30559 people that were interviewed during the 2020-2022 period. Based on existing literature, apart from trust in the European Parliament (EP), 11 other questions that might be the most adequate to subject have been chosen and will serve as factors that will be analyzed in association to the dependent variable. In effect, the final dataset contains 30559 rows - each one representing one participant’s full set of answers across 12 different questions.

2.1 Variable descriptions

The survey questions used a numerical scale to have the participants answer how much they agreed with each statement connected to their political views and overall values. The analysis will center one variable - the amount of reported trust in the European Parliament that was given of the scale of 0 to 10, where 10 represented its highest level, and according to literature, other explanatory characteristics will be added. Some of the included variables will refer to the participant’s opinion about their home country - trust in the country’s parliament and legal system, as well as the satisfaction with the country’s economy, democracy and education. Additionally, viewpoints concerning issues important for the EU will be added - variables referring to the attitudes towards the influence of immigrants on the economy and general life conditions as well as the level of worry about climate change. The last set of added variables will include the overall trust in other people, level of interest in politics and whether the participant supports the idea of further European unification. The analysis will therefore focus on the association between those characteristics and trust in the EP, in order to examine the most relevant connections.

2.2 Data categorization

All of the answers to the survey questions were given on a numerical scale, therefore they will need to be manually put into descriptive categories before the analystical process. Most of the variables, including the trust in the EP were given on the 0-10 scale, and will be separated into 3 categories - answers in the range of 0-3 will be described as low values, 4-6 as medium values and 7-10 as high values. The only exceptions in the dataset include variables concerning the participant’s interest in politics, which was given on a 1-4 scale and their level of worry about the climate change, which was given on a 1-5 scale. Both of these variables were also categorized into 3 descriptive groups, but the exact scales were adjusted to their maximal values. The categories and descriptions of all of the used variables can be viewed below.

# Other people can be trusted
data$ppltrst<-ifelse(data[,1]<4, "no trust in ppl", ifelse(data[,1]<7, "medium trust in ppl", ifelse(data[,1]<11, "high trust in ppl", NA)))
# Interesit in politics
data$polintr<-ifelse(data[,2]<2, "no politics intr", ifelse(data[,2]<4, "medium politics intr", ifelse(data[,2]<5, "high politics intr", NA)))
# Trust in country parliament
data$trstprl<-ifelse(data[,3]<4, "no trust in c_prl", ifelse(data[,3]<7, "medium trust in c_prl", ifelse(data[,3]<11, "high trust in c_prl", NA)))
# Trust in country legal system
data$trstlgl<-ifelse(data[,4]<4, "no trust in c_lgl", ifelse(data[,4]<7, "medium trust in c_lgl", ifelse(data[,4]<11, "high trust in c_lgl", NA)))
# Trust in European Parliament
data$trstep<-ifelse(data[,5]<4, "no trust in EP", ifelse(data[,5]<7, "medium trust in EP", ifelse(data[,5]<11, "high trust in EP", NA)))
# Satisfaction with country economy
data$stfeco<-ifelse(data[,6]<4, "weak satisf with econ", ifelse(data[,6]<7, "medium satisf with econ", ifelse(data[,6]<11, "high satisf with econ", NA)))
# Satisfaction with country democracy
data$stfdem<-ifelse(data[,7]<4, "weak satisf with dem", ifelse(data[,7]<7, "medium satisf with dem", ifelse(data[,7]<11, "high satisf with dem", NA)))
# Satisfaction with country education
data$stfedu<-ifelse(data[,8]<4, "weak satisf with educ", ifelse(data[,8]<7, "medium satisf with educ", ifelse(data[,8]<11, "high satisf with educ", NA)))
# Further european unification
data$euftf<-ifelse(data[,9]<4, "e_unif no further", ifelse(data[,9]<7, "e_unif maybe further", ifelse(data[,9]<11, "e_unif go further", NA)))
# Immigrants good for country economy
data$imbgeco<-ifelse(data[,10]<4, "immig bad for econ", ifelse(data[,10]<7, "immig neutral for econ", ifelse(data[,10]<11, "immig good for econ", NA)))
# Immigrants make country a better place to live
data$imwbcnt<-ifelse(data[,11]<4, "immig bad for life_cond", ifelse(data[,11]<7, "immig neutral for life_cond", ifelse(data[,11]<11, "immig good for life_cond", NA)))
# Worried about climate change
data$wrclmch<-ifelse(data[,12]<3, "not worried ab climate", ifelse(data[,12]<4, "somewhat worried ab climate", ifelse(data[,12]<6, "very worried ab climate", NA)))

3. Association rules

Within machine learning, the association rules process is a method of analyzing a dataset with the aim of finding significant patterns and co-occurrences between data elements. This algorithm can identify not only correlational relationships but also if-then associations, which imply that the occurrence of one element on the left-hand side of the rule significantly implies the occurrence of another on its right-hand side. In more detail, the analysis searches for all the possible co-occurrence patterns between data items and then measures the frequency and quality of the discovered if-then rules. Few different measures can be utilized to evaluate the strength and significance of the found relationships. The count value simply calculates the number of the number of observations which contain all of the elements included in a rule, whereas the support measure gives the frequency of that happening. The confidence explains the probability of observing the “then” element given that the “if” elements of the rule are present within an observation. The lift on the other hand measures how much more likely the “if” and “then” items are likely to appear together rather than independently. The following analysis will therefore focus on finding associations between trust in European Parliament and all of the categories of the variables in the dataset, and then creating and evaluating significant rules, which would imply the occurrence of high trust in the EP.

3.1 Basic statistics

Before beginning the actual analytical process, it might be beneficial to look at the basic statistics of all the included variables and all of their descriptive categories to see the main trends in the survey participants’ answers. After reading the dataset as “transactions” and cleaning it from rare observations, it is possible to obtain the frequency of all of the included categories.

library(arules)
library(arulesViz)
library(shinythemes)

data<-read.transactions("data.csv", format="basket", sep=",", skip=0)
data<-data[, itemFrequency(data)>0.05]

sort(itemFrequency(data, type="relative"))

##            no politics intr      not worried ab climate 
##                   0.1008835                   0.1812500 
##       weak satisf with educ          high politics intr 
##                   0.1912631                   0.2020288 
##           e_unif no further     immig bad for life_cond 
##                   0.2196335                   0.2268652 
##          immig bad for econ        weak satisf with dem 
##                   0.2401832                   0.2474149 
##             no trust in ppl         high trust in c_prl 
##                   0.2677683                   0.2704843 
##           no trust in c_lgl       high satisf with econ 
##                   0.2708115                   0.2836060 
##            high trust in EP    immig good for life_cond 
##                   0.2877945                   0.2888743 
##              no trust in EP       weak satisf with econ 
##                   0.2910668                   0.3015707 
##       medium trust in c_lgl           high trust in ppl 
##                   0.3349476                   0.3356348 
##         immig good for econ           no trust in c_prl 
##                   0.3481675                   0.3543194 
##           e_unif go further      medium satisf with dem 
##                   0.3553010                   0.3749346 
##       medium trust in c_prl        high satisf with dem 
##                   0.3751636                   0.3776178 
##     very worried ab climate     medium satisf with educ 
##                   0.3776832                   0.3792866 
##         high trust in c_lgl         medium trust in ppl 
##                   0.3942081                   0.3965641 
##      immig neutral for econ     medium satisf with econ 
##                   0.4116165                   0.4147906 
##          medium trust in EP        e_unif maybe further 
##                   0.4211060                   0.4250327 
##       high satisf with educ somewhat worried ab climate 
##                   0.4294175                   0.4410340 
## immig neutral for life_cond        medium politics intr 
##                   0.4842277                   0.6970550

Moreover, the categories with the highest frequencies can be shown visually.

itemFrequencyPlot(data, topN=10, type="absolute", main="Absolute frequency")

itemFrequencyPlot(data, topN=10, type="relative", main="Relative frequency")

Examining the highest frequency categories, it can be noticed that most of them represent the “medium” range answers which can be anticipated. The details of all of the variables show that in the case of the main interest of this analysis - trust in the European Parliament - only 28.78% of respondents declared a high level of trust.

Additionally, it is also possible to calculate the Jaccard index of similarity in order to find the correlations between various characteristics, as well as try to visualize the basic relationships between them in a form of a denrogram.

#Jaccard similarity index for characteristics
data<-data[, itemFrequency(data)>0.05]
d.jac.i<-dissimilarity(data, which="items")
#dendogram for characteristics
plot(hclust(d.jac.i, method="ward.D2"), main="Dendrogram for characteristics")

After examining the basic characteristics of the used variables and their categories, the next step the analysis will try to find meaningful associations of trust in the European Parliament with all the other traits to try to explain the character of the people who displayed either high or low trust in the EP.

3.2 Association analysis for high trust

The association rules process will begin with trying to find rules that best explain the categories that are associated with high trust in European Parliament, as that is the most desired outcome for the dependent variable. Therefore the rules will be formed by putting the “high trust in EP” category on the right-hand side of the rule and finding the significant co-occurrences with various elements on the left-hand side.

rules<-apriori(data=data, parameter=list(supp=0.1, conf=0.5), appearance=list(default="lhs", rhs="high trust in EP"), control=list(verbose=F)) 
summary(rules)

## set of 20 rules
## 
## rule length distribution (lhs + rhs):sizes
##  2  3  4 
##  1 12  7 
## 
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##     2.0     3.0     3.0     3.3     4.0     4.0 
## 
## summary of quality measures:
##     support         confidence        coverage           lift      
##  Min.   :0.1005   Min.   :0.5002   Min.   :0.1581   Min.   :1.738  
##  1st Qu.:0.1034   1st Qu.:0.5345   1st Qu.:0.1869   1st Qu.:1.857  
##  Median :0.1097   Median :0.5684   Median :0.1956   Median :1.975  
##  Mean   :0.1162   Mean   :0.5775   Mean   :0.2026   Mean   :2.007  
##  3rd Qu.:0.1279   3rd Qu.:0.6274   3rd Qu.:0.2103   3rd Qu.:2.180  
##  Max.   :0.1605   Max.   :0.6499   Max.   :0.2705   Max.   :2.258  
##      count     
##  Min.   :3072  
##  1st Qu.:3160  
##  Median :3354  
##  Mean   :3552  
##  3rd Qu.:3908  
##  Max.   :4904  
## 
## mining info:
##  data ntransactions support confidence
##  data         30560     0.1        0.5
##                                                                                                                                                       call
##  apriori(data = data, parameter = list(supp = 0.1, conf = 0.5), appearance = list(default = "lhs", rhs = "high trust in EP"), control = list(verbose = F))

Setting the starting quality values quite high at the level 0.1 for support and 0.5 for confidence, the function was able to find 20 relevant association rules. The main characteristics of the discovered rules can be displayed visually according to their support, confidence and lift.

plot(rules, measure=c("support","lift"), shading="confidence")

However, in order to grasp a better idea of which elements the rules actually contain, it might be important to inspect the associations in more details. The most significant of the rules can be displayed based on one of the quality measures, and to start they were sorted according th their confidence levels.

rules.byconf<-sort(rules, by="confidence", decreasing=TRUE)
inspect(head(rules.byconf))

##     lhs                         rhs                  support confidence  coverage     lift count
## [1] {high satisf with dem,                                                                      
##      high trust in c_lgl,                                                                       
##      high trust in c_prl}    => {high trust in EP} 0.1110602  0.6499426 0.1708770 2.258356  3394
## [2] {high satisf with educ,                                                                     
##      high trust in c_lgl,                                                                       
##      high trust in c_prl}    => {high trust in EP} 0.1047448  0.6471896 0.1618455 2.248791  3201
## [3] {high satisf with dem,                                                                      
##      high satisf with educ,                                                                     
##      high trust in c_prl}    => {high trust in EP} 0.1022579  0.6467301 0.1581152 2.247194  3125
## [4] {high trust in c_lgl,                                                                       
##      high trust in c_prl,                                                                       
##      medium politics intr}   => {high trust in EP} 0.1022906  0.6310052 0.1621073 2.192555  3126
## [5] {high trust in c_lgl,                                                                       
##      high trust in c_prl}    => {high trust in EP} 0.1359293  0.6296802 0.2158704 2.187951  4154
## [6] {high satisf with dem,                                                                      
##      high trust in c_prl}    => {high trust in EP} 0.1265707  0.6267012 0.2019634 2.177600  3868

The rule with the highest confidence of 0.6499 (rule 1) contains characteristic like high satisfaction with democracy and high trust in country’s parliament and legal system, which means that the estimated conditional probability that a participant who gave those answers also responded with high trust in the European Parliament is approximately 65%. The support of 0.1106 means that about 11% of all responses contained all 4 of these characteristics together and the lift value of 2.2583 implies that these elements co-appeared about 2.26 times more often than they appeared separately. In general, the rules with the highest confidence level imply that participants who reported high trust in the EP could be generally characterized as people with high satisfaction with the democracy and education in their country as well as high trust in both the country’s parliament and legal system and had a medium interest in politics. The main rules can also be displayed according to their other statistics, to see if the lists of the highest quality rules differ heavely due to the quality measure used.

rules.bylift<-sort(rules, by="lift", decreasing=TRUE)
inspect(head(rules.bylift))

##     lhs                         rhs                  support confidence  coverage     lift count
## [1] {high satisf with dem,                                                                      
##      high trust in c_lgl,                                                                       
##      high trust in c_prl}    => {high trust in EP} 0.1110602  0.6499426 0.1708770 2.258356  3394
## [2] {high satisf with educ,                                                                     
##      high trust in c_lgl,                                                                       
##      high trust in c_prl}    => {high trust in EP} 0.1047448  0.6471896 0.1618455 2.248791  3201
## [3] {high satisf with dem,                                                                      
##      high satisf with educ,                                                                     
##      high trust in c_prl}    => {high trust in EP} 0.1022579  0.6467301 0.1581152 2.247194  3125
## [4] {high trust in c_lgl,                                                                       
##      high trust in c_prl,                                                                       
##      medium politics intr}   => {high trust in EP} 0.1022906  0.6310052 0.1621073 2.192555  3126
## [5] {high trust in c_lgl,                                                                       
##      high trust in c_prl}    => {high trust in EP} 0.1359293  0.6296802 0.2158704 2.187951  4154
## [6] {high satisf with dem,                                                                      
##      high trust in c_prl}    => {high trust in EP} 0.1265707  0.6267012 0.2019634 2.177600  3868

rules.bycount<-sort(rules, by="count", decreasing=TRUE)
inspect(head(rules.bycount))

##     lhs                         rhs                  support confidence  coverage     lift count
## [1] {high trust in c_prl}    => {high trust in EP} 0.1604712  0.5932737 0.2704843 2.061449  4904
## [2] {high satisf with dem,                                                                      
##      high trust in c_lgl}    => {high trust in EP} 0.1374346  0.5509642 0.2494437 1.914436  4200
## [3] {high trust in c_lgl,                                                                       
##      high trust in c_prl}    => {high trust in EP} 0.1359293  0.6296802 0.2158704 2.187951  4154
## [4] {high satisf with educ,                                                                     
##      high trust in c_lgl}    => {high trust in EP} 0.1329188  0.5327912 0.2494764 1.851290  4062
## [5] {high satisf with dem,                                                                      
##      high satisf with educ}  => {high trust in EP} 0.1318717  0.5019930 0.2626963 1.744276  4030
## [6] {high satisf with dem,                                                                      
##      high trust in c_prl}    => {high trust in EP} 0.1265707  0.6267012 0.2019634 2.177600  3868

rules.bysupp<-sort(rules, by="support", decreasing=TRUE)
inspect(head(rules.bysupp))

##     lhs                         rhs                  support confidence  coverage     lift count
## [1] {high trust in c_prl}    => {high trust in EP} 0.1604712  0.5932737 0.2704843 2.061449  4904
## [2] {high satisf with dem,                                                                      
##      high trust in c_lgl}    => {high trust in EP} 0.1374346  0.5509642 0.2494437 1.914436  4200
## [3] {high trust in c_lgl,                                                                       
##      high trust in c_prl}    => {high trust in EP} 0.1359293  0.6296802 0.2158704 2.187951  4154
## [4] {high satisf with educ,                                                                     
##      high trust in c_lgl}    => {high trust in EP} 0.1329188  0.5327912 0.2494764 1.851290  4062
## [5] {high satisf with dem,                                                                      
##      high satisf with educ}  => {high trust in EP} 0.1318717  0.5019930 0.2626963 1.744276  4030
## [6] {high satisf with dem,                                                                      
##      high trust in c_prl}    => {high trust in EP} 0.1265707  0.6267012 0.2019634 2.177600  3868

Through this method 20 general rules with quite good characteristic have been created, however before moving on to the full interpretation of the results it might be worth to clean the rules in terms of their redundancy, correlations and potential non-maximal characters in order to draw the conclusions from the statisitcaly best rules. A rule can be defined as redundant when it does not add any new information in an effective way. In more detail, redundancy occurs when another rule with the same or higher confidence value but more general character can be found. The significance of rules is checked with the Fisher’s exact test and aims to ensure that the rules were not created due to chance. Additionally, the maximal character of rules is important to keep in mind, because foucsing on maximal rules, which cannot be generalized further without loosing important information helps with focusing on the most informative rules.

rules.clean<-rules[!is.redundant(rules)]
rules.clean<-rules.clean[is.significant(rules.clean, trans1)]
rules.clean<-rules.clean[is.maximal(rules.clean)]
inspectDT(rules.clean)

After insuring the quality of the entries, 12 out of initial 20 rules have been kept and can be characterized as maximal, significant and non-redundant. The strongest and the most informative association rules can now be visualized in order to better understand the characteristics linked with high trust in EP.

# interactive graphics for rules
plot(rules.clean, method="graph", engine="htmlwidget", main="Interactive rules representation")

## Warning: Unknown control parameters: main

## Available control parameters (with default values):
## itemCol   =  #CBD2FC
## nodeCol   =  c("#EE0000", "#EE0303", "#EE0606", "#EE0909", "#EE0C0C", "#EE0F0F", "#EE1212", "#EE1515", "#EE1818", "#EE1B1B", "#EE1E1E", "#EE2222", "#EE2525", "#EE2828", "#EE2B2B", "#EE2E2E", "#EE3131", "#EE3434", "#EE3737", "#EE3A3A", "#EE3D3D", "#EE4040", "#EE4444", "#EE4747", "#EE4A4A", "#EE4D4D", "#EE5050", "#EE5353", "#EE5656", "#EE5959", "#EE5C5C", "#EE5F5F", "#EE6262", "#EE6666", "#EE6969", "#EE6C6C", "#EE6F6F", "#EE7272", "#EE7575", "#EE7878", "#EE7B7B", "#EE7E7E", "#EE8181", "#EE8484", "#EE8888", "#EE8B8B",  "#EE8E8E", "#EE9191", "#EE9494", "#EE9797", "#EE9999", "#EE9B9B", "#EE9D9D", "#EE9F9F", "#EEA0A0", "#EEA2A2", "#EEA4A4", "#EEA5A5", "#EEA7A7", "#EEA9A9", "#EEABAB", "#EEACAC", "#EEAEAE", "#EEB0B0", "#EEB1B1", "#EEB3B3", "#EEB5B5", "#EEB7B7", "#EEB8B8", "#EEBABA", "#EEBCBC", "#EEBDBD", "#EEBFBF", "#EEC1C1", "#EEC3C3", "#EEC4C4", "#EEC6C6", "#EEC8C8", "#EEC9C9", "#EECBCB", "#EECDCD", "#EECFCF", "#EED0D0", "#EED2D2", "#EED4D4", "#EED5D5", "#EED7D7", "#EED9D9", "#EEDBDB", "#EEDCDC", "#EEDEDE", "#EEE0E0",  "#EEE1E1", "#EEE3E3", "#EEE5E5", "#EEE7E7", "#EEE8E8", "#EEEAEA", "#EEECEC", "#EEEEEE")
## precision     =  3
## igraphLayout  =  layout_nicely
## interactive   =  TRUE
## engine    =  visNetwork
## max   =  100
## selection_menu    =  TRUE
## degree_highlight  =  1
## verbose   =  FALSE

#network illustration of the rules
plot(rules.clean, method="graph", control=list(type="items"), main="Rules network illustration")

## Warning: Unknown control parameters: type, main

## Available control parameters (with default values):
## layout    =  stress
## circular  =  FALSE
## ggraphdots    =  NULL
## edges     =  <environment>
## nodes     =  <environment>
## nodetext  =  <environment>
## colors    =  c("#EE0000FF", "#EEEEEEFF")
## engine    =  ggplot2
## max   =  100
## verbose   =  FALSE

#complexity of rules containing "high trust in EP"
plot(rules.clean, method="paracoord", control=list(reorder=TRUE), main="Complexity of rules for high trust in EP")

The best of the created association rules link high trust in the European parliament with 8 other characteristics. Based on the analysis it can be concluded that a person who displays high trust in the EP is also a generally trusting person - displays high general trust in other people. Additionally such EU citizen also has high trust in their home country parliament and the legal system - it seems that they generally believe in the political system and the institutions that are meant to represent them. Such respondent also displays high satisfaction with the economic, democratic and educational situation in the country where they live and shows a medium interest in politics - follows politics regularly and is probably aware of various political mechanisms and current affairs.

3.3 Association analysis for medium trust

High trust in institutions like the European Parliament is incredibly important, however for some people it might now be achievable, especially not the in the current unstable and eurosceptic attitudes in Europe. If high trust cannot be achieved, then it might be important to stray away from low trust and at least maintain medium levels in order for the EP to be able to function somewhat properly. Therefore, a similar analysis that was done for the high category can be repeated for medium trust in the EP to see what meaningful association rules can be found in this case.

rules<-apriori(data=data, parameter=list(supp=0.05, conf=0.05), appearance=list(default="lhs", rhs="medium trust in EP"), control=list(verbose=F))
rules.clean<-rules[!is.redundant(rules)] 
rules.clean<-rules.clean[is.significant(rules.clean, data)] 
rules.clean<-rules.clean[is.maximal(rules.clean)]
rules.byconf<-sort(rules.clean, by="confidence", decreasing=TRUE)
inspect(head(rules.byconf))

##     lhs                               rhs                     support confidence   coverage     lift count
## [1] {e_unif maybe further,                                                                                
##      medium trust in c_lgl,                                                                               
##      medium trust in c_prl}        => {medium trust in EP} 0.06142016  0.6910898 0.08887435 1.641130  1877
## [2] {e_unif maybe further,                                                                                
##      medium satisf with dem,                                                                              
##      medium trust in c_prl}        => {medium trust in EP} 0.06475785  0.6871528 0.09424084 1.631781  1979
## [3] {e_unif maybe further,                                                                                
##      immig neutral for econ,                                                                              
##      immig neutral for life_cond,                                                                         
##      medium trust in c_prl}        => {medium trust in EP} 0.05085079  0.6797900 0.07480366 1.614297  1554
## [4] {e_unif maybe further,                                                                                
##      medium satisf with educ,                                                                             
##      medium trust in c_prl}        => {medium trust in EP} 0.05742801  0.6763006 0.08491492 1.606010  1755
## [5] {e_unif maybe further,                                                                                
##      medium trust in c_prl,                                                                               
##      somewhat worried ab climate}  => {medium trust in EP} 0.05752618  0.6748560 0.08524215 1.602580  1758
## [6] {medium politics intr,                                                                                
##      medium satisf with dem,                                                                              
##      medium trust in c_lgl,                                                                               
##      medium trust in c_prl}        => {medium trust in EP} 0.05271597  0.6746231 0.07814136 1.602027  1611

The discovered rules have been once again assessed, cleaned and displayed according to their confidence levels. The found association link show that people who display a medium trust in the EP can often also be characterized by traits like medium trust in country’s parliament and legal system, medium satisfaction with democracy and education and believe that immigrants have a neutral effect on economy and general life conditions and EU integration should maybe go a little further

3.4 Automatic categorization method

Apart from the classic association rules analysis that has been shown above, there are also other available methods - for example processes which involve automatic categorization, which might be worth discussing, examining and considering. In order to apply an automatic categorization method using the package “arulesCBA”, firstly it is necessary to load the original dataset, which contains numerical variables that have not been categorized yet. Then, to discretize other variables one dependent variable need to be chosen - in this case it naturally will be trust in European Parliament, which exact levels with the number of corresponding responses can be seen below.

library(arules)
library(arulesViz)
library(arulesCBA)

original<-read.csv("original.csv", header=TRUE, sep=",")
table(original$trstep)

## 
##    0    1    2    3    4    5    6    7    8    9   10 
## 2549 1334 2219 2793 2877 5793 4199 4266 2905  985  639

Subsequently, it is possible to discretize all of the variables, however do so using a completely automatic approach this time. All of the created categories and their corresponding intervals for the variables have been shown below.

original$trstep<-as.factor(original$trstep)
data.disc<-discretizeDF.supervised(trstep ~ ., data=original[,2:13], method="mdlp") # now it goes
summary(data.disc)

##        ppltrst            polintr            trstprl           trstlgl    
##  [-Inf,0.5): 2035   [-Inf,2.5):13415   [4.5,5.5) :5079   [4.5,5.5) :4316  
##  [0.5,1.5) : 1028   [2.5,3.5) :10970   [6.5,7.5) :3837   [7.5,8.5) :4298  
##  [1.5,2.5) : 2038   [3.5, Inf]: 6174   [5.5,6.5) :3592   [6.5,7.5) :4256  
##  [2.5,5.5) :11579                      [2.5,3.5) :3353   [5.5,6.5) :3437  
##  [5.5,7.5) : 8461                      [-Inf,0.5):3259   [2.5,3.5) :2499  
##  [7.5,9.5) : 4799                      [7.5,8.5) :2862   [-Inf,0.5):2483  
##  [9.5, Inf]:  619                      (Other)   :8577   (Other)   :9270  
##      trstep            stfeco           stfdem            stfedu    
##  5      :5793   [3.5,5.5) :8420   [7.5,9.5):5951   [-Inf,0.5):1060  
##  7      :4266   [6.5,8.5) :7206   [6.5,7.5):4731   [0.5,1.5) : 685  
##  6      :4199   [5.5,6.5) :4256   [1.5,3.5):4713   [1.5,3.5) :4100  
##  8      :2905   [2.5,3.5) :3491   [4.5,5.5):4653   [3.5,5.5) :7249  
##  4      :2877   [1.5,2.5) :2545   [5.5,6.5):3944   [5.5,7.5) :9799  
##  3      :2793   [-Inf,0.5):2030   [3.5,4.5):2861   [7.5,9.5) :6625  
##  (Other):7726   (Other)   :2611   (Other)  :3706   [9.5, Inf]:1041  
##         euftf            imbgeco           imwbcnt            wrclmch     
##  [-Inf,0.5):1678   [-Inf,0.5):2018   [-Inf,0.5): 1700   [-Inf,1.5): 1182  
##  [0.5,2.5) :2528   [0.5,1.5) :1003   [0.5,3.5) : 5233   [1.5,2.5) : 4357  
##  [2.5,4.5) :4937   [1.5,3.5) :4319   [3.5,5.5) :11330   [2.5,3.5) :13478  
##  [4.5,5.5) :7179   [3.5,5.5) :8964   [5.5,6.5) : 3468   [3.5, Inf]:11542  
##  [5.5,7.5) :7274   [5.5,7.5) :8044   [6.5,9.5) : 7616                     
##  [7.5,9.5) :4619   [7.5,9.5) :4825   [9.5, Inf]: 1212                     
##  [9.5, Inf]:2344   [9.5, Inf]:1386

The first thing that can be noticed given the results is that the automatic approach creates very different categories for the numeric responses than the manual process. For example, in case of the variable wrclmch - how worried an individual is about climate change - the manually created categories separated values 1-5 into categories 1-2, 3 and 4-5 corresponding to descriptive groups: not worried, somewhat worried and very worried However, the automatic approach divided the answers into 4 different categories with the separating values being 1.5, 2.5 and 3.5. In this case, it might appear that the automatic categorization might not be the most intuitive approach considering that the values are natural numbers. However, this data can be used a little further in order to examine what kind of rules can be found through this automatic process.

data.trans<-transactions(data.disc)
trans2<-data.trans[, itemFrequency(data.trans)>0.05]
data.ass<-mineCARs(trstep ~ ., transactions=trans2, support=0.05, confidence=0.3)
summary(data.ass)
data.ass.clean<-data.ass[!is.redundant(data.ass)]

rules.byconf<-sort(data.ass.clean, by="confidence", decreasing=TRUE)
inspect(head(rules.byconf))

##     lhs                                         rhs        support   
## [1] {trstprl=[-Inf,0.5), trstlgl=[-Inf,0.5)} => {trstep=0} 0.03069472
## [2] {stfdem=[-Inf,0.5)}                      => {trstep=0} 0.02781505
## [3] {trstlgl=[-Inf,0.5)}                     => {trstep=0} 0.03697765
## [4] {trstprl=[-Inf,0.5)}                     => {trstep=0} 0.04787460
## [5] {euftf=[-Inf,0.5)}                       => {trstep=0} 0.02241565
## [6] {stfeco=[-Inf,0.5)}                      => {trstep=0} 0.02585163
##     confidence coverage   lift     count
## [1] 0.5616766  0.05464839 6.733730  938 
## [2] 0.4569892  0.06086587 5.478672  850 
## [3] 0.4550946  0.08125266 5.455958 1130 
## [4] 0.4489107  0.10664616 5.381821 1463 
## [5] 0.4082241  0.05491017 4.894045  685 
## [6] 0.3891626  0.06642888 4.665523  790

The best discovered association rules based on the automatically categorized data and sorted by the highest confidence scores all focus on a strong connection between the absolute lowest values of variables like trust in country parliament and legal system, satisfaction with democracy and economy and want for further EU unification and the complete lack of trust in the European parliament. All of those rules include responses of 0, therefore confirm that people who consider their trust in the European Parliament usually also do not believe that EU integration should go any further and do not have any trust in the country parliament and legal system either. However, the analysis based on the the automatic function “mineCARs” based on the automatically categorized data did not offer very good insight into overall the links that can be found in the dataset.

4. Conclusions

The association rules analysis was performed on the sets of responses of EU citizens concerning their political views and values. The created rules that searched for significant links between participants’ characteristics and high trust in the European Parliament concluded it was usually people who had high trust in their home country’s parliament and legal system, high general trust in others, high satisfaction with economy, democracy and education and a medium interest in politics. Interestingly enough, the most significant rules for high trust in EP did not include variables like opinions about the immigrants’ influence on economy and on general life conditions as well as the level of worry about climate change. Therefore, the discovered strong links and the entire association rules analysis can offer a better insight into the patterns of characteristics connected to high trust in the European Parliament, which is a vary important and highly desired phenomenon, especially during the current unstable political period.

Association rules project

Weronika Wyrwas

2025-02-06