Introduction

Everyone who’s living in Poland has already noticed how politically divided our society is. Since the beginning of the 2000s, Poles clustered themselves mainly between right-wing conservatives Law and Justice electors and liberals who vote for the Civic Platform. This divide still exists nowadays and has even intensified as years have passed.

In this study, by using opinion poll based data, we try to identify interesting discriminating factors of those two electorates. By using contrast set mining, we try to find conjunctions of attributes and values that differ meaningfully in their distributions across groups.

Methodology

Association Rules and Contrast Sets

Association rule mining is commonly used in market basket analysis. We tried to adapt it to a socio-political framework analysis. Here, each voter is treated as a “transaction” containing their various opinions as “items.” The Apriori algorithm identifies rules of the form:

\[\{Attitude_1, Attitude_2, ...\} \Rightarrow Vote_{Party}\]

To compare electorates, we generate rules separately for each party and measure how differently each attitude pattern predicts voting behavior. This approach reveals not just what each group believes, but which belief combinations most strongly separate them.

Data

library(arules)
library(arulesViz)
library(dplyr)
library(ggplot2)
library(knitr)
library(kableExtra)

The data comes from the Polish Panel Survey (POLPAN), a longitudinal study conducted by the Polish Academy of Sciences. We use the 2018 wave whose data was collected between April and June 2018, selecting variables measuring:

Sources of conflict: Whether respondents perceive various social divisions as strong or weak sources of conflict.

-the rich and the poor

-manual and non-manual workers

-managers and their employees

-residents of cities and rural areas

-religious and non-religious

-those in power and the masses

-capitalists and labor

-more and less educated people

-supporters of different political platforms (parties and groups)

-younger and older generations

Predictors of success: Whether respondents believe factors are important for success.

-Ambition

-Knowing the right people

-Hard work

-Political influence

-Coming from a rich family

-A good education

-Luck

-Innate ability and talent

Party voted: Our target variable (PiS/Law&Justice or PO/Civic Platform). I decided to put Modern party in the same category as PO as both electorate are really similar and both parties would united in 2019.

In this chunck, we load our dataset filter the voters between PiS and PO and then create the “transactions”.

df_arule <- read.csv("polpan_encoded_for_arules3 2.csv")

df_arule <- df_arule %>%
  filter(Vote_LawAndJustice == 1 | Vote_CivicPlatform == 1) %>%
  select(-Vote_Kukiz15, -Vote_Other, -Vote_PeasantsParty)

trans <- as(as.matrix(df_arule) == 1, "transactions")

Dataset Overview

cat("Total respondents:", length(trans), "\n")

## Total respondents: 738

cat("Total items:", ncol(trans), "\n")

## Total items: 38

The electoral split between our two parties of interest:

cat("Vote_LJ:", round(itemFrequency(trans)["Vote_LawAndJustice"] * 100, 1), "%\n")

## Vote_LJ: 58 %

cat("Vote_CP:", round(itemFrequency(trans)["Vote_CivicPlatform"] * 100, 1), "%")

## Vote_CP: 42 %

This gives us a sufficient representation of both electorates for meaningful comparison.

Analysis

Algorithm Parameters

MIN_SUPPORT <- 0.05    # Rule must apply to at least 5% of respondents
MIN_CONFIDENCE <- 0.35 # At least 35% must vote for the party
MIN_LEN <- 3           # Minimum 3 attitudes combined (with vote)
MAX_LEN <- 5           # Maximum 5 attitudes combined (with vote)

Although association rule mining is supposed to be unsupervised, we still need to apply some parameters that will determine how our algorithms will work. The support threshold of 5% ensures each rule is based on at least 37 respondents.

Generate rules for both electorate

We run the Apriori algorithm twice, once for each party, then remove redundant rules (those adding conditions without improving predictive power).

Law and Justice Rules

rules_LJ <- apriori(trans,
                    parameter = list(supp = MIN_SUPPORT, 
                                     conf = MIN_CONFIDENCE, 
                                     minlen = MIN_LEN, 
                                     maxlen = MAX_LEN),
                    appearance = list(default = "lhs", rhs = "Vote_LawAndJustice"),
                    control = list(verbose = FALSE))

rules_LJ <- rules_LJ[!is.redundant(rules_LJ)]
cat("Law and Justice rules (after pruning):", length(rules_LJ), "\n")

## Law and Justice rules (after pruning): 1625

Civic Platform Rules

rules_CP <- apriori(trans,
                    parameter = list(supp = MIN_SUPPORT, 
                                     conf = MIN_CONFIDENCE, 
                                     minlen = MIN_LEN, 
                                     maxlen = MAX_LEN),
                    appearance = list(default = "lhs", rhs = "Vote_CivicPlatform"),
                    control = list(verbose = FALSE))

rules_CP <- rules_CP[!is.redundant(rules_CP)]
cat("Civic Platform rules (after pruning):", length(rules_CP), "\n")

## Civic Platform rules (after pruning): 1535

Comparing Electorates

Here are all the rules found for both our electorates.

# Extract rules to dataframes
extract_rules <- function(rules, party_name) {
  if (length(rules) == 0) return(data.frame())
  df <- as(rules, "data.frame")
  df$party <- party_name
  df$lhs_text <- sapply(LIST(lhs(rules)), function(x) paste(sort(x), collapse = " + "))
  df$lhs_count <- sapply(LIST(lhs(rules)), length)
  df[, c("lhs_text", "lhs_count", "support", "confidence", "lift", "count", "party")]}

df_LJ <- extract_rules(rules_LJ, "LawAndJustice")
df_CP <- extract_rules(rules_CP, "CivicPlatform")

# Merge
comparison <- merge(df_LJ, df_CP, by = "lhs_text", suffixes = c("_LJ", "_CP"))

# Calculate lift difference
comparison <- comparison %>%
  mutate(
    lift_diff = lift_LJ - lift_CP,
    abs_lift_diff = abs(lift_diff)
  ) %>%
  arrange(desc(abs_lift_diff))

cat("Rules existing for both parties:", nrow(comparison), "\n")

## Rules existing for both parties: 312

Results

Truly Discriminating Rules

These are rules that exist for both parties with opposite tendencies (lift > 1 for one party and lift < 1 for the other).

opposite_rules <- comparison %>%
  filter(
    (lift_LJ > 1 & lift_CP < 1) | (lift_LJ < 1 & lift_CP > 1)
  ) %>%
  arrange(desc(abs_lift_diff)) %>%
  select(lhs_text, lift_LJ, lift_CP, lift_diff, confidence_LJ, confidence_CP)

cat("Total truly discriminating rules:", nrow(opposite_rules), "\n")

## Total truly discriminating rules: 312

Rules Favouring x party

Lift tells how much more likely this attitude combination is to predict voting for a party compared to random chance.

Confidence tells what percentage of people with this attitude combination actually vote for the party.

_imp = important
_NotImp = not important
C_ = conflits
S_ = success

Rules Favoring Law and Justice

pis_rules <- opposite_rules %>%
  filter(lift_diff > 0) %>%
  head(10)

if(nrow(pis_rules) > 0) {
  kable(pis_rules,
        col.names = c("Attitude(s)", "Lift (PiS)", "Lift (PO)", "Difference", "Conf (PiS)", "Conf (PO)"),
        digits = 3, align = "lccccc") %>%
    kable_styling(bootstrap_options = "striped", full_width = TRUE) %>%
    column_spec(1, width = "40%")
} else {
  cat("No rules found favoring Law and Justice.")}

Attitude(s)	Lift (PiS)	Lift (PO)	Difference	Conf (PiS)	Conf (PO)
S_HardWork_Imp + S_RichFamily_Imp	1.120	0.835	0.285	0.649	0.351
S_Luck_Imp + S_PoliticalInfluence_Imp	1.119	0.836	0.283	0.649	0.351
C_ManualNonManual_Strong_C + C_ReligiousNonReligious_Weak_C	1.117	0.838	0.279	0.648	0.352
C_ManualNonManual_Strong_C + C_YoungerOlder_Strong_C	1.113	0.844	0.270	0.646	0.354
S_Education_Imp + S_RichFamily_Imp	1.111	0.847	0.263	0.644	0.356
C_ManagersEmployees_Weak_C + C_ReligiousNonReligious_Weak_C	1.106	0.854	0.252	0.641	0.359
C_ReligiousNonReligious_Weak_C + S_Talent_Imp	1.105	0.854	0.251	0.641	0.359
C_YoungerOlder_Strong_C + S_RightPeople_Imp	1.103	0.858	0.244	0.639	0.361
C_ManualNonManual_Strong_C + S_Talent_Imp	1.102	0.859	0.243	0.639	0.361
C_YoungerOlder_Weak_C + S_RichFamily_Imp	1.102	0.860	0.242	0.639	0.361

Rules Favoring Civic Platform

po_rules <- opposite_rules %>%
  filter(lift_diff < 0) %>%
  head(10)

if(nrow(po_rules) > 0) {
  kable(po_rules,
        col.names = c("Attitude(s)", "Lift (PiS)", "Lift (PO)", "Difference", "Conf (PiS)", "Conf (PO)"),
        digits = 3, align = "lccccc") %>%
    kable_styling(bootstrap_options = "striped", full_width = TRUE) %>%
    column_spec(1, width = "40%")
} else {
  cat("No rules found favoring Civic Platform.")}

Attitude(s)	Lift (PiS)	Lift (PO)	Difference	Conf (PiS)	Conf (PO)
C_PowerMasses_Strong_C + S_RichFamily_NotImp	0.761	1.330	-0.569	0.441	0.559
C_ReligiousNonReligious_Strong_C + S_RichFamily_NotImp	0.784	1.299	-0.515	0.455	0.545
C_CapitalistsLabor_Weak_C + C_PowerMasses_Strong_C	0.792	1.287	-0.495	0.459	0.541
C_ManualNonManual_Weak_C + C_ReligiousNonReligious_Strong_C	0.807	1.267	-0.460	0.468	0.532
C_EducatedLessEducated_Strong_C + S_RichFamily_NotImp	0.811	1.260	-0.449	0.471	0.529
C_PowerMasses_Strong_C + S_PoliticalInfluence_NotImp	0.813	1.258	-0.445	0.472	0.528
C_CapitalistsLabor_Strong_C + S_RichFamily_NotImp	0.840	1.221	-0.381	0.487	0.513
C_PowerMasses_Strong_C + C_UrbanRural_Weak_C	0.840	1.221	-0.381	0.487	0.513
C_ManagersEmployees_Strong_C + S_RichFamily_NotImp	0.840	1.220	-0.380	0.487	0.513
C_ManualNonManual_Weak_C + C_RichPoor_Strong_C	0.846	1.213	-0.366	0.491	0.509

Plots

Here we have a visualisation of the previous dataframe.

Law and Justice

if(nrow(pis_rules) > 0) {
  pis_viz <- pis_rules %>%
    mutate(
      lhs_short = gsub("_", " ", lhs_text),
      lhs_short = substr(lhs_short, 1, 55)
    )
  
  ggplot(pis_viz, aes(x = reorder(lhs_short, lift_diff), y = lift_diff)) +
    geom_col(fill = "#1e3a5f") +
    coord_flip() +
    labs(
      title = "Rules Favoring Law and Justice",
      subtitle = "Attitude combinations that increase PiS vote likelihood",
      x = NULL, 
      y = "Lift Difference"
    ) +
    theme_minimal(base_size = 12) +
    theme(
      axis.text.y = element_text(size = 9),
      plot.title = element_text(face = "bold"))}

Civic Platform

if(nrow(po_rules) > 0) {
  po_viz <- po_rules %>%
    mutate(
      lhs_short = gsub("_", " ", lhs_text),
      lhs_short = substr(lhs_short, 1, 55)
    )
  
  ggplot(po_viz, aes(x = reorder(lhs_short, -lift_diff), y = lift_diff)) +
    geom_col(fill = "#f97316") +
    coord_flip() +
    labs(
      title = "Rules Favoring Civic Platform",
      subtitle = "Attitude combinations that increase PO vote likelihood",
      x = NULL, 
      y = "Lift Difference"
    ) +
    theme_minimal(base_size = 12) +
    theme(
      axis.text.y = element_text(size = 9),
      plot.title = element_text(face = "bold"))}

Conclusion

Our methodology brought us some very interesting results. Some rules that we get corroborate what we know about Polish politics. Many of those rules can serve as a base for further research because obviously, this paper has to be treated as an exploratory analysis. We get displayed only rules that had truly an opposed effect of chances of voting for both electorate, so consider that these rules mark a true opposition and not simply a difference in opinion. Other variables could be added and more individuals as well such that we get stronger rules. Naturally we didn’t include demographic variables as well, which could be the ones responsible for given values, and we know that older generations for example tend to vote more for Law and Justice.

Contrast Set Mining: Identifying Attitudinal Differences Between Law And Justice And Civic Platform Electorates Through Association Rule Learning

Luka de Caters