Everyone who’s living in Poland has already noticed how politically divided our society is. Since the beginning of the 2000s, Poles clustered themselves mainly between right-wing conservatives Law and Justice electors and liberals who vote for the Civic Platform. This divide still exists nowadays and has even intensified as years have passed.
In this study, by using opinion poll based data, we try to identify interesting discriminating factors of those two electorates. By using contrast set mining, we try to find conjunctions of attributes and values that differ meaningfully in their distributions across groups.
Association rule mining is commonly used in market basket analysis. We tried to adapt it to a socio-political framework analysis. Here, each voter is treated as a “transaction” containing their various opinions as “items.” The Apriori algorithm identifies rules of the form:
\[\{Attitude_1, Attitude_2, ...\} \Rightarrow Vote_{Party}\]
To compare electorates, we generate rules separately for each party and measure how differently each attitude pattern predicts voting behavior. This approach reveals not just what each group believes, but which belief combinations most strongly separate them.
library(arules)
library(arulesViz)
library(dplyr)
library(ggplot2)
library(knitr)
library(kableExtra)
The data comes from the Polish Panel Survey (POLPAN), a longitudinal study conducted by the Polish Academy of Sciences. We use the 2018 wave whose data was collected between April and June 2018, selecting variables measuring:
Sources of conflict: Whether respondents perceive various social divisions as strong or weak sources of conflict.
-the rich and the poor
-manual and non-manual workers
-managers and their employees
-residents of cities and rural areas
-religious and non-religious
-those in power and the masses
-capitalists and labor
-more and less educated people
-supporters of different political platforms (parties and groups)
-younger and older generations
Predictors of success: Whether respondents believe factors are important for success.
-Ambition
-Knowing the right people
-Hard work
-Political influence
-Coming from a rich family
-A good education
-Luck
-Innate ability and talent
Party voted: Our target variable (PiS/Law&Justice or PO/Civic Platform). I decided to put Modern party in the same category as PO as both electorate are really similar and both parties would united in 2019.
In this chunck, we load our dataset filter the voters between PiS and PO and then create the “transactions”.
df_arule <- read.csv("polpan_encoded_for_arules3 2.csv")
df_arule <- df_arule %>%
filter(Vote_LawAndJustice == 1 | Vote_CivicPlatform == 1) %>%
select(-Vote_Kukiz15, -Vote_Other, -Vote_PeasantsParty)
trans <- as(as.matrix(df_arule) == 1, "transactions")
cat("Total respondents:", length(trans), "\n")
## Total respondents: 738
cat("Total items:", ncol(trans), "\n")
## Total items: 38
The electoral split between our two parties of interest:
cat("Vote_LJ:", round(itemFrequency(trans)["Vote_LawAndJustice"] * 100, 1), "%\n")
## Vote_LJ: 58 %
cat("Vote_CP:", round(itemFrequency(trans)["Vote_CivicPlatform"] * 100, 1), "%")
## Vote_CP: 42 %
This gives us a sufficient representation of both electorates for meaningful comparison.
MIN_SUPPORT <- 0.05 # Rule must apply to at least 5% of respondents
MIN_CONFIDENCE <- 0.35 # At least 35% must vote for the party
MIN_LEN <- 3 # Minimum 3 attitudes combined (with vote)
MAX_LEN <- 5 # Maximum 5 attitudes combined (with vote)
Although association rule mining is supposed to be unsupervised, we still need to apply some parameters that will determine how our algorithms will work. The support threshold of 5% ensures each rule is based on at least 37 respondents.
We run the Apriori algorithm twice, once for each party, then remove redundant rules (those adding conditions without improving predictive power).
rules_LJ <- apriori(trans,
parameter = list(supp = MIN_SUPPORT,
conf = MIN_CONFIDENCE,
minlen = MIN_LEN,
maxlen = MAX_LEN),
appearance = list(default = "lhs", rhs = "Vote_LawAndJustice"),
control = list(verbose = FALSE))
rules_LJ <- rules_LJ[!is.redundant(rules_LJ)]
cat("Law and Justice rules (after pruning):", length(rules_LJ), "\n")
## Law and Justice rules (after pruning): 1625
rules_CP <- apriori(trans,
parameter = list(supp = MIN_SUPPORT,
conf = MIN_CONFIDENCE,
minlen = MIN_LEN,
maxlen = MAX_LEN),
appearance = list(default = "lhs", rhs = "Vote_CivicPlatform"),
control = list(verbose = FALSE))
rules_CP <- rules_CP[!is.redundant(rules_CP)]
cat("Civic Platform rules (after pruning):", length(rules_CP), "\n")
## Civic Platform rules (after pruning): 1535
Here are all the rules found for both our electorates.
# Extract rules to dataframes
extract_rules <- function(rules, party_name) {
if (length(rules) == 0) return(data.frame())
df <- as(rules, "data.frame")
df$party <- party_name
df$lhs_text <- sapply(LIST(lhs(rules)), function(x) paste(sort(x), collapse = " + "))
df$lhs_count <- sapply(LIST(lhs(rules)), length)
df[, c("lhs_text", "lhs_count", "support", "confidence", "lift", "count", "party")]}
df_LJ <- extract_rules(rules_LJ, "LawAndJustice")
df_CP <- extract_rules(rules_CP, "CivicPlatform")
# Merge
comparison <- merge(df_LJ, df_CP, by = "lhs_text", suffixes = c("_LJ", "_CP"))
# Calculate lift difference
comparison <- comparison %>%
mutate(
lift_diff = lift_LJ - lift_CP,
abs_lift_diff = abs(lift_diff)
) %>%
arrange(desc(abs_lift_diff))
cat("Rules existing for both parties:", nrow(comparison), "\n")
## Rules existing for both parties: 312
These are rules that exist for both parties with opposite tendencies (lift > 1 for one party and lift < 1 for the other).
opposite_rules <- comparison %>%
filter(
(lift_LJ > 1 & lift_CP < 1) | (lift_LJ < 1 & lift_CP > 1)
) %>%
arrange(desc(abs_lift_diff)) %>%
select(lhs_text, lift_LJ, lift_CP, lift_diff, confidence_LJ, confidence_CP)
cat("Total truly discriminating rules:", nrow(opposite_rules), "\n")
## Total truly discriminating rules: 312
Lift tells how much more likely this attitude combination is to predict voting for a party compared to random chance.
Confidence tells what percentage of people with this attitude combination actually vote for the party.
pis_rules <- opposite_rules %>%
filter(lift_diff > 0) %>%
head(10)
if(nrow(pis_rules) > 0) {
kable(pis_rules,
col.names = c("Attitude(s)", "Lift (PiS)", "Lift (PO)", "Difference", "Conf (PiS)", "Conf (PO)"),
digits = 3, align = "lccccc") %>%
kable_styling(bootstrap_options = "striped", full_width = TRUE) %>%
column_spec(1, width = "40%")
} else {
cat("No rules found favoring Law and Justice.")}
| Attitude(s) | Lift (PiS) | Lift (PO) | Difference | Conf (PiS) | Conf (PO) |
|---|---|---|---|---|---|
| S_HardWork_Imp + S_RichFamily_Imp | 1.120 | 0.835 | 0.285 | 0.649 | 0.351 |
| S_Luck_Imp + S_PoliticalInfluence_Imp | 1.119 | 0.836 | 0.283 | 0.649 | 0.351 |
| C_ManualNonManual_Strong_C + C_ReligiousNonReligious_Weak_C | 1.117 | 0.838 | 0.279 | 0.648 | 0.352 |
| C_ManualNonManual_Strong_C + C_YoungerOlder_Strong_C | 1.113 | 0.844 | 0.270 | 0.646 | 0.354 |
| S_Education_Imp + S_RichFamily_Imp | 1.111 | 0.847 | 0.263 | 0.644 | 0.356 |
| C_ManagersEmployees_Weak_C + C_ReligiousNonReligious_Weak_C | 1.106 | 0.854 | 0.252 | 0.641 | 0.359 |
| C_ReligiousNonReligious_Weak_C + S_Talent_Imp | 1.105 | 0.854 | 0.251 | 0.641 | 0.359 |
| C_YoungerOlder_Strong_C + S_RightPeople_Imp | 1.103 | 0.858 | 0.244 | 0.639 | 0.361 |
| C_ManualNonManual_Strong_C + S_Talent_Imp | 1.102 | 0.859 | 0.243 | 0.639 | 0.361 |
| C_YoungerOlder_Weak_C + S_RichFamily_Imp | 1.102 | 0.860 | 0.242 | 0.639 | 0.361 |
po_rules <- opposite_rules %>%
filter(lift_diff < 0) %>%
head(10)
if(nrow(po_rules) > 0) {
kable(po_rules,
col.names = c("Attitude(s)", "Lift (PiS)", "Lift (PO)", "Difference", "Conf (PiS)", "Conf (PO)"),
digits = 3, align = "lccccc") %>%
kable_styling(bootstrap_options = "striped", full_width = TRUE) %>%
column_spec(1, width = "40%")
} else {
cat("No rules found favoring Civic Platform.")}
| Attitude(s) | Lift (PiS) | Lift (PO) | Difference | Conf (PiS) | Conf (PO) |
|---|---|---|---|---|---|
| C_PowerMasses_Strong_C + S_RichFamily_NotImp | 0.761 | 1.330 | -0.569 | 0.441 | 0.559 |
| C_ReligiousNonReligious_Strong_C + S_RichFamily_NotImp | 0.784 | 1.299 | -0.515 | 0.455 | 0.545 |
| C_CapitalistsLabor_Weak_C + C_PowerMasses_Strong_C | 0.792 | 1.287 | -0.495 | 0.459 | 0.541 |
| C_ManualNonManual_Weak_C + C_ReligiousNonReligious_Strong_C | 0.807 | 1.267 | -0.460 | 0.468 | 0.532 |
| C_EducatedLessEducated_Strong_C + S_RichFamily_NotImp | 0.811 | 1.260 | -0.449 | 0.471 | 0.529 |
| C_PowerMasses_Strong_C + S_PoliticalInfluence_NotImp | 0.813 | 1.258 | -0.445 | 0.472 | 0.528 |
| C_CapitalistsLabor_Strong_C + S_RichFamily_NotImp | 0.840 | 1.221 | -0.381 | 0.487 | 0.513 |
| C_PowerMasses_Strong_C + C_UrbanRural_Weak_C | 0.840 | 1.221 | -0.381 | 0.487 | 0.513 |
| C_ManagersEmployees_Strong_C + S_RichFamily_NotImp | 0.840 | 1.220 | -0.380 | 0.487 | 0.513 |
| C_ManualNonManual_Weak_C + C_RichPoor_Strong_C | 0.846 | 1.213 | -0.366 | 0.491 | 0.509 |
Here we have a visualisation of the previous dataframe.
if(nrow(pis_rules) > 0) {
pis_viz <- pis_rules %>%
mutate(
lhs_short = gsub("_", " ", lhs_text),
lhs_short = substr(lhs_short, 1, 55)
)
ggplot(pis_viz, aes(x = reorder(lhs_short, lift_diff), y = lift_diff)) +
geom_col(fill = "#1e3a5f") +
coord_flip() +
labs(
title = "Rules Favoring Law and Justice",
subtitle = "Attitude combinations that increase PiS vote likelihood",
x = NULL,
y = "Lift Difference"
) +
theme_minimal(base_size = 12) +
theme(
axis.text.y = element_text(size = 9),
plot.title = element_text(face = "bold"))}
if(nrow(po_rules) > 0) {
po_viz <- po_rules %>%
mutate(
lhs_short = gsub("_", " ", lhs_text),
lhs_short = substr(lhs_short, 1, 55)
)
ggplot(po_viz, aes(x = reorder(lhs_short, -lift_diff), y = lift_diff)) +
geom_col(fill = "#f97316") +
coord_flip() +
labs(
title = "Rules Favoring Civic Platform",
subtitle = "Attitude combinations that increase PO vote likelihood",
x = NULL,
y = "Lift Difference"
) +
theme_minimal(base_size = 12) +
theme(
axis.text.y = element_text(size = 9),
plot.title = element_text(face = "bold"))}
Our methodology brought us some very interesting results. Some rules that we get corroborate what we know about Polish politics. Many of those rules can serve as a base for further research because obviously, this paper has to be treated as an exploratory analysis. We get displayed only rules that had truly an opposed effect of chances of voting for both electorate, so consider that these rules mark a true opposition and not simply a difference in opinion. Other variables could be added and more individuals as well such that we get stronger rules. Naturally we didn’t include demographic variables as well, which could be the ones responsible for given values, and we know that older generations for example tend to vote more for Law and Justice.