Team members and contribution

Project 1

Miya Vakrusheva — Checking if variable types are identified correctly and R data types are matching them, Providing a histogram, Providing a barplot, Interpretation

Artyom Shobyrev — Providing a table with descriptive statistics (mean, median, etc.), Providing a scatterplot, Interpretation, Knitting

Ivan Piatykh — Providing a boxplot, Interpretation

Svetlana Mokrozub — Providing a stacked boxplot, Interpretation

Dana Levkovskaya — Research question

All of us — Theoretical framework

Project 2

Dana Levkovskaya — Pearson’s chi-square test

Svetlana Mokrozub — ANOVA, knitting html file

Ivan Piatykh — T-tests

Artyom Shobyrev — ANOVA

Miya Vakrusheva — T-testa

All of us — theoretical framework

Project 3

Dana Levkovskaya — Theoretical framework, Search of literature, Project discussion

Svetlana Mokrozub — Correlations interpretations, Boxplots, Choice of variables, Draft of correlations and regression models

Ivan Piatykh — Theoretical framework, Search of literature

Artyom Shobyrev — Correlations, Regressions and interpretations, Project outlines discussion, Knitting

Miya Vakrusheva — Descriptive statistics, Project outlines discussion

Project 4 (final)

Dana Levkovskaya — project 4 theory, variables in project 4

Svetlana Mokrozub — project 4 theory, project 1 correction

Ivan Piatykh — correction of the Project 3, project 4 theory

Artyom Shobyrev — project 4 regression models, interpretation, knitting

Miya Vakrusheva — project 4 theory and hypothesis, project 2 correction

library(foreign)
library(ggplot2)
library(dplyr)
library(DescTools)
library(formattable)
library(DescTools)
library(knitr)

setwd("F:/ARRR")
ESS <- read.spss("ESS11.sav", use.value.labels = T, to.data.frame = T)

serbia <- ESS %>%
  filter(cntry == "Serbia") %>% 
  select(agea, sgnptit, pbldmna, polintr, lrscale, hinctnta, edlvdrs, gndr)

Project 1

Introduction

Our project covers political participation in Serbia. We examine several factors and their potential influence on an individual’s interest and activism in politics.

The existing studies examining the same topic have helped us to build a solid ground for our own research. Among the more recent works on the issue, the one conducted by (Stanojević, Vukelić & Tomašević, 2023) particulary stands out. The researchers questioned the factors influencing the levels of young Serbian’s political activism, in both conventional and unconventional forms. A special emphasis was made on human values and institutional trust. In addition to data analysis, Stanojević et al provide the necessary economical and social context of Serbia. The overview of the local context was especially handy as we were not familiar with the countries’ specifics.

The other study by (Petrović & Stanojević, 2020) explore the correlation between age and various forms of political participation.

The analysis revealed that traditional forms of political activism (e.g. membership of political parties, making direct contact with politicians) are more popular among the elderly people, whereas newer ones (such as signing petitions) are mostly performed by younger people. There are several forms of political participation in the ESS dataset, so following Petrović and Stanojević we include them in our research as well.

Our research question is based on the findings described previously, however we include several variables which were not examined in the existing studies.

Thus, the research question in its broad form we have formulated for this small research is How are individual’s characteristics related to their political participation? Specifically, we look closer into how one’s gender, level of education, income, age are connected with their political participation. We understand political participation broadly, both as one’s involvement in politics in active forms of signing petitions, taking part in demonstrations, and as a presence of a strong political opinion and self-placement on the political scale.

Hypotheses we have gathered from the readings are the following:

Those who engage in unconventional political participation practices like signing petitions and taking part in demonstrations are young people.
Those with a higher level of education will have a high interest in politics.

The last graphs we have included cover political participation in the broader sense we have mentioned earlier. We explore age and income connection with one’s placement on the political scale. There is no specific literature on this topic we have found or hypotheses made, but we still decided to freely explore this topic.

Types of variables

In this project we have chosen and will further explore the variables which concern the following factors: age, petition signing, taking part in public demonstrations, interest in politics, placement on political scale, household income, and level of education.

We can explore the types of variables. Well, R stores all of them as factors. So let’s recode them.

str(serbia)

## 'data.frame':    1563 obs. of  8 variables:
##  $ agea    : Factor w/ 76 levels "15","16","17",..: 33 48 32 7 46 36 39 5 21 50 ...
##  $ sgnptit : Factor w/ 2 levels "Yes","No": 1 1 2 2 2 2 1 2 2 2 ...
##  $ pbldmna : Factor w/ 2 levels "Yes","No": 2 2 2 NA 2 2 1 2 2 2 ...
##  $ polintr : Factor w/ 4 levels "Very interested",..: 2 3 3 3 4 4 2 3 4 4 ...
##  $ lrscale : Factor w/ 11 levels "Left","1","2",..: NA 5 6 NA NA 6 7 6 NA 6 ...
##  $ hinctnta: Factor w/ 10 levels "J - 1st decile",..: NA 6 NA NA NA NA 10 2 NA NA ...
##  $ edlvdrs : Factor w/ 19 levels "Nikada nije išao/la u škola, nedovršena osnovna škola, manje od  4 razreda",..: 7 7 12 9 7 7 12 7 9 16 ...
##  $ gndr    : Factor w/ 2 levels "Male","Female": 1 1 2 1 2 2 1 2 2 1 ...
##  - attr(*, "variable.labels")= Named chr [1:640] "Title of dataset" "ESS round" "Edition" "Production date" ...
##   ..- attr(*, "names")= chr [1:640] "name" "essround" "edition" "proddate" ...
##  - attr(*, "codepage")= int 65001

serbia$agea <- as.numeric(as.character(serbia$agea))
serbia$hinctnta <- as.numeric(serbia$hinctnta)
serbia$lrscale <- as.numeric(serbia$lrscale)

str(serbia)

## 'data.frame':    1563 obs. of  8 variables:
##  $ agea    : num  47 62 46 21 60 50 53 19 35 64 ...
##  $ sgnptit : Factor w/ 2 levels "Yes","No": 1 1 2 2 2 2 1 2 2 2 ...
##  $ pbldmna : Factor w/ 2 levels "Yes","No": 2 2 2 NA 2 2 1 2 2 2 ...
##  $ polintr : Factor w/ 4 levels "Very interested",..: 2 3 3 3 4 4 2 3 4 4 ...
##  $ lrscale : num  NA 5 6 NA NA 6 7 6 NA 6 ...
##  $ hinctnta: num  NA 6 NA NA NA NA 10 2 NA NA ...
##  $ edlvdrs : Factor w/ 19 levels "Nikada nije išao/la u škola, nedovršena osnovna škola, manje od  4 razreda",..: 7 7 12 9 7 7 12 7 9 16 ...
##  $ gndr    : Factor w/ 2 levels "Male","Female": 1 1 2 1 2 2 1 2 2 1 ...
##  - attr(*, "variable.labels")= Named chr [1:640] "Title of dataset" "ESS round" "Edition" "Production date" ...
##   ..- attr(*, "names")= chr [1:640] "name" "essround" "edition" "proddate" ...
##  - attr(*, "codepage")= int 65001

serbia <- na.omit(serbia)

agea - Age — ratio (R stores as numeric)
sgnptit - Signed petition last 12 months — categorical nominal (yes / no) (R stores as factor)
pbldmna - Taken part in public demonstration last 12 months — categorical nominal (yes / no) (R stores as factor)
polintr - How interested in politics — categorical ordinal (4-scale, 1 - very interested, 4 - not at all interested)
lrscale - Placement on left right scale — quasi-interval (10-scale, 0 - left, 10 - right) (R now stores as numeric)
hinctnta - Household’s total net income, all sources — quasi-interval (10-scale, 1 - 1-st decile, 10 - 10-th decile) (R now stores as numeric)
edlvdrs - Highest level of education, Serbia — categorical ordinal (18 levels, 1 - lower, 18 - higher) (R stores as factor)
gndr - Gender of the respondent — categorical nominal (Male/Female) (R stores as factor)

Desciptive statistics

Petitions

For petition signing and other categorical variables only the mode can be counted.

mode_pet <- Mode(serbia$sgnptit)

stats_pet <- data.frame(
  Statistic = c("Mode"),
  Value = c(mode_pet)
)

kable(stats_pet,
  caption = "Descriptive Statistics for Petition Signing in Serbia",
  digits = 2,
  align = "lc"
)

Descriptive Statistics for Petition Signing in Serbia
Statistic	Value
Mode	No

We see that the most common answer is No, most of the Serbians have not signed petitions during the 12 month prior to the survey.

ggplot(serbia) +
  geom_bar(aes(x = serbia$sgnptit), na.rm = T) +
  scale_x_discrete(limits = c('Yes', 'No')) +
  labs(x = "Signed petition last 12 months", y = NULL, 
       caption = "Source: ESS 11 Round") + 
  theme(plot.title = element_text(face = "bold", size = 20, vjust = 0.5), 
        panel.grid.minor = element_line(color = "grey"))

As it is seen out of the bar plot, almost 3/4 of repondents in Serbia did not sign petitions in the last 12 months, while approximately 1/4 did.

Gender

mode_gnd <- Mode(serbia$gndr)

stats_gnd <- data.frame(
  Statistic = c("Mode"),
  Value = c(mode_gnd)
)

kable(stats_gnd,
  caption = "Descriptive Statistics for Gender in Serbia",
  digits = 2,
  align = "lc"
)

Descriptive Statistics for Gender in Serbia
Statistic	Value
Mode	Male

We see that the most of the respondents who took the survey in Serbia are men.

ggplot(serbia) +
  geom_bar(aes(x = serbia$gndr)) +
  labs(x = NULL, y = NULL, 
       title = "Gender of respondents",
       caption = "Source: ESS 11 Round") + 
  theme(plot.title = element_text(face = "bold", size = 20, vjust = 0.5), 
        panel.grid.minor = element_line(color = "grey"))

The difference between the number of males and females among the respondents is approximately 50.

Demonstrations

mode_dem <- Mode(serbia$pbldmna)

stats_dem <- data.frame(
  Statistic = c("Mode"),
  Value = c(mode_dem)
)

kable(stats_dem,
  caption = "Descriptive Statistics for Demonstrations in Serbia",
  digits = 2,
  align = "lc"
)

Descriptive Statistics for Demonstrations in Serbia
Statistic	Value
Mode	No

Most of the respondents in Serbia have not participated in demonstrations during the suggested time.

serbiana <- serbia[!is.na(serbia$pbldmna), ]
ggplot(serbiana) +
  geom_bar(aes(x = as.factor(serbiana$pbldmna), na.rm = T)) +
  labs(x = NULL, y = NULL, 
       title = "Have you participated in demonstrations?",
       caption = "Source: ESS 11 Round") + 
  theme(plot.title = element_text(face = "bold", size = 20, vjust = 0.5), 
        panel.grid.minor = element_line(color = "grey"))

The difference is quite large, being almost 550 people. It is larger than the one on the petitions graph, which may suggest that these two forms of political participation are percieved and performed differently.

Interest in politics

mode_int <- Mode(serbia$polintr)

stats_int <- data.frame(
  Statistic = c("Mode"),
  Value = c(mode_int)
)

kable(stats_int,
  caption = "Descriptive Statistics for Demonstrations in Serbia",
  digits = 2,
  align = "lc"
)

Descriptive Statistics for Demonstrations in Serbia
Statistic	Value
Mode	Hardly interested

Most of the respondents are hardly interested in politics.

serbiana <- serbia[!is.na(serbia$polintr), ]
ggplot(serbiana) +
  geom_bar(aes(x = as.factor(serbiana$polintr), na.rm = T)) +
  labs(x = NULL, y = NULL, 
       title = "Interest in politics",
       caption = "Source: ESS 11 Round") + 
  theme(plot.title = element_text(face = "bold", size = 20, vjust = 0.5), 
        panel.grid.minor = element_line(color = "grey"))

As bar plot on interest in politics shows, most people in Serbia are hardly interested in politics or are not interested at all. While lesser amounts are quite interested and even less are very interested in politics.

Age

mean_age <- mean(serbia$agea)
median_age <- median(serbia$agea)
mode_age <- Mode(serbia$agea)
variance_age <- var(serbia$agea)
std_dev_age <- sd(serbia$agea)

stats_age <- data.frame(
  Statistic = c("Mean", "Median", "Mode", "Variance", "Standard Deviation"),
  Value = c(mean_age, median_age, mode_age, variance_age, std_dev_age)
)

kable(stats_age,
  caption = "Descriptive Statistics for Age in Serbia",
  digits = 2,
  align = "lc"
)

Descriptive Statistics for Age in Serbia
Statistic	Value
Mean	53.16
Median	54.50
Mode	71.00
Variance	315.47
Standard Deviation	17.76

Mean and median ages in Serbia are 53 and 55 respectively - very close to one another. Mode is far away from them, the most frequent respondent’s age being 70 y.o. Variance is 315. Standard deviation is 18, which reflects a large amount of variation in the age of respondents in Serbia - the data is not concentrated around the mean.

ggplot(serbia, aes(x = agea)) +
  geom_histogram(binwidth = 1, fill = "grey80", color = "black") +
  geom_vline(aes(xintercept = mean(serbia$agea, na.rm = TRUE), color = "Mean"), lwd = 1) +
  geom_vline(aes(xintercept = median(serbia$agea, na.rm = TRUE), color = "Median"), lwd = 1) +
  geom_vline(aes(xintercept = Mode(serbia$agea), color = "Mode"), lwd = 1) +
  scale_color_manual(name = "Statistics", values = c("Mean" = "blue", "Median" = "red", "Mode" = "purple")) +
  theme_bw() +
  labs(x = "Age", y = "Frequency", 
       title = "Age distribution in Serbia", 
       caption = "Source: ESS 11 Round") + 
  theme(plot.title = element_text(face = "bold", size = 20, vjust = 0.5), 
        panel.grid.minor = element_line(color = "grey")) +
  xlim(0, 100)

Distribution of age in Serbia is characterised with almost the same mean (blue line – 38,5 years) and median (red line – 39 years). The most frequent age is 57 years – purple line. There are high values of age on the right, which are compensated by elongated left tail - this is why mean and median are located in the center. It can explain the fact that the mean and median are located very close to each other, but the mode is quite far away from them.

The distribution does not look like a bell shape. There are certain “gaps” in the distribution of ages - as one of our teammates suggested, they could be explained by war in Yugoslavia.

Left - Right Political Orientation

mean_lr <- mean(serbia$lrscale)
median_lr <- median(serbia$lrscale)
mode_lr <- Mode(serbia$lrscale)
variance_lr <- var(serbia$lrscale)
std_dev_lr <- sd(serbia$lrscale)

stats_lr <- data.frame(
  Statistic = c("Mean", "Median", "Mode", "Variance", "Standard Deviation"),
  Value = c(mean_lr, median_lr, mode_lr, variance_lr, std_dev_lr)
)

kable(stats_lr,
  caption = "Descriptive Statistics for Age in Serbia",
  digits = 2,
  align = "lc"
)

Descriptive Statistics for Age in Serbia
Statistic	Value
Mean	5.63
Median	6.00
Mode	6.00
Variance	6.85
Standard Deviation	2.62

The mean for position on the left-right scale is 5.6, which makes it close, but a bit to the right from the center. Median and mode are even further to the right. Variance is 6.9. Standard deviation is 2.6, which is almost a half of median, which reflects data is scattered quite far from the mean.

ggplot(serbia, aes(x = lrscale)) +
  geom_histogram(binwidth = 1, fill = "grey80", color = "black") +
  geom_vline(aes(xintercept = mean(serbia$lrscale, na.rm = TRUE), color = "Mean"), lwd = 1) +
  geom_vline(aes(xintercept = median(serbia$lrscale, na.rm = TRUE), color = "Median"), lwd = 1) +
  geom_vline(aes(xintercept = Mode(serbia$lrscale), color = "Mode"), lwd = 1) +
  scale_color_manual(name = "Statistics", values = c("Mean" = "blue", "Median" = "red", "Mode" = "purple")) +
  theme_bw() +
  labs(x = "Age", y = "Frequency", 
       title = "Positions on right-left scale in Serbia", 
       caption = "Source: ESS 11 Round") + 
  theme(plot.title = element_text(face = "bold", size = 20, vjust = 0.5), 
        panel.grid.minor = element_line(color = "grey")) +
  xlim(0, 10)

We see, again, that most of the people identify themselves as centrists. However, there are hard leftist, which can be noticed at the left end of the distribution. It may be important for our further ananlysis, though this high of number of centrists reflects there is no bright political identification tendency for Serbians.

Net-Income

mean_n <- mean(serbia$hinctnta)
median_n <- median(serbia$hinctnta)
mode_n <- Mode(serbia$hinctnta)
variance_n <- var(serbia$hinctnta)
std_dev_n <- sd(serbia$hinctnta)

stats_n <- data.frame(
  Statistic = c("Mean", "Median", "Mode", "Variance", "Standard Deviation"),
  Value = c(mean_n, median_n, mode_n, variance_n, std_dev_n)
)

kable(stats_n,
  caption = "Descriptive Statistics for Net-Income in Serbia",
  digits = 2,
  align = "lc"
)

Descriptive Statistics for Net-Income in Serbia
Statistic	Value
Mean	5.96
Median	6.00
Mode	5.00
Variance	6.66
Standard Deviation	2.58

Now we can look at Net-income. The data structure here is quite close to the lef-right scale ones. Mean, median, and mode are close, ranging from 5 to 6. Variance is 6.7. Standard deviation is 2.6 which is almost a half of the mean, which makes the income data deviate quite far away from the mean.

ggplot(serbia, aes(x = hinctnta)) +
  geom_histogram(binwidth = 1, fill = "grey80", color = "black") +
  geom_vline(aes(xintercept = mean(serbia$hinctnta, na.rm = TRUE), color = "Mean"), lwd = 1) +
  geom_vline(aes(xintercept = median(serbia$hinctnta, na.rm = TRUE), color = "Median"), lwd = 1) +
  geom_vline(aes(xintercept = Mode(serbia$hinctnta), color = "Mode"), lwd = 1) +
  scale_color_manual(name = "Statistics", values = c("Mean" = "blue", "Median" = "red", "Mode" = "purple")) +
  theme_bw() +
  labs(x = "Age", y = "Frequency", 
       title = "Net-income in Serbia", 
       caption = "Source: ESS 11 Round") + 
  theme(plot.title = element_text(face = "bold", size = 20, vjust = 0.5), 
        panel.grid.minor = element_line(color = "grey")) +
  xlim(0, 10)

As the plot of net-income shows, most of the Serbians are located in 7-th and 6-th decile which is moderate but closer to low level of income. Followed by the 5-th and 4-th deciles by frequency, which are moderate but closer to high level of income. Also, there are significant amounts of 8-th, 9-th and 10-th deciles, which constitute the poor and the poorest population of Serbia.

Boxplot for age and petitions

serbiana <- serbia[!is.na(serbia$sgnptit), ]
ggplot(serbiana) +
  geom_boxplot(aes(x = as.factor(serbiana$sgnptit), y = as.numeric(serbiana$agea)), na.rm = T) +
  labs(x = NULL, y = 'Age', 
       title = "Have you signed petition in last 12 months?",
       caption = "Source: ESS 11 Round") + 
  theme(plot.title = element_text(face = "bold", size = 20, vjust = 0.5), 
        panel.grid.minor = element_line(color = "grey"))

An apparent observation is that people signing petitions are mostly those of the younger age. A direct compartment reveals a 10-year gap between the age medians of ‘signers’ and ‘non-signers’. Moreover, the right box’s median is close to the upper border of the left box’s interquartile range, singinging a meaningful difference between the two groups. Speaking of the boxes themselves, the left one is positively skewed, whereas the right one is almost symmetric. Though the skew is not large enough, we can still observe the tendency towards younger age among the ‘signers’.

Boxplot for age and demonstrations

serbiana <- serbia[!is.na(serbia$pbldmna), ]
ggplot(serbiana) +
  geom_boxplot(aes(x = as.factor(serbiana$pbldmna), y = as.numeric(serbiana$agea)), na.rm = T) +
  labs(x = NULL, y = 'Age', 
       title = "Have you participate in demonstrations in last 12 months?",
       caption = "Source: ESS 11 Round") + 
  theme(plot.title = element_text(face = "bold", size = 20, vjust = 0.5), 
        panel.grid.minor = element_line(color = "grey")) +
  theme(plot.title = element_text(size = 15))

The following graph generally follows the same patterns as the previous one. With the same comments applied as before, it should be added that the tendency towards younger age is even stronger in the case of demonstration attenders. Unlike with petitions, the median value is lover than 30 for the left box. Reflecting on the two graphs, we conclude that the tendency for younger people to participate in new forms of political participation, described in our first hypothesis, is present in our data.

Stacked bar plot

ESS_copy <- ESS %>% 
  filter(cntry == "Serbia")

ESS_copy$edlvdrs <- as.factor(ESS_copy$edlvdrs)

levels(ESS_copy$edlvdrs) = c("Primary", "Primary", "Primary", "Secondary", "Secondary", "Secondary", "Secondary", "Secondary", "Secondary", "Secondary", "Higher", "Higher", "Higher", "Higher", "Higher", "Higher", "Higher", "Higher", "Other", "Other", "Other")

ESS_copy$icgndra <- as.factor(ESS_copy$icgndra)
levels(ESS_copy$icgndra) = c("Male", "Female", "Other")

#table(ESS_copy$polintr)
ESS_copy$polintr <- dplyr::recode(ESS_copy$polintr,
                          "Not at all interested" = 1,
                          "Hardly interested" = 2,
                          "Quite interested" = 3,
                          "Very interested" = 4)

ESS_copy %>% 
  select(cntry, edlvdrs, polintr) %>% 
  filter(!(edlvdrs=="Other"), !(polintr=="7"), !(polintr=="8"), !(polintr=="9")) %>% 
  ggplot(aes(fill=as.factor(edlvdrs), x=polintr), na.rm = T)+
  geom_bar()+ 
  theme_bw() +
  labs(x = "Political interest", y = "frequency", 
       title = "Polotical interest and level of education in Serbia", 
       caption = "Source: ESS11") + 
  theme(plot.title = element_text(face = "bold", size = 20, vjust = 0.5), 
        panel.grid.minor = element_line(color = "grey"))+
  guides(fill=guide_legend(title="Level of education"))

ESS_copy %>% 
  filter(!(edlvdrs == "Other"), !(polintr %in% c(7,8,9))) %>% 
  ggplot(aes(x = as.factor(polintr), fill = edlvdrs)) +
  geom_bar(position = "fill", na.rm = TRUE) +  # position = fill scales bars to proportions
  scale_y_continuous(labels = scales::percent_format()) +  # y-axis in percent
  theme_bw() +
  labs(x = "Political interest", y = "Proportion", 
       title = "Political interest and level of education in Serbia", 
       caption = "Source: ESS11") + 
  theme(plot.title = element_text(face = "bold", size = 20, vjust = 0.5), 
        panel.grid.minor = element_line(color = "grey")) +
  guides(fill = guide_legend(title = "Level of education"))

ESS_copy %>% 
  filter(!(edlvdrs == "Other"), !(polintr %in% c(7,8,9))) %>% 
  ggplot(aes(x = edlvdrs, fill = as.factor(polintr))) +
  geom_bar(position = "fill", na.rm = TRUE) +
  scale_y_continuous(labels = scales::percent_format()) +
  theme_bw() +
  labs(x = "Level of education", y = "Proportion", 
       title = "Political interest and level of education in Serbia", 
       caption = "Source: ESS11") + 
  theme(plot.title = element_text(face = "bold", size = 20, vjust = 0.5), 
        panel.grid.minor = element_line(color = "grey")) +
  guides(fill = guide_legend(title = "Political interest"))

Firstly, we used the original levels of the variable of the highest level of a person’s education (edlvdrs), but then grouped the 18 levels into three simpler ones, based on the structure of Serbian system of education (Scholaro, n.d.).

From the first graph we can see that most of the respondents are not interested in politics – the majority has chosen option 3 – “Hardly interested”. The fourth one – “Not at all interested” is also very popular. Among those not interested (level 3 and 4) there are more people with higher education. Among those interested in politics the distribution of levels of education is quite equal, Primary being slightly less represented.

We also added two plots revealing the 1) proportion of education levels in different political interest groups, 2) proportion of political interest groups in different education groups. The (1) plot shows there are more low-educated people in the group of the lowest political interest. Moreover, the highest proportion of high-educated people is in the most interested in politics group. The (2) plot shows that the number of actively interested in politics people increases with each educational step, while the Hardly interested and Not at all interested groups decrease in their number.

ESS_copy %>% 
  select(cntry, icgndra, edlvdrs, polintr) %>% 
  filter(!(edlvdrs=="Other"), !(polintr=="7"), !(polintr=="8"), !(polintr=="9"), !(icgndra=="Other")) %>% 
  ggplot(aes(fill=as.factor(edlvdrs), x=polintr), na.rm = T)+
  geom_bar()+ 
  theme_bw() +
  labs(x = "Political interest", y = "frequency", 
       title = "Political interest and level of education in Serbia", 
       caption = "Source: ESS11") + 
  theme(plot.title = element_text(face = "bold", size = 20, vjust = 0.5), 
        panel.grid.minor = element_line(color = "grey"))+
  guides(fill=guide_legend(title="Level of education"))+
  facet_wrap(~icgndra)

The second graph, which includes gender, shows that men are more interested in politics than women are, levels 1 and 2 being significantly higher for male respondents. Amount of people indifferent to politics is almost the same for men and women, around 150 respondents.

Scatterplot for two continuous variables

The following three graphs analyse the relationship between political self-determination and three different variables: income, age and education. As Trošt and Marinšek (2022) write, the political orientation of Serbs on the right-left scale depends more on age and education, while income has almost no effect.

Although the preliminary analysis of the political scale variable did not reveal any particular tendencies in our data, we decided to include this graph to at least look at the radical leftists and see if we can notice their certain income status, level of education, and age. The same thing accounts for the slightly elevated right tail.

ggplot(serbia) +
  geom_jitter(aes(x = as.numeric(serbia$lrscale), y = as.numeric(serbia$hinctnta)), na.rm = T) +
  theme_bw() +
  geom_density_2d_filled(aes(x = as.numeric(serbia$lrscale), y = as.numeric(serbia$hinctnta), alpha = 0.6), na.rm = T, show.legend = F) +
  theme_bw() +
  labs(x = "Left - Right Scale", y = "Net-Income", 
       title = "Relationship between Political Views and Income", 
       caption = "Yellow - High density, Purple - Low density
       Source: ESS 11 Round") + 
  theme(plot.title = element_text(face = "bold", size = 20, vjust = 0.5), 
        panel.grid.minor = element_line(color = NULL))

In this graph, the X-axis is the left-right scale (where 1 is far left and 11 is far right) and the Y-axis is the net-income scale (1 is the highest income, 10 is the lowest). As we can see, most of the answers are in the upper middle, which means that the majority of the low-income population in Serbia does not have strong political preferences. However, we can pay attention to the left side, where middle-income people have clearly extreme left-wing political preferences. We can also assume the presence of extreme right-wing views among lower-income people if we look at the right side of the graph, but this is a less concentrated group. In summary, we can say that people who define themselves as far left have mostly middle income, but also include people with lower, and higher incomes, while far right people have only middle and low income.

ggplot(serbia) +
  geom_jitter(aes(x = as.numeric(serbia$lrscale), y = as.numeric(serbia$edlvdrs)), na.rm = T) +
  theme_bw() +
  geom_density_2d_filled(aes(x = as.numeric(serbia$lrscale), y = as.numeric(serbia$edlvdrs), alpha = 0.7), na.rm = T, show.legend = F) +
  theme_bw() +
  labs(x = "Left - Right Scale", y = "Levels of Education", 
       title = "Relationship between Political Views and Levels of education", 
       caption = "Yellow - High density, Purple - Low density
       Source: ESS 11 Round") + 
  theme(plot.title = element_text(face = "bold", size = 13, vjust = 0.5), 
        panel.grid.minor = element_line(color = NULL))

This graph shows the relationship between educational attainment and political preferences. As can be seen, Serbian people do not generally define themselves as right or left. However, we can see two tails in the average education (10) that stretch towards extreme right or extreme left, which may suggest that people with average education are normally distributed on the political scale. Higher education is more interesting, because of a left tail, and no right tail. Thus we can say that people with higher education are either centrist or have left-wing views, right-wing views are rare among people with higher education.

ggplot(serbia) +
  geom_jitter(aes(x = as.numeric(serbia$lrscale), y = as.numeric(serbia$agea)), na.rm = T) +
  theme_bw() +
  geom_density_2d_filled(aes(x = as.numeric(serbia$lrscale), y = as.numeric(serbia$agea), alpha = 0.6), na.rm = T, show.legend = F) +
  theme_bw() +
  labs(x = "Left - Right Scale", y = "Age", 
       title = "Relationship between Political Views and Age", 
       caption = "Yellow - High density, Purple - Low density
       Source: ESS 11 Round") + 
  theme(plot.title = element_text(face = "bold", size = 20, vjust = 0.5), 
        panel.grid.minor = element_line(color = NULL))

This graph shows the correlation between age and political preferences. As in the previous graphs, we can see that mostly Serbs do not have strong political preferences. At the same time, we can see the group of responses concentrated at the top left, these are people around 60 years old with extreme left-wing views. This can be explained by Serbia’s communist past, we can assume that these people have retained extreme left-wing views from the time of Yugoslavia.

Conclusion

In this project we have seen the distributions of the variables in our focus, which are connected with political participation. We have built plots depicting the relationships between them in order to investigate the hypothesis we have set based on the existing literature.

Our hypothesis were:

Those who engage in unconventional political participation practices like signing petitions and taking part in demonstrations are young people.
Those with a higher level of education will have a high interest in politics.

The first hypothesis is supported by our data: those who participate in uncoventional political participation practices like signing petitions and participating in demonstrations are mostly young people - those under 50 years old.

The second hypothesis was also backed by our data. While most of the respondents in Serbia are not intersted in politics, the proportion of actively interested in politics people increases with each educational step - from Primary, through Secondary, to Higher education, while those not interested in politics appear less while educational level increases.

Some other findings were:

Serbs do not have strong political preferences - most of them situate themselves in the middle of the right-left scale.
There are more low-educated people among those of the lowest political interest. Moreover, the highest proportion of high-educated people is in the most interested in politics group.
Serbian men are more interested in politics than women are.
People who define themselves as far left have mostly middle income, but also include people with lower, and higher incomes, while far right people have only middle and low income.
People with higher education are either centrist or have left-wing views, right-wing views are rare among people with higher education.
There is a group of people, identifying themselves as radical left, who are mostly old, around 60 years old.

References:

Petrović, Jelisaveta & Stanojevic, Dragan. (2020). Political Activism in Serbia. Südosteuropa. 68. 365-385.
Stanojević, D., Vukelić, J., & Tomašević, A. (2023). Political Participation of Young People in Serbia: Activities, Values, and Capability. In I. Rivers & C. L. Lovin (Eds.), Young People Shaping Democratic Politics: Interrogating Inclusion, Mobilising Education (pp. 31–53). Springer International Publishing.
Tamara Trošt, Denis Marinšek; Social Class and Ethnocentric Worldviews: Assessing the Effect of Socioeconomic Status on Attitudes in Serbia and Croatia. Communist and Post-Communist Studies 1 June 2022; 55 (2): 39–61.
Scholar database, https://www.scholaro.com/db/countries/Serbia/Education-System

Project 2

library(foreign)
setwd("F:/ARRR")
ESS <- read.spss("ESS11.sav", use.value.labels = T, to.data.frame = T)
library(ggplot2)
library(dplyr)
library(DescTools)
library(formattable)
library(psych)
library(ggpubr)
library(car)
library(sjstats)
library(magrittr)
library(knitr)
## install.packages("kableExtra")
library(kableExtra)
library(ggstatsplot)
library(corrplot)
library(sjPlot)
library(rstatix)

serbia <- ESS %>%
  filter(cntry == "Serbia")
serbia$agea <- as.numeric(as.character(serbia$agea))
serbianapol <- serbia[!is.na(serbia$polintr), ]
serbianage <- serbia[!is.na(serbia$agea), ]
serbianalr <- serbia[!is.na(serbia$lrscale), ]
serbiana1 <- serbianage[!is.na(serbianage$polintr), ]
serbiana2 <- serbianage[!is.na(serbia$lrscale), ]

Introduction

In this project we explore political participation, tackling our research question: Which factors are connected with people’s political participation in Serbia? We use ESS 11 data to do so. We will examine four hypotheses connected with political participation and factors which may influence it:

We expect those interested in politics to sign petitions more than those who are not.
We suggest that people participating in an unconventional political practice of signing petitions are those of a younger age.
Those Serbians confident in their ability to participate in politics have spent more time getting education than those who are unconfident.
We expect that young Serbians are more interested in politics than older people.
We assume that age and position on right-left scale are connected, specifically those on the left being older than others.

Three of them (1-4) are based on Petrović & Stanojević, 2020 work, which analyses “characteristics and factors shaping political activism in Serbia”. They distinguish traditional and unconventional, or old and new, types of political activism, unconventional practices being signing petitions, occupating public spaces, and participating in protests and traditional ones being, for example, having membership of political parties and making direct contact with politicians.

The last hypothesis is inspired by one of graphs in the first projects, which depicts age distribution among those having different positions on the right-left political scale.

Chi-square test

serbiachisq <- serbia %>% select(polintr, sgnptit, gndr)

Interest in politics and signing petitions.

Theory: The article (Zorkaya, N. 1999) describes how lack of interest in politics and general apolitical attitude affect general political participation. People who express their interest in politics as “high” or “rather high” are more likely to participate in polls and they are also more likely to vote in elections. Based on the study by (Petrović & Stanojević, 2020), in recent years there is the rise of interest in signing petitions in Serbia - presumably, because of the rise of online petitions and high accessibility of them to citizens. As well as, generally, Serbians are more engaged in so-called unconventional, that is new, forms of political participation, among them is agreement to sign petitions. These conclusions inspired us to analyse the relationship between political interest and signing petitions.

We expect those interested in politics to sign petitions more than those who are not, this is our research hypothesis.

H0: The statistical null hypothesis in our case is that there will be no relationship between the variables “Interest in politics” and “Petition signing”.

We can use a chi-square test on these variables for several reasons 1. The variables contain categorical values - for “political interest” 4 levels, for “petition signatures” 2 levels. Collectively, we have 8 groups of people with a different set of previous variables. These groups contain numbers - the number of people in each variation. Not a percentage 2. There are no fewer than 5 people in any of these groups. 3. Since the survey did not imply multiple choice, the data is mutually exclusive.

chisq.test(serbiachisq$polintr, serbiachisq$sgnptit)

## 
##  Pearson's Chi-squared test
## 
## data:  serbiachisq$polintr and serbiachisq$sgnptit
## X-squared = 64.432, df = 3, p-value = 6.636e-14

polintr_sgnptit <- chisq.test(serbiachisq$polintr, serbiachisq$sgnptit)

As we can see, the p-value is well below the 0.05 threshold. It follows that there is a statistically significant relationship between the variables in this case. Therefore, the null hypothesis was not confirmed, the variables are somehow related. Let’s take a closer look at the expected and observed data.

polintr_sgnptit$expected

##                        serbiachisq$sgnptit
## serbiachisq$polintr           Yes       No
##   Very interested        27.46452 101.5355
##   Quite interested       56.84516 210.1548
##   Hardly interested     130.72258 483.2774
##   Not at all interested 114.96774 425.0323

polintr_sgnptit$observed

##                        serbiachisq$sgnptit
## serbiachisq$polintr     Yes  No
##   Very interested        47  82
##   Quite interested       89 178
##   Hardly interested     125 489
##   Not at all interested  69 471

polintr_sgnptit$stdres

##                        serbiachisq$sgnptit
## serbiachisq$polintr            Yes         No
##   Very interested        4.3882660 -4.3882660
##   Quite interested       5.2836986 -5.2836986
##   Hardly interested     -0.7259897  0.7259897
##   Not at all interested -5.9862695  5.9862695

corrplot(chisq.test(serbiachisq$polintr, serbiachisq$sgnptit)$stdres, is.cor = F, method = 'num')

In order to say that the difference between expected and actual is not pronounced, we would need to have standardized residuals in the range between -2 and 2. On our data, we see the following: 1. For people interested in politics (“Very interested” and “Quite interested”). Petitions are being signed by many more people than expected (the remaining 4.3 and 5.2, respectively). 2. The distribution of people by status of whether a person signs petitions in the “Hardly interested” category almost coincides with what is expected. 3. For people who are not interested in politics, we can see on standardized residuals that they, in turn, sign petitions much less often than expected.

plot_xtab(serbiachisq$polintr, serbiachisq$sgnptit, margin = 'row', bar.pos = 'stack', show.summary = T)

From the graph, we can see that people who are less interested in politics are less likely to sign petitions. Moreover, the less interest there is, the fewer petitioners there are. The null hypothesis has not been confirmed.

T-test. Participation in demonstrations and Age

In our previous project based on the study of (Petrović & Stanojević, 2020) we suggested that people participating in unconventional political practices (such as signing petitions and participation in demonstrations) are those of a younger age. Stating this as a research hypothesis, let’s check whether it is true or not.

As it is seen from the bar plot below, generally, there are more people who do not sign petitions among the respondents from Serbia. We will use signing petitions as a factor variable in the analysis.

ggplot(serbia) +
  geom_bar(aes(x = as.factor(serbia$sgnptit), na.rm = T)) +
  scale_x_discrete(limits = c('Yes', 'No')) +
  labs(title = "Signing Petitions among Serbians", 
       x = "Signed petition last 12 months", y = NULL, 
       caption = "Source: ESS 11 Round") + 
  theme(plot.title = element_text(face = "bold", size = 20, vjust = 0.5), 
        panel.grid.minor = element_line(color = "grey"))

As for the age distribution, though mean and median are almost the same (near 38 years), the mode is too far from them, thus distribution seems to be not normally distributed. There are significant drops of the data, as well as heavy left tail - there are more people of a younger age than of an older - which also implies non-normality. Also, the most frequent respondent is of the age of 57. We will use age as a continuous variable in the analysis.

ggplot(serbia, aes(x = agea)) +
  geom_histogram(binwidth = 1, fill = "grey80", color = "black") +
  geom_vline(xintercept = mean(serbia$agea, na.rm = TRUE), col = 'blue', lwd = 1) +
  geom_vline(xintercept = median(serbia$agea, na.rm = TRUE), col = 'red', lwd = 1) +
  geom_vline(xintercept = Mode(serbia$agea, na.rm = TRUE), col = 'purple', lwd = 1) +
  theme_bw() +
  labs(x = "Age", y = "Frequency", 
       title = "Age distribution in Serbia", 
       caption = "Source: ESS 11 Round") + 
  theme(plot.title = element_text(face = "bold", size = 20, vjust = 0.5), 
        panel.grid.minor = element_line(color = "grey")) +
  xlim(0,100)

Now, we need to check whether people who sign petitions are statistically different in age with those who don’t. We will use t-test to compare two samples of respondents by a continuous variable - age. In this case it will be two sample t-test, which will compare the mean ages of those who sign and those who don’t sign petitions.

First, we check for several assumptions t-test requires data to correspond to:

Observations are independent
Data in both samples should be normally distributed
Equality of variances in both samples

First, observations are assumed to be independent, as they are apparently collected from different respondents.

In order to check whether the data is distributed normally, we use Shapiro test, as well as check it by looking at skew and kurtosis. For a normal (Gaussian) distribution, skew is within +-0.5 from 0 (symmetric), and kurtosis is within +-1 from 0.

By looking at skew and kurtosis of distribution of age, we see that it is a bit positively skewed in the first sample - a bit more data is concentrated in the left part of the graph, and a bit negatively skewed (to the right) in the second sample. While kurtosis in both samples are negative and almost on the threshold - near -1. However, from the analysis of these parameters we can still assume the normality of the data in both samples.

Now we are to conduct Shapiro test in order to confirm or reject the normality of the data. P-values in both tests are low, which means the data is not normally distributed.

describeBy(serbia$agea, group = serbia$sgnptit)

## 
##  Descriptive statistics by group 
## group: Yes
##    vars   n  mean    sd median trimmed   mad min max range skew kurtosis   se
## X1    1 325 46.33 16.67     45   45.92 19.27  17  85    68 0.22    -0.93 0.92
## ------------------------------------------------------------ 
## group: No
##    vars    n  mean    sd median trimmed   mad min max range  skew kurtosis   se
## X1    1 1209 54.11 18.13     56   54.99 20.76  15  90    75 -0.35    -0.85 0.52

shapiro.test(serbia$agea[serbia$sgnptit == "Yes"])

## 
##  Shapiro-Wilk normality test
## 
## data:  serbia$agea[serbia$sgnptit == "Yes"]
## W = 0.97096, p-value = 4.059e-06

shapiro.test(serbia$agea[serbia$sgnptit == "No"])

## 
##  Shapiro-Wilk normality test
## 
## data:  serbia$agea[serbia$sgnptit == "No"]
## W = 0.96388, p-value < 2.2e-16

Let’s check the normality one more time in order to rest assured about the conclusion. We will use Q-Q plot. The data should be located on the (or near) the prediction line in order for us to state it’s normality. On the plots for both samples we clearly see, there are many outliers from the lines.

ggqqplot(serbia, "agea", facet.by = "sgnptit", main = "Q-Q plot. Age of Serbians: Those signing petitions and not")

Now we can confirm abnormality of the data. However, we can still conduct t-test, as if there are many (> 100) observations, it is useful for non-normally distributed data. In our case we have a sample of 330 observations of those who sign petitions and 1223 observations of those who don’t.

Now let’s check the second assumption - the equality of variances - in both (“yes” and “no”) samples. In order to do this we can conduct Levene test. Here p-value is less than 0.05, thus the hypothesis about equality of variances could be rejected. This implies that further we will run Welch t-test which is default for r.

leveneTest(serbia$agea ~ serbia$sgnptit)

## Levene's Test for Homogeneity of Variance (center = median)
##         Df F value  Pr(>F)  
## group    1   3.956 0.04688 *
##       1532                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Non-equality of variances is also seen on boxplots. Although the boxes do not differ in size much, they are a little bit different in spread.

boxplot(serbia$agea ~ serbia$sgnptit, main = "Age by signing petitions")

Now we can move to the t-test itself. As it was already mentioned, we will run Welch t-test, because variances of two samples are not equal.

As usual, our H0 is that there is no difference in means of two samples - those who sign petitions and those who don’t. Whereas H1 is that there is difference in means of two samples.

First, conducting two-sided test, we find p-value to be very low (<0.05), thus, we can state significant difference between the mean ages of those who sign petitions and those who don’t.

Then we can specify and check the direct hypothesis - that the mean age of those who sign petitions is less than of those who don’t. In this one-sided t-test p-value is also very small (<0.05), thus the above-stated alternative hypothesis is true.

We can also see the means of two samples: for those who sign petitions mean age is 32 years and for those who don’t - 40 years.

t.test(serbia$agea ~ serbia$sgnptit)

## 
##  Welch Two Sample t-test
## 
## data:  serbia$agea by serbia$sgnptit
## t = -7.3301, df = 548.13, p-value = 8.295e-13
## alternative hypothesis: true difference in means between group Yes and group No is not equal to 0
## 95 percent confidence interval:
##  -9.865554 -5.695538
## sample estimates:
## mean in group Yes  mean in group No 
##          46.32615          54.10670

t.test(serbia$agea ~ serbia$sgnptit,
       alternative = "less")

## 
##  Welch Two Sample t-test
## 
## data:  serbia$agea by serbia$sgnptit
## t = -7.3301, df = 548.13, p-value = 4.148e-13
## alternative hypothesis: true difference in means between group Yes and group No is less than 0
## 95 percent confidence interval:
##      -Inf -6.03166
## sample estimates:
## mean in group Yes  mean in group No 
##          46.32615          54.10670

Now we will double check the result of t-test, using non-parametric Mann-Whitney-Wilcoxon test, which is usually used for non-normally distributed data in cases when number of observations is < 100 or when data used is ordinal, not continuous. In our case we will use Mann-Whitney test as it is used for independent samples (not paired for which Wilcoxon test can be used).

Running one-sided Mann-Whitney test we can see low p-value (<0.05), which again confirms that those who sign petitions are statistically younger than those who don’t.

wilcox.test(serbia$agea ~ serbia$sgnptit,
       alternative = "less", paired = F)

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  serbia$agea by serbia$sgnptit
## W = 145968, p-value = 5.265e-13
## alternative hypothesis: true location shift is less than 0

In order to see how big is the difference between mean ages of two samples, let’s check Cohen’s d - the effect size.

Cohen’s d = 0.45, that is a small (close to medium) effect size.

cohens_d(serbia, agea ~ sgnptit)

## # A tibble: 1 × 7
##   .y.   group1 group2 effsize    n1    n2 magnitude
## * <chr> <chr>  <chr>    <dbl> <int> <int> <ord>    
## 1 agea  Yes    No      -0.447   325  1209 small

Using code from https://rpsychologist.com/short-r-script-to-plot-effect-sizes-cohens-d-and-shade-overlapping-area site, we can visualize the Cohen’s d effect.

The data is standardized and the more data overlap, the less is the effect size.

ES <- 0.45
mean1 <- ES*1 + 1
x <- seq(1 - 3*1, mean1 + 3*1, .01)
y1 <- dnorm(x, 1, 1)
df1 <- data.frame("x" = x, "y" = y1)
y2 <- dnorm(x, mean1, 1)
df2 <- data.frame("x" = x, "y" = y2)
y.poly <- pmin(y1,y2)
poly <- data.frame("x" = x, "y" = y.poly)

u3 <- 1 - pnorm(1, mean1,1)
u3 <- round(u3,3)
 

ggplot(df1, aes(x,y, color="Those who sign petitions")) +
  geom_line(size=1) + 
  geom_line(data=df2, aes(color="Those who don't sign petitions"),size=1) +
  geom_polygon(aes(color=NULL), data=poly, fill="#BCEE68", alpha=I(4/10),
               show_guide=F) +
  geom_vline(xintercept = 1, linetype="dotted") + 
  geom_vline(xintercept = mean1, linetype="dotted") + 
  labs(title=paste("Age and Petitions - Effect Size Visualisation
      (Cohen's d = ",ES,"; U3 = ",u3,")", sep="")) +
  scale_color_manual("Group", 
           values= c("Those who sign petitions" = "dodgerblue2","Those who don't sign petitions" = "#BCEE68")) +
  theme(plot.title = element_text(face = "bold", size = 18, vjust = 0.5), 
        panel.grid.minor = element_line(color = "grey"))+
  ylab(NULL) + xlab(NULL)

To sum up, we have statistically proved our initial hypothesis that those participating in unconventional political practices such as signing petitions are of a younger age. The effect size for the test is small, but close to median.

ANOVA Tests

For our analysis of variances we will take two pairs of variables:

age and political interest
age and position on left-right scale

We have a hypothesis, based on the Petrović, J., & Stanojević, D. (2020) that young Serbians, are more interested in politics, than older people, so we want to test.

Also, based on our previous investigations we saw, that we can observe the cluster of older people among the far-leftists, we want to test if there is a real significant difference in age depends on political position.

First of all we need to check the assumptions for our test.

ANOVA has the following:

Independence of variables. Age and political interest are independent variables as age and position on left-right scale.
Variable types. ANOVA can test only categorical and continuous variables. For each pair we have categorical and continuous variable: age is numeric, and political interest (1-4) and left-right scale (1-11) are categorical (quasi-interval, but we will treat them as categorical).
Equality of variances.
Normality of distribution.

Last two assumptions we need to test.

First of all we will test normality of distribution of age.

ggplot(serbia, aes(x = agea)) +
  geom_histogram(binwidth = 1, fill = "grey80", color = "black") +
  geom_vline(xintercept = mean(serbia$agea, na.rm = TRUE), col = 'blue', lwd = 1) +
  geom_vline(xintercept = median(serbia$agea, na.rm = TRUE), col = 'red', lwd = 1) +
  geom_vline(xintercept = Mode(serbia$agea, na.rm = TRUE), col = 'purple', lwd = 1) +
  theme_bw() +
  labs(x = "Age", y = "Frequency", 
       title = "Age distribution in Serbia", 
       caption = "Source: ESS 11 Round") + 
  theme(plot.title = element_text(face = "bold", size = 20, vjust = 0.5), 
        panel.grid.minor = element_line(color = "grey")) +
  xlim(0,100)

Here we can see that mean (blue) and the median (red) are very close to each other, but mode (purple) is too far from them, that could say, the data is normally distributed. We will also check the skew and kurtosis of our data:

describeBy(serbia$agea)

##    vars    n  mean    sd median trimmed   mad min max range  skew kurtosis   se
## X1    1 1543 52.47 18.11     53   53.03 22.24  15  90    75 -0.21    -0.97 0.46

Here we can see, that skewness is slightly negative, which means, that data set is skewed to the right, we can see it on the histogram. Kurtosis is also slightly negative, near the -1, that means that we probably have a very few outliers, but it is not less than three, which means that kurtosis is normal.

Now we can test normality using the Shapiro-Wilk test. Here our H0 is that data is normal, the alternative is that it is abnormal.

shapiro.test(x = serbianage$agea)

## 
##  Shapiro-Wilk normality test
## 
## data:  serbianage$agea
## W = 0.96936, p-value < 2.2e-16

ggqqplot(serbia, "agea")

The p-value is very low, it means, that our data is abnormally distributed. It means that we can’t use ANOVA, because it is a parametric test. Instead of it we should use Kruskal-Wallis test.

1st Test. Interest in Politics and Age

ggplot(serbianapol) +
  geom_bar(aes(x = as.factor(serbianapol$polintr), na.rm = TRUE)) +
  labs(x = "1 - Very Interested, 4 - Not at all interested", y = NULL, 
       title = "Interest in politics",
       caption = "Source: ESS 11 Round") + 
  theme(plot.title = element_text(face = "bold", size = 20, vjust = 0.5), 
        panel.grid.minor = element_line(color = "grey"))

Here we can see, that most of people are not interested in politics.

Now we will check the normality of distribution of age for each group.

ggqqplot(serbianapol, "agea", facet.by = "polintr")

On the QQ-plot we can see that most of points are on the prediction line, that mean, that the distribution is close to normal, but we have deviations from normality at the beginning and at the end of the line, that tells us about abnormality.

To be more accurate we can check the normality with comparing mean and median, and look at skewness and kurtosis.

describeBy(serbianapol$agea, serbianapol$polintr, mat = TRUE) %>% 
  select(Education = group1, N = n, Mean = mean, SD = sd, Median = median, Min = min, Max = max, Skew = skew, Kurtosis = kurtosis, st.error = se) %>% 
  kable(align = c("lrrrrrrrrr"), digits = 2, row.names = FALSE,
        caption = "Age by Interest in Politics")

Age by Interest in Politics
Education	N	Mean	SD	Median	Min	Max	Skew	Kurtosis	st.error
Very interested	127	57.74	15.76	58	17	84	-0.54	-0.54	1.40
Quite interested	263	56.60	17.35	58	17	90	-0.39	-0.83	1.07
Hardly interested	611	52.52	17.57	54	17	89	-0.22	-1.01	0.71
Not at all interested	539	49.22	18.90	49	15	90	0.00	-0.99	0.81

Here we can see that differences between means and medians of all groups are very small. Also the skewness and kurtosis are small too for each group. But we know, that data is general is abnormally distributed, that means we can’t say about normal distribution for all groups.

ggplot(serbiana1) +
  geom_boxplot(aes(x = as.factor(serbiana1$polintr), y = serbiana1$agea), na.rm = TRUE) +
  labs(x = "1 - Very Interested, 4 - Not at all interested", y = 'Age', 
       title = "Interest in politics by age",
       caption = "Source: ESS 11 Round") + 
  theme(plot.title = element_text(face = "bold", size = 20, vjust = 0.5), 
        panel.grid.minor = element_line(color = "grey"))

Here on the boxplot we can see, that median age of people, who are interested in politics is above 40, while people who are not interested is lower than 40. It is interesting, because it contradicts our previous hypothesis. We need to check is this difference significant or not.

We will use both ANOVA and Kruskal-Wallis tests, but of course Krukal-Wallis is more accurate here.

Firstly we will do ANOVA. We need to check variances, if they are equal or not. If they are equal we will do classical ANOVA, if not one way test.

leveneTest(serbiana1$agea ~ serbiana1$polintr)

## Levene's Test for Homogeneity of Variance (center = median)
##         Df F value  Pr(>F)  
## group    3   3.577 0.01348 *
##       1536                  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The probability (Pr) here is small and that means that variances are not equal. So we will do oneway ANOVA test.

oneway.test(serbiana1$agea ~ serbiana1$polintr, var.equal = F)

## 
##  One-way analysis of means (not assuming equal variances)
## 
## data:  serbiana1$agea and serbiana1$polintr
## F = 14.803, num df = 3.00, denom df = 481.09, p-value = 3.112e-09

Our H0 was that all group means are equal, so the Ha is that at least one mean is different.

The results of test show us, that p-value is very small, that means that at least one group mean is different. This result can’t be considered as final and totally accurate, because our data is not normal.

We need to check if the residuals are standardized here and do post-hoc tests to explore which group is different.

one.way.anova <- stats::aov(serbiana1$agea ~ serbiana1$polintr)
                       
summary(one.way.anova)

##                     Df Sum Sq Mean Sq F value  Pr(>F)    
## serbiana1$polintr    3  13693    4564   14.28 3.5e-09 ***
## Residuals         1536 490844     320                    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Here we see that difference in groups is significant (p-values < 0.05) even if the residuals are equal, because aov() function assuming it by default.

Now we can plot the residuals.

plot(one.way.anova, 2)

layout(matrix(1:4, 2, 2))
plot(one.way.anova)

Here we see the distribution of residuals. The top-left plot show us the distribution of residuals and its variance. We should see the red line on the zero-axis. Here on the plot we can see that it is not straight line. The same is with the top-right. We must see the straight red line, but it is not straight.

And the final, the QQ-plot. We can see that residuals are not on the prediction line here on the ends.

It means that residuals are abnormally distributed and the results are not totally correct.

We can explore it further.

anova.res <- residuals(object = one.way.anova)
describe(anova.res)

##    vars    n mean    sd median trimmed   mad    min   max range  skew kurtosis
## X1    1 1540    0 17.86   0.78    0.46 21.58 -40.74 40.78 81.52 -0.18    -0.92
##      se
## X1 0.46

Here we can see that mean and median are not the same, it also says about abnormality.

shapiro.test(x = anova.res)

## 
##  Shapiro-Wilk normality test
## 
## data:  anova.res
## W = 0.97436, p-value = 6.115e-16

The p-value is very small, we can conclude that residuals are abnormal. And we can see it on the histogram.

hist(anova.res)

Now we can do post-hoc test to investigate which group is different. We will use pairwise t-test with Bonferroni’s correction, because we have unequal residuals.

pairwise.t.test(serbiana1$agea, serbiana1$polintr, 
                adjust = "bonferroni", pool.sd = TRUE)

## 
##  Pairwise comparisons using t tests with pooled SD 
## 
## data:  serbiana1$agea and serbiana1$polintr 
## 
##                       Very interested Quite interested Hardly interested
## Quite interested      0.5554          -                -                
## Hardly interested     0.0074          0.0074           -                
## Not at all interested 7.5e-06         2.9e-07          0.0074           
## 
## P value adjustment method: holm

Here we can see that statistically significant pairs are 1-3, 1-4, 2-3, 2-4, 3-4.

Now we can measure the effect size.

library(sjstats)
## install.packages("pwr") # may require for this package
anova_stats(one.way.anova) # the name of your ANOVA resulting object

## etasq | partial.etasq | omegasq | partial.omegasq | epsilonsq | cohens.f
## ------------------------------------------------------------------------
## 0.027 |         0.027 |   0.025 |           0.025 |     0.025 |    0.167
##       |               |         |                 |           |         
## 
## etasq |              term |     sumsq |   df |   meansq | statistic | p.value | power
## -------------------------------------------------------------------------------------
## 0.027 | serbiana1$polintr | 13693.060 |    3 | 4564.353 |    14.283 |  < .001 |     1
##       |         Residuals | 4.908e+05 | 1536 |  319.560 |           |         |

The omega-squared is 0.025 that means that our effect is small.

Now, we can do the Kruskall-Wallis non-parametric test and compare the results of both tests.

kruskal.test(agea ~ polintr, data = serbiana1)

## 
##  Kruskal-Wallis rank sum test
## 
## data:  agea by polintr
## Kruskal-Wallis chi-squared = 42.323, df = 3, p-value = 3.426e-09

As we can see, the p-value is very low, it means that groups are different and confirms the results of previous test.

Now we can do post-hoc test.

DunnTest(agea ~ polintr, data = serbiana1)

## 
##  Dunn's test of multiple comparisons using rank sums : holm  
## 
##                                         mean.rank.diff    pval    
## Quite interested-Very interested             -28.92521  0.5472    
## Hardly interested-Very interested           -132.55911  0.0067 ** 
## Not at all interested-Very interested       -210.96284 7.5e-06 ***
## Hardly interested-Quite interested          -103.63390  0.0063 ** 
## Not at all interested-Quite interested      -182.03763 3.1e-07 ***
## Not at all interested-Hardly interested      -78.40373  0.0067 ** 
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Here we can see that 5 pairs have significant differences, that confirms Bonferroni’s correction results.

Now we can visualize our results and interpret it.

## install.packages("ggstatsplot")
library(ggstatsplot)
ggbetweenstats(data = serbiana1,  y = agea, x = polintr, var.equal = F)

As we can see here, based on our previous research, age of people that are interested in politics is higher than age of people who are not interested. To be more detailed we can say, that age of people who choose 3 (hardly interested) and 4 (not interested at all) is less than age of people who choose 1 (very interested) and 2 (quite interested). Also there is significant difference between 2 and 3, and 3 and 4, age is decreasing with decreasing the interest.

Our first hypothesis about the age is denied. The results are inverted, the higher the age, the higher the interest in politics.

2nd Test. Position on the left-right scale and Age

In this block we will use ANOVA to check if there are age differences among people of different positions on the right-left scale.

The research hypothesis is the following: we assume that age and position on right-left scale are connected, specifically those on the left being older than others.

First we decided to group positions on left-right political scale into bigger categories: left, central, and right. It would make the tests easier to conduct and interpret.

serbiana2_gr <- serbiana2 %>% select(lrscale, agea)

serbiana2_gr <- serbiana2_gr[!is.na(serbiana2_gr$lrscale), ]

levels(serbiana2$lrscale)

##  [1] "Left"  "1"     "2"     "3"     "4"     "5"     "6"     "7"     "8"    
## [10] "9"     "Right"

levels(serbiana2_gr$lrscale) <- c("Left", "Left", "Left", "Left", "Central", "Central", "Central", "Right", "Right", "Right", "Right")

levels(serbiana2_gr$lrscale)

## [1] "Left"    "Central" "Right"

Now we can examine the variable, starting with its visualisation.

ggplot(serbia, aes(x = agea)) +
  geom_histogram(binwidth = 1, fill = "grey80", color = "black") +
  geom_vline(xintercept = mean(serbia$agea, na.rm = TRUE), col = 'blue', lwd = 1) +
  geom_vline(xintercept = median(serbia$agea, na.rm = TRUE), col = 'red', lwd = 1) +
  geom_vline(xintercept = Mode(serbia$agea, na.rm = TRUE), col = 'purple', lwd = 1) +
  theme_bw() +
  labs(x = "Age", y = "Frequency", 
       title = "Age distribution in Serbia", 
       caption = "Source: ESS 11 Round") + 
  theme(plot.title = element_text(face = "bold", size = 20, vjust = 0.5), 
        panel.grid.minor = element_line(color = "grey")) +
  xlim(0,100)

Distributions and medians are very similar, but we can see that median in the Left group is slightly bigger. We can run a statistical test to check if the difference is statistically significant.

We want to use ANOVA test. As it was mentioned, there are three conditions we should check before conducting the test: 1) Variables should be independent. 2) The observations should be normally distributed within groups. 3) Variances should be approximately equal. 4) Variables are continuous and categorical.

The variables are indeed independent – there are separate, not intersecting observations. Types of variables are also suitable for the test.

Let’s now check the equality of variances using Levene’s test.

leveneTest(serbiana2_gr$agea ~ serbiana2_gr$lrscale)

## Levene's Test for Homogeneity of Variance (center = median)
##        Df F value Pr(>F)
## group   2   1.761 0.1726
##       698

The null hypothesis for this test is that the variances are equal. We cannot reject it as p-value is 0.17, bigger than statistically significant (0.05). Thus, we assume the equality of variances and we can use regular ANOVA.

The null hypothesis is H0: Means of left, central, and right groups are equal.

The alternative hypothesis is H1: There is at least one group that differs from the other in terms of the mean.

test_lrage <- aov(agea ~ lrscale, data =  serbiana2_gr)

summary(test_lrage)

##              Df Sum Sq Mean Sq F value Pr(>F)
## lrscale       2    937   468.7   1.451  0.235
## Residuals   698 225496   323.1

P-value is 0.235, which is not statistically significant (>0.05). At this stage we cannot reject the null hypothesis which says that the means are equal. As the test does not show signs of differences between the groups, we will not conduct post-hoc tests. But we can check the residuals to delve into the normality of data distribution.

plot(test_lrage, 2)

layout(matrix(1:4, 2, 2))
plot(test_lrage)

anova.res.ess2 <- residuals(object = test_lrage)

shapiro.test(x = anova.res.ess2)

## 
##  Shapiro-Wilk normality test
## 
## data:  anova.res.ess2
## W = 0.97063, p-value = 1.195e-10

hist(anova.res.ess2)

The residuals are not distributed normally: Q-Q plot shows their inclination from the diagonal line, the lines of the two upper graphs are not straigh, Shapiro-test shows p-value smaller than 0.05, and histogram of residuals also shows the shift. Thus, data is not normally distributed. We can conduct non-parametric test to test the hypothesis again. One suitable test for our case is Kruskal-Wallis test.

H0: the medians of all groups are equal.

H1: at least one population median of one group is different from the population median of at least one other group.

kruskal.test(agea ~ lrscale, data = serbiana2_gr)

## 
##  Kruskal-Wallis rank sum test
## 
## data:  agea by lrscale
## Kruskal-Wallis chi-squared = 3.4269, df = 2, p-value = 0.1802

This test gives us a p-value of 0.18, which confirms our previous finding.

ggbetweenstats(data = serbiana2_gr,  y = agea, x = lrscale, var.equal = F)

P-value is too big, the effect size is too small. Thus, we cannot reject null hypotheses and argue that age of people that occupy distinct positions on right-left scale differs in any way. Consequently, difference in age of those on the left can be explained by random data variation.

Conclusion

Initially we had 5 hypotheses:

We expect those interested in politics to sign petitions more than those who are not.
We suggest that people participating in an unconventional political practice of signing petitions are those of a younger age.
We expect that young Serbians are more interested in politics than older people.
We assume that age and position on right-left scale are connected, specifically those on the left being older than others.

Tests on all of them, except the last one, show statistically significant results. It means that we confirm the previous findings of the researchers, whose paper we mentioned, using new data.

Our last test was based on the assumption which we had after making a graph in the first project. We used scatter plot and geom_jitter. It had points, which have the same positions, drifted away from each other, resulting in the misrepresentation of data. It was interesting to use statistics to check this hypothesis and correct the flaw.

List of references

Petrović, J. & Stanojević, D. (2020). Political Activism in Serbia. Comparative Southeast European Studies, 68(3), 365-385. https://doi.org/10.1515/soeu-2020-0027
Зоркая, Н. (1999). Интерес к политике как форма политического участия. Мониторинг общественного мнения: экономические и социальные перемены, (4), 13-20.

Project 3

library(foreign)
setwd("F:/ARRR")
ESS <- read.spss("ESS11.sav", use.value.labels = F, to.data.frame = T)
library(ggplot2)
library(dplyr)
library(DescTools)
library(formattable)
library(psych)
library(ggpubr)
library(car)
library(sjstats)
library(magrittr)
library(knitr)
library(kableExtra)
library(ggstatsplot)
library(corrplot)
library(sjPlot)
library(rstatix)
library(ggcorrplot)
library(GGally)

serbia <- ESS %>%
  filter(cntry == "RS")

serbia1 <- select(serbia, gndr, polintr, agea, hinctnta, stfgov, edlvdrs, stfhlth)


serbia1 <- serbia1 %>% 
  filter(serbia1$edlvdrs < 19)

serbia1$edlvdrs1 <- as.factor(ifelse(serbia1$edlvdrs %in% c(1, 2, 3), "Primary",
                         ifelse(serbia1$edlvdrs %in% c(4, 5, 6, 7, 8, 9, 10), "Secondary", "Higher")))

serbia1$edlvdrs1 <-  factor(serbia1$edlvdrs1, levels = c("Primary",
                                          "Secondary",
                                          "Higher"))

serbia1$gndr <- as.factor(serbia1$gndr)
serbia1$gndr <- dplyr::recode(serbia1$gndr,
                                 "1" = "Male",
                                 "2" = "Female")

serbia1$polintr <- as.factor(serbia1$polintr)
serbia1$polintr <- dplyr::recode(serbia1$polintr,
                                 "1" = "Very",
                                 "2" = "Quite",
                                 "3" = "Hardly",
                                 "4" = "Not at all")
serbia1$polintr <- factor(serbia1$polintr, levels = c("Not at all","Hardly","Quite","Very"))

Introduction

In this project we will explore the following research question:

Which factors can explain one’s satisfaction with government in Serbia?

This topic is connected with our previous one via a general theme of politics. Variables we use as potential factors of influence on government satisfaction (stfgov) are: - agea - age of respondent, calculated - hinctnta - household’s total net income, all sources - stfhlth - satisfaction with Serbian health system - edlvdrs - level education (we have grouped it into three categories of Primary, Secondary, and Higher education, according to Sebian system of education) - gndr - gender - polintr - interest in politics

The first three variables and the outcome one (satisfaction with government) are used as numeric in this project: age is initially continuous, income, satisfaction with health system and satisfaction with government are quasi-interval, which were made interval for the sake of the project. There are not a lot of continuous variables in ESS11 fitting our topic, so this was our solution.

Our main reference this time is a study discovering the main factors affecting trust in government in Norway by Tom Christensen & Per Lægreid.Although we study satisfaction with government, we thought that these notions can be used interchangeably, as the meanings are close. In the authors’ terms, trust in government is understood as the trust in parliament, the cabinet, the civil service, local councils, political parties, and politicians, whereas trust is indicated via people’s satisfaction with specific public services. Their research provides a solid justification for some of our variables, namely: agea (older people generally have more trust in government), hinctnta (those approving of a country’s healthcare tend to approve the government) and edlvdrs (higher educated people are more critical of the government, resulting in less satisfaction level of this).

It was mentioned in a study that there is no relationship between gender and trust in government, but we want to check if there is any association in our data without presuming the direction of the relationship. Interest in politics was added because we were inspired by our previous project. We want to check if it has any association with our outcome variable, as the papers on this pair of variables were not found. We presume a reverse relationship, where the more one is interested in politics, the more they are dissatisfied.

The second source that we use is a dynamic study of family income and satisfaction (both political and in general) in China for several years by (Zheng Yu et al). Their final conclusion is that average family income had positive impacts on satisfaction with government at county level. Following their intuition, we include hinctnta in our own model as well.

Overall, our research hypotheses are as follows:

Оlder people are more satisfied with government than younger people;
Citizens satisfied with healthcare system are more satisfied with governmental system than those who are not;
Higher educated citizens are less satisfied with the governmental system than those with lower educational level;
Citizens with higher household net income are more satisfied with government than those with lower income;
Those more interested in politics have a lower satisfaction with government than those who are less interested.

Variables

Age distribution seems to be not normally distributed with heavy left tail and a lighter right tail - that is, there are more very young people than very old among the respondents. Also, the most frequent respondent is of age 57. Mean and median are denoted by blue and red lines on the plot and constitute almost the same value - 52.5 and 53, respectively.

serbia$agea <- as.numeric(as.character(serbia$agea))

ggplot(serbia, aes(x = agea)) +
  geom_histogram(binwidth = 1, fill = "grey80", color = "black") +
  geom_vline(xintercept = mean(serbia$agea, na.rm = TRUE), col = 'blue', lwd = 1) +
  geom_vline(xintercept = median(serbia$agea, na.rm = TRUE), col = 'red', lwd = 1) +
  geom_vline(xintercept = Mode(serbia$agea), col = 'purple', lwd = 1) +
  theme_bw() +
  labs(x = "Age", y = "Frequency", 
       title = "Age distribution in Serbia", 
       caption = "Source: ESS 11 Round") + 
  theme(plot.title = element_text(face = "bold", size = 20, vjust = 0.5), 
        panel.grid.minor = element_line(color = "grey")) +
  xlim(0,100)

median(as.numeric(serbia$agea), na.rm = T)

## [1] 53

mean(as.numeric(serbia$agea), na.rm = T)

## [1] 52.46598

Mode(as.numeric(serbia$agea), na.rm = T)

## [1] 71
## attr(,"freq")
## [1] 49

sd(as.numeric(serbia$agea), na.rm = T)

## [1] 18.10798

tabll_1 <- data.frame(mean = round(mean(as.numeric(serbia1$agea), na.rm = T), 2), median = median(as.numeric(serbia1$agea), na.rm = T), mode = Mode(as.numeric(serbia1$agea), na.rm = T), SD = round(sd(as.numeric(serbia$agea), na.rm = T), 2))
formattable(tabll_1)

mean	median	mode	SD
52.55	53.5	71	18.11

Next three variables are quasi-interval, which we interpret in a numeric form.

There are more of those who are completely dissatisfied with National Government in Serbia - that is the mode. There is also an outstanding bin in the middle (in the questionnaire denoted by answer “5”), that was probably chosen by those undecided. Mean and median here are near each other with 4.4 and 5, respectively.

ggplot(serbia1) +
  geom_histogram(aes(x = as.numeric(serbia1$stfgov)), na.rm = T, binwidth = 1, color = "black") +
  geom_vline(aes(xintercept = mean(as.numeric(serbia1$stfgov),na.rm = T)), col = 'blue', lwd = 1) +
  geom_vline(aes(xintercept = median(as.numeric(serbia1$stfgov), na.rm = T)), col = 'red', lwd = 1) +
  geom_vline(aes(xintercept = Mode(as.numeric(serbia1$stfgov), na.rm = T)), col = 'purple', lwd = 1) +
  theme_bw() +
  labs(x = "Amount of Satisfaction", y = "frequency", 
       title = "Satisfaction with National Government in Serbia", 
       caption = "Source: ESS 11 Round") + 
  theme(plot.title = element_text(face = "bold", size = 20, vjust = 0.5), 
        panel.grid.minor = element_line(color = "grey"))

tabll_2 <- data.frame(mean = round(mean(as.numeric(serbia1$stfgov), na.rm = T), 2), median = median(as.numeric(serbia1$stfgov), na.rm = T), mode = Mode(as.numeric(serbia1$stfgov), na.rm = T), SD = round(sd(as.numeric(serbia$stfgov), na.rm = T), 2))
formattable(tabll_2)

mean	median	mode	SD
4.44	5	0	3.2

median(as.numeric(serbia$stfgov), na.rm = T)

## [1] 5

mean(as.numeric(serbia$stfgov), na.rm = T)

## [1] 4.424221

Mode(as.numeric(serbia$stfgov), na.rm = T)

## [1] 0
## attr(,"freq")
## [1] 252

sd(as.numeric(serbia$stfgov), na.rm = T)

## [1] 3.201773

As for satisfaction with the health system, here the left part of the distribution seems to be considerably heavier - that is, more respondents are rather dissatisfied with health system in Serbia. However, the most frequent answer is in the middle, as well as in the previous plot here are many supposedly undecided. Mean level of satisfaction is 4.2 and the median - 4.

ggplot(serbia1) +
  geom_histogram(aes(x = as.numeric(serbia1$stfhlth)), na.rm = T, binwidth = 1, color = "black") +
   geom_vline(aes(xintercept = mean(as.numeric(serbia1$stfhlth),na.rm = T)), col = 'blue', lwd = 1) +
  geom_vline(aes(xintercept = median(as.numeric(serbia1$stfhlth), na.rm = T)), col = 'red', lwd = 1) +
  geom_vline(aes(xintercept = Mode(as.numeric(serbia1$stfhlth), na.rm = T)), col = 'purple', lwd = 1) +
  theme_bw() +
  labs(x = "Amount of Satisfaction", y = "frequency", 
       title = "Satisfaction with Health System in Serbia", 
       caption = "Source: ESS 11 Round") + 
  theme(plot.title = element_text(face = "bold", size = 20, vjust = 0.5), 
        panel.grid.minor = element_line(color = "grey"))

tabll_3 <- data.frame(mean = round(mean(as.numeric(serbia1$stfhlth), na.rm = T), 2), median = median(as.numeric(serbia1$stfhlth), na.rm = T), mode = Mode(as.numeric(serbia1$stfhlth), na.rm = T), SD = round(sd(as.numeric(serbia$stfhlth), na.rm = T), 2))
formattable(tabll_3)

mean	median	mode	SD
4.25	4	5	2.85

median(as.numeric(serbia$stfhlth), na.rm = T)

## [1] 4

mean(as.numeric(serbia$stfhlth), na.rm = T)

## [1] 4.24722

Mode(as.numeric(serbia$stfhlth), na.rm = T)

## [1] 5
## attr(,"freq")
## [1] 241

sd(as.numeric(serbia$stfhlth), na.rm = T)

## [1] 2.851076

As the histogram of net-income shows, most of the Serbians are located in 7-th and 6-th deciles which is moderate but closer to high level of income. Followed by the 5-th and 4-th deciles by frequency, which are moderate but closer to low level of income. Also, there are significant amounts of 8-th, 9-th and 10-th deciles, which constitute the rich and the richest population of Serbia. The mean level of net-income is almost 6, as well as median.

ggplot(serbia1) +
  geom_histogram(aes(x = as.numeric(serbia1$hinctnta)), na.rm = T, binwidth = 1, color = "black") +
  geom_vline(aes(xintercept = mean(as.numeric(serbia1$hinctnta), na.rm = T)), col = 'blue', lwd = 1) +
  geom_vline(aes(xintercept = median(as.numeric(serbia1$hinctnta), na.rm = T)), col = 'red', lwd = 1) +
  geom_vline(aes(xintercept = Mode(as.numeric(serbia1$hinctnta), na.rm = T)), col = 'purple', lwd = 1) +
  theme_bw() +
  labs(x = "Income", y = "frequency", 
       title = "Net-Income Distribution in Serbia", 
       caption = "Source: ESS 11 Round") + 
  theme(plot.title = element_text(face = "bold", size = 20, vjust = 0.5), 
        panel.grid.minor = element_line(color = "grey"))

tabll_4 <- data.frame(mean = round(mean(as.numeric(serbia1$hinctnta), na.rm = T), 2), median = median(as.numeric(serbia1$hinctnta), na.rm = T), mode = Mode(as.numeric(serbia1$hinctnta), na.rm = T), SD = round(sd(as.numeric(serbia$hinctnta), na.rm = T), 2))
formattable(tabll_4)

mean	median	mode	SD
5.77	6	7	2.63

median(as.numeric(serbia$hinctnta), na.rm = T)

## [1] 6

mean(as.numeric(serbia$hinctnta), na.rm = T)

## [1] 5.771272

Mode(as.numeric(serbia$hinctnta), na.rm = T)

## [1] 7
## attr(,"freq")
## [1] 141

sd(as.numeric(serbia$hinctnta), na.rm = T)

## [1] 2.631316

Among the respondents from Serbia there are more females than males, as it is quite common thing in surveys.

ggplot(serbia1) +
  geom_bar(aes(x = serbia1$gndr, na.rm = T)) +
  scale_x_discrete(limits = c('Male', 'Female')) +
  labs(title = "Gender of Respondents from Serbia", 
       x = NULL, y = NULL, 
       caption = "Source: ESS 11 Round") + 
  theme(plot.title = element_text(face = "bold", size = 20, vjust = 0.5), 
        panel.grid.minor = element_line(color = "grey"))

Mode(as.factor(serbia1$gndr), na.rm = T)

## [1] Female
## attr(,"freq")
## [1] 824
## Levels: Male Female

According to Serbian system of education, out of initial 18 categories we constructed these three: Primary, Secondary and Higher education. Bar plot of levels of education shows many people with higher and very not many of those with secondary or primary education as the last completed stage.

ggplot(serbia1) +
  geom_bar(aes(x = serbia1$edlvdrs1, na.rm = T)) +
  scale_x_discrete(limits = c('Primary', 'Secondary', 'Higher')) +
  labs(title = "Education Levels of Respondents from Serbia", 
       x = NULL, y = NULL, 
       caption = "Source: ESS 11 Round") + 
  theme(plot.title = element_text(face = "bold", size = 20, vjust = 0.5), 
        panel.grid.minor = element_line(color = "grey"))

Mode(as.factor(serbia1$edlvdrs1), na.rm = T)

## [1] Secondary
## attr(,"freq")
## [1] 884
## Levels: Primary Secondary Higher

As bar plot on interest in politics shows, most Serbian people among respondents are hardly interested in politics or are not interested at all. While lesser amounts are quite interested and even less are very interested in politics.

  ggplot(serbia1) +
    geom_bar(aes(x = serbia1$polintr, na.rm = TRUE)) +
    labs(x = NULL, y = NULL, 
         title = "Interest in Politics",
         caption = "Source: ESS 11 Round") + 
    theme(plot.title = element_text(face = "bold", size = 20, vjust = 0.5), 
          panel.grid.minor = element_line(color = "grey"))

Mode(as.factor(serbia1$polintr), na.rm = T)

## [1] Hardly
## attr(,"freq")
## [1] 616
## Levels: Not at all Hardly Quite Very

Correlation & Boxplot

Correlations

We have constructed a correlation table to see possible colinearities and to determine, which numeric variables to include into our model. We use Spearman’s correlation because our data is too abnormally distributed for parametric correlation.

corr <- cor(serbia1[, c(3, 4, 5, 7)], method = "spearman", use = "complete.obs") # Compute a correlation matrix
p.mat <- cor_pmat(serbia1[, c(3, 4, 5, 7)]) # Compute a matrix of correlation p-values
ggcorrplot(corr, hc.order = TRUE, type = "lower", p.mat = p.mat,
           insig = "blank",
           lab = TRUE,
   ggtheme = ggplot2::theme_gray,
   colors = c("#6D9EC1", "white", "#E46726"))

We can see that both Satisfaction with health system and Age have a positive moderate correlation with our outcome variable - Satisfaction with government.

Correlation with Household income is also present, but it is a weak one: < 0.3. Moreover, there is a moderate correlation between Household income and Age, which can probably be explained by capital accumulation. This possible colinearity and weak correlation with outcome variable led to our decision to not to include Household income as a factor into our models.

Now we can visualise the correlations

ggpairs(serbia1[, c(5, 3)],
        columnLabels = c("Government Satisfaction", "Age"),
        upper = list(continuous = wrap("cor", size = 8, col = "black")))

ggplot(serbia1, aes(agea, stfgov)) +
  geom_jitter() +
  geom_smooth() +
  labs(x = "Age", y = "Government Satisfaction", 
       title = "Government Satisfaction and Age", 
       caption = "Source: ESS 11 Round")

ggpairs(serbia1[, c(5, 7)],
        columnLabels = c("Government Satisfaction", "Health Service Satisfaction"),
        upper = list(continuous = wrap("cor", size = 8, col = "black")))

serbia1_counts <- serbia1 %>%
  count(stfhlth, stfgov)

ggscatter(serbia1_counts, x = "stfhlth", y = "stfgov", 
          size = "n", 
          add = "reg.line",
          cor.coef = TRUE, 
          cor.method = "spearman",
          ylim = c(0, 10),
          alpha = 0.6,
          xlab = "Health system satisfaction", 
          ylab = "Government Satisfaction", 
          title = "Government Satisfaction and Health system satisfaction", 
          caption = "Source: ESS 11 Round") +
  scale_size_continuous(range = c(1, 10)) + 
  theme_minimal()

ggpairs(serbia1[, c(5, 4)],
        columnLabels = c("Government Satisfaction", "Household income"),
        upper = list(continuous = wrap("cor", size = 8, col = "black")))

ggplot(serbia1, aes(hinctnta, stfgov)) +
  geom_jitter(alpha = 0.6) +
  geom_smooth() + 
  labs(x = "Houshold income", y = "Government Satisfaction", 
       title = "Government Satisfaction and Houshold income", 
       caption = "Source: ESS 11 Round")

Boxplots

Now we can look at categorical variables and their connection with Satisfaction with government. We have chosen Level of education, which were grouped by ourselves into three categories of Primary, Secondary, and Higher; Gender, and Political interest.

boxplot(serbia1$stfgov ~ serbia1$edlvdrs1, main = "Satisfaction with government by level of education")

boxplot(serbia1$stfgov ~ serbia1$gndr, main = "Satisfaction with government by gender")

boxplot(serbia1$stfgov ~ serbia1$polintr, main = "Satisfaction with government by interest in politics")

Plots show a, at the first glance, significant difference between satisfaction with government among people with different levels of educations. The median values for the three boxes are significantly different, such that the “Secondary” box’s 3rd quartile border is at the same level as the “Primary” median line, and “Secondary”’s ovn median is at the same level as the “Higher”’s Q3.

Plot with Gender and Interest in politics do not depict potential differences. Both medians and interquantile ranges are the same, hinting that the two groups show no differences in terms of satisfaction with government. This could serve as a solid reason for excluding the variable from the model.

The plot of interest in politics and government satisfaction reveals that the four interest levels have approximately the same median satisfaction levels, however their interquntile range differs. The “Very” (interested) level stands out in terms of IQR, being significantly wider than the other ones. This means that people deeply interested in poltics have more diversed attitudes towards the government than those not that immersed in this sphere of interest.

Despite some of our conclusions made based on plots, we decided to include all of these variables into models to explore their potential explanatory power.

Regression

We have decided to include as predictors the following continuous variables in our models: Age (agea), Satisfaction of Health Service (stfhlth), because they had moderate correlations with our outcome variable - Satisfaction of Government (stfgov). We decided also to include categorical variables as Gender (gndr), Level of Education (edlvdrs1) and Political Interest (polintr).

Model 1

serbia1 <- serbia1[!is.na(serbia1$polintr), ]
m1 <- lm(stfgov ~ agea + stfhlth, data = serbia1)
summary(m1)

## 
## Call:
## lm(formula = stfgov ~ agea + stfhlth, data = serbia1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.9527 -2.0166 -0.0729  1.9193  8.9768 
## 
## Coefficients:
##              Estimate Std. Error t value Pr(>|t|)    
## (Intercept) -0.168819   0.241065   -0.70    0.484    
## agea         0.049667   0.004111   12.08   <2e-16 ***
## stfhlth      0.468017   0.025675   18.23   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.717 on 1402 degrees of freedom
##   (144 пропущенных наблюдений удалены)
## Multiple R-squared:  0.2802, Adjusted R-squared:  0.2792 
## F-statistic: 272.9 on 2 and 1402 DF,  p-value: < 2.2e-16

The first model has two predictors - age and satisfaction with health services, and it explains 0.2802 (R^2) of our data, also it is significant because p-value is below the 0.5 threshold. We can see that the higher age the higher the satisfaction with government, it rises with 0.05 per year, the same we can see with health service satisfaction, it rises with 0.56 per 1 point of scale.

Model 2

m2 <- lm(stfgov ~ agea + edlvdrs1 + stfhlth, data = serbia1)
summary(m2)

## 
## Call:
## lm(formula = stfgov ~ agea + edlvdrs1 + stfhlth, data = serbia1)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -6.620 -1.905 -0.026  1.856  8.376 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        1.319923   0.332468   3.970 7.55e-05 ***
## agea               0.043443   0.004127  10.526  < 2e-16 ***
## edlvdrs1Secondary -0.955330   0.212611  -4.493 7.59e-06 ***
## edlvdrs1Higher    -1.769324   0.238851  -7.408 2.21e-13 ***
## stfhlth            0.439818   0.025467  17.270  < 2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.666 on 1400 degrees of freedom
##   (144 пропущенных наблюдений удалены)
## Multiple R-squared:  0.3083, Adjusted R-squared:  0.3063 
## F-statistic:   156 on 4 and 1400 DF,  p-value: < 2.2e-16

The second model includes levels of education. It is also a significant predictor, we can see that government satisfaction decreases by 1 point per level of education (Primary, Secondary, Higher). Also R^2 is increased, now it is 0.3083.

Model 3

m3 <- lm(stfgov ~ agea + stfhlth + edlvdrs1 + gndr, data = serbia1)
summary(m3)

## 
## Call:
## lm(formula = stfgov ~ agea + stfhlth + edlvdrs1 + gndr, data = serbia1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.7350 -1.9320  0.0043  1.8503  8.2855 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        1.167205   0.350008   3.335 0.000876 ***
## agea               0.043490   0.004126  10.541  < 2e-16 ***
## stfhlth            0.442726   0.025545  17.332  < 2e-16 ***
## edlvdrs1Secondary -0.914347   0.214571  -4.261 2.17e-05 ***
## edlvdrs1Higher    -1.734965   0.240044  -7.228 8.05e-13 ***
## gndrFemale         0.200466   0.144058   1.392 0.164275    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.665 on 1399 degrees of freedom
##   (144 пропущенных наблюдений удалены)
## Multiple R-squared:  0.3093, Adjusted R-squared:  0.3068 
## F-statistic: 125.3 on 5 and 1399 DF,  p-value: < 2.2e-16

In the third model we added gender as a predictor, but as we can see it is non-significant, which was previosly evident by the boxplot.

Model 4

m4 <- lm(stfgov ~ agea + edlvdrs1 + stfhlth + gndr + polintr, data = serbia1)
summary(m4)

## 
## Call:
## lm(formula = stfgov ~ agea + edlvdrs1 + stfhlth + gndr + polintr, 
##     data = serbia1)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.5717 -2.0072 -0.0075  1.8321  8.4989 
## 
## Coefficients:
##                    Estimate Std. Error t value Pr(>|t|)    
## (Intercept)        0.987762   0.356571   2.770  0.00568 ** 
## agea               0.041809   0.004189   9.980  < 2e-16 ***
## edlvdrs1Secondary -0.964273   0.214903  -4.487 7.82e-06 ***
## edlvdrs1Higher    -1.827657   0.242736  -7.529 9.11e-14 ***
## stfhlth            0.438764   0.025536  17.182  < 2e-16 ***
## gndrFemale         0.265115   0.146021   1.816  0.06965 .  
## polintrHardly      0.424833   0.168628   2.519  0.01187 *  
## polintrQuite       0.525443   0.215768   2.435  0.01501 *  
## polintrVery        0.417138   0.280969   1.485  0.13786    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.659 on 1396 degrees of freedom
##   (144 пропущенных наблюдений удалены)
## Multiple R-squared:  0.3135, Adjusted R-squared:  0.3095 
## F-statistic: 79.68 on 8 and 1396 DF,  p-value: < 2.2e-16

In the fourth model we added Political Interest, but it is slightly significant. We see that connection is not linear, people who are quite-interested in politics are more satisfied than people who are very interested. The only strong significant variables are age, satisfaction with health services and level of education. Their coefficients have changed non-significantly from the previous model. But R^2 increases to 0.3135, it means that this model explains about a one-third of our data.

Comparison of models

tab_model(m1, m2, m3, m4, show.ci = F)

	stfgov		stfgov		stfgov		stfgov
Predictors	Estimates	p	Estimates	p	Estimates	p	Estimates	p
(Intercept)	-0.17	0.484	1.32	<0.001	1.17	0.001	0.99	0.006
agea	0.05	<0.001	0.04	<0.001	0.04	<0.001	0.04	<0.001
stfhlth	0.47	<0.001	0.44	<0.001	0.44	<0.001	0.44	<0.001
edlvdrs1 [Secondary]			-0.96	<0.001	-0.91	<0.001	-0.96	<0.001
edlvdrs1 [Higher]			-1.77	<0.001	-1.73	<0.001	-1.83	<0.001
gndr [Female]					0.20	0.164	0.27	0.070
polintr [Hardly]							0.42	0.012
polintr [Quite]							0.53	0.015
polintr [Very]							0.42	0.138
Observations	1405		1405		1405		1405
R² / R² adjusted	0.280 / 0.279		0.308 / 0.306		0.309 / 0.307		0.313 / 0.310

We can see that with adding new variables R and R^2 is increasing, that means that explanatory power of model is growing too. But this effect could be caused by just growing number of variables, to find the best model we need to test it by ANOVA.

plot_models(m1, m2, m3, m4) + scale_y_continuous(limits = c(-1.9, 1.9))

On the plot above we can just see the visualization of comparison of estimates in different models.

Now we can find out which model is better. Number of observations in models should be equal, that’s why we have deleted NA’s in Political Interest variable (polintr), because of them number of observations in last model was higher by 3.

anova(m1, m2, m3, m4)

## Analysis of Variance Table
## 
## Model 1: stfgov ~ agea + stfhlth
## Model 2: stfgov ~ agea + edlvdrs1 + stfhlth
## Model 3: stfgov ~ agea + stfhlth + edlvdrs1 + gndr
## Model 4: stfgov ~ agea + edlvdrs1 + stfhlth + gndr + polintr
##   Res.Df     RSS Df Sum of Sq       F    Pr(>F)    
## 1   1402 10351.5                                   
## 2   1400  9947.7  2    403.82 28.5470 7.063e-13 ***
## 3   1399  9933.9  1     13.75  1.9441   0.16345    
## 4   1396  9873.6  3     60.27  2.8403   0.03677 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Here we can see that the smallest RSS is in model 4, and the p-value shows that the results are statistically significant (0.03), so our 4th model is the best and we would analyse it further.

Model analysis

Now when we’ve chosen the model to continue to work with, it’d be useful to describe it in detail.

The fourth model explains 31.4% of data, and the intercept value is 0.988.

Regarding the predictors: age, education, healthcare are clearly significant with p < 0.001, while political interest has mixed effects with only two it’s values (“Quite” and “Hardly”) being statistically significant.

Let’s look through the significant estimate’s coefficients to get a deeper understanding of the model. If the variable has several levels, which are not hohogenous in terms of statistical signifficance, we only include those levels which are significant accroding to the model.

Age (agea): +0.042 – Each additional year of age increases satisfaction by 0.042 points.
Education (edlvdrs1)

Secondary: -0.964 Higher: -1.828

– People with secondary education are 0.96 points less satisfied than those with primary education. – People with higher education are 1.83 points less satisfied than those with primary education.

Healthcare Satisfaction (stfhlth): +0.439 – A 1-point increase in healthcare satisfaction leads to a 0.44-point increase in government satisfaction.
Gender (gndrFemale): +0.265 – Women are 0.27 points more satisfied than men, but the difference is borderline significant (p = 0.07)
Political Interest (polintr)

“Hardly” Interested: +0.425 “Quite” Interested: +0.525

– People who are “Hardly” politically interested are 0.43 points more satisfied than those “Not at all interested”. – People who are “Quite” politically interested are 0.53 points more satisfied than the reference group.

Visualisation

plot_model(m4, type = "pred")

## $agea

## 
## $edlvdrs1

## 
## $stfhlth

## 
## $gndr

## 
## $polintr

We’ve decided to graph our model to observe the effects of variables visually. These plots reveal some useful tendencies regarding our model:

– The higher the age, the higher the satisfaction with government; – The higher the education, the less one is satisfied with government; – The higher the satisfaction with healthcare system, the higher the satisfaction with government; – Females are genarally more satisfied with government than males,though the difference is insignificant. – Those, who are not all interested in politics are less satisfied with government than those with higher leves of interest. The 3 other levels are positioned approximately on the same level of satisfaction, but those “quite” interested in politics are generally the most satisfied among others.

Regression equasion

\[ stfgov = 0.98 +0.04*agea +0,44*stfhlth -0.96*edlvdrs1 [Secondary] -1.77*edlvdrs1 [Higher] +0.42*polintr [Hardly] +0.52*polintr [Quite] \]

Regression table

tab_model(m4, show.ci = F)

	stfgov
Predictors	Estimates	p
(Intercept)	0.99	0.006
agea	0.04	<0.001
edlvdrs1 [Secondary]	-0.96	<0.001
edlvdrs1 [Higher]	-1.83	<0.001
stfhlth	0.44	<0.001
gndr [Female]	0.27	0.070
polintr [Hardly]	0.42	0.012
polintr [Quite]	0.53	0.015
polintr [Very]	0.42	0.138
Observations	1405
R² / R² adjusted	0.313 / 0.310

Our model explains about a third (0.3135) of our data, and find connection of our outcome variable (stfgov) with 4 significant variables: Age (agea), Level of Education (edlvdrs1), Satisfaction with Health Services (stfhlth) and Interest in politics (polintr, [Quite] and [Quite]). Exept for polintr and [Very], all of their p-values are less than 0.05 threshold.

Level of Education appears to be our best predictor with it’s values being the two with the highiest estimates in the model: edlvdrs1 [Secondary] -0.96, edlvdrs1 [Higher] -1.83. With both estimates being negative, we once again assure that people with higher education levels tend to be less satisfied with government than less educated people.

Conclusion

We had the following research question:

Which factors can explain one’s satisfaction with the government in Serbia?

According to our regression models, the significant factors of influence are: age, level of education, satisfaction with the health services in the country and interest in politics.

We also had four hypotheses: 1) Оlder people are more satisfied with government than younger people; 2) Citizens satisfied with healthcare system are more satisfied with governmental system than those who are not; 3) Higher educated citizens are less satisfied with the governmental system than those with lower educational level; 4) Citizens with higher household net income are more satisfied with the government than those with lower income. 5) Those more interested in politics have a lower satisfaction with government than those who are less interested.

Hypotheses 1, 2, and 3 were supported by our findings.

References:

Christensen, T., & Lægreid, P. (2005). Trust in Government: The Relative Importance of Service Satisfaction, Political Factors, and Demography. Public Performance & Management Review, 28(4), 487–511. http://www.jstor.org/stable/3381308

Z. Yu, W. Bo and L. Shu, “The dynamic relationship between satisfaction with local government, family income, and life satisfaction in China: A 6-year perspective,” 2011 International Conference on Management Science & Engineering 18th Annual Conference Proceedings, Rome, Italy, 2011, pp. 1207-1214

Project 4

library(foreign)
ESS <- read.spss("F:/ARRR/ESS11.sav", use.value.labels = T, to.data.frame = T)
library(ggplot2)
library(dplyr)
library(DescTools)
library(formattable)
library(psych)
library(ggpubr)
library(car)
library(sjstats)
library(magrittr)
library(knitr)
library(kableExtra)
library(ggstatsplot)
library(corrplot)
library(sjPlot)
library(rstatix)
library(ggcorrplot)
library(GGally)

serbia <- ESS %>%
  filter(cntry == "Serbia")

serbia1 <- select(serbia, gndr, agea, stfgov, stfhlth, edlvdrs, rlgdgr, lrscale, trstprl)

serbia1$agea <- as.numeric(serbia1$agea)
serbia1$stfgov <- as.numeric(serbia1$stfgov)
serbia1$edlvdrs <- as.numeric(serbia1$edlvdrs)
serbia1$rlgdgr <- as.numeric(serbia1$rlgdgr)
serbia1$stfhlth <- as.numeric(serbia1$stfhlth)
serbia1$lrscale <- as.numeric(serbia1$lrscale)
serbia1$trstprl <- as.numeric(serbia1$trstprl)

Introduction

For this project we will construct a regression model with political satisfaction as an outcome variable and use the following factors:

age,
level of education,
satisfaction with health,
religiosity.

We will also include an interaction between religiosity and gender, as the existing literature suggests a connection between these factors and satisfaction with government

Variables

Age

ggplot(serbia1) +
  geom_histogram(aes(x = as.numeric(as.character(serbia1$agea))), na.rm = T) +
  geom_vline(aes(xintercept = mean(as.numeric(as.character(serbia1$agea)),na.rm = T)), col = 'blue', lwd = 1) +
  geom_vline(aes(xintercept = median(as.numeric(as.character(serbia1$agea)), na.rm = T)), col = 'red', lwd = 1) +
  geom_vline(aes(xintercept = Mode(as.numeric(as.character(serbia1$agea)), na.rm = T)), col = 'purple', lwd = 1) +
  theme_bw() +
  labs(x = "Age", y = "frequency", 
       title = "Age Distribution in Serbia", 
       caption = "Source: ESS 11 Round") + 
  theme(plot.title = element_text(face = "bold", size = 20, vjust = 0.5), 
        panel.grid.minor = element_line(color = "grey"))

tabll_1 <- data.frame(mean = round(mean(as.numeric(serbia1$agea), na.rm = T), 2), median = median(as.numeric(serbia1$agea), na.rm = T), mode = Mode(as.numeric(serbia1$agea), na.rm = T), sd = round(sd(serbia1$agea, na.rm = T), 2))
formattable(tabll_1)

mean	median	mode	sd
38.47	39	57	18.11

Median age is 39 in Serbia, while the most frequent age is 57.

Satisfaction with Government

ggplot(serbia1) +
  geom_histogram(aes(x = as.numeric(serbia1$stfgov)), na.rm = T, binwidth = 1, color = "black") +
  geom_vline(aes(xintercept = mean(as.numeric(serbia1$stfgov),na.rm = T)), col = 'blue', lwd = 1) +
  geom_vline(aes(xintercept = median(as.numeric(serbia1$stfgov), na.rm = T)), col = 'red', lwd = 1) +
  geom_vline(aes(xintercept = Mode(as.numeric(serbia1$stfgov), na.rm = T)), col = 'purple', lwd = 1) +
  theme_bw() +
  labs(x = "Amount of Satisfaction", y = "frequency", 
       title = "Satisfaction with National Government in Serbia", 
       caption = "Source: ESS 11 Round") + 
  theme(plot.title = element_text(face = "bold", size = 20, vjust = 0.5), 
        panel.grid.minor = element_line(color = "grey"))

tabll_2 <- data.frame(mean = round(mean(as.numeric(serbia1$stfgov), na.rm = T), 2), median = median(as.numeric(serbia1$stfgov), na.rm = T), mode = Mode(as.numeric(serbia1$stfgov), na.rm = T), sd = round(sd(serbia1$stfgov, na.rm = T), 2))
formattable(tabll_2)

mean	median	mode	sd
5.42	6	1	3.2

Satisfaction with government is moderate in Serbia, median is 6, but it is intersting, that mode here is 1, it can sign that there is a huge amount of people, that are dissatisfied with government.

Satisfaction with health system

ggplot(serbia1) +
  geom_histogram(aes(x = as.numeric(serbia1$stfhlth)), na.rm = T, binwidth = 1, color = "black") +
   geom_vline(aes(xintercept = mean(as.numeric(serbia1$stfhlth),na.rm = T)), col = 'blue', lwd = 1) +
  geom_vline(aes(xintercept = median(as.numeric(serbia1$stfhlth), na.rm = T)), col = 'red', lwd = 1) +
  geom_vline(aes(xintercept = Mode(as.numeric(serbia1$stfhlth), na.rm = T)), col = 'purple', lwd = 1) +
  theme_bw() +
  labs(x = "Amount of Satisfaction", y = "frequency", 
       title = "Satisfaction with Health System in Serbia", 
       caption = "Source: ESS 11 Round") + 
  theme(plot.title = element_text(face = "bold", size = 20, vjust = 0.5), 
        panel.grid.minor = element_line(color = "grey"))

tabll_3 <- data.frame(mean = round(mean(as.numeric(serbia1$stfhlth), na.rm = T), 2), median = median(as.numeric(serbia1$stfhlth), na.rm = T), mode = Mode(as.numeric(serbia1$stfhlth), na.rm = T), sd = round(sd(serbia1$stfhlth, na.rm = T), 2))
formattable(tabll_3)

mean	median	mode	sd
5.25	5	6	2.85

People are not so satisfied with health system, median values is lower than a half (6).

Religiosity

ggplot(serbia1) +
  geom_histogram(aes(x = as.numeric(serbia1$rlgdgr)), na.rm = T, binwidth = 1, color = "black") +
   geom_vline(aes(xintercept = mean(as.numeric(serbia1$rlgdgr),na.rm = T)), col = 'blue', lwd = 1) +
  geom_vline(aes(xintercept = median(as.numeric(serbia1$rlgdgr), na.rm = T)), col = 'red', lwd = 1) +
  geom_vline(aes(xintercept = Mode(as.numeric(serbia1$rlgdgr), na.rm = T)), col = 'purple', lwd = 1) +
  theme_bw() +
  labs(x = "Religiosity", y = "frequency", 
       title = "Religiosity of people in Serbia", 
       caption = "Source: ESS 11 Round") + 
  theme(plot.title = element_text(face = "bold", size = 20, vjust = 0.5), 
        panel.grid.minor = element_line(color = "grey"))

tabll_5 <- data.frame(mean = round(mean(as.numeric(serbia1$rlgdgr), na.rm = T), 2), median = median(as.numeric(serbia1$rlgdgr), na.rm = T), mode = Mode(as.numeric(serbia1$rlgdgr), na.rm = T), sd = round(sd(serbia1$rlgdgr, na.rm = T), 2))
formattable(tabll_5)

mean	median	mode	sd
7.23	8	6	2.85

We can see that median Serbian citizen is likely religious, but also there is left atheist peak as we can see.

Trust in parliament

ggplot(serbia1) +
  geom_histogram(aes(x = as.numeric(serbia1$trstprl)), na.rm = T, binwidth = 1, color = "black") +
   geom_vline(aes(xintercept = mean(as.numeric(serbia1$trstprl),na.rm = T)), col = 'blue', lwd = 1) +
  geom_vline(aes(xintercept = median(as.numeric(serbia1$trstprl), na.rm = T)), col = 'red', lwd = 1) +
  geom_vline(aes(xintercept = Mode(as.numeric(serbia1$trstprl), na.rm = T)), col = 'purple', lwd = 1) +
  theme_bw() +
  labs(x = "Trust in parliament", y = "frequency", 
       title = "Trustment in parliament in Serbia", 
       caption = "Source: ESS 11 Round") + 
  theme(plot.title = element_text(face = "bold", size = 20, vjust = 0.5), 
        panel.grid.minor = element_line(color = "grey"))

tabll_6 <- data.frame(mean = round(mean(as.numeric(serbia1$trstprl), na.rm = T), 2), median = median(as.numeric(serbia1$trstprl), na.rm = T), mode = Mode(as.numeric(serbia1$trstprl), na.rm = T), sd = round(sd(serbia1$trstprl, na.rm = T), 2))
formattable(tabll_6)

mean	median	mode	sd
5.2	6	1	3.1

We can see that most of people can’t say they more trust or not, but there is a left peak, we can suggest this is people who don’t like current Serbian government and support opposition.

Gender

ggplot(serbia1) +
  geom_bar(aes(x = serbia1$gndr, na.rm = T)) +
  scale_x_discrete(limits = c('Male', 'Female')) +
  labs(title = "Gender of Respondents from Serbia", 
       x = NULL, y = NULL, 
       caption = "Source: ESS 11 Round") + 
  theme(plot.title = element_text(face = "bold", size = 20, vjust = 0.5), 
        panel.grid.minor = element_line(color = "grey"))

tab1 <- data.frame(Mode = Mode(serbia1$gndr, na.rm = T))
formattable(tab1)

Mode
Female

There are more females in Serbia than males.

Education levels

serbia1$edlvdrs1 <- as.factor(ifelse(serbia1$edlvdrs %in% c(1, 2, 3), "Primary",
                         ifelse(serbia1$edlvdrs %in% c(4, 5, 6, 7, 8, 9, 10), "Secondary", "Higher")))

serbia1$edlvdrs1 <-  factor(serbia1$edlvdrs1, levels = c("Primary",
                                          "Secondary",
                                          "Higher"))
ggplot(serbia1) +
  geom_bar(aes(x = serbia1$edlvdrs1, na.rm = T)) +
  scale_x_discrete(limits = c('Primary', 'Secondary', 'Higher')) +
  labs(title = "Education Levels of Respondents from Serbia", 
       x = NULL, y = NULL, 
       caption = "Source: ESS 11 Round") + 
  theme(plot.title = element_text(face = "bold", size = 20, vjust = 0.5), 
        panel.grid.minor = element_line(color = "grey"))

tab2 <- data.frame(Mode = Mode(serbia1$edlvdrs1, na.rm = T))
formattable(tab2)

Mode
Secondary

As we can see, more than a half of people have secondary and higher education.

Political orientation on left-right scale

ggplot(serbia1) +
  geom_bar(aes(x = as.numeric(serbia1$lrscale), na.rm = T)) +
  scale_x_discrete(limits = c('1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11')) +
  labs(x = "1 - Far Left, 11 - Far Right", y = NULL, 
       title = "Political Orientation Scale",
       caption = "Source: ESS 11 Round") + 
  theme(plot.title = element_text(face = "bold", size = 20, vjust = 0.5), 
        panel.grid.minor = element_line(color = "grey"))

tabll_4 <- data.frame(mean = round(mean(as.numeric(serbia1$lrscale), na.rm = T), 2), median = median(as.numeric(serbia1$lrscale), na.rm = T), mode = Mode(as.numeric(serbia1$lrscale), na.rm = T), sd = round(sd(as.numeric(serbia1$lrscale), na.rm = T), 2))
formattable(tabll_4)

mean	median	mode	sd
5.66	6	6	2.56

We can see that most of all people are centrists, and prefer not to choose between left and right. Some radicals are also there, on left and right sides.

Interaction model

For our interaction model we want to test the relationship between satisfaction with government and religiosity, moderated by gender.

In the article by Chloe Vaughn (2022) it is stated that political trust is correlated with religious involvement, the more person is apt to participate in religious activities, the higher their rates for political trust.

Except the political trust we will take government satisfaction as a dependent variable in our model. These two are intuitively about the same thing, but for the scientific accuracy let’s check if they are correlated enough.

serbia1 <- serbia1 %>% 
  filter(serbia1$edlvdrs < 19)
corr <- cor(serbia1[, c(2:8)], method = "spearman", use = "complete.obs") 
p.mat <- cor_pmat(serbia1[, c(2:8)])
ggcorrplot(corr, hc.order = TRUE, type = "lower", p.mat = p.mat,
           insig = "blank",
           lab = TRUE,
   ggtheme = ggplot2::theme_gray,
   colors = c("#6D9EC1", "white", "#E46726"))

In the ESS dataset the closest variable to political trust is trust in parliament. As we can see from the correlation plot below, correlation coefficient for trust in parliament and government satisfaction is 0.75, which is a very high result. It implies multicollinearity of variables, thus we can apply above-mentioned theory to government satisfaction.

Study made by Julia Zinkina, Marina Butovskaya, Sergey Shulgin, Andrey Korotayev (2024) provides a theoretical evidence for interaction between religiosity and gender. Utilising the data from World Values Survey, researchers find that women tend to attribute higher value to religiosity than men almost in all countries (except the Middle East and North Africa).

Therefore, our hypothesis based on previous findings is that religiosity will correlate positively with satisfaction with government and the effect will be stronger for females as in general they are more religious.

serbia1 <- serbia1 %>% 
  filter(serbia1$edlvdrs < 19)

serbia1$edlvdrs1 <- as.factor(ifelse(serbia1$edlvdrs %in% c(1, 2, 3), "Primary",
                         ifelse(serbia1$edlvdrs %in% c(4, 5, 6, 7, 8, 9, 10), "Secondary", "Higher")))

serbia1$edlvdrs1 <-  factor(serbia1$edlvdrs1, levels = c("Primary",
                                          "Secondary",
                                          "Higher"))

serbia1$lrscale <- as.factor(serbia1$lrscale)
levels(serbia1$lrscale) <- c("Left", "Left", "Left", "Left", "Central", "Central", "Central", "Right", "Right", "Right", "Right")

m1 <- lm(stfgov ~ agea + edlvdrs1 + stfhlth + rlgdgr + gndr, data = serbia1)
m2 <- lm(stfgov ~ agea + edlvdrs1 + stfhlth + rlgdgr * gndr, data = serbia1)
tab_model(m1, m2, show.ci = F)

	stfgov		stfgov
Predictors	Estimates	p	Estimates	p
(Intercept)	1.10	0.003	1.46	<0.001
agea	0.04	<0.001	0.04	<0.001
edlvdrs1 [Secondary]	-0.83	<0.001	-0.81	<0.001
edlvdrs1 [Higher]	-1.56	<0.001	-1.54	<0.001
stfhlth	0.43	<0.001	0.43	<0.001
rlgdgr	0.17	<0.001	0.12	0.001
gndr [Female]	0.09	0.520	-0.62	0.107
rlgdgr × gndr [Female]			0.10	0.046
Observations	1393		1393
R² / R² adjusted	0.328 / 0.325		0.330 / 0.327

We can see, that there is a significant predictors in both models: the higher the age, satisfaction with health and religiosity (in first model) the higher the satisfaction with government. But in the second model with interaction religiosity became insignificant, while gender starts non significantly influence negatively on satisfaction with government - females has lower satisfaction than males. Also we can see that the higher the education, the less people are satisfied with their government, it works for both models. This models explains about a 1/3 of whole data (0.32 and 0.33 1 and 2 models respectfully). In the second model we can see significant interaction between religiosity and gender, we will explore it later.

anova(m1, m2)

## Analysis of Variance Table
## 
## Model 1: stfgov ~ agea + edlvdrs1 + stfhlth + rlgdgr + gndr
## Model 2: stfgov ~ agea + edlvdrs1 + stfhlth + rlgdgr * gndr
##   Res.Df    RSS Df Sum of Sq      F  Pr(>F)  
## 1   1386 9547.0                              
## 2   1385 9519.7  1    27.294 3.9709 0.04649 *
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

According to ANOVA second model is significantly better then the first, because it has less RSS and p-values is less than threshold (0.05). Also R^2 of 2nd model is higher.

Then we can explore significant interaction between gender and religiosity.

plot_model(m2, type = "pred", terms = c("rlgdgr", "gndr"), show.data = F)

As we can see religiosity has different influence on satisfaction with government, depending on gender: non-religious females are less satisfied with government, than males, but more religious females are more satisfied with government, than religious males. For males we can’t see significant growth of satisfaction with in creasing religiosity, while females has it.

Conclusion

We find, that religiosity is a good predictor for satisfaction with government. Also our hypothesis that religiosity will correlate positively with satisfaction with government and the effect will be stronger for females as in general they are more religious was approved.

References

Zinkina J., Butovskaya M., Shulgin S., and Korotayev A. Global Evolutionary Perspectives on Gender Differences in Religiosity, Family, Politics and Pro-Social Values Based on the Data from the World Values Survey. Social Evolution & History, Vol. 23 No. 1, March 2024, pp. 76–105.

Chloe Vaughn. (2022). Faith and Trust: Religion’s Impact on Political Trust. Aletheia, Vol. 7(2).

List of all refernces

List of references

Petrović, Jelisaveta & Stanojevic, Dragan. (2020). Political Activism in Serbia. Südosteuropa. 68. 365-385.
Stanojević, D., Vukelić, J., & Tomašević, A. (2023). Political Participation of Young People in Serbia: Activities, Values, and Capability. In I. Rivers & C. L. Lovin (Eds.), Young People Shaping Democratic Politics: Interrogating Inclusion, Mobilising Education (pp. 31–53). Springer International Publishing.
Tamara Trošt, Denis Marinšek; Social Class and Ethnocentric Worldviews: Assessing the Effect of Socioeconomic Status on Attitudes in Serbia and Croatia. Communist and Post-Communist Studies 1 June 2022; 55 (2): 39–61.
Scholar database, https://www.scholaro.com/db/countries/Serbia/Education-System
Christensen, T., & Lægreid, P. (2005). Trust in Government: The Relative Importance of Service Satisfaction, Political Factors, and Demography. Public Performance & Management Review, 28(4), 487–511. http://www.jstor.org/stable/3381308
Z. Yu, W. Bo and L. Shu, “The dynamic relationship between satisfaction with local government, family income, and life satisfaction in China: A 6-year perspective,” 2011 International Conference on Management Science & Engineering 18th Annual Conference Proceedings, Rome, Italy, 2011, pp. 1207-1214
Зоркая, Н. (1999). Интерес к политике как форма политического участия. Мониторинг общественного мнения: экономические и социальные перемены, (4), 13-20.

Final Project

Research Team

10 06 2025

Team members and contribution

Project 1

Project 2

Project 3

Project 4 (final)

Project 1

Introduction

Types of variables

Desciptive statistics

Petitions

Gender

Demonstrations

Interest in politics

Age

Left - Right Political Orientation

Net-Income

Boxplot for age and petitions

Boxplot for age and demonstrations

Stacked bar plot

Scatterplot for two continuous variables

Conclusion

References:

Project 2

Introduction

Chi-square test

T-test. Participation in demonstrations and Age

ANOVA Tests

1st Test. Interest in Politics and Age

2nd Test. Position on the left-right scale and Age

Conclusion

List of references

Project 3

Introduction

Variables

Correlation & Boxplot

Correlations

Boxplots

Regression

Model 1

Model 2

Model 3

Model 4

Comparison of models

Model analysis

Visualisation

Regression equasion

Regression table

Conclusion

References:

Project 4

Introduction

Variables

Age

Satisfaction with Government

Satisfaction with health system

Religiosity

Trust in parliament

Gender

Education levels

Political orientation on left-right scale

Interaction model

Conclusion

References

List of all refernces

List of references