Miya Vakrusheva — Checking if variable types are identified correctly and R data types are matching them, Providing a histogram, Providing a barplot, Interpretation
Artyom Shobyrev — Providing a table with descriptive statistics (mean, median, etc.), Providing a scatterplot, Interpretation, Knitting
Ivan Piatykh — Providing a boxplot, Interpretation
Svetlana Mokrozub — Providing a stacked boxplot, Interpretation
Dana Levkovskaya — Research question
All of us — Theoretical framework
Dana Levkovskaya — Pearson’s chi-square test
Svetlana Mokrozub — ANOVA, knitting html file
Ivan Piatykh — T-tests
Artyom Shobyrev — ANOVA
Miya Vakrusheva — T-testa
All of us — theoretical framework
Dana Levkovskaya — Theoretical framework, Search of literature, Project discussion
Svetlana Mokrozub — Correlations interpretations, Boxplots, Choice of variables, Draft of correlations and regression models
Ivan Piatykh — Theoretical framework, Search of literature
Artyom Shobyrev — Correlations, Regressions and interpretations, Project outlines discussion, Knitting
Miya Vakrusheva — Descriptive statistics, Project outlines discussion
Dana Levkovskaya — project 4 theory, variables in project 4
Svetlana Mokrozub — project 4 theory, project 1 correction
Ivan Piatykh — correction of the Project 3, project 4 theory
Artyom Shobyrev — project 4 regression models, interpretation, knitting
Miya Vakrusheva — project 4 theory and hypothesis, project 2 correction
library(foreign)
library(ggplot2)
library(dplyr)
library(DescTools)
library(formattable)
library(DescTools)
library(knitr)
setwd("F:/ARRR")
ESS <- read.spss("ESS11.sav", use.value.labels = T, to.data.frame = T)
serbia <- ESS %>%
filter(cntry == "Serbia") %>%
select(agea, sgnptit, pbldmna, polintr, lrscale, hinctnta, edlvdrs, gndr)
Our project covers political participation in Serbia. We examine several factors and their potential influence on an individual’s interest and activism in politics.
The existing studies examining the same topic have helped us to build a solid ground for our own research. Among the more recent works on the issue, the one conducted by (Stanojević, Vukelić & Tomašević, 2023) particulary stands out. The researchers questioned the factors influencing the levels of young Serbian’s political activism, in both conventional and unconventional forms. A special emphasis was made on human values and institutional trust. In addition to data analysis, Stanojević et al provide the necessary economical and social context of Serbia. The overview of the local context was especially handy as we were not familiar with the countries’ specifics.
The other study by (Petrović & Stanojević, 2020) explore the correlation between age and various forms of political participation.
The analysis revealed that traditional forms of political activism (e.g. membership of political parties, making direct contact with politicians) are more popular among the elderly people, whereas newer ones (such as signing petitions) are mostly performed by younger people. There are several forms of political participation in the ESS dataset, so following Petrović and Stanojević we include them in our research as well.
Our research question is based on the findings described previously, however we include several variables which were not examined in the existing studies.
Thus, the research question in its broad form we have formulated for this small research is How are individual’s characteristics related to their political participation? Specifically, we look closer into how one’s gender, level of education, income, age are connected with their political participation. We understand political participation broadly, both as one’s involvement in politics in active forms of signing petitions, taking part in demonstrations, and as a presence of a strong political opinion and self-placement on the political scale.
Hypotheses we have gathered from the readings are the following:
Those who engage in unconventional political participation practices like signing petitions and taking part in demonstrations are young people.
Those with a higher level of education will have a high interest in politics.
The last graphs we have included cover political participation in the broader sense we have mentioned earlier. We explore age and income connection with one’s placement on the political scale. There is no specific literature on this topic we have found or hypotheses made, but we still decided to freely explore this topic.
In this project we have chosen and will further explore the variables which concern the following factors: age, petition signing, taking part in public demonstrations, interest in politics, placement on political scale, household income, and level of education.
We can explore the types of variables. Well, R stores all of them as factors. So let’s recode them.
str(serbia)
## 'data.frame': 1563 obs. of 8 variables:
## $ agea : Factor w/ 76 levels "15","16","17",..: 33 48 32 7 46 36 39 5 21 50 ...
## $ sgnptit : Factor w/ 2 levels "Yes","No": 1 1 2 2 2 2 1 2 2 2 ...
## $ pbldmna : Factor w/ 2 levels "Yes","No": 2 2 2 NA 2 2 1 2 2 2 ...
## $ polintr : Factor w/ 4 levels "Very interested",..: 2 3 3 3 4 4 2 3 4 4 ...
## $ lrscale : Factor w/ 11 levels "Left","1","2",..: NA 5 6 NA NA 6 7 6 NA 6 ...
## $ hinctnta: Factor w/ 10 levels "J - 1st decile",..: NA 6 NA NA NA NA 10 2 NA NA ...
## $ edlvdrs : Factor w/ 19 levels "Nikada nije išao/la u škola, nedovršena osnovna škola, manje od 4 razreda",..: 7 7 12 9 7 7 12 7 9 16 ...
## $ gndr : Factor w/ 2 levels "Male","Female": 1 1 2 1 2 2 1 2 2 1 ...
## - attr(*, "variable.labels")= Named chr [1:640] "Title of dataset" "ESS round" "Edition" "Production date" ...
## ..- attr(*, "names")= chr [1:640] "name" "essround" "edition" "proddate" ...
## - attr(*, "codepage")= int 65001
serbia$agea <- as.numeric(as.character(serbia$agea))
serbia$hinctnta <- as.numeric(serbia$hinctnta)
serbia$lrscale <- as.numeric(serbia$lrscale)
str(serbia)
## 'data.frame': 1563 obs. of 8 variables:
## $ agea : num 47 62 46 21 60 50 53 19 35 64 ...
## $ sgnptit : Factor w/ 2 levels "Yes","No": 1 1 2 2 2 2 1 2 2 2 ...
## $ pbldmna : Factor w/ 2 levels "Yes","No": 2 2 2 NA 2 2 1 2 2 2 ...
## $ polintr : Factor w/ 4 levels "Very interested",..: 2 3 3 3 4 4 2 3 4 4 ...
## $ lrscale : num NA 5 6 NA NA 6 7 6 NA 6 ...
## $ hinctnta: num NA 6 NA NA NA NA 10 2 NA NA ...
## $ edlvdrs : Factor w/ 19 levels "Nikada nije išao/la u škola, nedovršena osnovna škola, manje od 4 razreda",..: 7 7 12 9 7 7 12 7 9 16 ...
## $ gndr : Factor w/ 2 levels "Male","Female": 1 1 2 1 2 2 1 2 2 1 ...
## - attr(*, "variable.labels")= Named chr [1:640] "Title of dataset" "ESS round" "Edition" "Production date" ...
## ..- attr(*, "names")= chr [1:640] "name" "essround" "edition" "proddate" ...
## - attr(*, "codepage")= int 65001
serbia <- na.omit(serbia)
For petition signing and other categorical variables only the mode can be counted.
mode_pet <- Mode(serbia$sgnptit)
stats_pet <- data.frame(
Statistic = c("Mode"),
Value = c(mode_pet)
)
kable(stats_pet,
caption = "Descriptive Statistics for Petition Signing in Serbia",
digits = 2,
align = "lc"
)
Statistic | Value |
---|---|
Mode | No |
We see that the most common answer is No, most of the Serbians have not signed petitions during the 12 month prior to the survey.
ggplot(serbia) +
geom_bar(aes(x = serbia$sgnptit), na.rm = T) +
scale_x_discrete(limits = c('Yes', 'No')) +
labs(x = "Signed petition last 12 months", y = NULL,
caption = "Source: ESS 11 Round") +
theme(plot.title = element_text(face = "bold", size = 20, vjust = 0.5),
panel.grid.minor = element_line(color = "grey"))
As it is seen out of the bar plot, almost 3/4 of repondents in Serbia did not sign petitions in the last 12 months, while approximately 1/4 did.
mode_gnd <- Mode(serbia$gndr)
stats_gnd <- data.frame(
Statistic = c("Mode"),
Value = c(mode_gnd)
)
kable(stats_gnd,
caption = "Descriptive Statistics for Gender in Serbia",
digits = 2,
align = "lc"
)
Statistic | Value |
---|---|
Mode | Male |
We see that the most of the respondents who took the survey in Serbia are men.
ggplot(serbia) +
geom_bar(aes(x = serbia$gndr)) +
labs(x = NULL, y = NULL,
title = "Gender of respondents",
caption = "Source: ESS 11 Round") +
theme(plot.title = element_text(face = "bold", size = 20, vjust = 0.5),
panel.grid.minor = element_line(color = "grey"))
The difference between the number of males and females among the respondents is approximately 50.
mode_dem <- Mode(serbia$pbldmna)
stats_dem <- data.frame(
Statistic = c("Mode"),
Value = c(mode_dem)
)
kable(stats_dem,
caption = "Descriptive Statistics for Demonstrations in Serbia",
digits = 2,
align = "lc"
)
Statistic | Value |
---|---|
Mode | No |
Most of the respondents in Serbia have not participated in demonstrations during the suggested time.
serbiana <- serbia[!is.na(serbia$pbldmna), ]
ggplot(serbiana) +
geom_bar(aes(x = as.factor(serbiana$pbldmna), na.rm = T)) +
labs(x = NULL, y = NULL,
title = "Have you participated in demonstrations?",
caption = "Source: ESS 11 Round") +
theme(plot.title = element_text(face = "bold", size = 20, vjust = 0.5),
panel.grid.minor = element_line(color = "grey"))
The difference is quite large, being almost 550 people. It is larger than the one on the petitions graph, which may suggest that these two forms of political participation are percieved and performed differently.
mode_int <- Mode(serbia$polintr)
stats_int <- data.frame(
Statistic = c("Mode"),
Value = c(mode_int)
)
kable(stats_int,
caption = "Descriptive Statistics for Demonstrations in Serbia",
digits = 2,
align = "lc"
)
Statistic | Value |
---|---|
Mode | Hardly interested |
Most of the respondents are hardly interested in politics.
serbiana <- serbia[!is.na(serbia$polintr), ]
ggplot(serbiana) +
geom_bar(aes(x = as.factor(serbiana$polintr), na.rm = T)) +
labs(x = NULL, y = NULL,
title = "Interest in politics",
caption = "Source: ESS 11 Round") +
theme(plot.title = element_text(face = "bold", size = 20, vjust = 0.5),
panel.grid.minor = element_line(color = "grey"))
As bar plot on interest in politics shows, most people in Serbia are hardly interested in politics or are not interested at all. While lesser amounts are quite interested and even less are very interested in politics.
mean_age <- mean(serbia$agea)
median_age <- median(serbia$agea)
mode_age <- Mode(serbia$agea)
variance_age <- var(serbia$agea)
std_dev_age <- sd(serbia$agea)
stats_age <- data.frame(
Statistic = c("Mean", "Median", "Mode", "Variance", "Standard Deviation"),
Value = c(mean_age, median_age, mode_age, variance_age, std_dev_age)
)
kable(stats_age,
caption = "Descriptive Statistics for Age in Serbia",
digits = 2,
align = "lc"
)
Statistic | Value |
---|---|
Mean | 53.16 |
Median | 54.50 |
Mode | 71.00 |
Variance | 315.47 |
Standard Deviation | 17.76 |
Mean and median ages in Serbia are 53 and 55 respectively - very close to one another. Mode is far away from them, the most frequent respondent’s age being 70 y.o. Variance is 315. Standard deviation is 18, which reflects a large amount of variation in the age of respondents in Serbia - the data is not concentrated around the mean.
ggplot(serbia, aes(x = agea)) +
geom_histogram(binwidth = 1, fill = "grey80", color = "black") +
geom_vline(aes(xintercept = mean(serbia$agea, na.rm = TRUE), color = "Mean"), lwd = 1) +
geom_vline(aes(xintercept = median(serbia$agea, na.rm = TRUE), color = "Median"), lwd = 1) +
geom_vline(aes(xintercept = Mode(serbia$agea), color = "Mode"), lwd = 1) +
scale_color_manual(name = "Statistics", values = c("Mean" = "blue", "Median" = "red", "Mode" = "purple")) +
theme_bw() +
labs(x = "Age", y = "Frequency",
title = "Age distribution in Serbia",
caption = "Source: ESS 11 Round") +
theme(plot.title = element_text(face = "bold", size = 20, vjust = 0.5),
panel.grid.minor = element_line(color = "grey")) +
xlim(0, 100)
Distribution of age in Serbia is characterised with almost the same mean (blue line – 38,5 years) and median (red line – 39 years). The most frequent age is 57 years – purple line. There are high values of age on the right, which are compensated by elongated left tail - this is why mean and median are located in the center. It can explain the fact that the mean and median are located very close to each other, but the mode is quite far away from them.
The distribution does not look like a bell shape. There are certain “gaps” in the distribution of ages - as one of our teammates suggested, they could be explained by war in Yugoslavia.
mean_lr <- mean(serbia$lrscale)
median_lr <- median(serbia$lrscale)
mode_lr <- Mode(serbia$lrscale)
variance_lr <- var(serbia$lrscale)
std_dev_lr <- sd(serbia$lrscale)
stats_lr <- data.frame(
Statistic = c("Mean", "Median", "Mode", "Variance", "Standard Deviation"),
Value = c(mean_lr, median_lr, mode_lr, variance_lr, std_dev_lr)
)
kable(stats_lr,
caption = "Descriptive Statistics for Age in Serbia",
digits = 2,
align = "lc"
)
Statistic | Value |
---|---|
Mean | 5.63 |
Median | 6.00 |
Mode | 6.00 |
Variance | 6.85 |
Standard Deviation | 2.62 |
The mean for position on the left-right scale is 5.6, which makes it close, but a bit to the right from the center. Median and mode are even further to the right. Variance is 6.9. Standard deviation is 2.6, which is almost a half of median, which reflects data is scattered quite far from the mean.
ggplot(serbia, aes(x = lrscale)) +
geom_histogram(binwidth = 1, fill = "grey80", color = "black") +
geom_vline(aes(xintercept = mean(serbia$lrscale, na.rm = TRUE), color = "Mean"), lwd = 1) +
geom_vline(aes(xintercept = median(serbia$lrscale, na.rm = TRUE), color = "Median"), lwd = 1) +
geom_vline(aes(xintercept = Mode(serbia$lrscale), color = "Mode"), lwd = 1) +
scale_color_manual(name = "Statistics", values = c("Mean" = "blue", "Median" = "red", "Mode" = "purple")) +
theme_bw() +
labs(x = "Age", y = "Frequency",
title = "Positions on right-left scale in Serbia",
caption = "Source: ESS 11 Round") +
theme(plot.title = element_text(face = "bold", size = 20, vjust = 0.5),
panel.grid.minor = element_line(color = "grey")) +
xlim(0, 10)
We see, again, that most of the people identify themselves as centrists. However, there are hard leftist, which can be noticed at the left end of the distribution. It may be important for our further ananlysis, though this high of number of centrists reflects there is no bright political identification tendency for Serbians.
mean_n <- mean(serbia$hinctnta)
median_n <- median(serbia$hinctnta)
mode_n <- Mode(serbia$hinctnta)
variance_n <- var(serbia$hinctnta)
std_dev_n <- sd(serbia$hinctnta)
stats_n <- data.frame(
Statistic = c("Mean", "Median", "Mode", "Variance", "Standard Deviation"),
Value = c(mean_n, median_n, mode_n, variance_n, std_dev_n)
)
kable(stats_n,
caption = "Descriptive Statistics for Net-Income in Serbia",
digits = 2,
align = "lc"
)
Statistic | Value |
---|---|
Mean | 5.96 |
Median | 6.00 |
Mode | 5.00 |
Variance | 6.66 |
Standard Deviation | 2.58 |
Now we can look at Net-income. The data structure here is quite close to the lef-right scale ones. Mean, median, and mode are close, ranging from 5 to 6. Variance is 6.7. Standard deviation is 2.6 which is almost a half of the mean, which makes the income data deviate quite far away from the mean.
ggplot(serbia, aes(x = hinctnta)) +
geom_histogram(binwidth = 1, fill = "grey80", color = "black") +
geom_vline(aes(xintercept = mean(serbia$hinctnta, na.rm = TRUE), color = "Mean"), lwd = 1) +
geom_vline(aes(xintercept = median(serbia$hinctnta, na.rm = TRUE), color = "Median"), lwd = 1) +
geom_vline(aes(xintercept = Mode(serbia$hinctnta), color = "Mode"), lwd = 1) +
scale_color_manual(name = "Statistics", values = c("Mean" = "blue", "Median" = "red", "Mode" = "purple")) +
theme_bw() +
labs(x = "Age", y = "Frequency",
title = "Net-income in Serbia",
caption = "Source: ESS 11 Round") +
theme(plot.title = element_text(face = "bold", size = 20, vjust = 0.5),
panel.grid.minor = element_line(color = "grey")) +
xlim(0, 10)
As the plot of net-income shows, most of the Serbians are located in 7-th and 6-th decile which is moderate but closer to low level of income. Followed by the 5-th and 4-th deciles by frequency, which are moderate but closer to high level of income. Also, there are significant amounts of 8-th, 9-th and 10-th deciles, which constitute the poor and the poorest population of Serbia.
serbiana <- serbia[!is.na(serbia$sgnptit), ]
ggplot(serbiana) +
geom_boxplot(aes(x = as.factor(serbiana$sgnptit), y = as.numeric(serbiana$agea)), na.rm = T) +
labs(x = NULL, y = 'Age',
title = "Have you signed petition in last 12 months?",
caption = "Source: ESS 11 Round") +
theme(plot.title = element_text(face = "bold", size = 20, vjust = 0.5),
panel.grid.minor = element_line(color = "grey"))
An apparent observation is that people signing petitions are mostly those of the younger age. A direct compartment reveals a 10-year gap between the age medians of ‘signers’ and ‘non-signers’. Moreover, the right box’s median is close to the upper border of the left box’s interquartile range, singinging a meaningful difference between the two groups. Speaking of the boxes themselves, the left one is positively skewed, whereas the right one is almost symmetric. Though the skew is not large enough, we can still observe the tendency towards younger age among the ‘signers’.
serbiana <- serbia[!is.na(serbia$pbldmna), ]
ggplot(serbiana) +
geom_boxplot(aes(x = as.factor(serbiana$pbldmna), y = as.numeric(serbiana$agea)), na.rm = T) +
labs(x = NULL, y = 'Age',
title = "Have you participate in demonstrations in last 12 months?",
caption = "Source: ESS 11 Round") +
theme(plot.title = element_text(face = "bold", size = 20, vjust = 0.5),
panel.grid.minor = element_line(color = "grey")) +
theme(plot.title = element_text(size = 15))
The following graph generally follows the same patterns as the previous one. With the same comments applied as before, it should be added that the tendency towards younger age is even stronger in the case of demonstration attenders. Unlike with petitions, the median value is lover than 30 for the left box. Reflecting on the two graphs, we conclude that the tendency for younger people to participate in new forms of political participation, described in our first hypothesis, is present in our data.
ESS_copy <- ESS %>%
filter(cntry == "Serbia")
ESS_copy$edlvdrs <- as.factor(ESS_copy$edlvdrs)
levels(ESS_copy$edlvdrs) = c("Primary", "Primary", "Primary", "Secondary", "Secondary", "Secondary", "Secondary", "Secondary", "Secondary", "Secondary", "Higher", "Higher", "Higher", "Higher", "Higher", "Higher", "Higher", "Higher", "Other", "Other", "Other")
ESS_copy$icgndra <- as.factor(ESS_copy$icgndra)
levels(ESS_copy$icgndra) = c("Male", "Female", "Other")
#table(ESS_copy$polintr)
ESS_copy$polintr <- dplyr::recode(ESS_copy$polintr,
"Not at all interested" = 1,
"Hardly interested" = 2,
"Quite interested" = 3,
"Very interested" = 4)
ESS_copy %>%
select(cntry, edlvdrs, polintr) %>%
filter(!(edlvdrs=="Other"), !(polintr=="7"), !(polintr=="8"), !(polintr=="9")) %>%
ggplot(aes(fill=as.factor(edlvdrs), x=polintr), na.rm = T)+
geom_bar()+
theme_bw() +
labs(x = "Political interest", y = "frequency",
title = "Polotical interest and level of education in Serbia",
caption = "Source: ESS11") +
theme(plot.title = element_text(face = "bold", size = 20, vjust = 0.5),
panel.grid.minor = element_line(color = "grey"))+
guides(fill=guide_legend(title="Level of education"))
ESS_copy %>%
filter(!(edlvdrs == "Other"), !(polintr %in% c(7,8,9))) %>%
ggplot(aes(x = as.factor(polintr), fill = edlvdrs)) +
geom_bar(position = "fill", na.rm = TRUE) + # position = fill scales bars to proportions
scale_y_continuous(labels = scales::percent_format()) + # y-axis in percent
theme_bw() +
labs(x = "Political interest", y = "Proportion",
title = "Political interest and level of education in Serbia",
caption = "Source: ESS11") +
theme(plot.title = element_text(face = "bold", size = 20, vjust = 0.5),
panel.grid.minor = element_line(color = "grey")) +
guides(fill = guide_legend(title = "Level of education"))
ESS_copy %>%
filter(!(edlvdrs == "Other"), !(polintr %in% c(7,8,9))) %>%
ggplot(aes(x = edlvdrs, fill = as.factor(polintr))) +
geom_bar(position = "fill", na.rm = TRUE) +
scale_y_continuous(labels = scales::percent_format()) +
theme_bw() +
labs(x = "Level of education", y = "Proportion",
title = "Political interest and level of education in Serbia",
caption = "Source: ESS11") +
theme(plot.title = element_text(face = "bold", size = 20, vjust = 0.5),
panel.grid.minor = element_line(color = "grey")) +
guides(fill = guide_legend(title = "Political interest"))
Firstly, we used the original levels of the variable of the highest level of a person’s education (edlvdrs), but then grouped the 18 levels into three simpler ones, based on the structure of Serbian system of education (Scholaro, n.d.).
From the first graph we can see that most of the respondents are not interested in politics – the majority has chosen option 3 – “Hardly interested”. The fourth one – “Not at all interested” is also very popular. Among those not interested (level 3 and 4) there are more people with higher education. Among those interested in politics the distribution of levels of education is quite equal, Primary being slightly less represented.
We also added two plots revealing the 1) proportion of education levels in different political interest groups, 2) proportion of political interest groups in different education groups. The (1) plot shows there are more low-educated people in the group of the lowest political interest. Moreover, the highest proportion of high-educated people is in the most interested in politics group. The (2) plot shows that the number of actively interested in politics people increases with each educational step, while the Hardly interested and Not at all interested groups decrease in their number.
ESS_copy %>%
select(cntry, icgndra, edlvdrs, polintr) %>%
filter(!(edlvdrs=="Other"), !(polintr=="7"), !(polintr=="8"), !(polintr=="9"), !(icgndra=="Other")) %>%
ggplot(aes(fill=as.factor(edlvdrs), x=polintr), na.rm = T)+
geom_bar()+
theme_bw() +
labs(x = "Political interest", y = "frequency",
title = "Political interest and level of education in Serbia",
caption = "Source: ESS11") +
theme(plot.title = element_text(face = "bold", size = 20, vjust = 0.5),
panel.grid.minor = element_line(color = "grey"))+
guides(fill=guide_legend(title="Level of education"))+
facet_wrap(~icgndra)
The second graph, which includes gender, shows that men are more interested in politics than women are, levels 1 and 2 being significantly higher for male respondents. Amount of people indifferent to politics is almost the same for men and women, around 150 respondents.
The following three graphs analyse the relationship between political self-determination and three different variables: income, age and education. As Trošt and Marinšek (2022) write, the political orientation of Serbs on the right-left scale depends more on age and education, while income has almost no effect.
Although the preliminary analysis of the political scale variable did not reveal any particular tendencies in our data, we decided to include this graph to at least look at the radical leftists and see if we can notice their certain income status, level of education, and age. The same thing accounts for the slightly elevated right tail.
ggplot(serbia) +
geom_jitter(aes(x = as.numeric(serbia$lrscale), y = as.numeric(serbia$hinctnta)), na.rm = T) +
theme_bw() +
geom_density_2d_filled(aes(x = as.numeric(serbia$lrscale), y = as.numeric(serbia$hinctnta), alpha = 0.6), na.rm = T, show.legend = F) +
theme_bw() +
labs(x = "Left - Right Scale", y = "Net-Income",
title = "Relationship between Political Views and Income",
caption = "Yellow - High density, Purple - Low density
Source: ESS 11 Round") +
theme(plot.title = element_text(face = "bold", size = 20, vjust = 0.5),
panel.grid.minor = element_line(color = NULL))
In this graph, the X-axis is the left-right scale (where 1 is far left and 11 is far right) and the Y-axis is the net-income scale (1 is the highest income, 10 is the lowest). As we can see, most of the answers are in the upper middle, which means that the majority of the low-income population in Serbia does not have strong political preferences. However, we can pay attention to the left side, where middle-income people have clearly extreme left-wing political preferences. We can also assume the presence of extreme right-wing views among lower-income people if we look at the right side of the graph, but this is a less concentrated group. In summary, we can say that people who define themselves as far left have mostly middle income, but also include people with lower, and higher incomes, while far right people have only middle and low income.
ggplot(serbia) +
geom_jitter(aes(x = as.numeric(serbia$lrscale), y = as.numeric(serbia$edlvdrs)), na.rm = T) +
theme_bw() +
geom_density_2d_filled(aes(x = as.numeric(serbia$lrscale), y = as.numeric(serbia$edlvdrs), alpha = 0.7), na.rm = T, show.legend = F) +
theme_bw() +
labs(x = "Left - Right Scale", y = "Levels of Education",
title = "Relationship between Political Views and Levels of education",
caption = "Yellow - High density, Purple - Low density
Source: ESS 11 Round") +
theme(plot.title = element_text(face = "bold", size = 13, vjust = 0.5),
panel.grid.minor = element_line(color = NULL))
This graph shows the relationship between educational attainment and political preferences. As can be seen, Serbian people do not generally define themselves as right or left. However, we can see two tails in the average education (10) that stretch towards extreme right or extreme left, which may suggest that people with average education are normally distributed on the political scale. Higher education is more interesting, because of a left tail, and no right tail. Thus we can say that people with higher education are either centrist or have left-wing views, right-wing views are rare among people with higher education.
ggplot(serbia) +
geom_jitter(aes(x = as.numeric(serbia$lrscale), y = as.numeric(serbia$agea)), na.rm = T) +
theme_bw() +
geom_density_2d_filled(aes(x = as.numeric(serbia$lrscale), y = as.numeric(serbia$agea), alpha = 0.6), na.rm = T, show.legend = F) +
theme_bw() +
labs(x = "Left - Right Scale", y = "Age",
title = "Relationship between Political Views and Age",
caption = "Yellow - High density, Purple - Low density
Source: ESS 11 Round") +
theme(plot.title = element_text(face = "bold", size = 20, vjust = 0.5),
panel.grid.minor = element_line(color = NULL))
This graph shows the correlation between age and political preferences. As in the previous graphs, we can see that mostly Serbs do not have strong political preferences. At the same time, we can see the group of responses concentrated at the top left, these are people around 60 years old with extreme left-wing views. This can be explained by Serbia’s communist past, we can assume that these people have retained extreme left-wing views from the time of Yugoslavia.
In this project we have seen the distributions of the variables in our focus, which are connected with political participation. We have built plots depicting the relationships between them in order to investigate the hypothesis we have set based on the existing literature.
Our hypothesis were:
Those who engage in unconventional political participation practices like signing petitions and taking part in demonstrations are young people.
Those with a higher level of education will have a high interest in politics.
The first hypothesis is supported by our data: those who participate in uncoventional political participation practices like signing petitions and participating in demonstrations are mostly young people - those under 50 years old.
The second hypothesis was also backed by our data. While most of the respondents in Serbia are not intersted in politics, the proportion of actively interested in politics people increases with each educational step - from Primary, through Secondary, to Higher education, while those not interested in politics appear less while educational level increases.
Some other findings were:
Serbs do not have strong political preferences - most of them situate themselves in the middle of the right-left scale.
There are more low-educated people among those of the lowest political interest. Moreover, the highest proportion of high-educated people is in the most interested in politics group.
Serbian men are more interested in politics than women are.
People who define themselves as far left have mostly middle income, but also include people with lower, and higher incomes, while far right people have only middle and low income.
People with higher education are either centrist or have left-wing views, right-wing views are rare among people with higher education.
There is a group of people, identifying themselves as radical left, who are mostly old, around 60 years old.
Petrović, Jelisaveta & Stanojevic, Dragan. (2020). Political Activism in Serbia. Südosteuropa. 68. 365-385.
Stanojević, D., Vukelić, J., & Tomašević, A. (2023). Political Participation of Young People in Serbia: Activities, Values, and Capability. In I. Rivers & C. L. Lovin (Eds.), Young People Shaping Democratic Politics: Interrogating Inclusion, Mobilising Education (pp. 31–53). Springer International Publishing.
Tamara Trošt, Denis Marinšek; Social Class and Ethnocentric Worldviews: Assessing the Effect of Socioeconomic Status on Attitudes in Serbia and Croatia. Communist and Post-Communist Studies 1 June 2022; 55 (2): 39–61.
Scholar database, https://www.scholaro.com/db/countries/Serbia/Education-System
library(foreign)
setwd("F:/ARRR")
ESS <- read.spss("ESS11.sav", use.value.labels = T, to.data.frame = T)
library(ggplot2)
library(dplyr)
library(DescTools)
library(formattable)
library(psych)
library(ggpubr)
library(car)
library(sjstats)
library(magrittr)
library(knitr)
## install.packages("kableExtra")
library(kableExtra)
library(ggstatsplot)
library(corrplot)
library(sjPlot)
library(rstatix)
serbia <- ESS %>%
filter(cntry == "Serbia")
serbia$agea <- as.numeric(as.character(serbia$agea))
serbianapol <- serbia[!is.na(serbia$polintr), ]
serbianage <- serbia[!is.na(serbia$agea), ]
serbianalr <- serbia[!is.na(serbia$lrscale), ]
serbiana1 <- serbianage[!is.na(serbianage$polintr), ]
serbiana2 <- serbianage[!is.na(serbia$lrscale), ]
In this project we explore political participation, tackling our research question: Which factors are connected with people’s political participation in Serbia? We use ESS 11 data to do so. We will examine four hypotheses connected with political participation and factors which may influence it:
Three of them (1-4) are based on Petrović & Stanojević, 2020 work, which analyses “characteristics and factors shaping political activism in Serbia”. They distinguish traditional and unconventional, or old and new, types of political activism, unconventional practices being signing petitions, occupating public spaces, and participating in protests and traditional ones being, for example, having membership of political parties and making direct contact with politicians.
The last hypothesis is inspired by one of graphs in the first projects, which depicts age distribution among those having different positions on the right-left political scale.
serbiachisq <- serbia %>% select(polintr, sgnptit, gndr)
Interest in politics and signing petitions.
Theory: The article (Zorkaya, N. 1999) describes how lack of interest in politics and general apolitical attitude affect general political participation. People who express their interest in politics as “high” or “rather high” are more likely to participate in polls and they are also more likely to vote in elections. Based on the study by (Petrović & Stanojević, 2020), in recent years there is the rise of interest in signing petitions in Serbia - presumably, because of the rise of online petitions and high accessibility of them to citizens. As well as, generally, Serbians are more engaged in so-called unconventional, that is new, forms of political participation, among them is agreement to sign petitions. These conclusions inspired us to analyse the relationship between political interest and signing petitions.
We expect those interested in politics to sign petitions more than those who are not, this is our research hypothesis.
H0: The statistical null hypothesis in our case is that there will be no relationship between the variables “Interest in politics” and “Petition signing”.
We can use a chi-square test on these variables for several reasons 1. The variables contain categorical values - for “political interest” 4 levels, for “petition signatures” 2 levels. Collectively, we have 8 groups of people with a different set of previous variables. These groups contain numbers - the number of people in each variation. Not a percentage 2. There are no fewer than 5 people in any of these groups. 3. Since the survey did not imply multiple choice, the data is mutually exclusive.
chisq.test(serbiachisq$polintr, serbiachisq$sgnptit)
##
## Pearson's Chi-squared test
##
## data: serbiachisq$polintr and serbiachisq$sgnptit
## X-squared = 64.432, df = 3, p-value = 6.636e-14
polintr_sgnptit <- chisq.test(serbiachisq$polintr, serbiachisq$sgnptit)
As we can see, the p-value is well below the 0.05 threshold. It follows that there is a statistically significant relationship between the variables in this case. Therefore, the null hypothesis was not confirmed, the variables are somehow related. Let’s take a closer look at the expected and observed data.
polintr_sgnptit$expected
## serbiachisq$sgnptit
## serbiachisq$polintr Yes No
## Very interested 27.46452 101.5355
## Quite interested 56.84516 210.1548
## Hardly interested 130.72258 483.2774
## Not at all interested 114.96774 425.0323
polintr_sgnptit$observed
## serbiachisq$sgnptit
## serbiachisq$polintr Yes No
## Very interested 47 82
## Quite interested 89 178
## Hardly interested 125 489
## Not at all interested 69 471
polintr_sgnptit$stdres
## serbiachisq$sgnptit
## serbiachisq$polintr Yes No
## Very interested 4.3882660 -4.3882660
## Quite interested 5.2836986 -5.2836986
## Hardly interested -0.7259897 0.7259897
## Not at all interested -5.9862695 5.9862695
corrplot(chisq.test(serbiachisq$polintr, serbiachisq$sgnptit)$stdres, is.cor = F, method = 'num')
In order to say that the difference between expected and actual is not pronounced, we would need to have standardized residuals in the range between -2 and 2. On our data, we see the following: 1. For people interested in politics (“Very interested” and “Quite interested”). Petitions are being signed by many more people than expected (the remaining 4.3 and 5.2, respectively). 2. The distribution of people by status of whether a person signs petitions in the “Hardly interested” category almost coincides with what is expected. 3. For people who are not interested in politics, we can see on standardized residuals that they, in turn, sign petitions much less often than expected.
plot_xtab(serbiachisq$polintr, serbiachisq$sgnptit, margin = 'row', bar.pos = 'stack', show.summary = T)
From the graph, we can see that people who are less interested in politics are less likely to sign petitions. Moreover, the less interest there is, the fewer petitioners there are. The null hypothesis has not been confirmed.
In our previous project based on the study of (Petrović & Stanojević, 2020) we suggested that people participating in unconventional political practices (such as signing petitions and participation in demonstrations) are those of a younger age. Stating this as a research hypothesis, let’s check whether it is true or not.
As it is seen from the bar plot below, generally, there are more people who do not sign petitions among the respondents from Serbia. We will use signing petitions as a factor variable in the analysis.
ggplot(serbia) +
geom_bar(aes(x = as.factor(serbia$sgnptit), na.rm = T)) +
scale_x_discrete(limits = c('Yes', 'No')) +
labs(title = "Signing Petitions among Serbians",
x = "Signed petition last 12 months", y = NULL,
caption = "Source: ESS 11 Round") +
theme(plot.title = element_text(face = "bold", size = 20, vjust = 0.5),
panel.grid.minor = element_line(color = "grey"))
As for the age distribution, though mean and median are almost the same (near 38 years), the mode is too far from them, thus distribution seems to be not normally distributed. There are significant drops of the data, as well as heavy left tail - there are more people of a younger age than of an older - which also implies non-normality. Also, the most frequent respondent is of the age of 57. We will use age as a continuous variable in the analysis.
ggplot(serbia, aes(x = agea)) +
geom_histogram(binwidth = 1, fill = "grey80", color = "black") +
geom_vline(xintercept = mean(serbia$agea, na.rm = TRUE), col = 'blue', lwd = 1) +
geom_vline(xintercept = median(serbia$agea, na.rm = TRUE), col = 'red', lwd = 1) +
geom_vline(xintercept = Mode(serbia$agea, na.rm = TRUE), col = 'purple', lwd = 1) +
theme_bw() +
labs(x = "Age", y = "Frequency",
title = "Age distribution in Serbia",
caption = "Source: ESS 11 Round") +
theme(plot.title = element_text(face = "bold", size = 20, vjust = 0.5),
panel.grid.minor = element_line(color = "grey")) +
xlim(0,100)
Now, we need to check whether people who sign petitions are statistically different in age with those who don’t. We will use t-test to compare two samples of respondents by a continuous variable - age. In this case it will be two sample t-test, which will compare the mean ages of those who sign and those who don’t sign petitions.
First, we check for several assumptions t-test requires data to correspond to:
First, observations are assumed to be independent, as they are apparently collected from different respondents.
In order to check whether the data is distributed normally, we use Shapiro test, as well as check it by looking at skew and kurtosis. For a normal (Gaussian) distribution, skew is within +-0.5 from 0 (symmetric), and kurtosis is within +-1 from 0.
By looking at skew and kurtosis of distribution of age, we see that it is a bit positively skewed in the first sample - a bit more data is concentrated in the left part of the graph, and a bit negatively skewed (to the right) in the second sample. While kurtosis in both samples are negative and almost on the threshold - near -1. However, from the analysis of these parameters we can still assume the normality of the data in both samples.
Now we are to conduct Shapiro test in order to confirm or reject the normality of the data. P-values in both tests are low, which means the data is not normally distributed.
describeBy(serbia$agea, group = serbia$sgnptit)
##
## Descriptive statistics by group
## group: Yes
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 325 46.33 16.67 45 45.92 19.27 17 85 68 0.22 -0.93 0.92
## ------------------------------------------------------------
## group: No
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 1209 54.11 18.13 56 54.99 20.76 15 90 75 -0.35 -0.85 0.52
shapiro.test(serbia$agea[serbia$sgnptit == "Yes"])
##
## Shapiro-Wilk normality test
##
## data: serbia$agea[serbia$sgnptit == "Yes"]
## W = 0.97096, p-value = 4.059e-06
shapiro.test(serbia$agea[serbia$sgnptit == "No"])
##
## Shapiro-Wilk normality test
##
## data: serbia$agea[serbia$sgnptit == "No"]
## W = 0.96388, p-value < 2.2e-16
Let’s check the normality one more time in order to rest assured about the conclusion. We will use Q-Q plot. The data should be located on the (or near) the prediction line in order for us to state it’s normality. On the plots for both samples we clearly see, there are many outliers from the lines.
ggqqplot(serbia, "agea", facet.by = "sgnptit", main = "Q-Q plot. Age of Serbians: Those signing petitions and not")
Now we can confirm abnormality of the data. However, we can still conduct t-test, as if there are many (> 100) observations, it is useful for non-normally distributed data. In our case we have a sample of 330 observations of those who sign petitions and 1223 observations of those who don’t.
Now let’s check the second assumption - the equality of variances - in both (“yes” and “no”) samples. In order to do this we can conduct Levene test. Here p-value is less than 0.05, thus the hypothesis about equality of variances could be rejected. This implies that further we will run Welch t-test which is default for r.
leveneTest(serbia$agea ~ serbia$sgnptit)
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 1 3.956 0.04688 *
## 1532
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Non-equality of variances is also seen on boxplots. Although the boxes do not differ in size much, they are a little bit different in spread.
boxplot(serbia$agea ~ serbia$sgnptit, main = "Age by signing petitions")
Now we can move to the t-test itself. As it was already mentioned, we will run Welch t-test, because variances of two samples are not equal.
As usual, our H0 is that there is no difference in means of two samples - those who sign petitions and those who don’t. Whereas H1 is that there is difference in means of two samples.
First, conducting two-sided test, we find p-value to be very low (<0.05), thus, we can state significant difference between the mean ages of those who sign petitions and those who don’t.
Then we can specify and check the direct hypothesis - that the mean age of those who sign petitions is less than of those who don’t. In this one-sided t-test p-value is also very small (<0.05), thus the above-stated alternative hypothesis is true.
We can also see the means of two samples: for those who sign petitions mean age is 32 years and for those who don’t - 40 years.
t.test(serbia$agea ~ serbia$sgnptit)
##
## Welch Two Sample t-test
##
## data: serbia$agea by serbia$sgnptit
## t = -7.3301, df = 548.13, p-value = 8.295e-13
## alternative hypothesis: true difference in means between group Yes and group No is not equal to 0
## 95 percent confidence interval:
## -9.865554 -5.695538
## sample estimates:
## mean in group Yes mean in group No
## 46.32615 54.10670
t.test(serbia$agea ~ serbia$sgnptit,
alternative = "less")
##
## Welch Two Sample t-test
##
## data: serbia$agea by serbia$sgnptit
## t = -7.3301, df = 548.13, p-value = 4.148e-13
## alternative hypothesis: true difference in means between group Yes and group No is less than 0
## 95 percent confidence interval:
## -Inf -6.03166
## sample estimates:
## mean in group Yes mean in group No
## 46.32615 54.10670
Now we will double check the result of t-test, using non-parametric Mann-Whitney-Wilcoxon test, which is usually used for non-normally distributed data in cases when number of observations is < 100 or when data used is ordinal, not continuous. In our case we will use Mann-Whitney test as it is used for independent samples (not paired for which Wilcoxon test can be used).
Running one-sided Mann-Whitney test we can see low p-value (<0.05), which again confirms that those who sign petitions are statistically younger than those who don’t.
wilcox.test(serbia$agea ~ serbia$sgnptit,
alternative = "less", paired = F)
##
## Wilcoxon rank sum test with continuity correction
##
## data: serbia$agea by serbia$sgnptit
## W = 145968, p-value = 5.265e-13
## alternative hypothesis: true location shift is less than 0
In order to see how big is the difference between mean ages of two samples, let’s check Cohen’s d - the effect size.
Cohen’s d = 0.45, that is a small (close to medium) effect size.
cohens_d(serbia, agea ~ sgnptit)
## # A tibble: 1 × 7
## .y. group1 group2 effsize n1 n2 magnitude
## * <chr> <chr> <chr> <dbl> <int> <int> <ord>
## 1 agea Yes No -0.447 325 1209 small
Using code from https://rpsychologist.com/short-r-script-to-plot-effect-sizes-cohens-d-and-shade-overlapping-area site, we can visualize the Cohen’s d effect.
The data is standardized and the more data overlap, the less is the effect size.
ES <- 0.45
mean1 <- ES*1 + 1
x <- seq(1 - 3*1, mean1 + 3*1, .01)
y1 <- dnorm(x, 1, 1)
df1 <- data.frame("x" = x, "y" = y1)
y2 <- dnorm(x, mean1, 1)
df2 <- data.frame("x" = x, "y" = y2)
y.poly <- pmin(y1,y2)
poly <- data.frame("x" = x, "y" = y.poly)
u3 <- 1 - pnorm(1, mean1,1)
u3 <- round(u3,3)
ggplot(df1, aes(x,y, color="Those who sign petitions")) +
geom_line(size=1) +
geom_line(data=df2, aes(color="Those who don't sign petitions"),size=1) +
geom_polygon(aes(color=NULL), data=poly, fill="#BCEE68", alpha=I(4/10),
show_guide=F) +
geom_vline(xintercept = 1, linetype="dotted") +
geom_vline(xintercept = mean1, linetype="dotted") +
labs(title=paste("Age and Petitions - Effect Size Visualisation
(Cohen's d = ",ES,"; U3 = ",u3,")", sep="")) +
scale_color_manual("Group",
values= c("Those who sign petitions" = "dodgerblue2","Those who don't sign petitions" = "#BCEE68")) +
theme(plot.title = element_text(face = "bold", size = 18, vjust = 0.5),
panel.grid.minor = element_line(color = "grey"))+
ylab(NULL) + xlab(NULL)
To sum up, we have statistically proved our initial hypothesis that those participating in unconventional political practices such as signing petitions are of a younger age. The effect size for the test is small, but close to median.
For our analysis of variances we will take two pairs of variables:
We have a hypothesis, based on the Petrović, J., & Stanojević, D. (2020) that young Serbians, are more interested in politics, than older people, so we want to test.
Also, based on our previous investigations we saw, that we can observe the cluster of older people among the far-leftists, we want to test if there is a real significant difference in age depends on political position.
First of all we need to check the assumptions for our test.
ANOVA has the following:
Last two assumptions we need to test.
First of all we will test normality of distribution of age.
ggplot(serbia, aes(x = agea)) +
geom_histogram(binwidth = 1, fill = "grey80", color = "black") +
geom_vline(xintercept = mean(serbia$agea, na.rm = TRUE), col = 'blue', lwd = 1) +
geom_vline(xintercept = median(serbia$agea, na.rm = TRUE), col = 'red', lwd = 1) +
geom_vline(xintercept = Mode(serbia$agea, na.rm = TRUE), col = 'purple', lwd = 1) +
theme_bw() +
labs(x = "Age", y = "Frequency",
title = "Age distribution in Serbia",
caption = "Source: ESS 11 Round") +
theme(plot.title = element_text(face = "bold", size = 20, vjust = 0.5),
panel.grid.minor = element_line(color = "grey")) +
xlim(0,100)
Here we can see that mean (blue) and the median (red) are very close to each other, but mode (purple) is too far from them, that could say, the data is normally distributed. We will also check the skew and kurtosis of our data:
describeBy(serbia$agea)
## vars n mean sd median trimmed mad min max range skew kurtosis se
## X1 1 1543 52.47 18.11 53 53.03 22.24 15 90 75 -0.21 -0.97 0.46
Here we can see, that skewness is slightly negative, which means, that data set is skewed to the right, we can see it on the histogram. Kurtosis is also slightly negative, near the -1, that means that we probably have a very few outliers, but it is not less than three, which means that kurtosis is normal.
Now we can test normality using the Shapiro-Wilk test. Here our H0 is that data is normal, the alternative is that it is abnormal.
shapiro.test(x = serbianage$agea)
##
## Shapiro-Wilk normality test
##
## data: serbianage$agea
## W = 0.96936, p-value < 2.2e-16
ggqqplot(serbia, "agea")
The p-value is very low, it means, that our data is abnormally distributed. It means that we can’t use ANOVA, because it is a parametric test. Instead of it we should use Kruskal-Wallis test.
ggplot(serbianapol) +
geom_bar(aes(x = as.factor(serbianapol$polintr), na.rm = TRUE)) +
labs(x = "1 - Very Interested, 4 - Not at all interested", y = NULL,
title = "Interest in politics",
caption = "Source: ESS 11 Round") +
theme(plot.title = element_text(face = "bold", size = 20, vjust = 0.5),
panel.grid.minor = element_line(color = "grey"))
Here we can see, that most of people are not interested in politics.
Now we will check the normality of distribution of age for each group.
ggqqplot(serbianapol, "agea", facet.by = "polintr")
On the QQ-plot we can see that most of points are on the prediction line, that mean, that the distribution is close to normal, but we have deviations from normality at the beginning and at the end of the line, that tells us about abnormality.
To be more accurate we can check the normality with comparing mean and median, and look at skewness and kurtosis.
describeBy(serbianapol$agea, serbianapol$polintr, mat = TRUE) %>%
select(Education = group1, N = n, Mean = mean, SD = sd, Median = median, Min = min, Max = max, Skew = skew, Kurtosis = kurtosis, st.error = se) %>%
kable(align = c("lrrrrrrrrr"), digits = 2, row.names = FALSE,
caption = "Age by Interest in Politics")
Education | N | Mean | SD | Median | Min | Max | Skew | Kurtosis | st.error |
---|---|---|---|---|---|---|---|---|---|
Very interested | 127 | 57.74 | 15.76 | 58 | 17 | 84 | -0.54 | -0.54 | 1.40 |
Quite interested | 263 | 56.60 | 17.35 | 58 | 17 | 90 | -0.39 | -0.83 | 1.07 |
Hardly interested | 611 | 52.52 | 17.57 | 54 | 17 | 89 | -0.22 | -1.01 | 0.71 |
Not at all interested | 539 | 49.22 | 18.90 | 49 | 15 | 90 | 0.00 | -0.99 | 0.81 |
Here we can see that differences between means and medians of all groups are very small. Also the skewness and kurtosis are small too for each group. But we know, that data is general is abnormally distributed, that means we can’t say about normal distribution for all groups.
ggplot(serbiana1) +
geom_boxplot(aes(x = as.factor(serbiana1$polintr), y = serbiana1$agea), na.rm = TRUE) +
labs(x = "1 - Very Interested, 4 - Not at all interested", y = 'Age',
title = "Interest in politics by age",
caption = "Source: ESS 11 Round") +
theme(plot.title = element_text(face = "bold", size = 20, vjust = 0.5),
panel.grid.minor = element_line(color = "grey"))
Here on the boxplot we can see, that median age of people, who are interested in politics is above 40, while people who are not interested is lower than 40. It is interesting, because it contradicts our previous hypothesis. We need to check is this difference significant or not.
We will use both ANOVA and Kruskal-Wallis tests, but of course Krukal-Wallis is more accurate here.
Firstly we will do ANOVA. We need to check variances, if they are equal or not. If they are equal we will do classical ANOVA, if not one way test.
leveneTest(serbiana1$agea ~ serbiana1$polintr)
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 3 3.577 0.01348 *
## 1536
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
The probability (Pr) here is small and that means that variances are not equal. So we will do oneway ANOVA test.
oneway.test(serbiana1$agea ~ serbiana1$polintr, var.equal = F)
##
## One-way analysis of means (not assuming equal variances)
##
## data: serbiana1$agea and serbiana1$polintr
## F = 14.803, num df = 3.00, denom df = 481.09, p-value = 3.112e-09
Our H0 was that all group means are equal, so the Ha is that at least one mean is different.
The results of test show us, that p-value is very small, that means that at least one group mean is different. This result can’t be considered as final and totally accurate, because our data is not normal.
We need to check if the residuals are standardized here and do post-hoc tests to explore which group is different.
one.way.anova <- stats::aov(serbiana1$agea ~ serbiana1$polintr)
summary(one.way.anova)
## Df Sum Sq Mean Sq F value Pr(>F)
## serbiana1$polintr 3 13693 4564 14.28 3.5e-09 ***
## Residuals 1536 490844 320
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Here we see that difference in groups is significant (p-values < 0.05) even if the residuals are equal, because aov() function assuming it by default.
Now we can plot the residuals.
plot(one.way.anova, 2)
layout(matrix(1:4, 2, 2))
plot(one.way.anova)
Here we see the distribution of residuals. The top-left plot show us the distribution of residuals and its variance. We should see the red line on the zero-axis. Here on the plot we can see that it is not straight line. The same is with the top-right. We must see the straight red line, but it is not straight.
And the final, the QQ-plot. We can see that residuals are not on the prediction line here on the ends.
It means that residuals are abnormally distributed and the results are not totally correct.
We can explore it further.
anova.res <- residuals(object = one.way.anova)
describe(anova.res)
## vars n mean sd median trimmed mad min max range skew kurtosis
## X1 1 1540 0 17.86 0.78 0.46 21.58 -40.74 40.78 81.52 -0.18 -0.92
## se
## X1 0.46
Here we can see that mean and median are not the same, it also says about abnormality.
shapiro.test(x = anova.res)
##
## Shapiro-Wilk normality test
##
## data: anova.res
## W = 0.97436, p-value = 6.115e-16
The p-value is very small, we can conclude that residuals are abnormal. And we can see it on the histogram.
hist(anova.res)
Now we can do post-hoc test to investigate which group is different. We will use pairwise t-test with Bonferroni’s correction, because we have unequal residuals.
pairwise.t.test(serbiana1$agea, serbiana1$polintr,
adjust = "bonferroni", pool.sd = TRUE)
##
## Pairwise comparisons using t tests with pooled SD
##
## data: serbiana1$agea and serbiana1$polintr
##
## Very interested Quite interested Hardly interested
## Quite interested 0.5554 - -
## Hardly interested 0.0074 0.0074 -
## Not at all interested 7.5e-06 2.9e-07 0.0074
##
## P value adjustment method: holm
Here we can see that statistically significant pairs are 1-3, 1-4, 2-3, 2-4, 3-4.
Now we can measure the effect size.
library(sjstats)
## install.packages("pwr") # may require for this package
anova_stats(one.way.anova) # the name of your ANOVA resulting object
## etasq | partial.etasq | omegasq | partial.omegasq | epsilonsq | cohens.f
## ------------------------------------------------------------------------
## 0.027 | 0.027 | 0.025 | 0.025 | 0.025 | 0.167
## | | | | |
##
## etasq | term | sumsq | df | meansq | statistic | p.value | power
## -------------------------------------------------------------------------------------
## 0.027 | serbiana1$polintr | 13693.060 | 3 | 4564.353 | 14.283 | < .001 | 1
## | Residuals | 4.908e+05 | 1536 | 319.560 | | |
The omega-squared is 0.025 that means that our effect is small.
Now, we can do the Kruskall-Wallis non-parametric test and compare the results of both tests.
kruskal.test(agea ~ polintr, data = serbiana1)
##
## Kruskal-Wallis rank sum test
##
## data: agea by polintr
## Kruskal-Wallis chi-squared = 42.323, df = 3, p-value = 3.426e-09
As we can see, the p-value is very low, it means that groups are different and confirms the results of previous test.
Now we can do post-hoc test.
DunnTest(agea ~ polintr, data = serbiana1)
##
## Dunn's test of multiple comparisons using rank sums : holm
##
## mean.rank.diff pval
## Quite interested-Very interested -28.92521 0.5472
## Hardly interested-Very interested -132.55911 0.0067 **
## Not at all interested-Very interested -210.96284 7.5e-06 ***
## Hardly interested-Quite interested -103.63390 0.0063 **
## Not at all interested-Quite interested -182.03763 3.1e-07 ***
## Not at all interested-Hardly interested -78.40373 0.0067 **
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Here we can see that 5 pairs have significant differences, that confirms Bonferroni’s correction results.
Now we can visualize our results and interpret it.
## install.packages("ggstatsplot")
library(ggstatsplot)
ggbetweenstats(data = serbiana1, y = agea, x = polintr, var.equal = F)
As we can see here, based on our previous research, age of people that are interested in politics is higher than age of people who are not interested. To be more detailed we can say, that age of people who choose 3 (hardly interested) and 4 (not interested at all) is less than age of people who choose 1 (very interested) and 2 (quite interested). Also there is significant difference between 2 and 3, and 3 and 4, age is decreasing with decreasing the interest.
Our first hypothesis about the age is denied. The results are inverted, the higher the age, the higher the interest in politics.
In this block we will use ANOVA to check if there are age differences among people of different positions on the right-left scale.
The research hypothesis is the following: we assume that age and position on right-left scale are connected, specifically those on the left being older than others.
First we decided to group positions on left-right political scale into bigger categories: left, central, and right. It would make the tests easier to conduct and interpret.
serbiana2_gr <- serbiana2 %>% select(lrscale, agea)
serbiana2_gr <- serbiana2_gr[!is.na(serbiana2_gr$lrscale), ]
levels(serbiana2$lrscale)
## [1] "Left" "1" "2" "3" "4" "5" "6" "7" "8"
## [10] "9" "Right"
levels(serbiana2_gr$lrscale) <- c("Left", "Left", "Left", "Left", "Central", "Central", "Central", "Right", "Right", "Right", "Right")
levels(serbiana2_gr$lrscale)
## [1] "Left" "Central" "Right"
Now we can examine the variable, starting with its visualisation.
ggplot(serbia, aes(x = agea)) +
geom_histogram(binwidth = 1, fill = "grey80", color = "black") +
geom_vline(xintercept = mean(serbia$agea, na.rm = TRUE), col = 'blue', lwd = 1) +
geom_vline(xintercept = median(serbia$agea, na.rm = TRUE), col = 'red', lwd = 1) +
geom_vline(xintercept = Mode(serbia$agea, na.rm = TRUE), col = 'purple', lwd = 1) +
theme_bw() +
labs(x = "Age", y = "Frequency",
title = "Age distribution in Serbia",
caption = "Source: ESS 11 Round") +
theme(plot.title = element_text(face = "bold", size = 20, vjust = 0.5),
panel.grid.minor = element_line(color = "grey")) +
xlim(0,100)
Distributions and medians are very similar, but we can see that median in the Left group is slightly bigger. We can run a statistical test to check if the difference is statistically significant.
We want to use ANOVA test. As it was mentioned, there are three conditions we should check before conducting the test: 1) Variables should be independent. 2) The observations should be normally distributed within groups. 3) Variances should be approximately equal. 4) Variables are continuous and categorical.
The variables are indeed independent – there are separate, not intersecting observations. Types of variables are also suitable for the test.
Let’s now check the equality of variances using Levene’s test.
leveneTest(serbiana2_gr$agea ~ serbiana2_gr$lrscale)
## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 2 1.761 0.1726
## 698
The null hypothesis for this test is that the variances are equal. We cannot reject it as p-value is 0.17, bigger than statistically significant (0.05). Thus, we assume the equality of variances and we can use regular ANOVA.
The null hypothesis is H0: Means of left, central, and right groups are equal.
The alternative hypothesis is H1: There is at least one group that differs from the other in terms of the mean.
test_lrage <- aov(agea ~ lrscale, data = serbiana2_gr)
summary(test_lrage)
## Df Sum Sq Mean Sq F value Pr(>F)
## lrscale 2 937 468.7 1.451 0.235
## Residuals 698 225496 323.1
P-value is 0.235, which is not statistically significant (>0.05). At this stage we cannot reject the null hypothesis which says that the means are equal. As the test does not show signs of differences between the groups, we will not conduct post-hoc tests. But we can check the residuals to delve into the normality of data distribution.
plot(test_lrage, 2)
layout(matrix(1:4, 2, 2))
plot(test_lrage)
anova.res.ess2 <- residuals(object = test_lrage)
shapiro.test(x = anova.res.ess2)
##
## Shapiro-Wilk normality test
##
## data: anova.res.ess2
## W = 0.97063, p-value = 1.195e-10
hist(anova.res.ess2)
The residuals are not distributed normally: Q-Q plot shows their inclination from the diagonal line, the lines of the two upper graphs are not straigh, Shapiro-test shows p-value smaller than 0.05, and histogram of residuals also shows the shift. Thus, data is not normally distributed. We can conduct non-parametric test to test the hypothesis again. One suitable test for our case is Kruskal-Wallis test.
H0: the medians of all groups are equal.
H1: at least one population median of one group is different from the population median of at least one other group.
kruskal.test(agea ~ lrscale, data = serbiana2_gr)
##
## Kruskal-Wallis rank sum test
##
## data: agea by lrscale
## Kruskal-Wallis chi-squared = 3.4269, df = 2, p-value = 0.1802
This test gives us a p-value of 0.18, which confirms our previous finding.
ggbetweenstats(data = serbiana2_gr, y = agea, x = lrscale, var.equal = F)
P-value is too big, the effect size is too small. Thus, we cannot reject null hypotheses and argue that age of people that occupy distinct positions on right-left scale differs in any way. Consequently, difference in age of those on the left can be explained by random data variation.
Initially we had 5 hypotheses:
Tests on all of them, except the last one, show statistically significant results. It means that we confirm the previous findings of the researchers, whose paper we mentioned, using new data.
Our last test was based on the assumption which we had after making a graph in the first project. We used scatter plot and geom_jitter. It had points, which have the same positions, drifted away from each other, resulting in the misrepresentation of data. It was interesting to use statistics to check this hypothesis and correct the flaw.
library(foreign)
setwd("F:/ARRR")
ESS <- read.spss("ESS11.sav", use.value.labels = F, to.data.frame = T)
library(ggplot2)
library(dplyr)
library(DescTools)
library(formattable)
library(psych)
library(ggpubr)
library(car)
library(sjstats)
library(magrittr)
library(knitr)
library(kableExtra)
library(ggstatsplot)
library(corrplot)
library(sjPlot)
library(rstatix)
library(ggcorrplot)
library(GGally)
serbia <- ESS %>%
filter(cntry == "RS")
serbia1 <- select(serbia, gndr, polintr, agea, hinctnta, stfgov, edlvdrs, stfhlth)
serbia1 <- serbia1 %>%
filter(serbia1$edlvdrs < 19)
serbia1$edlvdrs1 <- as.factor(ifelse(serbia1$edlvdrs %in% c(1, 2, 3), "Primary",
ifelse(serbia1$edlvdrs %in% c(4, 5, 6, 7, 8, 9, 10), "Secondary", "Higher")))
serbia1$edlvdrs1 <- factor(serbia1$edlvdrs1, levels = c("Primary",
"Secondary",
"Higher"))
serbia1$gndr <- as.factor(serbia1$gndr)
serbia1$gndr <- dplyr::recode(serbia1$gndr,
"1" = "Male",
"2" = "Female")
serbia1$polintr <- as.factor(serbia1$polintr)
serbia1$polintr <- dplyr::recode(serbia1$polintr,
"1" = "Very",
"2" = "Quite",
"3" = "Hardly",
"4" = "Not at all")
serbia1$polintr <- factor(serbia1$polintr, levels = c("Not at all","Hardly","Quite","Very"))
In this project we will explore the following research question:
Which factors can explain one’s satisfaction with government in Serbia?
This topic is connected with our previous one via a general theme of politics. Variables we use as potential factors of influence on government satisfaction (stfgov) are: - agea - age of respondent, calculated - hinctnta - household’s total net income, all sources - stfhlth - satisfaction with Serbian health system - edlvdrs - level education (we have grouped it into three categories of Primary, Secondary, and Higher education, according to Sebian system of education) - gndr - gender - polintr - interest in politics
The first three variables and the outcome one (satisfaction with government) are used as numeric in this project: age is initially continuous, income, satisfaction with health system and satisfaction with government are quasi-interval, which were made interval for the sake of the project. There are not a lot of continuous variables in ESS11 fitting our topic, so this was our solution.
Our main reference this time is a study discovering the main factors affecting trust in government in Norway by Tom Christensen & Per Lægreid.Although we study satisfaction with government, we thought that these notions can be used interchangeably, as the meanings are close. In the authors’ terms, trust in government is understood as the trust in parliament, the cabinet, the civil service, local councils, political parties, and politicians, whereas trust is indicated via people’s satisfaction with specific public services. Their research provides a solid justification for some of our variables, namely: agea (older people generally have more trust in government), hinctnta (those approving of a country’s healthcare tend to approve the government) and edlvdrs (higher educated people are more critical of the government, resulting in less satisfaction level of this).
It was mentioned in a study that there is no relationship between gender and trust in government, but we want to check if there is any association in our data without presuming the direction of the relationship. Interest in politics was added because we were inspired by our previous project. We want to check if it has any association with our outcome variable, as the papers on this pair of variables were not found. We presume a reverse relationship, where the more one is interested in politics, the more they are dissatisfied.
The second source that we use is a dynamic study of family income and satisfaction (both political and in general) in China for several years by (Zheng Yu et al). Their final conclusion is that average family income had positive impacts on satisfaction with government at county level. Following their intuition, we include hinctnta in our own model as well.
Overall, our research hypotheses are as follows:
Age distribution seems to be not normally distributed with heavy left tail and a lighter right tail - that is, there are more very young people than very old among the respondents. Also, the most frequent respondent is of age 57. Mean and median are denoted by blue and red lines on the plot and constitute almost the same value - 52.5 and 53, respectively.
serbia$agea <- as.numeric(as.character(serbia$agea))
ggplot(serbia, aes(x = agea)) +
geom_histogram(binwidth = 1, fill = "grey80", color = "black") +
geom_vline(xintercept = mean(serbia$agea, na.rm = TRUE), col = 'blue', lwd = 1) +
geom_vline(xintercept = median(serbia$agea, na.rm = TRUE), col = 'red', lwd = 1) +
geom_vline(xintercept = Mode(serbia$agea), col = 'purple', lwd = 1) +
theme_bw() +
labs(x = "Age", y = "Frequency",
title = "Age distribution in Serbia",
caption = "Source: ESS 11 Round") +
theme(plot.title = element_text(face = "bold", size = 20, vjust = 0.5),
panel.grid.minor = element_line(color = "grey")) +
xlim(0,100)
median(as.numeric(serbia$agea), na.rm = T)
## [1] 53
mean(as.numeric(serbia$agea), na.rm = T)
## [1] 52.46598
Mode(as.numeric(serbia$agea), na.rm = T)
## [1] 71
## attr(,"freq")
## [1] 49
sd(as.numeric(serbia$agea), na.rm = T)
## [1] 18.10798
tabll_1 <- data.frame(mean = round(mean(as.numeric(serbia1$agea), na.rm = T), 2), median = median(as.numeric(serbia1$agea), na.rm = T), mode = Mode(as.numeric(serbia1$agea), na.rm = T), SD = round(sd(as.numeric(serbia$agea), na.rm = T), 2))
formattable(tabll_1)
mean | median | mode | SD |
---|---|---|---|
52.55 | 53.5 | 71 | 18.11 |
Next three variables are quasi-interval, which we interpret in a numeric form.
There are more of those who are completely dissatisfied with National Government in Serbia - that is the mode. There is also an outstanding bin in the middle (in the questionnaire denoted by answer “5”), that was probably chosen by those undecided. Mean and median here are near each other with 4.4 and 5, respectively.
ggplot(serbia1) +
geom_histogram(aes(x = as.numeric(serbia1$stfgov)), na.rm = T, binwidth = 1, color = "black") +
geom_vline(aes(xintercept = mean(as.numeric(serbia1$stfgov),na.rm = T)), col = 'blue', lwd = 1) +
geom_vline(aes(xintercept = median(as.numeric(serbia1$stfgov), na.rm = T)), col = 'red', lwd = 1) +
geom_vline(aes(xintercept = Mode(as.numeric(serbia1$stfgov), na.rm = T)), col = 'purple', lwd = 1) +
theme_bw() +
labs(x = "Amount of Satisfaction", y = "frequency",
title = "Satisfaction with National Government in Serbia",
caption = "Source: ESS 11 Round") +
theme(plot.title = element_text(face = "bold", size = 20, vjust = 0.5),
panel.grid.minor = element_line(color = "grey"))
tabll_2 <- data.frame(mean = round(mean(as.numeric(serbia1$stfgov), na.rm = T), 2), median = median(as.numeric(serbia1$stfgov), na.rm = T), mode = Mode(as.numeric(serbia1$stfgov), na.rm = T), SD = round(sd(as.numeric(serbia$stfgov), na.rm = T), 2))
formattable(tabll_2)
mean | median | mode | SD |
---|---|---|---|
4.44 | 5 | 0 | 3.2 |
median(as.numeric(serbia$stfgov), na.rm = T)
## [1] 5
mean(as.numeric(serbia$stfgov), na.rm = T)
## [1] 4.424221
Mode(as.numeric(serbia$stfgov), na.rm = T)
## [1] 0
## attr(,"freq")
## [1] 252
sd(as.numeric(serbia$stfgov), na.rm = T)
## [1] 3.201773
As for satisfaction with the health system, here the left part of the distribution seems to be considerably heavier - that is, more respondents are rather dissatisfied with health system in Serbia. However, the most frequent answer is in the middle, as well as in the previous plot here are many supposedly undecided. Mean level of satisfaction is 4.2 and the median - 4.
ggplot(serbia1) +
geom_histogram(aes(x = as.numeric(serbia1$stfhlth)), na.rm = T, binwidth = 1, color = "black") +
geom_vline(aes(xintercept = mean(as.numeric(serbia1$stfhlth),na.rm = T)), col = 'blue', lwd = 1) +
geom_vline(aes(xintercept = median(as.numeric(serbia1$stfhlth), na.rm = T)), col = 'red', lwd = 1) +
geom_vline(aes(xintercept = Mode(as.numeric(serbia1$stfhlth), na.rm = T)), col = 'purple', lwd = 1) +
theme_bw() +
labs(x = "Amount of Satisfaction", y = "frequency",
title = "Satisfaction with Health System in Serbia",
caption = "Source: ESS 11 Round") +
theme(plot.title = element_text(face = "bold", size = 20, vjust = 0.5),
panel.grid.minor = element_line(color = "grey"))
tabll_3 <- data.frame(mean = round(mean(as.numeric(serbia1$stfhlth), na.rm = T), 2), median = median(as.numeric(serbia1$stfhlth), na.rm = T), mode = Mode(as.numeric(serbia1$stfhlth), na.rm = T), SD = round(sd(as.numeric(serbia$stfhlth), na.rm = T), 2))
formattable(tabll_3)
mean | median | mode | SD |
---|---|---|---|
4.25 | 4 | 5 | 2.85 |
median(as.numeric(serbia$stfhlth), na.rm = T)
## [1] 4
mean(as.numeric(serbia$stfhlth), na.rm = T)
## [1] 4.24722
Mode(as.numeric(serbia$stfhlth), na.rm = T)
## [1] 5
## attr(,"freq")
## [1] 241
sd(as.numeric(serbia$stfhlth), na.rm = T)
## [1] 2.851076
As the histogram of net-income shows, most of the Serbians are located in 7-th and 6-th deciles which is moderate but closer to high level of income. Followed by the 5-th and 4-th deciles by frequency, which are moderate but closer to low level of income. Also, there are significant amounts of 8-th, 9-th and 10-th deciles, which constitute the rich and the richest population of Serbia. The mean level of net-income is almost 6, as well as median.
ggplot(serbia1) +
geom_histogram(aes(x = as.numeric(serbia1$hinctnta)), na.rm = T, binwidth = 1, color = "black") +
geom_vline(aes(xintercept = mean(as.numeric(serbia1$hinctnta), na.rm = T)), col = 'blue', lwd = 1) +
geom_vline(aes(xintercept = median(as.numeric(serbia1$hinctnta), na.rm = T)), col = 'red', lwd = 1) +
geom_vline(aes(xintercept = Mode(as.numeric(serbia1$hinctnta), na.rm = T)), col = 'purple', lwd = 1) +
theme_bw() +
labs(x = "Income", y = "frequency",
title = "Net-Income Distribution in Serbia",
caption = "Source: ESS 11 Round") +
theme(plot.title = element_text(face = "bold", size = 20, vjust = 0.5),
panel.grid.minor = element_line(color = "grey"))
tabll_4 <- data.frame(mean = round(mean(as.numeric(serbia1$hinctnta), na.rm = T), 2), median = median(as.numeric(serbia1$hinctnta), na.rm = T), mode = Mode(as.numeric(serbia1$hinctnta), na.rm = T), SD = round(sd(as.numeric(serbia$hinctnta), na.rm = T), 2))
formattable(tabll_4)
mean | median | mode | SD |
---|---|---|---|
5.77 | 6 | 7 | 2.63 |
median(as.numeric(serbia$hinctnta), na.rm = T)
## [1] 6
mean(as.numeric(serbia$hinctnta), na.rm = T)
## [1] 5.771272
Mode(as.numeric(serbia$hinctnta), na.rm = T)
## [1] 7
## attr(,"freq")
## [1] 141
sd(as.numeric(serbia$hinctnta), na.rm = T)
## [1] 2.631316
Among the respondents from Serbia there are more females than males, as it is quite common thing in surveys.
ggplot(serbia1) +
geom_bar(aes(x = serbia1$gndr, na.rm = T)) +
scale_x_discrete(limits = c('Male', 'Female')) +
labs(title = "Gender of Respondents from Serbia",
x = NULL, y = NULL,
caption = "Source: ESS 11 Round") +
theme(plot.title = element_text(face = "bold", size = 20, vjust = 0.5),
panel.grid.minor = element_line(color = "grey"))
Mode(as.factor(serbia1$gndr), na.rm = T)
## [1] Female
## attr(,"freq")
## [1] 824
## Levels: Male Female
According to Serbian system of education, out of initial 18 categories we constructed these three: Primary, Secondary and Higher education. Bar plot of levels of education shows many people with higher and very not many of those with secondary or primary education as the last completed stage.
ggplot(serbia1) +
geom_bar(aes(x = serbia1$edlvdrs1, na.rm = T)) +
scale_x_discrete(limits = c('Primary', 'Secondary', 'Higher')) +
labs(title = "Education Levels of Respondents from Serbia",
x = NULL, y = NULL,
caption = "Source: ESS 11 Round") +
theme(plot.title = element_text(face = "bold", size = 20, vjust = 0.5),
panel.grid.minor = element_line(color = "grey"))
Mode(as.factor(serbia1$edlvdrs1), na.rm = T)
## [1] Secondary
## attr(,"freq")
## [1] 884
## Levels: Primary Secondary Higher
As bar plot on interest in politics shows, most Serbian people among respondents are hardly interested in politics or are not interested at all. While lesser amounts are quite interested and even less are very interested in politics.
ggplot(serbia1) +
geom_bar(aes(x = serbia1$polintr, na.rm = TRUE)) +
labs(x = NULL, y = NULL,
title = "Interest in Politics",
caption = "Source: ESS 11 Round") +
theme(plot.title = element_text(face = "bold", size = 20, vjust = 0.5),
panel.grid.minor = element_line(color = "grey"))
Mode(as.factor(serbia1$polintr), na.rm = T)
## [1] Hardly
## attr(,"freq")
## [1] 616
## Levels: Not at all Hardly Quite Very
We have constructed a correlation table to see possible colinearities and to determine, which numeric variables to include into our model. We use Spearman’s correlation because our data is too abnormally distributed for parametric correlation.
corr <- cor(serbia1[, c(3, 4, 5, 7)], method = "spearman", use = "complete.obs") # Compute a correlation matrix
p.mat <- cor_pmat(serbia1[, c(3, 4, 5, 7)]) # Compute a matrix of correlation p-values
ggcorrplot(corr, hc.order = TRUE, type = "lower", p.mat = p.mat,
insig = "blank",
lab = TRUE,
ggtheme = ggplot2::theme_gray,
colors = c("#6D9EC1", "white", "#E46726"))
We can see that both Satisfaction with health system and Age have a positive moderate correlation with our outcome variable - Satisfaction with government.
Correlation with Household income is also present, but it is a weak one: < 0.3. Moreover, there is a moderate correlation between Household income and Age, which can probably be explained by capital accumulation. This possible colinearity and weak correlation with outcome variable led to our decision to not to include Household income as a factor into our models.
Now we can visualise the correlations
ggpairs(serbia1[, c(5, 3)],
columnLabels = c("Government Satisfaction", "Age"),
upper = list(continuous = wrap("cor", size = 8, col = "black")))
ggplot(serbia1, aes(agea, stfgov)) +
geom_jitter() +
geom_smooth() +
labs(x = "Age", y = "Government Satisfaction",
title = "Government Satisfaction and Age",
caption = "Source: ESS 11 Round")
ggpairs(serbia1[, c(5, 7)],
columnLabels = c("Government Satisfaction", "Health Service Satisfaction"),
upper = list(continuous = wrap("cor", size = 8, col = "black")))
serbia1_counts <- serbia1 %>%
count(stfhlth, stfgov)
ggscatter(serbia1_counts, x = "stfhlth", y = "stfgov",
size = "n",
add = "reg.line",
cor.coef = TRUE,
cor.method = "spearman",
ylim = c(0, 10),
alpha = 0.6,
xlab = "Health system satisfaction",
ylab = "Government Satisfaction",
title = "Government Satisfaction and Health system satisfaction",
caption = "Source: ESS 11 Round") +
scale_size_continuous(range = c(1, 10)) +
theme_minimal()
ggpairs(serbia1[, c(5, 4)],
columnLabels = c("Government Satisfaction", "Household income"),
upper = list(continuous = wrap("cor", size = 8, col = "black")))
ggplot(serbia1, aes(hinctnta, stfgov)) +
geom_jitter(alpha = 0.6) +
geom_smooth() +
labs(x = "Houshold income", y = "Government Satisfaction",
title = "Government Satisfaction and Houshold income",
caption = "Source: ESS 11 Round")
Now we can look at categorical variables and their connection with Satisfaction with government. We have chosen Level of education, which were grouped by ourselves into three categories of Primary, Secondary, and Higher; Gender, and Political interest.
boxplot(serbia1$stfgov ~ serbia1$edlvdrs1, main = "Satisfaction with government by level of education")
boxplot(serbia1$stfgov ~ serbia1$gndr, main = "Satisfaction with government by gender")
boxplot(serbia1$stfgov ~ serbia1$polintr, main = "Satisfaction with government by interest in politics")
Plots show a, at the first glance, significant difference between satisfaction with government among people with different levels of educations. The median values for the three boxes are significantly different, such that the “Secondary” box’s 3rd quartile border is at the same level as the “Primary” median line, and “Secondary”’s ovn median is at the same level as the “Higher”’s Q3.
Plot with Gender and Interest in politics do not depict potential differences. Both medians and interquantile ranges are the same, hinting that the two groups show no differences in terms of satisfaction with government. This could serve as a solid reason for excluding the variable from the model.
The plot of interest in politics and government satisfaction reveals that the four interest levels have approximately the same median satisfaction levels, however their interquntile range differs. The “Very” (interested) level stands out in terms of IQR, being significantly wider than the other ones. This means that people deeply interested in poltics have more diversed attitudes towards the government than those not that immersed in this sphere of interest.
Despite some of our conclusions made based on plots, we decided to include all of these variables into models to explore their potential explanatory power.
We have decided to include as predictors the following continuous variables in our models: Age (agea), Satisfaction of Health Service (stfhlth), because they had moderate correlations with our outcome variable - Satisfaction of Government (stfgov). We decided also to include categorical variables as Gender (gndr), Level of Education (edlvdrs1) and Political Interest (polintr).
serbia1 <- serbia1[!is.na(serbia1$polintr), ]
m1 <- lm(stfgov ~ agea + stfhlth, data = serbia1)
summary(m1)
##
## Call:
## lm(formula = stfgov ~ agea + stfhlth, data = serbia1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.9527 -2.0166 -0.0729 1.9193 8.9768
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.168819 0.241065 -0.70 0.484
## agea 0.049667 0.004111 12.08 <2e-16 ***
## stfhlth 0.468017 0.025675 18.23 <2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.717 on 1402 degrees of freedom
## (144 пропущенных наблюдений удалены)
## Multiple R-squared: 0.2802, Adjusted R-squared: 0.2792
## F-statistic: 272.9 on 2 and 1402 DF, p-value: < 2.2e-16
The first model has two predictors - age and satisfaction with health services, and it explains 0.2802 (R^2) of our data, also it is significant because p-value is below the 0.5 threshold. We can see that the higher age the higher the satisfaction with government, it rises with 0.05 per year, the same we can see with health service satisfaction, it rises with 0.56 per 1 point of scale.
m2 <- lm(stfgov ~ agea + edlvdrs1 + stfhlth, data = serbia1)
summary(m2)
##
## Call:
## lm(formula = stfgov ~ agea + edlvdrs1 + stfhlth, data = serbia1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.620 -1.905 -0.026 1.856 8.376
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.319923 0.332468 3.970 7.55e-05 ***
## agea 0.043443 0.004127 10.526 < 2e-16 ***
## edlvdrs1Secondary -0.955330 0.212611 -4.493 7.59e-06 ***
## edlvdrs1Higher -1.769324 0.238851 -7.408 2.21e-13 ***
## stfhlth 0.439818 0.025467 17.270 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.666 on 1400 degrees of freedom
## (144 пропущенных наблюдений удалены)
## Multiple R-squared: 0.3083, Adjusted R-squared: 0.3063
## F-statistic: 156 on 4 and 1400 DF, p-value: < 2.2e-16
The second model includes levels of education. It is also a significant predictor, we can see that government satisfaction decreases by 1 point per level of education (Primary, Secondary, Higher). Also R^2 is increased, now it is 0.3083.
m3 <- lm(stfgov ~ agea + stfhlth + edlvdrs1 + gndr, data = serbia1)
summary(m3)
##
## Call:
## lm(formula = stfgov ~ agea + stfhlth + edlvdrs1 + gndr, data = serbia1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.7350 -1.9320 0.0043 1.8503 8.2855
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 1.167205 0.350008 3.335 0.000876 ***
## agea 0.043490 0.004126 10.541 < 2e-16 ***
## stfhlth 0.442726 0.025545 17.332 < 2e-16 ***
## edlvdrs1Secondary -0.914347 0.214571 -4.261 2.17e-05 ***
## edlvdrs1Higher -1.734965 0.240044 -7.228 8.05e-13 ***
## gndrFemale 0.200466 0.144058 1.392 0.164275
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.665 on 1399 degrees of freedom
## (144 пропущенных наблюдений удалены)
## Multiple R-squared: 0.3093, Adjusted R-squared: 0.3068
## F-statistic: 125.3 on 5 and 1399 DF, p-value: < 2.2e-16
In the third model we added gender as a predictor, but as we can see it is non-significant, which was previosly evident by the boxplot.
m4 <- lm(stfgov ~ agea + edlvdrs1 + stfhlth + gndr + polintr, data = serbia1)
summary(m4)
##
## Call:
## lm(formula = stfgov ~ agea + edlvdrs1 + stfhlth + gndr + polintr,
## data = serbia1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -6.5717 -2.0072 -0.0075 1.8321 8.4989
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.987762 0.356571 2.770 0.00568 **
## agea 0.041809 0.004189 9.980 < 2e-16 ***
## edlvdrs1Secondary -0.964273 0.214903 -4.487 7.82e-06 ***
## edlvdrs1Higher -1.827657 0.242736 -7.529 9.11e-14 ***
## stfhlth 0.438764 0.025536 17.182 < 2e-16 ***
## gndrFemale 0.265115 0.146021 1.816 0.06965 .
## polintrHardly 0.424833 0.168628 2.519 0.01187 *
## polintrQuite 0.525443 0.215768 2.435 0.01501 *
## polintrVery 0.417138 0.280969 1.485 0.13786
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 2.659 on 1396 degrees of freedom
## (144 пропущенных наблюдений удалены)
## Multiple R-squared: 0.3135, Adjusted R-squared: 0.3095
## F-statistic: 79.68 on 8 and 1396 DF, p-value: < 2.2e-16
In the fourth model we added Political Interest, but it is slightly significant. We see that connection is not linear, people who are quite-interested in politics are more satisfied than people who are very interested. The only strong significant variables are age, satisfaction with health services and level of education. Their coefficients have changed non-significantly from the previous model. But R^2 increases to 0.3135, it means that this model explains about a one-third of our data.
tab_model(m1, m2, m3, m4, show.ci = F)
stfgov | stfgov | stfgov | stfgov | |||||
---|---|---|---|---|---|---|---|---|
Predictors | Estimates | p | Estimates | p | Estimates | p | Estimates | p |
(Intercept) | -0.17 | 0.484 | 1.32 | <0.001 | 1.17 | 0.001 | 0.99 | 0.006 |
agea | 0.05 | <0.001 | 0.04 | <0.001 | 0.04 | <0.001 | 0.04 | <0.001 |
stfhlth | 0.47 | <0.001 | 0.44 | <0.001 | 0.44 | <0.001 | 0.44 | <0.001 |
edlvdrs1 [Secondary] | -0.96 | <0.001 | -0.91 | <0.001 | -0.96 | <0.001 | ||
edlvdrs1 [Higher] | -1.77 | <0.001 | -1.73 | <0.001 | -1.83 | <0.001 | ||
gndr [Female] | 0.20 | 0.164 | 0.27 | 0.070 | ||||
polintr [Hardly] | 0.42 | 0.012 | ||||||
polintr [Quite] | 0.53 | 0.015 | ||||||
polintr [Very] | 0.42 | 0.138 | ||||||
Observations | 1405 | 1405 | 1405 | 1405 | ||||
R2 / R2 adjusted | 0.280 / 0.279 | 0.308 / 0.306 | 0.309 / 0.307 | 0.313 / 0.310 |
We can see that with adding new variables R and R^2 is increasing, that means that explanatory power of model is growing too. But this effect could be caused by just growing number of variables, to find the best model we need to test it by ANOVA.
plot_models(m1, m2, m3, m4) + scale_y_continuous(limits = c(-1.9, 1.9))
On the plot above we can just see the visualization of comparison of estimates in different models.
Now we can find out which model is better. Number of observations in models should be equal, that’s why we have deleted NA’s in Political Interest variable (polintr), because of them number of observations in last model was higher by 3.
anova(m1, m2, m3, m4)
## Analysis of Variance Table
##
## Model 1: stfgov ~ agea + stfhlth
## Model 2: stfgov ~ agea + edlvdrs1 + stfhlth
## Model 3: stfgov ~ agea + stfhlth + edlvdrs1 + gndr
## Model 4: stfgov ~ agea + edlvdrs1 + stfhlth + gndr + polintr
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 1402 10351.5
## 2 1400 9947.7 2 403.82 28.5470 7.063e-13 ***
## 3 1399 9933.9 1 13.75 1.9441 0.16345
## 4 1396 9873.6 3 60.27 2.8403 0.03677 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Here we can see that the smallest RSS is in model 4, and the p-value shows that the results are statistically significant (0.03), so our 4th model is the best and we would analyse it further.
Now when we’ve chosen the model to continue to work with, it’d be useful to describe it in detail.
The fourth model explains 31.4% of data, and the intercept value is 0.988.
Regarding the predictors: age, education, healthcare are clearly significant with p < 0.001, while political interest has mixed effects with only two it’s values (“Quite” and “Hardly”) being statistically significant.
Let’s look through the significant estimate’s coefficients to get a deeper understanding of the model. If the variable has several levels, which are not hohogenous in terms of statistical signifficance, we only include those levels which are significant accroding to the model.
Age (agea): +0.042 – Each additional year of age increases satisfaction by 0.042 points.
Education (edlvdrs1)
Secondary: -0.964 Higher: -1.828
– People with secondary education are 0.96 points less satisfied than those with primary education. – People with higher education are 1.83 points less satisfied than those with primary education.
Healthcare Satisfaction (stfhlth): +0.439 – A 1-point increase in healthcare satisfaction leads to a 0.44-point increase in government satisfaction.
Gender (gndrFemale): +0.265 – Women are 0.27 points more satisfied than men, but the difference is borderline significant (p = 0.07)
Political Interest (polintr)
“Hardly” Interested: +0.425 “Quite” Interested: +0.525
– People who are “Hardly” politically interested are 0.43 points more satisfied than those “Not at all interested”. – People who are “Quite” politically interested are 0.53 points more satisfied than the reference group.
plot_model(m4, type = "pred")
## $agea
##
## $edlvdrs1
##
## $stfhlth
##
## $gndr
##
## $polintr
We’ve decided to graph our model to observe the effects of variables visually. These plots reveal some useful tendencies regarding our model:
– The higher the age, the higher the satisfaction with government; – The higher the education, the less one is satisfied with government; – The higher the satisfaction with healthcare system, the higher the satisfaction with government; – Females are genarally more satisfied with government than males,though the difference is insignificant. – Those, who are not all interested in politics are less satisfied with government than those with higher leves of interest. The 3 other levels are positioned approximately on the same level of satisfaction, but those “quite” interested in politics are generally the most satisfied among others.
\[ stfgov = 0.98 +0.04*agea +0,44*stfhlth -0.96*edlvdrs1 [Secondary] -1.77*edlvdrs1 [Higher] +0.42*polintr [Hardly] +0.52*polintr [Quite] \]
tab_model(m4, show.ci = F)
stfgov | ||
---|---|---|
Predictors | Estimates | p |
(Intercept) | 0.99 | 0.006 |
agea | 0.04 | <0.001 |
edlvdrs1 [Secondary] | -0.96 | <0.001 |
edlvdrs1 [Higher] | -1.83 | <0.001 |
stfhlth | 0.44 | <0.001 |
gndr [Female] | 0.27 | 0.070 |
polintr [Hardly] | 0.42 | 0.012 |
polintr [Quite] | 0.53 | 0.015 |
polintr [Very] | 0.42 | 0.138 |
Observations | 1405 | |
R2 / R2 adjusted | 0.313 / 0.310 |
Our model explains about a third (0.3135) of our data, and find connection of our outcome variable (stfgov) with 4 significant variables: Age (agea), Level of Education (edlvdrs1), Satisfaction with Health Services (stfhlth) and Interest in politics (polintr, [Quite] and [Quite]). Exept for polintr and [Very], all of their p-values are less than 0.05 threshold.
Level of Education appears to be our best predictor with it’s values being the two with the highiest estimates in the model: edlvdrs1 [Secondary] -0.96, edlvdrs1 [Higher] -1.83. With both estimates being negative, we once again assure that people with higher education levels tend to be less satisfied with government than less educated people.
We had the following research question:
Which factors can explain one’s satisfaction with the government in Serbia?
According to our regression models, the significant factors of influence are: age, level of education, satisfaction with the health services in the country and interest in politics.
We also had four hypotheses: 1) Оlder people are more satisfied with government than younger people; 2) Citizens satisfied with healthcare system are more satisfied with governmental system than those who are not; 3) Higher educated citizens are less satisfied with the governmental system than those with lower educational level; 4) Citizens with higher household net income are more satisfied with the government than those with lower income. 5) Those more interested in politics have a lower satisfaction with government than those who are less interested.
Hypotheses 1, 2, and 3 were supported by our findings.
Christensen, T., & Lægreid, P. (2005). Trust in Government: The Relative Importance of Service Satisfaction, Political Factors, and Demography. Public Performance & Management Review, 28(4), 487–511. http://www.jstor.org/stable/3381308
Z. Yu, W. Bo and L. Shu, “The dynamic relationship between satisfaction with local government, family income, and life satisfaction in China: A 6-year perspective,” 2011 International Conference on Management Science & Engineering 18th Annual Conference Proceedings, Rome, Italy, 2011, pp. 1207-1214
library(foreign)
ESS <- read.spss("F:/ARRR/ESS11.sav", use.value.labels = T, to.data.frame = T)
library(ggplot2)
library(dplyr)
library(DescTools)
library(formattable)
library(psych)
library(ggpubr)
library(car)
library(sjstats)
library(magrittr)
library(knitr)
library(kableExtra)
library(ggstatsplot)
library(corrplot)
library(sjPlot)
library(rstatix)
library(ggcorrplot)
library(GGally)
serbia <- ESS %>%
filter(cntry == "Serbia")
serbia1 <- select(serbia, gndr, agea, stfgov, stfhlth, edlvdrs, rlgdgr, lrscale, trstprl)
serbia1$agea <- as.numeric(serbia1$agea)
serbia1$stfgov <- as.numeric(serbia1$stfgov)
serbia1$edlvdrs <- as.numeric(serbia1$edlvdrs)
serbia1$rlgdgr <- as.numeric(serbia1$rlgdgr)
serbia1$stfhlth <- as.numeric(serbia1$stfhlth)
serbia1$lrscale <- as.numeric(serbia1$lrscale)
serbia1$trstprl <- as.numeric(serbia1$trstprl)
For this project we will construct a regression model with political satisfaction as an outcome variable and use the following factors:
age,
level of education,
satisfaction with health,
religiosity.
We will also include an interaction between religiosity and gender, as the existing literature suggests a connection between these factors and satisfaction with government
ggplot(serbia1) +
geom_histogram(aes(x = as.numeric(as.character(serbia1$agea))), na.rm = T) +
geom_vline(aes(xintercept = mean(as.numeric(as.character(serbia1$agea)),na.rm = T)), col = 'blue', lwd = 1) +
geom_vline(aes(xintercept = median(as.numeric(as.character(serbia1$agea)), na.rm = T)), col = 'red', lwd = 1) +
geom_vline(aes(xintercept = Mode(as.numeric(as.character(serbia1$agea)), na.rm = T)), col = 'purple', lwd = 1) +
theme_bw() +
labs(x = "Age", y = "frequency",
title = "Age Distribution in Serbia",
caption = "Source: ESS 11 Round") +
theme(plot.title = element_text(face = "bold", size = 20, vjust = 0.5),
panel.grid.minor = element_line(color = "grey"))
tabll_1 <- data.frame(mean = round(mean(as.numeric(serbia1$agea), na.rm = T), 2), median = median(as.numeric(serbia1$agea), na.rm = T), mode = Mode(as.numeric(serbia1$agea), na.rm = T), sd = round(sd(serbia1$agea, na.rm = T), 2))
formattable(tabll_1)
mean | median | mode | sd |
---|---|---|---|
38.47 | 39 | 57 | 18.11 |
Median age is 39 in Serbia, while the most frequent age is 57.
ggplot(serbia1) +
geom_histogram(aes(x = as.numeric(serbia1$stfgov)), na.rm = T, binwidth = 1, color = "black") +
geom_vline(aes(xintercept = mean(as.numeric(serbia1$stfgov),na.rm = T)), col = 'blue', lwd = 1) +
geom_vline(aes(xintercept = median(as.numeric(serbia1$stfgov), na.rm = T)), col = 'red', lwd = 1) +
geom_vline(aes(xintercept = Mode(as.numeric(serbia1$stfgov), na.rm = T)), col = 'purple', lwd = 1) +
theme_bw() +
labs(x = "Amount of Satisfaction", y = "frequency",
title = "Satisfaction with National Government in Serbia",
caption = "Source: ESS 11 Round") +
theme(plot.title = element_text(face = "bold", size = 20, vjust = 0.5),
panel.grid.minor = element_line(color = "grey"))
tabll_2 <- data.frame(mean = round(mean(as.numeric(serbia1$stfgov), na.rm = T), 2), median = median(as.numeric(serbia1$stfgov), na.rm = T), mode = Mode(as.numeric(serbia1$stfgov), na.rm = T), sd = round(sd(serbia1$stfgov, na.rm = T), 2))
formattable(tabll_2)
mean | median | mode | sd |
---|---|---|---|
5.42 | 6 | 1 | 3.2 |
Satisfaction with government is moderate in Serbia, median is 6, but it is intersting, that mode here is 1, it can sign that there is a huge amount of people, that are dissatisfied with government.
ggplot(serbia1) +
geom_histogram(aes(x = as.numeric(serbia1$stfhlth)), na.rm = T, binwidth = 1, color = "black") +
geom_vline(aes(xintercept = mean(as.numeric(serbia1$stfhlth),na.rm = T)), col = 'blue', lwd = 1) +
geom_vline(aes(xintercept = median(as.numeric(serbia1$stfhlth), na.rm = T)), col = 'red', lwd = 1) +
geom_vline(aes(xintercept = Mode(as.numeric(serbia1$stfhlth), na.rm = T)), col = 'purple', lwd = 1) +
theme_bw() +
labs(x = "Amount of Satisfaction", y = "frequency",
title = "Satisfaction with Health System in Serbia",
caption = "Source: ESS 11 Round") +
theme(plot.title = element_text(face = "bold", size = 20, vjust = 0.5),
panel.grid.minor = element_line(color = "grey"))
tabll_3 <- data.frame(mean = round(mean(as.numeric(serbia1$stfhlth), na.rm = T), 2), median = median(as.numeric(serbia1$stfhlth), na.rm = T), mode = Mode(as.numeric(serbia1$stfhlth), na.rm = T), sd = round(sd(serbia1$stfhlth, na.rm = T), 2))
formattable(tabll_3)
mean | median | mode | sd |
---|---|---|---|
5.25 | 5 | 6 | 2.85 |
People are not so satisfied with health system, median values is lower than a half (6).
ggplot(serbia1) +
geom_histogram(aes(x = as.numeric(serbia1$rlgdgr)), na.rm = T, binwidth = 1, color = "black") +
geom_vline(aes(xintercept = mean(as.numeric(serbia1$rlgdgr),na.rm = T)), col = 'blue', lwd = 1) +
geom_vline(aes(xintercept = median(as.numeric(serbia1$rlgdgr), na.rm = T)), col = 'red', lwd = 1) +
geom_vline(aes(xintercept = Mode(as.numeric(serbia1$rlgdgr), na.rm = T)), col = 'purple', lwd = 1) +
theme_bw() +
labs(x = "Religiosity", y = "frequency",
title = "Religiosity of people in Serbia",
caption = "Source: ESS 11 Round") +
theme(plot.title = element_text(face = "bold", size = 20, vjust = 0.5),
panel.grid.minor = element_line(color = "grey"))
tabll_5 <- data.frame(mean = round(mean(as.numeric(serbia1$rlgdgr), na.rm = T), 2), median = median(as.numeric(serbia1$rlgdgr), na.rm = T), mode = Mode(as.numeric(serbia1$rlgdgr), na.rm = T), sd = round(sd(serbia1$rlgdgr, na.rm = T), 2))
formattable(tabll_5)
mean | median | mode | sd |
---|---|---|---|
7.23 | 8 | 6 | 2.85 |
We can see that median Serbian citizen is likely religious, but also there is left atheist peak as we can see.
ggplot(serbia1) +
geom_histogram(aes(x = as.numeric(serbia1$trstprl)), na.rm = T, binwidth = 1, color = "black") +
geom_vline(aes(xintercept = mean(as.numeric(serbia1$trstprl),na.rm = T)), col = 'blue', lwd = 1) +
geom_vline(aes(xintercept = median(as.numeric(serbia1$trstprl), na.rm = T)), col = 'red', lwd = 1) +
geom_vline(aes(xintercept = Mode(as.numeric(serbia1$trstprl), na.rm = T)), col = 'purple', lwd = 1) +
theme_bw() +
labs(x = "Trust in parliament", y = "frequency",
title = "Trustment in parliament in Serbia",
caption = "Source: ESS 11 Round") +
theme(plot.title = element_text(face = "bold", size = 20, vjust = 0.5),
panel.grid.minor = element_line(color = "grey"))
tabll_6 <- data.frame(mean = round(mean(as.numeric(serbia1$trstprl), na.rm = T), 2), median = median(as.numeric(serbia1$trstprl), na.rm = T), mode = Mode(as.numeric(serbia1$trstprl), na.rm = T), sd = round(sd(serbia1$trstprl, na.rm = T), 2))
formattable(tabll_6)
mean | median | mode | sd |
---|---|---|---|
5.2 | 6 | 1 | 3.1 |
We can see that most of people can’t say they more trust or not, but there is a left peak, we can suggest this is people who don’t like current Serbian government and support opposition.
ggplot(serbia1) +
geom_bar(aes(x = serbia1$gndr, na.rm = T)) +
scale_x_discrete(limits = c('Male', 'Female')) +
labs(title = "Gender of Respondents from Serbia",
x = NULL, y = NULL,
caption = "Source: ESS 11 Round") +
theme(plot.title = element_text(face = "bold", size = 20, vjust = 0.5),
panel.grid.minor = element_line(color = "grey"))
tab1 <- data.frame(Mode = Mode(serbia1$gndr, na.rm = T))
formattable(tab1)
Mode |
---|
Female |
There are more females in Serbia than males.
serbia1$edlvdrs1 <- as.factor(ifelse(serbia1$edlvdrs %in% c(1, 2, 3), "Primary",
ifelse(serbia1$edlvdrs %in% c(4, 5, 6, 7, 8, 9, 10), "Secondary", "Higher")))
serbia1$edlvdrs1 <- factor(serbia1$edlvdrs1, levels = c("Primary",
"Secondary",
"Higher"))
ggplot(serbia1) +
geom_bar(aes(x = serbia1$edlvdrs1, na.rm = T)) +
scale_x_discrete(limits = c('Primary', 'Secondary', 'Higher')) +
labs(title = "Education Levels of Respondents from Serbia",
x = NULL, y = NULL,
caption = "Source: ESS 11 Round") +
theme(plot.title = element_text(face = "bold", size = 20, vjust = 0.5),
panel.grid.minor = element_line(color = "grey"))
tab2 <- data.frame(Mode = Mode(serbia1$edlvdrs1, na.rm = T))
formattable(tab2)
Mode |
---|
Secondary |
As we can see, more than a half of people have secondary and higher education.
ggplot(serbia1) +
geom_bar(aes(x = as.numeric(serbia1$lrscale), na.rm = T)) +
scale_x_discrete(limits = c('1', '2', '3', '4', '5', '6', '7', '8', '9', '10', '11')) +
labs(x = "1 - Far Left, 11 - Far Right", y = NULL,
title = "Political Orientation Scale",
caption = "Source: ESS 11 Round") +
theme(plot.title = element_text(face = "bold", size = 20, vjust = 0.5),
panel.grid.minor = element_line(color = "grey"))
tabll_4 <- data.frame(mean = round(mean(as.numeric(serbia1$lrscale), na.rm = T), 2), median = median(as.numeric(serbia1$lrscale), na.rm = T), mode = Mode(as.numeric(serbia1$lrscale), na.rm = T), sd = round(sd(as.numeric(serbia1$lrscale), na.rm = T), 2))
formattable(tabll_4)
mean | median | mode | sd |
---|---|---|---|
5.66 | 6 | 6 | 2.56 |
We can see that most of all people are centrists, and prefer not to choose between left and right. Some radicals are also there, on left and right sides.
For our interaction model we want to test the relationship between satisfaction with government and religiosity, moderated by gender.
In the article by Chloe Vaughn (2022) it is stated that political trust is correlated with religious involvement, the more person is apt to participate in religious activities, the higher their rates for political trust.
Except the political trust we will take government satisfaction as a dependent variable in our model. These two are intuitively about the same thing, but for the scientific accuracy let’s check if they are correlated enough.
serbia1 <- serbia1 %>%
filter(serbia1$edlvdrs < 19)
corr <- cor(serbia1[, c(2:8)], method = "spearman", use = "complete.obs")
p.mat <- cor_pmat(serbia1[, c(2:8)])
ggcorrplot(corr, hc.order = TRUE, type = "lower", p.mat = p.mat,
insig = "blank",
lab = TRUE,
ggtheme = ggplot2::theme_gray,
colors = c("#6D9EC1", "white", "#E46726"))
In the ESS dataset the closest variable to political trust is trust in parliament. As we can see from the correlation plot below, correlation coefficient for trust in parliament and government satisfaction is 0.75, which is a very high result. It implies multicollinearity of variables, thus we can apply above-mentioned theory to government satisfaction.
Study made by Julia Zinkina, Marina Butovskaya, Sergey Shulgin, Andrey Korotayev (2024) provides a theoretical evidence for interaction between religiosity and gender. Utilising the data from World Values Survey, researchers find that women tend to attribute higher value to religiosity than men almost in all countries (except the Middle East and North Africa).
Therefore, our hypothesis based on previous findings is that religiosity will correlate positively with satisfaction with government and the effect will be stronger for females as in general they are more religious.
serbia1 <- serbia1 %>%
filter(serbia1$edlvdrs < 19)
serbia1$edlvdrs1 <- as.factor(ifelse(serbia1$edlvdrs %in% c(1, 2, 3), "Primary",
ifelse(serbia1$edlvdrs %in% c(4, 5, 6, 7, 8, 9, 10), "Secondary", "Higher")))
serbia1$edlvdrs1 <- factor(serbia1$edlvdrs1, levels = c("Primary",
"Secondary",
"Higher"))
serbia1$lrscale <- as.factor(serbia1$lrscale)
levels(serbia1$lrscale) <- c("Left", "Left", "Left", "Left", "Central", "Central", "Central", "Right", "Right", "Right", "Right")
m1 <- lm(stfgov ~ agea + edlvdrs1 + stfhlth + rlgdgr + gndr, data = serbia1)
m2 <- lm(stfgov ~ agea + edlvdrs1 + stfhlth + rlgdgr * gndr, data = serbia1)
tab_model(m1, m2, show.ci = F)
stfgov | stfgov | |||
---|---|---|---|---|
Predictors | Estimates | p | Estimates | p |
(Intercept) | 1.10 | 0.003 | 1.46 | <0.001 |
agea | 0.04 | <0.001 | 0.04 | <0.001 |
edlvdrs1 [Secondary] | -0.83 | <0.001 | -0.81 | <0.001 |
edlvdrs1 [Higher] | -1.56 | <0.001 | -1.54 | <0.001 |
stfhlth | 0.43 | <0.001 | 0.43 | <0.001 |
rlgdgr | 0.17 | <0.001 | 0.12 | 0.001 |
gndr [Female] | 0.09 | 0.520 | -0.62 | 0.107 |
rlgdgr × gndr [Female] | 0.10 | 0.046 | ||
Observations | 1393 | 1393 | ||
R2 / R2 adjusted | 0.328 / 0.325 | 0.330 / 0.327 |
We can see, that there is a significant predictors in both models: the higher the age, satisfaction with health and religiosity (in first model) the higher the satisfaction with government. But in the second model with interaction religiosity became insignificant, while gender starts non significantly influence negatively on satisfaction with government - females has lower satisfaction than males. Also we can see that the higher the education, the less people are satisfied with their government, it works for both models. This models explains about a 1/3 of whole data (0.32 and 0.33 1 and 2 models respectfully). In the second model we can see significant interaction between religiosity and gender, we will explore it later.
anova(m1, m2)
## Analysis of Variance Table
##
## Model 1: stfgov ~ agea + edlvdrs1 + stfhlth + rlgdgr + gndr
## Model 2: stfgov ~ agea + edlvdrs1 + stfhlth + rlgdgr * gndr
## Res.Df RSS Df Sum of Sq F Pr(>F)
## 1 1386 9547.0
## 2 1385 9519.7 1 27.294 3.9709 0.04649 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
According to ANOVA second model is significantly better then the first, because it has less RSS and p-values is less than threshold (0.05). Also R^2 of 2nd model is higher.
Then we can explore significant interaction between gender and religiosity.
plot_model(m2, type = "pred", terms = c("rlgdgr", "gndr"), show.data = F)
As we can see religiosity has different influence on satisfaction with government, depending on gender: non-religious females are less satisfied with government, than males, but more religious females are more satisfied with government, than religious males. For males we can’t see significant growth of satisfaction with in creasing religiosity, while females has it.
We find, that religiosity is a good predictor for satisfaction with government. Also our hypothesis that religiosity will correlate positively with satisfaction with government and the effect will be stronger for females as in general they are more religious was approved.
Zinkina J., Butovskaya M., Shulgin S., and Korotayev A. Global Evolutionary Perspectives on Gender Differences in Religiosity, Family, Politics and Pro-Social Values Based on the Data from the World Values Survey. Social Evolution & History, Vol. 23 No. 1, March 2024, pp. 76–105.
Chloe Vaughn. (2022). Faith and Trust: Religion’s Impact on Political Trust. Aletheia, Vol. 7(2).
Petrović, Jelisaveta & Stanojevic, Dragan. (2020). Political Activism in Serbia. Südosteuropa. 68. 365-385.
Stanojević, D., Vukelić, J., & Tomašević, A. (2023). Political Participation of Young People in Serbia: Activities, Values, and Capability. In I. Rivers & C. L. Lovin (Eds.), Young People Shaping Democratic Politics: Interrogating Inclusion, Mobilising Education (pp. 31–53). Springer International Publishing.
Tamara Trošt, Denis Marinšek; Social Class and Ethnocentric Worldviews: Assessing the Effect of Socioeconomic Status on Attitudes in Serbia and Croatia. Communist and Post-Communist Studies 1 June 2022; 55 (2): 39–61.
Scholar database, https://www.scholaro.com/db/countries/Serbia/Education-System
Christensen, T., & Lægreid, P. (2005). Trust in Government: The Relative Importance of Service Satisfaction, Political Factors, and Demography. Public Performance & Management Review, 28(4), 487–511. http://www.jstor.org/stable/3381308
Z. Yu, W. Bo and L. Shu, “The dynamic relationship between satisfaction with local government, family income, and life satisfaction in China: A 6-year perspective,” 2011 International Conference on Management Science & Engineering 18th Annual Conference Proceedings, Rome, Italy, 2011, pp. 1207-1214
Зоркая, Н. (1999). Интерес к политике как форма политического участия. Мониторинг общественного мнения: экономические и социальные перемены, (4), 13-20.