Hello. We are 2BK.Team members are Bakhareva Anastasia, Borisenko Iana, Kireeva Irina, Kuzmicheva Daria. Our topic is “Politics”. The country we have chosen for studying is Ireland.
Our theme includes such aspects as public interest in politics, the level of trust in politicians and satisfaction with the results of policies they introduces, taking part in political actions and much more, which will be presented in our project. We have focused on the results of the survey on Ireland, the total number of respondents is 2757 people. We hope it will be interesting.
There are all the libraries you need to run.
library(readr)
library(dplyr)
library(ggplot2)
library(knitr)
library(kableExtra)
library(sjlabelled)
library(sjmisc)
library(sjstats)
library(ggeffects)
library(sjPlot)
library(psych)
library(tidyverse)
library(magrittr)
library(DescTools)
library(car)
library(cowplot)
library(gridExtra)
library(snakecase)
library(car)
library(stargazer)
library(effects)
library(scales)To reproduce the code successfully, you should download the following data:
ESS = read_csv("D:/Documents/ESS1-8e01.csv")Before analyzing our dataset, we decided to prepare, immersed in the topic of our project. To do this, we scrolled throw the most frequently discussed topics within the framework of this theme and determined that such topics are: trust in politicians, economic indicators within a particular country, the desire and willingness of the population to take part in political life, issues with the LGBT community and its acceptance in the eyes of the public and, in principle, satisfaction with government work. In our analysis, we have tried to touch on all of these interns, clearly demonstrating them on the constructed graphs.
For clarity of data analysis, we have created several analytical charts that could clearly reveal all the patterns for in-depth analysis of data and logical conclusions.
First of all, we would like to take a look at variables we have selected for analysis.
politics1 = ESS %>%
select(polintr, actrolga, cptppola, trstlgl, trstplt, sgnptit, bctprd, lrscale, stflife, stfeco, stfgov, freehms) Label = c("`polintr`", "`actrolga`", "`cptppola`", "`trstlgl`", "`trstplt`", "`sgnptit`", "`bctprd`", "`lrscale`", "`stflife`", "`stfeco`", "`stfgov`", "`freehms`")
Meaning = c("How interested in politics", "Able to take active role in political group", "Confident in own ability to participate in politics", "Trust in the legal system", "Trust in politicians", "Signed petition last 12 months", "Boycotted certain products last 12 months", "Placement on left right scale", "How satisfied with life as a whole", "How satisfied with present state of economy in country", "How satisfied with the national government", "Gays and lesbians free to live life as they wish")
Level_Of_Measurement <- c("Ordinal", "Ordinal", "Interval", "Interval", "Nominal", "Nominal", "Ordinal", "Interval", "Interval", "Interval", "Interval", "Nominal")
df <- data.frame(Label, Meaning, Level_Of_Measurement, stringsAsFactors = FALSE)
kable(df) %>%
kable_styling(bootstrap_options=c("bordered", "responsive","striped"), full_width = FALSE)| Label | Meaning | Level_Of_Measurement |
|---|---|---|
polintr
|
How interested in politics | Ordinal |
actrolga
|
Able to take active role in political group | Ordinal |
cptppola
|
Confident in own ability to participate in politics | Interval |
trstlgl
|
Trust in the legal system | Interval |
trstplt
|
Trust in politicians | Nominal |
sgnptit
|
Signed petition last 12 months | Nominal |
bctprd
|
Boycotted certain products last 12 months | Ordinal |
lrscale
|
Placement on left right scale | Interval |
stflife
|
How satisfied with life as a whole | Interval |
stfeco
|
How satisfied with present state of economy in country | Interval |
stfgov
|
How satisfied with the national government | Interval |
freehms
|
Gays and lesbians free to live life as they wish | Nominal |
As it can be seen, there are both categorical and continuous variables presented in the dataset, so we will be able to go though it for deeper analysis.
Well, for this part only variables of interval type were taken. The result is present in the table below.
Mode <- function(x) {
ux <- unique(x)
ux[which.max(tabulate(match(x, ux)))]
}
#1
politics1$trstplt = as.numeric(as.character(politics1$trstplt))
v.trstplt <- c(mean(politics1$trstplt), Mode(politics1$trstplt), median(politics1$trstplt))
names(v.trstplt) <- c("mean", "mode", "median")
#2
politics1$trstlgl = as.numeric(as.character(politics1$trstlgl))
v.trstlgl <- c(mean(politics1$trstlgl), Mode(politics1$trstlgl), median(politics1$trstlgl))
names(v.trstlgl) <- c("mean", "mode", "median")
#3
politics1$lrscale = as.numeric(as.character(politics1$lrscale))
v.lrscale <- c(mean(politics1$lrscale), Mode(politics1$lrscale), median(politics1$lrscale))
names(v.lrscale) <- c("mean", "mode", "median")
#4
politics1$stflife = as.numeric(as.character(politics1$stflife))
v.stflife <- c(mean(politics1$stflife), Mode(politics1$stflife), median(politics1$stflife))
names(v.stflife) <- c("mean", "mode", "median")
#5
politics1$stfgov = as.numeric(as.character(politics1$stfgov))
v.stfgov <- c(mean(politics1$stfgov), Mode(politics1$stfgov), median(politics1$stfgov))
names(v.stfgov) <- c("mean", "mode", "median")
tendencymeasures_overview = data.frame(v.trstplt, v.trstlgl, v.lrscale, v.stflife, v.stfgov, stringsAsFactors = FALSE)
kable(tendencymeasures_overview) %>%
kable_styling(bootstrap_options=c("bordered", "responsive","striped"), full_width = FALSE)| v.trstplt | v.trstlgl | v.lrscale | v.stflife | v.stfgov | |
|---|---|---|---|---|---|
| mean | 4.850925 | 7.678999 | 19.53827 | 7.393544 | 5.536815 |
| mode | 5.000000 | 7.000000 | 5.00000 | 8.000000 | 5.000000 |
| median | 4.000000 | 6.000000 | 5.00000 | 7.000000 | 5.000000 |
And now it is time for finding out all the answers about politics in Ireland on all the questions you can imagine.
ggplot()+
geom_histogram(data = politics1, aes(x = trstplt), binwidth = 1, fill="#7ee0ff", col="#3a0d0d", alpha = 0.5) +
xlim(c(0, 10)) +
xlab("Trust in politicians") +
ylab("Number of people") +
geom_vline(aes(xintercept = mean(politics1$trstplt), color = 'mean'), linetype="solid", size=1) +
geom_vline(aes(xintercept = median(politics1$trstplt), color = 'median'), linetype="solid", size=1)+
geom_vline(aes(xintercept = Mode(politics1$trstplt), color = 'mode'), linetype="solid",size=1) +
scale_color_manual(name = "Measurement", values = c(median = "#cb3f68", mean = "#824acd", mode = "#339666"))+
ggtitle("The level of trust towards politicians")Conclusion 1. The level of trust to politicians is scewed to the left, so people tend not to trust politicians. Although most frequently people report their trust to be in the middle of the scale, 50% people have defined their level of trust as below the average.
ggplot()+
geom_histogram(data = politics1, aes(x = stfeco), binwidth = 1, fill="#FFB273", col="#FF9640", alpha = 0.5) +
xlim(c(0, 10)) +
xlab("Satisfaction with the economy`s state") +
ylab("Number of people") +
geom_vline(aes(xintercept = mean(politics1$stfeco), color = 'mean'), linetype="solid", size=1) +
geom_vline(aes(xintercept = median(politics1$stfeco), color = 'median'), linetype="solid", size=1)+
geom_vline(aes(xintercept = Mode(politics1$stfeco), color = 'mode'), linetype="longdash",size=1) +
scale_color_manual(name = "Measurement", values = c(median = "#008500", mean = "#A60000", mode = "white"))+
ggtitle("The level of satisfaction with present state of economy ")Conclusion 2. As it can be seen from the graph, people are more or less satisfied with the economy’s state, as both the most frequently reported value and the mean due to the whole pool of answers are equal to 6. However, 50% of respondent`s replies are lower by 1 point.
politics1 = politics1 %>%
filter(cptppola != 8 )%>%
filter(cptppola != 9 )%>%
filter(cptppola != 7 )
politics1$cptppola <- factor(politics1$cptppola, labels = c("Not at all confident", "A little confident", "Quite confident", "Very confident", "Completely confident"), ordered= F)
ggplot() +
geom_bar(data = politics1, aes(x = cptppola), fill="#AD66D5", col="#5F2580", alpha = 0.5) +
xlab("Confident in own ability to participate in politics") +
ylab("Number of people") +
ggtitle("The level of people`s confidence in ability to participate in politics")Conclusion 3. People in Ireland tend to be not confident in their ability to participate in politics.
politics1 = politics1 %>%
filter(freehms != 8 )%>%
filter(freehms != 9 )%>%
filter(freehms != 7 )
#bar2
politics1$freehms <- factor(politics1$freehms, labels = c("Agree strongly", "Agree", "Neither agree nor disagree", "Disagree", "Disagree strongly"), ordered= F)
ggplot() +
geom_bar(data = politics1, aes(x = freehms), fill="#FFE773", col="#A68900", alpha = 0.5) +
xlab("Homosexual people are free to live their lives as they wish ") +
ylab("Number of people") +
ggtitle("People`s attitude towards homosexual relationships")Conclusion 4. We have found out that, in Ireland, homosexual marriages were legalized in 2011. We are interested to know the level of homophobia five years after legalization. As the graph shows, most residents agree that homosexual people are free to live their lives as they wish.
politics1 = politics1 %>%
filter(bctprd != 8 )%>%
filter(bctprd != 9 )%>%
filter(bctprd != 7 )
politics1$bctprd <- factor(politics1$bctprd, labels = c("Yes", "No"), ordered= F,exclude = NA)
ggplot() +
geom_boxplot(data = politics1, aes(x = bctprd, y = stflife), fill="#C9F76F", col="#679B00", alpha = 0.5) +
ylim(c(1,10)) +
xlab("Boycotted certain products last 12 months") +
ylab("Satisfied in the life") +
ggtitle("Comparing level of life satisfaction due to experience of boykotting")Conclusion 5. It can be seen that there is a decent difference in the distribution of a satisfaction with life variable between two groups of people. The range of answers about life satisfaction is bigger among people who boycotted, and the median here is also higher.
politics1 = politics1 %>%
filter(actrolga != 8 )%>%
filter(actrolga != 9 )%>%
filter(actrolga != 7 )
politics1$actrolga <- factor(politics1$actrolga, labels = c("Not at all able", "A little able", "Quite able", "Very able", "Completely able"), ordered= F,exclude = NA)
ggplot() +
geom_boxplot(data = politics1, aes(x = actrolga, y = trstlgl), fill="#E667AF", col="#85004B", alpha = 0.5) +
ylim(c(1,10)) +
xlab("Considering ability to be politically active") +
ylab("Trust in the legal system") +
ggtitle("Ability to be politically active due to the trust in the legal system")Conclusion 6. The graph shows that for people, who considered themselves as completely able to be politically active, the level of trust in the legal system is higher. In contrast, for people considered themselves as not absolutely able to be politically active, the level of trust in the legal system is lower. In other cases, the average level of trust reported by the half of the sample is the same.
politics1 = politics1 %>%
filter(polintr != 7) %>%
filter(polintr != 8) %>%
filter(polintr != 9)
politics1$polintr <- factor(politics1$polintr, labels = c("Very interested", "Quite interested", "Hardly interested", "Not at all interested"), ordered= F)
politics1 = politics1 %>%
filter(sgnptit != 7) %>%
filter(sgnptit != 8) %>%
filter(sgnptit != 9)
politics1$signed_petitions <- factor(politics1$sgnptit, labels = c("Yes", "No"), ordered = F)
ggplot(data = politics1, aes(x = polintr, fill = signed_petitions)) +
geom_bar(position="fill")+
coord_flip()+
xlab("How interested in politics") +
ylab("Share of population") +
ggtitle("Participation in signing petitions due to the interest in politics")Conclusion 7. From what we can see on this graph, it can be concluded that regardless of the degree of interest in politics, a very small percentage of people sign petitions. The only exception is a layer of people who are extremely interested in politics; in their ranks about a third of people signed at least one petition for the last year
politics1 = politics1
politics1$lrscale = as.numeric(as.character(politics1$lrscale))
politics1$stfgov = as.numeric(as.character(politics1$stfgov))
politics1 = politics1 %>%
filter(lrscale != 77) %>%
filter(lrscale != 88) %>%
filter(lrscale != 99)
politics1 = politics1 %>%
filter(stfgov != 77) %>%
filter(stfgov != 88) %>%
filter(stfgov != 99)
ggplot(data = politics1) +
geom_point( aes(x = lrscale, y = stfgov))+
scale_color_gradient(low = "white", high = "black") +
xlab("Placement in left-right scale") +
ylab("Level of satisfaction with national government") +
ggtitle("The level of satisfaction with national government die to the placement in left-right scale") +
theme_bw()The graph shows that the correlation between variables is positive and week. Let`s have a look at correlation values:
politics1 = politics1 %>%
select(lrscale, stfgov)
cor1 = cor(politics1)
cordat = data.frame(cor1, stringsAsFactors = FALSE)
kable(cordat) %>%
kable_styling(bootstrap_options=c("bordered", "responsive","striped"), full_width = FALSE)| lrscale | stfgov | |
|---|---|---|
| lrscale | 1.0000000 | 0.2587218 |
| stfgov | 0.2587218 | 1.0000000 |
Conclusion 8. As it can be seen from the graph, there is a week association between sharing left/right views and the satisfaction with national government.
Thus, having built all these charts, we shed light on how things are in Ireland with the most frequently discussed topics in the political sphere. Many results turned out to be a surprise for us, quite interesting phenomenons and in subsequent projects we will try to deepen our research in these unusual areas.
politics2 <-ESS %>%
select(agea, lrscale, sgnptit, vote)
politics2 <- politics2[!is.na(politics2$agea),]
politics2 <- politics2[!is.na(politics2$lrscale),]
politics2 <- politics2[!is.na(politics2$sgnptit),]
politics2 <- politics2[!is.na(politics2$vote),]politics2 <- politics2 %>%
filter(lrscale != 77) %>%
filter(lrscale != 88) %>%
filter(lrscale != 99 )
politics2 <- politics2 %>%
filter(sgnptit != 7) %>%
filter(sgnptit != 8) %>%
filter(sgnptit != 9)
politics2 <- politics2 %>%
filter(agea != 999)First, we modify one of the variables to make it comfortable for manipulations. Then, we update our dataset.
politics2$lr <- ifelse(politics2$lrscale <= 3, "Left",
ifelse(politics2$lrscale >= 7, "Right", "Centre"))
politics2 <- politics2 %>%
select(- lrscale)Now, let`s look at the number of observations and the number of variables.
dim(politics2)## [1] 2240 4
Then, there is a description of chosen variables presented.
Label2 <- c("`sgnptit`", "`lr`", "`agea`", "`vote`" )
Meaning2 <- c("Signed petition last 12 months", "Placement on left right scale", "Age", "Voted last national election")
Level_Of_Measurement2 <- c("Nominal", "Nominal", "Ratio", "Nominal")
Test2 <- c("Chi-squared test", "Chi-squared test", "T-test for independet variables", "T-test for independet variables")
df2 <- data.frame(Label2, Meaning2, Level_Of_Measurement2,Test2, stringsAsFactors = FALSE)
kable(df2) %>%
kable_styling(bootstrap_options=c("bordered", "responsive","striped"), full_width = FALSE)| Label2 | Meaning2 | Level_Of_Measurement2 | Test2 |
|---|---|---|---|
sgnptit
|
Signed petition last 12 months | Nominal | Chi-squared test |
lr
|
Placement on left right scale | Nominal | Chi-squared test |
agea
|
Age | Ratio | T-test for independet variables |
vote
|
Voted last national election | Nominal | T-test for independet variables |
Firstly, we select variables necessary for chi-square test. Next, there is a contigency table presented.
politics_chi <- politics2 %>%
select(lr, sgnptit)
politics_chi$sgnptit <- factor(politics_chi$sgnptit, labels = c("Yes", "No"), ordered= F,exclude = NA)
ContigencyTable <- table(politics_chi$lr, politics_chi$sgnptit)
kable(ContigencyTable)%>%
kable_styling(bootstrap_options=c("bordered", "responsive","striped"), full_width = FALSE)| Yes | No | |
|---|---|---|
| Centre | 281 | 1041 |
| Left | 141 | 188 |
| Right | 95 | 494 |
In order to check whether our categories are successful to run chi-square test, we are going to create stacked barplots and analyze them.
sjp.xtab(politics_chi$lr, politics_chi$sgnptit, type = "bar", margin ="row",
bar.pos = "stack", title = "Participation in signing petitions due to party affiliation", title.wtd.suffix = NULL,
axis.titles = NULL, axis.labels = NULL, legend.title = NULL,
legend.labels = NULL, weight.by = NULL, rev.order = FALSE,
show.values = TRUE, show.n = TRUE, show.prc = TRUE, show.total = TRUE,
show.legend = TRUE, show.summary = TRUE, summary.pos = "r",
string.total = "Total", wrap.title = 50, wrap.labels = 15,
wrap.legend.title = 20, wrap.legend.labels = 20, geom.size = 0.7,
geom.spacing = 0.1, geom.colors = "Paired", dot.size = 3,
smooth.lines = FALSE, grid.breaks = 0.2, expand.grid = FALSE,
ylim = NULL, vjust = "bottom", hjust = "left", y.offset = NULL,
coord.flip = TRUE)With the introduction of the Internet, signing petitions has become available online. That is, the study of the largerst online petitions` platform (Change.org) has shown that this source is strongly biased toward liberal causes.
In Ireland, the half of the political parties presented are right(conservatism) or centre(social democracy, liberal conservatism, populism) and another half is left(socialism, respublicanism), so citizens have a wide spectrum of different views to share. Accordingly, in order to find out whether this distribution of the petition signatories due to their political preferences is random, we decided to build a Chi- square test.
The following hypotheses were approved for this:
So, let`s run chi-square test.
colnames(ContigencyTable) <- c("Petition +", "Petition -")
rownames(ContigencyTable) <- c("L", "R", "C")
chi.test <- chisq.test(ContigencyTable)
chi.test##
## Pearson's Chi-squared test
##
## data: ContigencyTable
## X-squared = 90.992, df = 2, p-value < 2.2e-16
Next, we have to look at residuals.
kable(chi.test$stdres)| Petition + | Petition - | |
|---|---|---|
| L | -2.459609 | 2.459609 |
| R | 9.217379 | -9.217379 |
| C | -4.663756 | 4.663756 |
assocplot(t(ContigencyTable), main="Residuals and number of observations" )On the plot of residuals, we can see the confirmation of our conclusion on the Chi-test: the difference in the number of petitioners who belong to different political parties is too big to say that the variables are independent of each other. Especially distinguished are the liberals, in whose ranks the number of signatories of the petition for indicator 10 is greater than expected, if these variables were independent; as well as the rights` ones, where the indicator 6 is less than the expected number of people who signed the petitions, if these variables were independent.
Thus, we were convinced that, apparently, since the Chi-square test and the difference in the residuals indicate a lack of evidence in favor of the independence of these data, we can assert that the political preferences of the respondents and their desire to sign or not to sign petitions of any kinds are related.
Here we start with filtering data to delete values useless for our test.
politics_ttest <- politics2 %>%
select(agea, vote) %>%
filter(vote != 3) %>%
filter(vote != 7) %>%
filter(vote != 8)Next, let`s compare mean values with the help of boxplot.
politics_ttest$vote <- factor(politics_ttest$vote, labels = c("Yes", "No"), ordered= F,exclude = NA)
ggplot() +
geom_boxplot(data = politics_ttest, aes(x = vote, y = agea), fill="#A44200", col="#A44200", alpha = 0.5) +
scale_y_continuous(limits = c(0,100)) +
xlab("Voted last national election") +
ylab("Age") +
ggtitle("Participation in the election due to age")There is the first way to check normality presented.
describeBy(politics_ttest, politics_ttest$vote)##
## Descriptive statistics by group
## group: Yes
## vars n mean sd median trimmed mad min max range skew
## agea 1 1689 54.93 16.41 55 54.98 19.27 19 92 73 -0.03
## vote* 2 1689 1.00 0.00 1 1.00 0.00 1 1 0 NaN
## kurtosis se
## agea -0.77 0.4
## vote* NaN 0.0
## --------------------------------------------------------
## group: No
## vars n mean sd median trimmed mad min max range skew
## agea 1 408 40.69 15.87 38 39.19 14.83 16 96 80 0.87
## vote* 2 408 2.00 0.00 2 2.00 0.00 2 2 0 NaN
## kurtosis se
## agea 0.44 0.79
## vote* NaN 0.00
Skewness is a measure of the symmetry in a distribution. The normal distribution is symmetrical, so skew should be equal to 0 in normal distribution. In the group of voters skew equals to 7.49, and in the group of non-voters the skew is higher, 8.71. The distribution of age is more symmetrical in the group of voters, but still it is far away from normal. However, both of skews are greater than 1, so both of the groups have a high positive skewness (right).
Kurtosis tells us, whether the distribution is peaked or plain. The kurtosis of the age in voters group equals to 55.29, and in the non-voters group kurtosis equals to 75.76. That means that the distribution of the first group (voters) is less sharp than the distribution of the second group.
Next, we check normality with the help of histogram.
library(ggplot2)
ggplot(politics_ttest, aes(x = agea, fill = vote)) +
geom_histogram(aes(y=..density..), position = "identity", alpha = 0.7, binwidth = 3) +
geom_density(col = "yellow", fill = "white", alpha = 0.1) +
geom_vline(aes(xintercept = mean(politics_ttest$agea), color = 'mean'), linetype="dashed", size=1) +
geom_vline(aes(xintercept = median(politics_ttest$agea), color = 'median'), linetype="longdash", size=1) +
scale_color_manual(name = "Measurement", values = c(median = "#cb3f68", mean = "#824acd")) +
xlab("Age") +
ylab("Density") +
ggtitle("Age distribution of voters and non-voters")Finally, we check normality with the help of Q-Q Plot.
#creating subgroups based on voting / non-voting
voteplus <- subset(politics_ttest[politics_ttest$vote == "Yes",])
voteminus <- subset(politics_ttest[politics_ttest$vote == "No",])
par(mfrow = c(1,2))
# y is limited from 18 because it is age at which the Irish are allowed to vote
qqnorm(voteplus$agea, ylim = c(18, 100), main = "Normal Q-Q Plot for vote+"); qqline(voteplus$agea,ylim = c(18 ,100), col= 2)
qqnorm(voteminus$agea, ylim = c(18 ,100), main = "Normal Q-Q Plot for vote-"); qqline(voteminus$agea, col= 2, ylim = c(18 ,100))Age is sometimes mentioned as one of the factor which can influence voting behaviour, but it seems that every country should be studied as a unique case.
As in the case with political preferences and signed petitions, we would like to find out whether voting behavior in Ireland is related to the respondent’s age, so, we should conduct a T-test. The following hypotheses were approved for this:
Now we are going to run T-test.
t.test(politics_ttest$agea ~ politics_ttest$vote)##
## Welch Two Sample t-test
##
## data: politics_ttest$agea by politics_ttest$vote
## t = 16.147, df = 634.29, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 12.50096 15.96259
## sample estimates:
## mean in group Yes mean in group No
## 54.92540 40.69363
wilcox.test(agea ~ vote, data = politics_ttest)##
## Wilcoxon rank sum test with continuity correction
##
## data: agea by vote
## W = 509930, p-value < 2.2e-16
## alternative hypothesis: true location shift is not equal to 0
Thus, by operating on the data and having conducted several statistical tests, we can confidently assert the following:
During the discussion of Ireland as a object of our research, we found out, that it is a quite welfare country. It is on the sixth place on a scale of human development index, which is extremely cool. However, this index, which is composed of life expectancy, education, and per capita income indicators, does not include the political aspects of a country.
Since our research topic is Politics, we were concerned with this fact, and tried to figure out and explore, how does involvement in politics affect life satisfaction in Ireland.
Our expectations were that the most interested in politics people have the highest level of life satisfaction, comparing to other people who are not that interested in political processes.
Our research question is “Do irish people who are interested in politics to different extents have the same level of life satisfaction?”
politics3 = ESS %>%
select( stflife, polintr)
politics3 = politics3 %>%
filter(stflife != 77) %>%
filter(stflife != 88) %>%
filter(stflife != 99)
politics3 = politics3 %>%
select( stflife, polintr) %>%
filter(polintr != 7) %>%
filter(polintr != 8) %>%
filter(polintr != 9 ) Then, there is a description of chosen variables presented.
Label3 <- c("`polintr`", "`stflife`")
Meaning3 <- c("How interested in politics", "How satisfied with life as a whole")
Level_Of_Measurement3 <- c("Ordinal", "Interval")
Measurement3 <- c("Very - Quite - Hardly - Not at all", "0 - 10")
df3 <- data.frame(Label3, Meaning3, Level_Of_Measurement3, Measurement3, stringsAsFactors = FALSE)
kable(df3) %>%
kable_styling(bootstrap_options=c("bordered", "responsive","striped"), full_width = FALSE)| Label3 | Meaning3 | Level_Of_Measurement3 | Measurement3 |
|---|---|---|---|
polintr
|
How interested in politics | Ordinal | Very - Quite - Hardly - Not at all |
stflife
|
How satisfied with life as a whole | Interval | 0 - 10 |
Let`s filter our data and prepare it for the further analysis.
politics3 = politics3 %>%
select(polintr, stflife)
politics3$polintr <- ifelse (politics3$polintr == 1, "Very interested",
ifelse(politics3$polintr == 2, "Quite interested",
ifelse(politics3$polintr == 3, "Hardly interested", "Not interested")))
politics3$stflife <- as.numeric(as.character(politics3$stflife))
politics3$polintr <- as.factor(politics3$polintr)politics3$polintr <- factor(politics3$polintr, c("Not interested", "Hardly interested", "Quite interested", "Very interested" ))
politics3 = politics3%>%
filter(politics3$stflife != 88)
politics3 = politics3 %>%
filter(politics3$stflife != 77)
politics3 = politics3 %>%
filter(politics3$stflife != 99)describeBy(politics3$stflife, politics3$polintr, mat = TRUE) %>% #create dataframe
select(polintr = group1, N=n, Mean=mean, SD=sd, Median=median, Min=min, Max=max,
Skew=skew, Kurtosis=kurtosis, st.error = se) %>%
kable(align=c("lrrrrrrrr"), digits=2, row.names = FALSE,
caption="Satisfaction with life by political preferences") %>%
kable_styling(bootstrap_options=c("bordered", "responsive","striped"), full_width = FALSE)| polintr | N | Mean | SD | Median | Min | Max | Skew | Kurtosis | st.error |
|---|---|---|---|---|---|---|---|---|---|
| Not interested | 744 | 7.12 | 2.00 | 7 | 0 | 10 | -0.87 | 1.05 | 0.07 |
| Hardly interested | 733 | 7.23 | 1.90 | 8 | 0 | 10 | -0.97 | 1.42 | 0.07 |
| Quite interested | 964 | 7.29 | 1.80 | 7 | 0 | 10 | -1.02 | 2.00 | 0.06 |
| Very interested | 308 | 7.54 | 1.87 | 8 | 0 | 10 | -0.99 | 1.35 | 0.11 |
By looking at this table we can conclude that the sizes of our groups are quite comparable
Next, we are to look at groups` sizes to be sure that they are representative.
par(mar = c(3,10,0,3))
barplot(table(politics3$polintr)/nrow(politics3)*100, horiz = T, xlim = c(0,60), las = 2)Now, by looking at the barplot, we also can conclude that the groups are of a comparable size.
ggplot()+
geom_boxplot(data = politics3, aes(x = polintr, y = stflife), fill="pink", col="purple", alpha = 0.5) +
ylim(c(0,10)) +
xlab("How interested in politics") +
ylab("Level of Life satisfaction") +
ggtitle("Life satisfaction by the level of interest in politics")Conclusion: From the boxplot we can see that the Y variables are not quite normally distributed, since in “Quite interested” and “Hardly interested” groups the means are far from the centre of boxplots. Hopefully, we can go with it. Also,there are several outliers. Moreover, it can be see that those, who are completely not interested in politics and those who are very interested in politics have the higher mean of life satisfaction level.
The next step is to check the assumptions for ANOVA-test. Then, let`s look at homogeneity of variances with the help of Levene test.
leveneTest(politics3$stflife ~ politics3$polintr)Conclusion: From the results of the Levene’s Test it can be seen that the p-value is much higher than the significance level of 0.05. This means that there is no evidence to suggest that the variance among groups is statistically significantly different. Therefore, we can assume the homogeneity of variances in the different groups of political interest.
oneway.test(politics3$stflife ~ politics3$polintr, var.equal = T)##
## One-way analysis of means
##
## data: politics3$stflife and politics3$polintr
## F = 3.8028, num df = 3, denom df = 2745, p-value = 0.009808
aov.out <- aov(politics3$stflife ~ politics3$polintr)
summary(aov.out)## Df Sum Sq Mean Sq F value Pr(>F)
## politics3$polintr 3 41 13.562 3.803 0.00981 **
## Residuals 2745 9790 3.566
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Conclusion: F(3, 2745) = 3.8028 and p-value <.01. Basing on these numbers we should reject the null hypothesis. It means that the difference in level of life satisfaction across the groups of political interest is statistically significant.
layout(matrix(1:4, 2, 2))
plot(aov.out)Conclusion: We can see that on the upper two graphs the red line is pretty straight. The lime on the Q-Q plot is not as straight. So, on the basis of these graphs, we can conclude that the distribution of residuals is not pretty normal.
anova.res <- residuals(object = aov.out)
describe(anova.res) Conclusion: Skew and kurtosis are <2, so the distribution of residuals is normal (!)
shapiro.test(x = anova.res)##
## Shapiro-Wilk normality test
##
## data: anova.res
## W = 0.93435, p-value < 2.2e-16
Conclusion: The p-value is extremely small, whcich testifies the non-normal distribution of residuals
hist(anova.res, main = "Distribution of residuals", xlab = "Residuals", col = "pink", border = "#BC6B97")Conclusion By looking at the histogram we can conclude that residuals are not pretty normally distributed, but rather skewed to the left.
Overall conclusion: All the tests except the skew and kurtosis analysis tell that the distribution of residuals is not normal. So, the assumption of the normality of residuals does not hold.
In the ANOVA test a significant p-value indicates that means in some groups are different, though it doesn`t show, which pairs of groups this exactly are. To find this out, a post hoc test can be conducted to determine if the mean difference between specific pairs of group are statistically significant.
As variances across groups are practically equal, we chose Tukey test for that.
par(mar = c(5, 15, 3, 1))
Tukey <- TukeyHSD(aov.out)
plot(Tukey, las = 2, col = "red" )Conclusion The test results show, that only the difference between very interested in politics and not interested in politics groups is significant, since the projection of difference between means of these two groups cross the “0” line
As it could be seen from the boxplot, there are some outliers. Therefore we want to double-check our results using non-parametric test.
H0: Mean ranks of the groups are not different.
kruskal.test(politics3$stflife ~ politics3$polintr, data = politics3) ##
## Kruskal-Wallis rank sum test
##
## data: politics3$stflife by politics3$polintr
## Kruskal-Wallis chi-squared = 12.764, df = 3, p-value = 0.005176
Conclusion Basing on KW chi-square (3) = 12.764 and p-value <.01 we reject the null-hypothesis and assume that the mean ranks of the chosen groups are different. The test confrims the results of the ANOVA test.
Since the results of Kruskal-Wallis test are statistically significant, we now run Dunn’s test.
DunnTest(politics3$stflife ~ politics3$polintr, data = politics3)##
## Dunn's test of multiple comparisons using rank sums : holm
##
## mean.rank.diff pval
## Hardly interested-Not interested 51.233292 0.4132
## Quite interested-Not interested 57.533139 0.3912
## Very interested-Not interested 188.485145 0.0022 **
## Quite interested-Hardly interested 6.299848 0.8690
## Very interested-Hardly interested 137.251854 0.0476 *
## Very interested-Quite interested 130.952006 0.0476 *
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Conclusion: The results of Dunn test show, that besides people who are very interested in politics and people who are not interested in politics at all there are two more pairs of groups, which differences in means are statistically significant. These are:
Very interested in politics and Hardly interested in politicsVery interested in politics and Quite interested in politics However, the differences in means between these pairs are statistically significant at 1% significance level, while the difference in means between Very interested in politics and Not interested in politics groups is significant at the level of 0.1%.The rest pairs of groups of people with different levels of political interest have not statistically significant differences in means.
So, answering our research question, we can argue that some groups of Irish people who are differently interested in politics have a different average level of life satisfaction. To be more precise, the following groups have a significant differences:
Very interested in politics and Hardly interested in politics peopleVery interested in politics and Quite interested in politics people While people, who are very interested in politics and people, who are not interested in politics at all, have remarkable significant differences in means of life satisfaction level, meaning that these groups of people have the largest difference in life satisfaction level.People, who are quite interested in politics and hardly interested in politics do not have statistically significant differences in means of life satisfaction level. The same goes also for these pairs of groups:
Quite interested in politics and Not interested in politics peopleHardly interested in politics and Not interested in politics peopleAfter all these tests and analysis we can conclude that the Irish people who are interested in politics to different extents indeed have not the same level of life satisfaction. Moreover, our expectations are met and people with the highest political interest are most satisfied with life.
However, what’s the reason for such a difference in life satisfaction among the groups who are very interested in politics and who are not interested in politics? The answer may be that people who are not satisfied with their lives do not even care about politics. They may be much more interested in and focused on their basic needs that they need to fulfill to become satisfied with life first. Also, there may be a moderator that causes such a difference.
All our long work with the analysis of political aspects in Ireland originates from our very first simple analysis of this topic with the construction of colorful graphs. While doing that we noticed some interesting patterns that we wanted to study and analyze in more detail. After all the works presented by us, we approached the most intriguing topic: which of our selected variables in our large politics dataset predicts the satisfaction with democracy in the best way. To know this, we constructed several mathematical models and compared them. Now we are taking a new step: we are adding a mediator to our model. So, finally, we want to complete our long road with adding the analysis of linear regression with an interaction effect. Previously, we have determined some variables for the linear regression and checked, which model is the best. Now we are going even deeper in analysing this pattern.
Having explored the literature, we came up to the the articles that told us the following:
Within the set of liberal democracies, the Nordic countries tend to have the highest trust rates, (and Ireland is actually a Nordic country), and the confidence of people in the government is of a general nature: a high level of trust in one institution tends to spread to other institutions, such as trust in parliament and overall satisfaction with democracy .
There was a citation: “The evidence suggests that trust in government is a poor indicator of the level of social trust in each country, its contribution to overall life satisfaction is at best indirect, and it is a poor indicator of quality of governance. Further research is recommended to clarify the value of trust in government and its relationship to other key policy objectives”. The author explored the relation between the quality of governance and the trust in government itself. It is interesting that the latter is a bad predictor of the first one. The author claims that the area needs further research, and here we are! We would like to introduce our new variable, categorical one!
Our research questions are as follows: * We would like to know, which of these variables predicts the satisfaction with democracy in the best way. To know this, we are going to construct several mathematical models and compare them to come up to the conclusion.
Null hypothesis is that there is no relationship between the independent variables (trust to parliament, trust to politicians, important that government is strong and ensures safety) and the dependent variable (satisfaction with democracy)).
The alternate hypothesis is that there exists a relationship between the independent variables (trust to parliament, trust to politicians, important that government is strong and ensures safety) and the dependent variable (satisfaction with democracy)).
Our variables are:
Label4 <- c("`trstprl`", "`ipstrgv`", "`stfdem`", "`trstplt`" )
Meaning4 <- c("Trust to parliament", "Important that government is strong and ensures safety", "Satisfaction with democracy", "Trust to politicians")
Level_Of_Measurement4 <- c("Interval", "Ordinal", "Interval", "Interval")
Measurement4 <- c("0 - 10","Very much like me - Like me - Somewhat like me - A little like me - Not like me", "0 - 10", "0 - 10")
df4 <- data.frame(Label4, Meaning4, Level_Of_Measurement4, Measurement4, stringsAsFactors = FALSE)
kable(df4) %>%
kable_styling(bootstrap_options=c("bordered", "responsive","striped"), full_width = FALSE)| Label4 | Meaning4 | Level_Of_Measurement4 | Measurement4 |
|---|---|---|---|
trstprl
|
Trust to parliament | Interval | 0 - 10 |
ipstrgv
|
Important that government is strong and ensures safety | Ordinal | Very much like me - Like me - Somewhat like me - A little like me - Not like me |
stfdem
|
Satisfaction with democracy | Interval | 0 - 10 |
trstplt
|
Trust to politicians | Interval | 0 - 10 |
es2 = ESS %>%
select(trstprl, stfdem, trstplt, ipstrgv)
es2$ipstrgv= as.factor(es2$ipstrgv)
es2$trstprl = as.numeric(as.character(es2$trstprl))
es2$trstplt = as.numeric(as.character(es2$trstplt))
es2$stfdem = as.numeric(as.character(es2$stfdem))
es2 = es2 %>%
filter(trstprl != 77) %>%
filter(trstprl != 88) %>%
filter(trstprl != 99)
es2 = es2 %>%
filter(ipstrgv != 7) %>%
filter(ipstrgv != 8) %>%
filter(ipstrgv != 9)
es2 = es2 %>%
filter(stfdem != 77) %>%
filter(stfdem != 88) %>%
filter(stfdem != 99)
es2 = es2 %>%
filter(trstplt != 77) %>%
filter(trstplt != 88) %>%
filter(trstplt != 99)
es2 <- es2[!is.na(es2$trstprl),]
es2 <- es2[!is.na(es2$ipstrgv),]
es2 <- es2[!is.na(es2$trstplt),]
es2 <- es2[!is.na(es2$stfdem),]So, first of all, we should have a glance on specifications of our dataset with the function summary.
summary(es2)## trstprl stfdem trstplt ipstrgv
## Min. : 0.000 Min. : 0.000 Min. : 0.000 2 :998
## 1st Qu.: 3.000 1st Qu.: 4.000 1st Qu.: 2.000 1 :729
## Median : 5.000 Median : 6.000 Median : 4.000 3 :421
## Mean : 4.538 Mean : 5.423 Mean : 3.775 4 :233
## 3rd Qu.: 6.000 3rd Qu.: 7.000 3rd Qu.: 5.000 5 :125
## Max. :10.000 Max. :10.000 Max. :10.000 6 : 26
## (Other): 0
Seems legit, now we need to understand our variables from our dataset graphically.
For that we will need to create:
Box plot, to show the continuous variables and spot outliers.
Density plot, to check if the distribution of our continuous variables is close to normal.
Barplot, to see if the categorical variable is representative.
Scatter plot, to visualize the linear relationship between the variables.
par(mfrow=c(1, 3))
boxplot(es2$trstprl, main="Trust in country's parliament", sub=paste("Outlier rows: ", boxplot.stats(es2$trstprl)$out))
boxplot(es2$trstplt, main="Trust in politicians", sub=paste("Outlier rows: ", boxplot.stats(es2$trstplt)$out))
boxplot(es2$stfdem, main="Satisfaction with democracy", sub=paste("Outlier rows: ", boxplot.stats(es2$stfdem)$out))trust in politicians (it can be found on line 10 in our dataset). Moreover, it can be seen that trust in politicians has the lowest median of level of trust.par = ggplot(data = es2, aes(x = trstprl)) + geom_histogram(aes(y=..density..), position = "identity", alpha = 0.7, binwidth = 1, fill = "orange") + geom_density(col = "blue", fill = "white", alpha = 0.1) + xlab("Trust in parliament")
dem = ggplot(data = es2, aes(x = stfdem)) + geom_histogram(aes(y=..density..), position = "identity", alpha = 0.7, binwidth = 1, fill = "purple") + geom_density(col = "blue", fill = "white", alpha = 0.1) + xlab("Satisfaction with democracy")
polit = ggplot(data = es2, aes(x = trstplt)) + geom_histogram(aes(y=..density..), position = "identity", alpha = 0.7, binwidth = 1, fill = "grey") + geom_density(col = "blue", fill = "white", alpha = 0.1) + xlab("Trust in politicians")
plot_grid(par, polit, dem)trust in parliament and satisfaction with democracy are slightly close to normal distribution. As for the trust in politicians, the histogram is not normally distributed. However, we can surely work with that.levels(es2$ipstrgv)[1] <- "Very important"
levels(es2$ipstrgv)[2] <- "Important"
levels(es2$ipstrgv)[3] <- "Quite important"
levels(es2$ipstrgv)[4] <- "Little important"
levels(es2$ipstrgv)[5] <- "Not really important"
levels(es2$ipstrgv)[6] <- "Not important at all"
es2$ipstrgv <- factor(es2$ipstrgv,ordered= F,exclude = NA)
ggplot(data = es2, aes(x = ipstrgv)) + geom_bar(aes(y = (..count..)/sum(..count..)), fill = "pink") + scale_y_continuous(labels=scales::percent) + ylab("Relative frequencies") + ggtitle("important for government to be strong") + coord_flip()We can see that the groups are not of a comparable size, but, surely, we can continue our work.
The most Irish people believe that the government needs to be strong and to provide safety.
w = ggplot(data = es2, aes(x = trstprl, y = stfdem)) + geom_point() + geom_smooth(method = lm, fill="blue", color="blue", se = FALSE) + ggtitle("Trust in parlment by satisfaction with democracy") + xlab("Trust in parlament") + ylab("Satisfaction with democracy")
we = ggplot(data = es2, aes(x = trstplt, y = stfdem)) + geom_point() + geom_smooth(method = lm, fill="blue", color="blue", se = FALSE) + ggtitle(" Trust in politicians by satisfaction with democracy") + xlab("Trust in politicians") + ylab("Satisfaction with democracy")
eOur scatterplots show that:
satisfaction with democracy and trust in parliamentsatisfaction with democracy and trust in politicians4.1. Looking at correlation coefficients
Last, but not the least, we would like to look how our continuous variables are related. For that let us have a look at this fine graph:
es3 = es2 %>%
select( - ipstrgv)
cor2 = cor(es3)
sjp.corr(es3, show.legend = TRUE)Since we have seen the linear relationship pictorially in the scatter plot and by computing the correlation, it is time for model conduction.
First, we look at the model with one predictor. Here we want to see how satisfaction with democracy can be predicted by trust to the parliament. We construct a table and look at what it means:
model1 = lm( stfdem ~ trstprl, data = es2)
sjPlot::tab_model(model1)| stfdem | |||
|---|---|---|---|
| Predictors | Estimates | CI | p |
| (Intercept) | 3.22 | 3.06 – 3.38 | <0.001 |
| trstprl | 0.49 | 0.45 – 0.52 | <0.001 |
| Observations | 2532 | ||
| R2 / adjusted R2 | 0.265 / 0.265 | ||
\[stfdem = 3.22 + 0.49 * trstprl \]
Now we add another predictor to our model. We add trust to politicians to see, if the additional variable will help us to predict the satisfaction with democracy better
model2 = lm( stfdem ~ trstprl + trstplt , data = es2)
sjPlot::tab_model(model2)| stfdem | |||
|---|---|---|---|
| Predictors | Estimates | CI | p |
| (Intercept) | 3.06 | 2.89 – 3.22 | <0.001 |
| trstprl | 0.37 | 0.33 – 0.41 | <0.001 |
| trstplt | 0.19 | 0.15 – 0.22 | <0.001 |
| Observations | 2532 | ||
| R2 / adjusted R2 | 0.290 / 0.290 | ||
\[stfdem = 3.06 + 0.37 * trstprl + 0.19 * trstplt \]
Finally, we add a variable ipstrgv(important that government is strong and ensures safety) to our model
model3 = lm( stfdem ~ trstprl + trstplt + ipstrgv, data = es2)
sjPlot::tab_model(model3)| stfdem | |||
|---|---|---|---|
| Predictors | Estimates | CI | p |
| (Intercept) | 3.22 | 3.03 – 3.41 | <0.001 |
| trstprl | 0.37 | 0.33 – 0.41 | <0.001 |
| trstplt | 0.19 | 0.15 – 0.23 | <0.001 |
| Important | -0.24 | -0.41 – -0.06 | 0.008 |
| Quite important | -0.35 | -0.57 – -0.13 | 0.002 |
| Little important | -0.28 | -0.55 – -0.01 | 0.043 |
| Not really important | -0.07 | -0.42 – 0.28 | 0.687 |
| Not important at all | 0.04 | -0.67 – 0.76 | 0.908 |
| Observations | 2532 | ||
| R2 / adjusted R2 | 0.294 / 0.292 | ||
Now, here we have lots of interesting stuff.
\[stfdem = 3.22 + 0.37 * trstprl + 0.19 * trstplt - 0.24 * important - 0.35 * quite.important - 0.28 * little.important - 0.07 * Not.really.important + 0.04 * Not.important.at.all\]
ANOVA helps us to compare models in which everything is the same, but several variables are added to one of them (or more), which are not taken into account in another model.
anova(model1, model2)anova(model2, model3)We`ve added interaction to the best model(model3 with 3 predictors) according to ANOVA.
model4 = lm( stfdem ~ trstprl + trstplt + ipstrgv + trstprl * trstplt , data = es2)
sjPlot::tab_model(model4)| stfdem | |||
|---|---|---|---|
| Predictors | Estimates | CI | p |
| (Intercept) | 3.04 | 2.80 – 3.28 | <0.001 |
| trstprl | 0.42 | 0.36 – 0.47 | <0.001 |
| trstplt | 0.26 | 0.19 – 0.33 | <0.001 |
| Important | -0.24 | -0.42 – -0.07 | 0.007 |
| Quite important | -0.36 | -0.58 – -0.14 | 0.001 |
| Little important | -0.30 | -0.57 – -0.03 | 0.029 |
| Not really important | -0.07 | -0.41 – 0.28 | 0.709 |
| Not important at all | 0.07 | -0.65 – 0.79 | 0.847 |
| trstprl:trstplt | -0.01 | -0.03 – -0.00 | 0.015 |
| Observations | 2532 | ||
| R2 / adjusted R2 | 0.295 / 0.293 | ||
Now we have a huge table with lots of numbers. and they look scary! Let’s interpret it all step by step.
Not really important and Not really important variables.Let us now have a look at the estimates and interpret the significant ones:
trust to parliament is 0.42. If a person does not trust to polirticians and thinks that it is very important for the government to be strong and provide security, the satisfaction with democracy will rise by 0.42 points with each increase by 1 in this variable.trust to politicians is 0.26. If a person does not trust to polirticians and thinks that it is very important for the government to be strong and provide security, the satisfaction with democracy will rise by 0.26 with each increase by 1 in this variable.important is 0.24. If a person does not trust to polirticians and to parliament and thinks that it is important for the government to be strong and provide security, the satisfaction with democracy will rise by 0.24.quite important is 0.36. If a person does not trust to polirticians and to parliament and thinks that it is quite important for the government to be strong and provide security, the satisfaction with democracy will rise by 0.36.little important is 0.30. If a person does not trust to polirticians and to parliament and thinks that it is little important for the government to be strong and provide security, the satisfaction with democracy will rise by 0.30.trust to parliament and trust to politicians interaction is -0.01. If a person considers that it is very important for the government to be strong and provide safety, each increase by 1 in both of these variables will bring -0.01 point to the satisfaction with democracy.And here is an equation for this model:
\[stfdem = 3.22 + 0.42 * trstprl + 0.26 * trstplt - 0.24 * important - 0.36 * quite.important - 0.30 * little.important - 0.07 * Not.really.important + 0.07 * Not.important.at.all - 0.01 * trstprl * trstplt\]
anova(model3, model4)Now let us construct the interaction plot to visualize our interaction somehow.
plot_model(model4, type = "int", terms = "trstplt", mdrt.values = "minmax") From the plot we can conclude that the higher the trust to parliament, the higher is the satisfaction with democracy. As a mediator here we take
Trust to politicians variable, namely, it’s highest and lowest values. We can see that the graphs do not cross, but come closer by the right end of the graph. This means, we have an interaction effect, but it is not very significant.
plot_model(model4, type = "int", terms = "trustpoliticians", mdrt.values = "quart") May be it will be better if we peek the quantiles? Well, the shaded area crosses. So, there is a non-significant interaction effect as well!
Linear regression makes several assumptions about the data, such as :
par(mfrow = c(2, 2))
plot(model4)Based on our long and stressful analysis, after having modeled a mathematical function and checked its assumptions, we can make the following conclusions:
In very simple words: People will be more satisfied with democracy if they trust to parliament and to politicians. The least it is important for them that government is strong and provides security, the higher will be their satisfaction with democracy.
The final formula is:
\[stfdem = 3.22 + 0.42 * trstprl + 0.26 * trstplt - 0.24 * important - 0.36 * quite.important - 0.30 * little.important - 0.07 * Not.really.important + 0.07 * Not.important.at.all - 0.01 * trstprl * trstplt\] We can safely say that according to these variables and by using our model, one can predict satisfaction with democracy of any Ireland citizen.