Introduction

Hello. We are 2BK.Team members are Bakhareva Anastasia, Borisenko Iana, Kireeva Irina, Kuzmicheva Daria. Our topic is “Politics”. The country we have chosen for studying is Ireland.

Our theme includes such aspects as public interest in politics, the level of trust in politicians and satisfaction with the results of policies they introduces, taking part in political actions and much more, which will be presented in our project. We have focused on the results of the survey on Ireland, the total number of respondents is 2757 people. We hope it will be interesting.

Loading data, running libraries

There are all the libraries you need to run.

library(readr)
library(dplyr)
library(ggplot2)
library(knitr) 
library(kableExtra)
library(sjlabelled)
library(sjmisc) 
library(sjstats) 
library(ggeffects)  
library(sjPlot)
library(psych)
library(tidyverse)
library(magrittr)
library(DescTools) 
library(car)
library(cowplot)
library(gridExtra)
library(snakecase)
library(car)
library(stargazer)
library(effects)
library(scales)

To reproduce the code successfully, you should download the following data:

European Social Survey, Ireland, Round 8
Datasets: Politics, Gender, age and household composition, Human values scale.

ESS = read_csv("D:/Documents/ESS1-8e01.csv")

Project 1. Describing the data.

Before analyzing our dataset, we decided to prepare, immersed in the topic of our project. To do this, we scrolled throw the most frequently discussed topics within the framework of this theme and determined that such topics are: trust in politicians, economic indicators within a particular country, the desire and willingness of the population to take part in political life, issues with the LGBT community and its acceptance in the eyes of the public and, in principle, satisfaction with government work. In our analysis, we have tried to touch on all of these interns, clearly demonstrating them on the constructed graphs.

For clarity of data analysis, we have created several analytical charts that could clearly reveal all the patterns for in-depth analysis of data and logical conclusions.

Describing variables

First of all, we would like to take a look at variables we have selected for analysis.

politics1 = ESS %>%
  select(polintr, actrolga, cptppola, trstlgl,  trstplt, sgnptit, bctprd, lrscale, stflife, stfeco, stfgov, freehms)

Label = c("`polintr`", "`actrolga`", "`cptppola`", "`trstlgl`", "`trstplt`", "`sgnptit`", "`bctprd`", "`lrscale`", "`stflife`", "`stfeco`", "`stfgov`", "`freehms`") 
Meaning = c("How interested in politics", "Able to take active role in political group", "Confident in own ability to participate in politics", "Trust in the legal system", "Trust in politicians", "Signed petition last 12 months", "Boycotted certain products last 12 months", "Placement on left right scale", "How satisfied with life as a whole", "How satisfied with present state of economy in country", "How satisfied with the national government", "Gays and lesbians free to live life as they wish")
Level_Of_Measurement <- c("Ordinal", "Ordinal", "Interval", "Interval", "Nominal",  "Nominal", "Ordinal", "Interval", "Interval", "Interval", "Interval", "Nominal")
df <- data.frame(Label, Meaning, Level_Of_Measurement, stringsAsFactors = FALSE)

kable(df) %>% 
  kable_styling(bootstrap_options=c("bordered", "responsive","striped"), full_width = FALSE)

Label	Meaning	Level_Of_Measurement
`polintr`	How interested in politics	Ordinal
`actrolga`	Able to take active role in political group	Ordinal
`cptppola`	Confident in own ability to participate in politics	Interval
`trstlgl`	Trust in the legal system	Interval
`trstplt`	Trust in politicians	Nominal
`sgnptit`	Signed petition last 12 months	Nominal
`bctprd`	Boycotted certain products last 12 months	Ordinal
`lrscale`	Placement on left right scale	Interval
`stflife`	How satisfied with life as a whole	Interval
`stfeco`	How satisfied with present state of economy in country	Interval
`stfgov`	How satisfied with the national government	Interval
`freehms`	Gays and lesbians free to live life as they wish	Nominal

As it can be seen, there are both categorical and continuous variables presented in the dataset, so we will be able to go though it for deeper analysis.

Calculating central tendency measures.

Well, for this part only variables of interval type were taken. The result is present in the table below.

Mode <- function(x) {
  ux <- unique(x)
  ux[which.max(tabulate(match(x, ux)))]
}  
#1
politics1$trstplt =  as.numeric(as.character(politics1$trstplt))
v.trstplt <- c(mean(politics1$trstplt), Mode(politics1$trstplt), median(politics1$trstplt))
names(v.trstplt) <- c("mean", "mode", "median")
#2
politics1$trstlgl =  as.numeric(as.character(politics1$trstlgl))
v.trstlgl <- c(mean(politics1$trstlgl), Mode(politics1$trstlgl), median(politics1$trstlgl))
names(v.trstlgl) <- c("mean", "mode", "median")
#3
politics1$lrscale = as.numeric(as.character(politics1$lrscale))
v.lrscale <- c(mean(politics1$lrscale), Mode(politics1$lrscale), median(politics1$lrscale))
names(v.lrscale) <- c("mean", "mode", "median")
#4
politics1$stflife = as.numeric(as.character(politics1$stflife))
v.stflife <- c(mean(politics1$stflife), Mode(politics1$stflife), median(politics1$stflife))
names(v.stflife) <- c("mean", "mode", "median")
#5
politics1$stfgov = as.numeric(as.character(politics1$stfgov))
v.stfgov <- c(mean(politics1$stfgov), Mode(politics1$stfgov), median(politics1$stfgov))
names(v.stfgov) <- c("mean", "mode", "median")

tendencymeasures_overview =  data.frame(v.trstplt, v.trstlgl, v.lrscale, v.stflife, v.stfgov, stringsAsFactors = FALSE)

kable(tendencymeasures_overview) %>% 
  kable_styling(bootstrap_options=c("bordered", "responsive","striped"), full_width = FALSE)

	v.trstplt	v.trstlgl	v.lrscale	v.stflife	v.stfgov
mean	4.850925	7.678999	19.53827	7.393544	5.536815
mode	5.000000	7.000000	5.00000	8.000000	5.000000
median	4.000000	6.000000	5.00000	7.000000	5.000000

Graphs` creating

And now it is time for finding out all the answers about politics in Ireland on all the questions you can imagine.

1. Do people in Ireland trust politicians ?

ggplot()+
  geom_histogram(data = politics1, aes(x = trstplt), binwidth = 1, fill="#7ee0ff", col="#3a0d0d", alpha = 0.5) +
  xlim(c(0, 10)) +
  xlab("Trust in politicians") + 
  ylab("Number of people") +
  geom_vline(aes(xintercept = mean(politics1$trstplt), color = 'mean'), linetype="solid", size=1) +
  geom_vline(aes(xintercept = median(politics1$trstplt), color = 'median'), linetype="solid", size=1)+
  geom_vline(aes(xintercept = Mode(politics1$trstplt), color = 'mode'), linetype="solid",size=1) +
  scale_color_manual(name = "Measurement", values = c(median = "#cb3f68", mean = "#824acd", mode = "#339666"))+
  ggtitle("The level of trust towards politicians")

Conclusion 1. The level of trust to politicians is scewed to the left, so people tend not to trust politicians. Although most frequently people report their trust to be in the middle of the scale, 50% people have defined their level of trust as below the average.

1. Are people in Ireland satisfied with the state of economy in the country ?

ggplot()+
  geom_histogram(data = politics1, aes(x = stfeco), binwidth = 1, fill="#FFB273", col="#FF9640", alpha = 0.5) +
  xlim(c(0, 10)) +
  xlab("Satisfaction with the economy`s state") + 
  ylab("Number of people") +
  geom_vline(aes(xintercept = mean(politics1$stfeco), color = 'mean'), linetype="solid", size=1) +
  geom_vline(aes(xintercept = median(politics1$stfeco), color = 'median'), linetype="solid", size=1)+
  geom_vline(aes(xintercept = Mode(politics1$stfeco), color = 'mode'), linetype="longdash",size=1) +
  scale_color_manual(name = "Measurement", values = c(median = "#008500", mean = "#A60000", mode = "white"))+
  ggtitle("The level of satisfaction with present state of economy ")

Conclusion 2. As it can be seen from the graph, people are more or less satisfied with the economy’s state, as both the most frequently reported value and the mean due to the whole pool of answers are equal to 6. However, 50% of respondent`s replies are lower by 1 point.

1. Are Irish confident in their ability to participate in politics ?

politics1 = politics1 %>%
  filter(cptppola != 8 )%>%
  filter(cptppola != 9 )%>% 
  filter(cptppola != 7 )

politics1$cptppola <- factor(politics1$cptppola, labels = c("Not at all confident", "A little confident", "Quite confident", "Very confident", "Completely confident"), ordered= F)
ggplot() +
  geom_bar(data = politics1, aes(x = cptppola), fill="#AD66D5", col="#5F2580", alpha = 0.5) +
  xlab("Confident in own ability to participate in politics") + 
  ylab("Number of people") +
  ggtitle("The level of people`s confidence in ability to participate in politics")

Conclusion 3. People in Ireland tend to be not confident in their ability to participate in politics.

1. What is an attitude towards homosexual people in Ireland?

politics1 = politics1 %>%
  filter(freehms != 8 )%>%
  filter(freehms != 9 )%>% 
  filter(freehms != 7 )
#bar2
politics1$freehms <- factor(politics1$freehms, labels = c("Agree strongly", "Agree", "Neither agree nor disagree", "Disagree", "Disagree strongly"), ordered= F)
ggplot() +
  geom_bar(data = politics1, aes(x = freehms), fill="#FFE773", col="#A68900", alpha = 0.5) +
  xlab("Homosexual people are free to live their lives as they wish ") + 
  ylab("Number of people") +
  ggtitle("People`s attitude towards homosexual relationships")

Conclusion 4. We have found out that, in Ireland, homosexual marriages were legalized in 2011. We are interested to know the level of homophobia five years after legalization. As the graph shows, most residents agree that homosexual people are free to live their lives as they wish.

1. Is there any difference between level of life satisfaction among people who boycott certain products and those who don’t? People, who boycot something, may be more satisfied, since they do something about things they do not like.

politics1 = politics1 %>%
  filter(bctprd != 8 )%>%
  filter(bctprd != 9 )%>% 
  filter(bctprd != 7 )


politics1$bctprd <- factor(politics1$bctprd, labels = c("Yes", "No"), ordered= F,exclude = NA)
ggplot() +
  geom_boxplot(data = politics1, aes(x = bctprd, y = stflife), fill="#C9F76F", col="#679B00", alpha = 0.5) +
  ylim(c(1,10)) +
  xlab("Boycotted certain products last 12 months") + 
  ylab("Satisfied in the life") +
  ggtitle("Comparing level of life satisfaction due to experience of boykotting")

Conclusion 5. It can be seen that there is a decent difference in the distribution of a satisfaction with life variable between two groups of people. The range of answers about life satisfaction is bigger among people who boycotted, and the median here is also higher.

1. How likely are people to trust in the legal system given their opinion towards the ability to be politically active?

politics1 = politics1 %>%
  filter(actrolga != 8 )%>%
  filter(actrolga != 9 )%>% 
  filter(actrolga != 7 )

politics1$actrolga <- factor(politics1$actrolga, labels = c("Not at all able", "A little able", "Quite able", "Very able", "Completely able"), ordered= F,exclude = NA)
ggplot() +
  geom_boxplot(data = politics1, aes(x = actrolga, y = trstlgl), fill="#E667AF", col="#85004B", alpha = 0.5) +
  ylim(c(1,10)) +
  xlab("Considering ability to be politically active") + 
  ylab("Trust in the legal system") +
  ggtitle("Ability to be politically active due to the trust in the legal system")

Conclusion 6. The graph shows that for people, who considered themselves as completely able to be politically active, the level of trust in the legal system is higher. In contrast, for people considered themselves as not absolutely able to be politically active, the level of trust in the legal system is lower. In other cases, the average level of trust reported by the half of the sample is the same.

1. Are more politically interested people more likely to sign petitions?

politics1 = politics1 %>%
  filter(polintr != 7) %>%
  filter(polintr != 8) %>%
  filter(polintr != 9)
politics1$polintr <- factor(politics1$polintr, labels = c("Very interested", "Quite interested", "Hardly interested", "Not at all interested"), ordered= F)

politics1 = politics1 %>%
  filter(sgnptit != 7) %>%
  filter(sgnptit != 8) %>%
  filter(sgnptit != 9)
politics1$signed_petitions <- factor(politics1$sgnptit, labels = c("Yes", "No"), ordered = F)
ggplot(data = politics1, aes(x = polintr, fill = signed_petitions)) +
  geom_bar(position="fill")+
  coord_flip()+
  xlab("How interested in politics") + 
  ylab("Share of population") +
  ggtitle("Participation in signing petitions due to the interest in politics")

Conclusion 7. From what we can see on this graph, it can be concluded that regardless of the degree of interest in politics, a very small percentage of people sign petitions. The only exception is a layer of people who are extremely interested in politics; in their ranks about a third of people signed at least one petition for the last year

1. Is there an association between sharing left/right views and the satisfaction with national government?

politics1 = politics1 

politics1$lrscale = as.numeric(as.character(politics1$lrscale))
politics1$stfgov = as.numeric(as.character(politics1$stfgov))

politics1 = politics1 %>% 
  filter(lrscale != 77) %>% 
  filter(lrscale != 88) %>% 
  filter(lrscale != 99) 

politics1 = politics1 %>% 
  filter(stfgov != 77) %>% 
  filter(stfgov != 88) %>% 
  filter(stfgov != 99)

ggplot(data = politics1) +
  geom_point( aes(x = lrscale, y = stfgov))+
  scale_color_gradient(low = "white", high = "black") +
  xlab("Placement in left-right scale") +
  ylab("Level of satisfaction with national government") +
  ggtitle("The level of satisfaction with national government die to the placement in left-right scale") +
  theme_bw()

The graph shows that the correlation between variables is positive and week. Let`s have a look at correlation values:

politics1 = politics1 %>% 
  select(lrscale, stfgov)
cor1 = cor(politics1)

cordat =  data.frame(cor1, stringsAsFactors = FALSE)

kable(cordat) %>% 
  kable_styling(bootstrap_options=c("bordered", "responsive","striped"), full_width = FALSE)

	lrscale	stfgov
lrscale	1.0000000	0.2587218
stfgov	0.2587218	1.0000000

Conclusion 8. As it can be seen from the graph, there is a week association between sharing left/right views and the satisfaction with national government.

Thus, having built all these charts, we shed light on how things are in Ireland with the most frequently discussed topics in the political sphere. Many results turned out to be a surprise for us, quite interesting phenomenons and in subsequent projects we will try to deepen our research in these unusual areas.

Project 2. Chi-squared and t-test

Preparing data for analysis

politics2 <-ESS %>% 
  select(agea, lrscale, sgnptit, vote)

politics2 <- politics2[!is.na(politics2$agea),]
politics2 <- politics2[!is.na(politics2$lrscale),]
politics2 <- politics2[!is.na(politics2$sgnptit),]
politics2 <- politics2[!is.na(politics2$vote),]

politics2 <- politics2 %>% 
  filter(lrscale != 77) %>% 
  filter(lrscale != 88) %>% 
  filter(lrscale != 99 ) 

politics2 <- politics2 %>%
  filter(sgnptit != 7) %>%
  filter(sgnptit != 8) %>%
  filter(sgnptit != 9) 

politics2 <- politics2 %>% 
  filter(agea != 999)

Describing variables

First, we modify one of the variables to make it comfortable for manipulations. Then, we update our dataset.

politics2$lr <- ifelse(politics2$lrscale <= 3, "Left",
                    ifelse(politics2$lrscale >= 7, "Right", "Centre"))
politics2 <- politics2 %>% 
  select(- lrscale)

Now, let`s look at the number of observations and the number of variables.

dim(politics2)

## [1] 2240    4

Well, that`s all we need for conducting tests: 4 variables and enough number of observations.

Then, there is a description of chosen variables presented.

Label2 <- c("`sgnptit`", "`lr`", "`agea`", "`vote`" ) 
Meaning2 <- c("Signed petition last 12 months", "Placement on left right scale", "Age", "Voted last national election")
Level_Of_Measurement2 <- c("Nominal", "Nominal", "Ratio", "Nominal")
Test2 <- c("Chi-squared test", "Chi-squared test", "T-test for independet variables", "T-test for independet variables")
df2 <- data.frame(Label2, Meaning2, Level_Of_Measurement2,Test2, stringsAsFactors = FALSE)
kable(df2) %>% 
  kable_styling(bootstrap_options=c("bordered", "responsive","striped"), full_width = FALSE)

Label2	Meaning2	Level_Of_Measurement2	Test2
`sgnptit`	Signed petition last 12 months	Nominal	Chi-squared test
`lr`	Placement on left right scale	Nominal	Chi-squared test
`agea`	Age	Ratio	T-test for independet variables
`vote`	Voted last national election	Nominal	T-test for independet variables

Exploring data for Chi-squared test

Firstly, we select variables necessary for chi-square test. Next, there is a contigency table presented.

politics_chi <- politics2 %>% 
  select(lr, sgnptit)
politics_chi$sgnptit <- factor(politics_chi$sgnptit, labels = c("Yes", "No"), ordered= F,exclude = NA)
ContigencyTable <- table(politics_chi$lr, politics_chi$sgnptit)
kable(ContigencyTable)%>% 
  kable_styling(bootstrap_options=c("bordered", "responsive","striped"), full_width = FALSE)

	Yes	No
Centre	281	1041
Left	141	188
Right	95	494

In order to check whether our categories are successful to run chi-square test, we are going to create stacked barplots and analyze them.

sjp.xtab(politics_chi$lr, politics_chi$sgnptit, type = "bar", margin ="row",
  bar.pos = "stack", title = "Participation in signing petitions due to party affiliation", title.wtd.suffix = NULL,
  axis.titles = NULL, axis.labels = NULL, legend.title = NULL,
  legend.labels = NULL, weight.by = NULL, rev.order = FALSE,
  show.values = TRUE, show.n = TRUE, show.prc = TRUE, show.total = TRUE,
  show.legend = TRUE, show.summary = TRUE, summary.pos = "r",
  string.total = "Total", wrap.title = 50, wrap.labels = 15,
  wrap.legend.title = 20, wrap.legend.labels = 20, geom.size = 0.7,
  geom.spacing = 0.1, geom.colors = "Paired", dot.size = 3,
  smooth.lines = FALSE, grid.breaks = 0.2, expand.grid = FALSE,
  ylim = NULL, vjust = "bottom", hjust = "left", y.offset = NULL,
  coord.flip = TRUE)

After building two plots, we were convinced that, one way or another, the adherents of each category of political preferences signed the petitions. However, it can be noted that the tendency to take part in signing petitions is not very common in Ireland; The largest group supporting this trend is the Liberals, with 42.5% of those who signed any petitions. The “centre” and “right”, respectively, have in their ranks 21.1% and 15.3% respectively.
Each observation is independent of all the others (i.e., one observation per subject)and no more than 20% of the expected counts are less than 5. (none of them, actually). Therefore, the data is appropriate to conduct a reliable chi-square test.

Chi-square test

With the introduction of the Internet, signing petitions has become available online. That is, the study of the largerst online petitions` platform (Change.org) has shown that this source is strongly biased toward liberal causes. Graph

In Ireland, the half of the political parties presented are right(conservatism) or centre(social democracy, liberal conservatism, populism) and another half is left(socialism, respublicanism), so citizens have a wide spectrum of different views to share. Accordingly, in order to find out whether this distribution of the petition signatories due to their political preferences is random, we decided to build a Chi- square test.

The following hypotheses were approved for this:

H0 - there is no relation between the political preferences of respondents and their signing petition or not signing behavior
H1 - there is a relation between the political preferences of respondents and their signing petition or not signing behavior

So, let`s run chi-square test.

colnames(ContigencyTable) <- c("Petition +", "Petition -")
rownames(ContigencyTable) <- c("L", "R", "C")
chi.test <- chisq.test(ContigencyTable)
chi.test

## 
##  Pearson's Chi-squared test
## 
## data:  ContigencyTable
## X-squared = 90.992, df = 2, p-value < 2.2e-16

After carrying out the Chi-square test, we found out that its p-value is extremely small, which means that we do not have strong enough evidence to assert that there is no relation between these two variables. In this way, our H0 should be rejected and political preferences of respondents and their signing petition or not signing behavior are likely to be related.

Next, we have to look at residuals.

kable(chi.test$stdres)

	Petition +	Petition -
L	-2.459609	2.459609
R	9.217379	-9.217379
C	-4.663756	4.663756

assocplot(t(ContigencyTable), main="Residuals and number of observations" )

On the plot of residuals, we can see the confirmation of our conclusion on the Chi-test: the difference in the number of petitioners who belong to different political parties is too big to say that the variables are independent of each other. Especially distinguished are the liberals, in whose ranks the number of signatories of the petition for indicator 10 is greater than expected, if these variables were independent; as well as the rights` ones, where the indicator 6 is less than the expected number of people who signed the petitions, if these variables were independent.
Thus, we were convinced that, apparently, since the Chi-square test and the difference in the residuals indicate a lack of evidence in favor of the independence of these data, we can assert that the political preferences of the respondents and their desire to sign or not to sign petitions of any kinds are related.

Exploring data for T-test

Here we start with filtering data to delete values useless for our test.

politics_ttest <- politics2 %>% 
  select(agea, vote) %>% 
  filter(vote != 3) %>% 
  filter(vote != 7) %>% 
  filter(vote != 8)

Next, let`s compare mean values with the help of boxplot.

politics_ttest$vote <- factor(politics_ttest$vote, labels = c("Yes", "No"), ordered= F,exclude = NA)
ggplot() +
  geom_boxplot(data = politics_ttest, aes(x = vote, y = agea), fill="#A44200", col="#A44200", alpha = 0.5) +
  scale_y_continuous(limits = c(0,100)) +
  xlab("Voted last national election") + 
  ylab("Age") +
  ggtitle("Participation in the election due to age")

The median age of voters is higher than the median age of those who did not vote. The first box plot is taller than the second, so we can say that there is a greater variety of ages in the group of voters than in the group of those who refused to vote. The whiskers are pretty the same on both of the graphs. However, the graph of non-voters shows some outliers.

Cheking normality of distribution

There is the first way to check normality presented.

describeBy(politics_ttest, politics_ttest$vote)

## 
##  Descriptive statistics by group 
## group: Yes
##       vars    n  mean    sd median trimmed   mad min max range  skew
## agea     1 1689 54.93 16.41     55   54.98 19.27  19  92    73 -0.03
## vote*    2 1689  1.00  0.00      1    1.00  0.00   1   1     0   NaN
##       kurtosis  se
## agea     -0.77 0.4
## vote*      NaN 0.0
## -------------------------------------------------------- 
## group: No
##       vars   n  mean    sd median trimmed   mad min max range skew
## agea     1 408 40.69 15.87     38   39.19 14.83  16  96    80 0.87
## vote*    2 408  2.00  0.00      2    2.00  0.00   2   2     0  NaN
##       kurtosis   se
## agea      0.44 0.79
## vote*      NaN 0.00

Skewness is a measure of the symmetry in a distribution. The normal distribution is symmetrical, so skew should be equal to 0 in normal distribution. In the group of voters skew equals to 7.49, and in the group of non-voters the skew is higher, 8.71. The distribution of age is more symmetrical in the group of voters, but still it is far away from normal. However, both of skews are greater than 1, so both of the groups have a high positive skewness (right).
Kurtosis tells us, whether the distribution is peaked or plain. The kurtosis of the age in voters group equals to 55.29, and in the non-voters group kurtosis equals to 75.76. That means that the distribution of the first group (voters) is less sharp than the distribution of the second group.

Next, we check normality with the help of histogram.

library(ggplot2)
ggplot(politics_ttest, aes(x = agea, fill = vote)) +
      geom_histogram(aes(y=..density..), position = "identity", alpha = 0.7, binwidth = 3) +
  geom_density(col = "yellow", fill = "white", alpha = 0.1) +
  geom_vline(aes(xintercept = mean(politics_ttest$agea), color = 'mean'), linetype="dashed", size=1) +
  geom_vline(aes(xintercept = median(politics_ttest$agea), color = 'median'), linetype="longdash", size=1) +
  scale_color_manual(name = "Measurement", values = c(median = "#cb3f68", mean = "#824acd")) +
  xlab("Age") + 
  ylab("Density") +
  ggtitle("Age distribution of voters and non-voters")

Now we can see the distribution and assume that both of the groups are not very close to normal distribution, but the group of voters is slightly closer to it.

Finally, we check normality with the help of Q-Q Plot.

#creating subgroups based on voting / non-voting
voteplus <- subset(politics_ttest[politics_ttest$vote == "Yes",]) 
voteminus <- subset(politics_ttest[politics_ttest$vote == "No",])
par(mfrow = c(1,2))
# y is limited from 18 because it is age at which the Irish are allowed to vote
qqnorm(voteplus$agea, ylim = c(18, 100), main = "Normal Q-Q Plot for vote+"); qqline(voteplus$agea,ylim = c(18 ,100), col= 2)
qqnorm(voteminus$agea, ylim = c(18 ,100), main = "Normal Q-Q Plot for vote-"); qqline(voteminus$agea, col= 2, ylim = c(18 ,100))

Both Q-Q plots show the distributions that are skewed to the right (some data higher than the line) and sharp peaks (the shape of the line is not the same as the normality line). However, the first plot shows a slightly plainer peak.
Thus, we can conclude that the first group (voters) is more normally distributed than the second group (non-voters). However, during a series of tests on the normality of the sample, it was proved that the resulting sample has an abnormal distribution.

Conducting T-test

Age is sometimes mentioned as one of the factor which can influence voting behaviour, but it seems that every country should be studied as a unique case.

As in the case with political preferences and signed petitions, we would like to find out whether voting behavior in Ireland is related to the respondent’s age, so, we should conduct a T-test. The following hypotheses were approved for this:

H0: the mean age of people who voted and did not vote does not differ.
H1: the mean age does differ and, thus, there is a relation between age and voting behavior

Now we are going to run T-test.

t.test(politics_ttest$agea ~ politics_ttest$vote)

## 
##  Welch Two Sample t-test
## 
## data:  politics_ttest$agea by politics_ttest$vote
## t = 16.147, df = 634.29, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  12.50096 15.96259
## sample estimates:
## mean in group Yes  mean in group No 
##          54.92540          40.69363

Statistical conclusion: at the 5% significance level on the available data the null hypothesis should be rejected in favor of the alternative one (p-value < Рћ.05).
Substantive conclusion : the average age of people is significantly different among those who voted, and those who refused to vote.

Double-checking results with non-parametric test

H0: the two populations (voters and non-voters) have the same distribution with the same median age.
H1: the two populations (voters and non-voters) have the different distribution with the different median age.

wilcox.test(agea ~ vote, data = politics_ttest)

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  agea by vote
## W = 509930, p-value < 2.2e-16
## alternative hypothesis: true location shift is not equal to 0

Statistical conclusion: according to the obtained p-value, which is really, really small, there are no strong enough evidence to assert that H0 is true. Thus, it should be rejected.
Substantive conclusion: The Wilcoxon test also proves that the age of people from the considered groups is significantly different among those who voted, and those who refused to vote.

Overall conclusion

Thus, by operating on the data and having conducted several statistical tests, we can confidently assert the following:

The activity of people in the field of signing petitions is extremely related to the political preferences of Irish citizens. The greatest number of petitions are signed by people of the left political orientation, the least - by the right one.
Age is considered to be one of the important factor which influences voting behavior in Ireland. Young citizens in Ireland tend not to attend elections, although they have already reached the age of majority, while people after forty for the most part regularly attend the elections.

Project 3. One-way Analysis of Variance

Describing background

During the discussion of Ireland as a object of our research, we found out, that it is a quite welfare country. It is on the sixth place on a scale of human development index, which is extremely cool. However, this index, which is composed of life expectancy, education, and per capita income indicators, does not include the political aspects of a country.

Since our research topic is Politics, we were concerned with this fact, and tried to figure out and explore, how does involvement in politics affect life satisfaction in Ireland.

Our expectations were that the most interested in politics people have the highest level of life satisfaction, comparing to other people who are not that interested in political processes.

Our research question is “Do irish people who are interested in politics to different extents have the same level of life satisfaction?”

Preparing data for analysis

politics3 = ESS %>% 
  select( stflife, polintr)

politics3 = politics3 %>%
  filter(stflife != 77) %>%
  filter(stflife != 88) %>%
  filter(stflife != 99) 

politics3 = politics3 %>% 
  select( stflife, polintr) %>% 
  filter(polintr != 7) %>% 
  filter(polintr != 8) %>% 
  filter(polintr != 9 )

Manipulating & Describing variables

Then, there is a description of chosen variables presented.

Label3 <- c("`polintr`", "`stflife`") 
Meaning3 <- c("How interested in politics", "How satisfied with life as a whole")
Level_Of_Measurement3 <- c("Ordinal", "Interval")
Measurement3 <- c("Very - Quite - Hardly - Not at all", "0 - 10")
df3 <- data.frame(Label3, Meaning3, Level_Of_Measurement3, Measurement3, stringsAsFactors = FALSE)
kable(df3) %>% 
  kable_styling(bootstrap_options=c("bordered", "responsive","striped"), full_width = FALSE)

Label3	Meaning3	Level_Of_Measurement3	Measurement3
`polintr`	How interested in politics	Ordinal	Very - Quite - Hardly - Not at all
`stflife`	How satisfied with life as a whole	Interval	0 - 10

Let`s filter our data and prepare it for the further analysis.

politics3 = politics3 %>% 
  select(polintr, stflife)
politics3$polintr <- ifelse (politics3$polintr == 1, "Very interested",
                    ifelse(politics3$polintr == 2, "Quite interested", 
                    ifelse(politics3$polintr == 3, "Hardly interested", "Not interested")))
politics3$stflife <-  as.numeric(as.character(politics3$stflife))
politics3$polintr <- as.factor(politics3$polintr)

politics3$polintr <- factor(politics3$polintr, c("Not interested", "Hardly interested", "Quite interested", "Very interested" ))

politics3 = politics3%>% 
  filter(politics3$stflife != 88)

politics3 = politics3 %>% 
  filter(politics3$stflife != 77)

politics3 = politics3 %>% 
  filter(politics3$stflife != 99)

Values descriptives across the groups

describeBy(politics3$stflife, politics3$polintr, mat = TRUE) %>% #create dataframe
  select(polintr = group1, N=n, Mean=mean, SD=sd, Median=median, Min=min, Max=max, 
                Skew=skew, Kurtosis=kurtosis, st.error = se) %>% 
  kable(align=c("lrrrrrrrr"), digits=2, row.names = FALSE,
        caption="Satisfaction with life by political preferences") %>% 
  kable_styling(bootstrap_options=c("bordered", "responsive","striped"), full_width = FALSE)

Satisfaction with life by political preferences
polintr	N	Mean	SD	Median	Max	Skew	Kurtosis	st.error
Not interested	744	7.12	2.00	7	10	-0.87	1.05	0.07
Hardly interested	733	7.23	1.90	8	10	-0.97	1.42	0.07
Quite interested	964	7.29	1.80	7	10	-1.02	2.00	0.06
Very interested	308	7.54	1.87	8	10	-0.99	1.35	0.11

By looking at this table we can conclude that the sizes of our groups are quite comparable

Looking at groups

Next, we are to look at groups` sizes to be sure that they are representative.

par(mar = c(3,10,0,3))
barplot(table(politics3$polintr)/nrow(politics3)*100, horiz = T, xlim = c(0,60), las = 2)

Now, by looking at the barplot, we also can conclude that the groups are of a comparable size.

Creating boxplot to check for outliers

ggplot()+
  geom_boxplot(data = politics3, aes(x = polintr, y = stflife), fill="pink", col="purple", alpha = 0.5) +
  ylim(c(0,10)) +
  xlab("How interested in politics") + 
  ylab("Level of Life satisfaction") +
  ggtitle("Life satisfaction by the level of interest in politics")

Conclusion: From the boxplot we can see that the Y variables are not quite normally distributed, since in “Quite interested” and “Hardly interested” groups the means are far from the centre of boxplots. Hopefully, we can go with it. Also,there are several outliers. Moreover, it can be see that those, who are completely not interested in politics and those who are very interested in politics have the higher mean of life satisfaction level.

Homogeneity of variances

The next step is to check the assumptions for ANOVA-test. Then, let`s look at homogeneity of variances with the help of Levene test.

leveneTest(politics3$stflife ~ politics3$polintr)

Conclusion: From the results of the Levene’s Test it can be seen that the p-value is much higher than the significance level of 0.05. This means that there is no evidence to suggest that the variance among groups is statistically significantly different. Therefore, we can assume the homogeneity of variances in the different groups of political interest.

ANOVA test

Hypothesis: the means of the level of life satisfaction among people with various levels of political interest are the same

oneway.test(politics3$stflife ~ politics3$polintr, var.equal = T)

## 
##  One-way analysis of means
## 
## data:  politics3$stflife and politics3$polintr
## F = 3.8028, num df = 3, denom df = 2745, p-value = 0.009808

aov.out <- aov(politics3$stflife ~ politics3$polintr)
summary(aov.out)

##                     Df Sum Sq Mean Sq F value  Pr(>F)   
## politics3$polintr    3     41  13.562   3.803 0.00981 **
## Residuals         2745   9790   3.566                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Conclusion: F(3, 2745) = 3.8028 and p-value <.01. Basing on these numbers we should reject the null hypothesis. It means that the difference in level of life satisfaction across the groups of political interest is statistically significant.

Normality of residuals

By plots

layout(matrix(1:4, 2, 2))
plot(aov.out)

Conclusion: We can see that on the upper two graphs the red line is pretty straight. The lime on the Q-Q plot is not as straight. So, on the basis of these graphs, we can conclude that the distribution of residuals is not pretty normal.

By skew and kurtosis

anova.res <- residuals(object = aov.out) 
describe(anova.res)

Conclusion: Skew and kurtosis are <2, so the distribution of residuals is normal (!)

By Shapiro test

shapiro.test(x = anova.res)

## 
##  Shapiro-Wilk normality test
## 
## data:  anova.res
## W = 0.93435, p-value < 2.2e-16

Conclusion: The p-value is extremely small, whcich testifies the non-normal distribution of residuals

By histogram

hist(anova.res, main = "Distribution of residuals", xlab = "Residuals", col = "pink", border = "#BC6B97")

Conclusion By looking at the histogram we can conclude that residuals are not pretty normally distributed, but rather skewed to the left.

Overall conclusion: All the tests except the skew and kurtosis analysis tell that the distribution of residuals is not normal. So, the assumption of the normality of residuals does not hold.

Post-hoc Tukey test

In the ANOVA test a significant p-value indicates that means in some groups are different, though it doesn`t show, which pairs of groups this exactly are. To find this out, a post hoc test can be conducted to determine if the mean difference between specific pairs of group are statistically significant.

As variances across groups are practically equal, we chose Tukey test for that.

par(mar = c(5, 15, 3, 1)) 
Tukey <- TukeyHSD(aov.out) 
plot(Tukey, las = 2, col = "red" )

Conclusion The test results show, that only the difference between very interested in politics and not interested in politics groups is significant, since the projection of difference between means of these two groups cross the “0” line

Non-parametric test (Kruskal-Wallis)

As it could be seen from the boxplot, there are some outliers. Therefore we want to double-check our results using non-parametric test.

H0: Mean ranks of the groups are not different.

kruskal.test(politics3$stflife ~ politics3$polintr, data = politics3)

## 
##  Kruskal-Wallis rank sum test
## 
## data:  politics3$stflife by politics3$polintr
## Kruskal-Wallis chi-squared = 12.764, df = 3, p-value = 0.005176

Conclusion Basing on KW chi-square (3) = 12.764 and p-value <.01 we reject the null-hypothesis and assume that the mean ranks of the chosen groups are different. The test confrims the results of the ANOVA test.

Dunn’s test

Since the results of Kruskal-Wallis test are statistically significant, we now run Dunn’s test.

DunnTest(politics3$stflife ~ politics3$polintr, data = politics3)

## 
##  Dunn's test of multiple comparisons using rank sums : holm  
## 
##                                    mean.rank.diff   pval    
## Hardly interested-Not interested        51.233292 0.4132    
## Quite interested-Not interested         57.533139 0.3912    
## Very interested-Not interested         188.485145 0.0022 ** 
## Quite interested-Hardly interested       6.299848 0.8690    
## Very interested-Hardly interested      137.251854 0.0476 *  
## Very interested-Quite interested       130.952006 0.0476 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Conclusion: The results of Dunn test show, that besides people who are very interested in politics and people who are not interested in politics at all there are two more pairs of groups, which differences in means are statistically significant. These are:

Very interested in politics and Hardly interested in politics
Very interested in politics and Quite interested in politics However, the differences in means between these pairs are statistically significant at 1% significance level, while the difference in means between Very interested in politics and Not interested in politics groups is significant at the level of 0.1%.

The rest pairs of groups of people with different levels of political interest have not statistically significant differences in means.

Total conclusion

So, answering our research question, we can argue that some groups of Irish people who are differently interested in politics have a different average level of life satisfaction. To be more precise, the following groups have a significant differences:

Very interested in politics and Hardly interested in politics people
Very interested in politics and Quite interested in politics people While people, who are very interested in politics and people, who are not interested in politics at all, have remarkable significant differences in means of life satisfaction level, meaning that these groups of people have the largest difference in life satisfaction level.

People, who are quite interested in politics and hardly interested in politics do not have statistically significant differences in means of life satisfaction level. The same goes also for these pairs of groups:

Quite interested in politics and Not interested in politics people
Hardly interested in politics and Not interested in politics people

After all these tests and analysis we can conclude that the Irish people who are interested in politics to different extents indeed have not the same level of life satisfaction. Moreover, our expectations are met and people with the highest political interest are most satisfied with life.

However, what’s the reason for such a difference in life satisfaction among the groups who are very interested in politics and who are not interested in politics? The answer may be that people who are not satisfied with their lives do not even care about politics. They may be much more interested in and focused on their basic needs that they need to fulfill to become satisfied with life first. Also, there may be a moderator that causes such a difference.

Project 4 + 5. Linear Regression: the Ultimate Genealogy

Contribution

Anastasia Bakhareva: exploratory analysis, building linear regression models, cheking linear regression assumptions
Iana Borisenko: analysis of background, interpretation of linear regression models, running and interpreting ANOVA for interactive model
Irina Kireeva: running ANOVA for additive models and their interpretation, building interaction graphs and their interpretation
Daria Kuzmicheva: creating hypotheses, interpretation of linear regression models, running ANOVA for additive models and their interpretation, writing conclusion

Analysis of Background

All our long work with the analysis of political aspects in Ireland originates from our very first simple analysis of this topic with the construction of colorful graphs. While doing that we noticed some interesting patterns that we wanted to study and analyze in more detail. After all the works presented by us, we approached the most intriguing topic: which of our selected variables in our large politics dataset predicts the satisfaction with democracy in the best way. To know this, we constructed several mathematical models and compared them. Now we are taking a new step: we are adding a mediator to our model. So, finally, we want to complete our long road with adding the analysis of linear regression with an interaction effect. Previously, we have determined some variables for the linear regression and checked, which model is the best. Now we are going even deeper in analysing this pattern.

Having explored the literature, we came up to the the articles that told us the following:

Within the set of liberal democracies, the Nordic countries tend to have the highest trust rates, (and Ireland is actually a Nordic country), and the confidence of people in the government is of a general nature: a high level of trust in one institution tends to spread to other institutions, such as trust in parliament and overall satisfaction with democracy .
There was a citation: “The evidence suggests that trust in government is a poor indicator of the level of social trust in each country, its contribution to overall life satisfaction is at best indirect, and it is a poor indicator of quality of governance. Further research is recommended to clarify the value of trust in government and its relationship to other key policy objectives”. The author explored the relation between the quality of governance and the trust in government itself. It is interesting that the latter is a bad predictor of the first one. The author claims that the area needs further research, and here we are! We would like to introduce our new variable, categorical one!

Our research questions are as follows: * We would like to know, which of these variables predicts the satisfaction with democracy in the best way. To know this, we are going to construct several mathematical models and compare them to come up to the conclusion.

We also would like to explore, whether people’s care for the strength and safety ensurance of the government (we consider this variable as close to the quality of governance) has an effect on the satisfaction with democracy as well.

Null hypothesis is that there is no relationship between the independent variables (trust to parliament, trust to politicians, important that government is strong and ensures safety) and the dependent variable (satisfaction with democracy)).

The alternate hypothesis is that there exists a relationship between the independent variables (trust to parliament, trust to politicians, important that government is strong and ensures safety) and the dependent variable (satisfaction with democracy)).

Manipulating the Data

Our variables are:

Label4 <- c("`trstprl`", "`ipstrgv`", "`stfdem`", "`trstplt`" ) 
Meaning4 <- c("Trust to parliament", "Important that government is strong and ensures safety", "Satisfaction with democracy", "Trust to politicians")
Level_Of_Measurement4 <- c("Interval", "Ordinal", "Interval", "Interval")
Measurement4 <- c("0 - 10","Very much like me - Like me - Somewhat like me  - A little like me - Not like me", "0 - 10", "0 - 10")
df4 <- data.frame(Label4, Meaning4, Level_Of_Measurement4, Measurement4, stringsAsFactors = FALSE)
kable(df4) %>% 
  kable_styling(bootstrap_options=c("bordered", "responsive","striped"), full_width = FALSE)

Label4	Meaning4	Level_Of_Measurement4	Measurement4
`trstprl`	Trust to parliament	Interval	0 - 10
`ipstrgv`	Important that government is strong and ensures safety	Ordinal	Very much like me - Like me - Somewhat like me - A little like me - Not like me
`stfdem`	Satisfaction with democracy	Interval	0 - 10
`trstplt`	Trust to politicians	Interval	0 - 10

es2 = ESS %>% 
  select(trstprl, stfdem, trstplt, ipstrgv)

es2$ipstrgv= as.factor(es2$ipstrgv)
es2$trstprl = as.numeric(as.character(es2$trstprl))
es2$trstplt = as.numeric(as.character(es2$trstplt))
es2$stfdem = as.numeric(as.character(es2$stfdem))

es2 = es2 %>% 
  filter(trstprl != 77) %>% 
  filter(trstprl != 88) %>% 
  filter(trstprl != 99)
es2 = es2 %>% 
  filter(ipstrgv != 7) %>%
  filter(ipstrgv != 8) %>%
  filter(ipstrgv != 9) 
es2 = es2 %>% 
  filter(stfdem != 77) %>%
  filter(stfdem != 88) %>% 
  filter(stfdem != 99) 
es2 = es2 %>% 
  filter(trstplt != 77) %>% 
  filter(trstplt != 88) %>% 
  filter(trstplt != 99) 
  

es2 <- es2[!is.na(es2$trstprl),]
es2 <- es2[!is.na(es2$ipstrgv),]
es2 <- es2[!is.na(es2$trstplt),]
es2 <- es2[!is.na(es2$stfdem),]

Exploring the data

So, first of all, we should have a glance on specifications of our dataset with the function summary.

summary(es2)

##     trstprl           stfdem          trstplt          ipstrgv   
##  Min.   : 0.000   Min.   : 0.000   Min.   : 0.000   2      :998  
##  1st Qu.: 3.000   1st Qu.: 4.000   1st Qu.: 2.000   1      :729  
##  Median : 5.000   Median : 6.000   Median : 4.000   3      :421  
##  Mean   : 4.538   Mean   : 5.423   Mean   : 3.775   4      :233  
##  3rd Qu.: 6.000   3rd Qu.: 7.000   3rd Qu.: 5.000   5      :125  
##  Max.   :10.000   Max.   :10.000   Max.   :10.000   6      : 26  
##                                                     (Other):  0

Seems legit, now we need to understand our variables from our dataset graphically.

For that we will need to create:

Box plot, to show the continuous variables and spot outliers.
Density plot, to check if the distribution of our continuous variables is close to normal.
Barplot, to see if the categorical variable is representative.
Scatter plot, to visualize the linear relationship between the variables.

Using Boxplot to check for outliers

par(mfrow=c(1, 3))
boxplot(es2$trstprl, main="Trust in country's parliament", sub=paste("Outlier rows: ", boxplot.stats(es2$trstprl)$out))
boxplot(es2$trstplt, main="Trust in politicians", sub=paste("Outlier rows: ", boxplot.stats(es2$trstplt)$out))
boxplot(es2$stfdem, main="Satisfaction with democracy", sub=paste("Outlier rows: ", boxplot.stats(es2$stfdem)$out))

It can be seen that there are virtually no outliers except for one point in trust in politicians (it can be found on line 10 in our dataset). Moreover, it can be seen that trust in politicians has the lowest median of level of trust.

Using Histograms to check if continuous variables are close to normal

par = ggplot(data = es2, aes(x = trstprl))  + geom_histogram(aes(y=..density..), position = "identity", alpha = 0.7, binwidth = 1, fill = "orange") + geom_density(col = "blue", fill = "white", alpha = 0.1) + xlab("Trust in parliament")
dem = ggplot(data = es2, aes(x = stfdem)) + geom_histogram(aes(y=..density..), position = "identity", alpha = 0.7, binwidth = 1, fill = "purple") + geom_density(col = "blue", fill = "white", alpha = 0.1) + xlab("Satisfaction with democracy")
polit = ggplot(data = es2, aes(x = trstplt)) + geom_histogram(aes(y=..density..), position = "identity", alpha = 0.7, binwidth = 1, fill = "grey") + geom_density(col = "blue", fill = "white", alpha = 0.1) + xlab("Trust in politicians")

plot_grid(par, polit, dem)

As it can be seen from the histograms, trust in parliament and satisfaction with democracy are slightly close to normal distribution. As for the trust in politicians, the histogram is not normally distributed. However, we can surely work with that.

Using Barplots to check if categorical variable is representative

levels(es2$ipstrgv)[1] <- "Very important"
levels(es2$ipstrgv)[2] <- "Important"
levels(es2$ipstrgv)[3] <- "Quite important"
levels(es2$ipstrgv)[4] <- "Little important"
levels(es2$ipstrgv)[5] <- "Not really important"
levels(es2$ipstrgv)[6] <- "Not important at all"
es2$ipstrgv <- factor(es2$ipstrgv,ordered= F,exclude = NA)
ggplot(data = es2, aes(x = ipstrgv)) + geom_bar(aes(y = (..count..)/sum(..count..)), fill = "pink") +  scale_y_continuous(labels=scales::percent) + ylab("Relative frequencies") + ggtitle("important for government to be strong") + coord_flip()

We can see that the groups are not of a comparable size, but, surely, we can continue our work.
The most Irish people believe that the government needs to be strong and to provide safety.

Using Scatterplots to visualise the relationship

w = ggplot(data = es2, aes(x = trstprl, y = stfdem)) + geom_point() + geom_smooth(method = lm, fill="blue", color="blue", se = FALSE)  + ggtitle("Trust in parlment by satisfaction with democracy") + xlab("Trust in parlament") + ylab("Satisfaction with democracy")
w

e = ggplot(data = es2, aes(x = trstplt, y = stfdem)) + geom_point() + geom_smooth(method = lm, fill="blue", color="blue", se = FALSE)  + ggtitle(" Trust in politicians by satisfaction with democracy") + xlab("Trust in politicians") + ylab("Satisfaction with democracy")
e

Our scatterplots show that:

there is a positive correlation between satisfaction with democracy and trust in parliament
there is a positive correlation between satisfaction with democracy and trust in politicians

4.1. Looking at correlation coefficients

Last, but not the least, we would like to look how our continuous variables are related. For that let us have a look at this fine graph:

es3 = es2 %>% 
  select( - ipstrgv)
cor2 = cor(es3)
sjp.corr(es3, show.legend = TRUE)

From what we can see, all the relationship between our variables are pretty decent and have positive direction.
What is interesting, that the highest correlation coefficient is between trust to politicians and trust to parlament. The presented values confirm the situation on the scatterplots.
It is also worth noting that there is a very high level of significance (p <. 001)

Conducting Linear Regression Models

Since we have seen the linear relationship pictorially in the scatter plot and by computing the correlation, it is time for model conduction.

Linear regression model with 1 predictor

First, we look at the model with one predictor. Here we want to see how satisfaction with democracy can be predicted by trust to the parliament. We construct a table and look at what it means:

model1 = lm( stfdem ~ trstprl, data = es2)
sjPlot::tab_model(model1)

	stfdem
Predictors	Estimates	CI	p
(Intercept)	3.22	3.06 – 3.38	<0.001
trstprl	0.49	0.45 – 0.52	<0.001
Observations	2532
R² / adjusted R²	0.265 / 0.265

P-value is significant here (p <.001)
Adjusted R-squared = 0.265, which means that 26.5% of variation in satisfaction with democracy can be explained with our model.
And now we look at the estimates to be able to construct the equation that looks like this:

\[stfdem = 3.22 + 0.49 * trstprl \]

The intercept is equal to 3.22 and it refers to the predicted value of satisfaction with democracy when trust in parlament is equal to 0.
With each increase in trust in parlament by one, satisfaction with democracy rises by 0.49.

Linear regression model with 2 predictors

Now we add another predictor to our model. We add trust to politicians to see, if the additional variable will help us to predict the satisfaction with democracy better

model2 = lm( stfdem ~ trstprl + trstplt , data = es2)
sjPlot::tab_model(model2)

	stfdem
Predictors	Estimates	CI	p
(Intercept)	3.06	2.89 – 3.22	<0.001
trstprl	0.37	0.33 – 0.41	<0.001
trstplt	0.19	0.15 – 0.22	<0.001
Observations	2532
R² / adjusted R²	0.290 / 0.290

P-value is significant here (p <.001)
Adjusted R-squared = 0.29, which means that 29% of variation in satisfaction with democracy can be explained with the model.
And now we look at the estimates to be able to construct the equation that looks like this:

\[stfdem = 3.06 + 0.37 * trstprl + 0.19 * trstplt \]

The intercept is equal to 3.06 and it refers to the predicted value of satisfaction with democracy when trust in parlament ans trust in politicians are equal to 0.
With each increase in trust in parlament by one, satisfaction with democracy rises by 0.37.
With each increase in trust in politicians by one, satisfaction with democracy rises by 0.19

Linear regression model with 3 predictors

Finally, we add a variable ipstrgv(important that government is strong and ensures safety) to our model

model3 = lm( stfdem ~ trstprl + trstplt + ipstrgv, data = es2)
sjPlot::tab_model(model3)

	stfdem
Predictors	Estimates	CI	p
(Intercept)	3.22	3.03 – 3.41	<0.001
trstprl	0.37	0.33 – 0.41	<0.001
trstplt	0.19	0.15 – 0.23	<0.001
Important	-0.24	-0.41 – -0.06	0.008
Quite important	-0.35	-0.57 – -0.13	0.002
Little important	-0.28	-0.55 – -0.01	0.043
Not really important	-0.07	-0.42 – 0.28	0.687
Not important at all	0.04	-0.67 – 0.76	0.908
Observations	2532
R² / adjusted R²	0.294 / 0.292

Now, here we have lots of interesting stuff.

P-value is significant in most cases (p <.001), though let us take a look on those categories have larger P-value. These are вЂњLittle importantвЂќ with P-value equal to 0.043 , which is really close to 0.05 level; also we have “Not really important” and “Not important”" at categories with p-values much larger than 0.05, so that it is not significant.
Adjusted R-squared = 0.292, which means that 29% of variation in satisfaction with democracy can be explained with this model.
And now we look at the estimates to be able to construct the equation that looks like this:

\[stfdem = 3.22 + 0.37 * trstprl + 0.19 * trstplt - 0.24 * important - 0.35 * quite.important - 0.28 * little.important - 0.07 * Not.really.important + 0.04 * Not.important.at.all\]

The intercept is equal to 3.22 and it refers to the predicted value of satisfaction with democracy when trust in parlament and trust in politicians are equal to 0, and when a person believes that it is very important for the government to be strong and provide security.
With each increase in trust in parlament by one, satisfaction with democracy rises by 0.37.
With each increase in trust in politicians by one, satisfaction with democracy rises by 0.19.
If a respondent consider the matter of a strong government important, satisfaction with democracy decreases by 0.24.
If a respondent consider the matter of a strong government quite important, satisfaction with democracy decreases by 0.35.
If a respondent consider the matter of a strong government only a little important, satisfaction with democracy decreases by 0.28.
If a respondent consider the matter of a strong government not really important, satisfaction with democracy decreases by 0.07.
If a respondent consider the matter of a strong government not important at all, satisfaction with democracy rises by 0.04.

Comparing Models

ANOVA helps us to compare models in which everything is the same, but several variables are added to one of them (or more), which are not taken into account in another model.

anova(model1, model2)

As we can see here, p-value is much less than 0.05, so we should look at the RSS value and consider model with its` least value as a better one.
Thus, in this case, model with 2 predictors is better.

anova(model2, model3)

As we can see here, p-value is less than 0.05, so we should look at the RSS value and consider model with its` least value as a better one.
Thus, in this case, model with 3 predictors is better.

Adding Interaction to the Model

We`ve added interaction to the best model(model3 with 3 predictors) according to ANOVA.

model4 = lm( stfdem ~ trstprl + trstplt + ipstrgv + trstprl * trstplt , data = es2)
sjPlot::tab_model(model4)

	stfdem
Predictors	Estimates	CI	p
(Intercept)	3.04	2.80 – 3.28	<0.001
trstprl	0.42	0.36 – 0.47	<0.001
trstplt	0.26	0.19 – 0.33	<0.001
Important	-0.24	-0.42 – -0.07	0.007
Quite important	-0.36	-0.58 – -0.14	0.001
Little important	-0.30	-0.57 – -0.03	0.029
Not really important	-0.07	-0.41 – 0.28	0.709
Not important at all	0.07	-0.65 – 0.79	0.847
trstprl:trstplt	-0.01	-0.03 – -0.00	0.015
Observations	2532
R² / adjusted R²	0.295 / 0.293

Now we have a huge table with lots of numbers. and they look scary! LetвЂ™s interpret it all step by step.

We added an interaction between вЂњTrust to parliamentвЂќ and вЂњTrust to politiciansвЂќ variables
Let us have a look at p-values. Hmm, they all seem quite significant, except for Not really important and Not really important variables.
Let us look at R-squared and adjusted R-squared. They are 0.295 / 0.293 respectively. And it is good for us, since our model explains nearly 1/3 of variance of satisfaction with democracy!
What is more, this model is better than the previous one, without the interaction. The values for R-squared and adjusted R-squared were 0.294 / 0.292, and now we have a 0.001 increase, yey!

Let us now have a look at the estimates and interpret the significant ones:

Intercept is equal to 3.22. If a person does not trust to politicians and to the parliament, and he also thinks that it is very important for the government to be strong and provide security, the satisfaction with democracy will be equal to 3.22.
The estimate for trust to parliament is 0.42. If a person does not trust to polirticians and thinks that it is very important for the government to be strong and provide security, the satisfaction with democracy will rise by 0.42 points with each increase by 1 in this variable.
The estimate for trust to politicians is 0.26. If a person does not trust to polirticians and thinks that it is very important for the government to be strong and provide security, the satisfaction with democracy will rise by 0.26 with each increase by 1 in this variable.
The estimate for important is 0.24. If a person does not trust to polirticians and to parliament and thinks that it is important for the government to be strong and provide security, the satisfaction with democracy will rise by 0.24.
The estimate for quite important is 0.36. If a person does not trust to polirticians and to parliament and thinks that it is quite important for the government to be strong and provide security, the satisfaction with democracy will rise by 0.36.
The estimate for little important is 0.30. If a person does not trust to polirticians and to parliament and thinks that it is little important for the government to be strong and provide security, the satisfaction with democracy will rise by 0.30.
The estimate for trust to parliament and trust to politicians interaction is -0.01. If a person considers that it is very important for the government to be strong and provide safety, each increase by 1 in both of these variables will bring -0.01 point to the satisfaction with democracy.

And here is an equation for this model:

anova(model3, model4)

As we can see here, p-value is less than 0.05, so we should look at the RSS value and consider model with its` least value as a better one.
Thus, in this case, interaction model is better and the best one.

Creating Interaction Plot

Now let us construct the interaction plot to visualize our interaction somehow.

plot_model(model4, type = "int", terms = "trstplt", mdrt.values = "minmax")

From the plot we can conclude that the higher the trust to parliament, the higher is the satisfaction with democracy. As a mediator here we take Trust to politicians variable, namely, it’s highest and lowest values. We can see that the graphs do not cross, but come closer by the right end of the graph. This means, we have an interaction effect, but it is not very significant.

plot_model(model4, type = "int", terms = "trustpoliticians", mdrt.values = "quart")

May be it will be better if we peek the quantiles? Well, the shaded area crosses. So, there is a non-significant interaction effect as well!

Checking Linear Regression Assumptions

Linear regression makes several assumptions about the data, such as :

Linearity of the data
Normality of residuals
Homogeneity of residuals variance
Independence of residuals error terms

par(mfrow = c(2, 2))
plot(model4)

Linearity assumption: at the Residuals vs.Fitted plot a horizontal line, without distinct patterns can be seen, which is surely a good thing. (Our data is linear)
At the Q-Q plot points follow the straight dashed line, which is a nice indicator of normally distributed residuals.
Scale-Location & Residuals vs. Leverage plot show us a red horizontal line with equally, though in a funny way, spread points. This corresponds with the homoscedasticity of our data.
On Residuals vs Leverage plot we can spot only a couple of outliers

Conclusion

Based on our long and stressful analysis, after having modeled a mathematical function and checked its assumptions, we can make the following conclusions:

Trust in politics and trust in parliament are related. Together they are the main elements of our model, since they have the most significant effect on the satisfaction with democracy, which we already suspected from the last project.
After constructing an interaction plot on the trust to parliament and satisfaction with democracy relation, where trust to politicians is the mediator, we can surely say that there is an interaction effect in such situation, but it is not very significant
After having checked the assumptions, we can conclude that they are held and our model is beautiful and describes the data in a good way
We also can conclude that satisfaction with democracy is positively associated with trusts and negatively with importance of strong government, that is why our alternative hypothesis is confirmed

In very simple words: People will be more satisfied with democracy if they trust to parliament and to politicians. The least it is important for them that government is strong and provides security, the higher will be their satisfaction with democracy.

The final formula is:

\[stfdem = 3.22 + 0.42 * trstprl + 0.26 * trstplt - 0.24 * important - 0.36 * quite.important - 0.30 * little.important - 0.07 * Not.really.important + 0.07 * Not.important.at.all - 0.01 * trstprl * trstplt\] We can safely say that according to these variables and by using our model, one can predict satisfaction with democracy of any Ireland citizen.

References

van der Meer, Tom W. G. (2017) “Political Trust and the ‘Crisis of Democracy’.” Oxford Research Encyclopedia of Politics.
Killerby, P., & Council, H. C. (2005). " Trust me, I’m from the government“: the complex relationship between trust in government and quality of governance. Social Policy Journal of New Zealand, 25, 1.
Schneider I. (2017). Can We Trust Measures of Political Trust? Assessing Measurement Equivalence in Diverse Regime Types. Social indicators research, 133(3), 963-984.
Gail Pacheco, Thomas Lange, (2010) “Political participation and life satisfaction: a crossвЂђEuropean analysis”, International Journal of Social Economics, Vol. 37 Issue: 9, pp.686-702.

2BK - final project

2BK

25.05.2019

Introduction

Loading data, running libraries

Project 1. Describing the data.

Describing variables

Calculating central tendency measures.

Graphs` creating

Project 2. Chi-squared and t-test

Preparing data for analysis

Describing variables

Exploring data for Chi-squared test

Chi-square test

Exploring data for T-test

Cheking normality of distribution

Conducting T-test

Double-checking results with non-parametric test

Overall conclusion

Project 3. One-way Analysis of Variance

Describing background

Preparing data for analysis

Manipulating & Describing variables

Values descriptives across the groups

Looking at groups

Creating boxplot to check for outliers

Homogeneity of variances

ANOVA test

Normality of residuals

Post-hoc Tukey test

Non-parametric test (Kruskal-Wallis)

Dunn’s test

Total conclusion

Project 4 + 5. Linear Regression: the Ultimate Genealogy

Contribution

Analysis of Background

Manipulating the Data

Exploring the data

Conducting Linear Regression Models

Comparing Models

Adding Interaction to the Model

Creating Interaction Plot

Checking Linear Regression Assumptions

Conclusion

References