Introduction

This small study was conducted by students from the Higher School of Economics during the introductory course of Data Analysis in Sociology for several month. Its main goal was to examine the difference between social & institutional forms of trust, the news consumption & Internet usage in Netherlands and use these variables for linear regression modelling.

The report presents all our work thus containing code chunks, graphs, descriptive tables, and interpretations. For a simplicity, it is organized in a sequence of 4 separate projects dedicated to the topics of politics and trust. So, with all respect and pleasure, we invite you to join us on this fascinating journey.

Libraries & Data

To start with, all libraries we used are presented below:

library(dplyr) 
library(kableExtra) 
library(foreign)
library(ggplot2)
library(mudata2)
library(corrplot)
library(sjPlot)
library(psych)
library(car)
library(effsize)
library(magrittr)
library(knitr)
library(scales)
library(sjstats) 
library(rcompanion)
library(DescTools) 
library(tidyverse) 
library(ggplot2)
library(cowplot)
library(gridExtra)
library(pwr)

The data for our purposes was taken from the European Social Survey (ESS) - a large ongoing survey conducted since 2001 in more than 30 countries. Particularly, we used the set of Round 9, Netherlands, and the subsets called “Media and Social Trust” and “Politics” mainly.

data <- read.spss("ESS9e01_2.sav",use.value.labels=T, to.data.frame=T)
NL <- data %>% 
  filter(cntry == "Netherlands") %>% 
  select(gndr, agea, nwspol, netustm, ppltrst, pplfair, trstplc, trstprl, trstlgl, sgnptit, netusoft, psppsgva, stfgov) %>%  
  filter(sgnptit != 7 | sgnptit != 8| sgnptit != 9 )

Project 1: looking through the data

Variables’ description

To understand the particular variables we are interested in, we came up with 2 descriptive tables. The first of them demonstrates the variables’ names, decodes the meanings (as the original names are constructed from several words and are usually obscure) and derive the levels of measurement. As some variables may have different scale type according to the context, this notion brings the clearness and introduce how we treat them. To briefly note, there are categorical and continuous variables. Another table is a logical extension of the previous one: originally presented to the audience in an old article (Stevens, 1946), the scale classification was accompanied by the central tendency measures specific to each scale. So, there are values of mean, median, and mode for the relevant ratio & ordinal variables.

NL1<- NL%>%
na.omit(NL)

Label = c("gndr", "agea", "netusoft", "netustm", "nwspol", "sgnptit", "ppltrst", "pplfair", "trstplc", "trstprl", "trstlgl")
Meaning = c("Gender", "Age", "Frequency of Internet usage", "Daily duraton of Internet usage", "Daily duration of news consumption", "Petition signing", "The degree of social trust", "The degree of believe in individual fair", "The degree of trust in the police", "The degree of trust in the parliament", "The degree of trust in the court")
Level_of_Measurement = c("Ratio", "Ordininal", "Ordinal", "Ratio", "Ratio", "Nominal", "Interval", "Interval", "Interval", "Interval", "Interval")
data.frame(Label, Meaning, Level_of_Measurement, stringsAsFactors = FALSE) %>% kable() %>% kable_styling(bootstrap_options=c("bordered", "responsive","striped"), full_width = FALSE)
Label Meaning Level_of_Measurement
gndr Gender Ratio
agea Age Ordininal
netusoft Frequency of Internet usage Ordinal
netustm Daily duraton of Internet usage Ratio
nwspol Daily duration of news consumption Ratio
sgnptit Petition signing Nominal
ppltrst The degree of social trust Interval
pplfair The degree of believe in individual fair Interval
trstplc The degree of trust in the police Interval
trstprl The degree of trust in the parliament Interval
trstlgl The degree of trust in the court Interval
mode <- function(x) { 
  ux <- unique(x) 
  ux[which.max(tabulate(match(x, ux)))] 
} 

NL1$agea = as.numeric(as.character(NL1$agea)) 
vector_agea <- c(mean(NL1$agea), median(NL1$agea), mode(NL1$agea)) 
names(vector_agea) <- c("Mean", "Median", "Mode")

NL1$netustm = as.numeric(as.character(NL1$netustm)) 
vector_netustm <- c(mean(NL1$netustm), median(NL1$netustm), mode(NL1$netustm)) 
names(vector_netustm) <- c("Mean", "Median", "Mode")

NL1$nwspol = as.numeric(as.character(NL1$nwspol)) 
vector_nwspol <- c(mean(NL1$nwspol), median(NL1$nwspol), mode(NL1$nwspol)) 
names(vector_nwspol) <- c("Mean", "Median", "Mode")

NL1$ppltrst = as.numeric(NL1$ppltrst) 
vector_ppltrst <- c(mean(NL1$ppltrst), median(NL1$ppltrst), mode(NL1$ppltrst)) 
names(vector_ppltrst) <- c("Mean", "Median", "Mode")

NL1$pplfair = as.numeric(NL1$pplfair)
vector_fair <- c(mean(NL1$pplfair), median(NL1$pplfair), mode(NL1$pplfair))
names <- c("Mean", "Median", "Mode")

NL1$trstplc = as.numeric(NL1$trstplc) 
vector_trstplc <- c(mean(NL1$trstplc), median(NL1$trstplc), mode(NL1$trstplc)) 
names(vector_trstplc) <- c("Mean", "Median", "Mode")

NL1$trstprl = as.numeric(NL1$trstprl) 
vector_trstprl <- c(mean(NL1$trstprl), median(NL1$trstprl), mode(NL1$trstprl)) 
names(vector_trstprl) <- c("Median", "Mode")

NL1$trstlgl = as.numeric(NL1$trstlgl) 
vector_trstlgl <- c(mean(NL1$trstlgl), median(NL1$trstlgl), mode(NL1$trstlgl)) 
names(vector_trstlgl) <- c("Median", "Mode")

data.frame(vector_agea, vector_netustm, vector_nwspol, vector_ppltrst, vector_fair, vector_trstplc, vector_trstprl, vector_trstlgl, stringsAsFactors = TRUE) %>%  knitr::kable() %>%  kable_styling(bootstrap_options=c("bordered", "responsive","striped"), full_width = FALSE) 
vector_agea vector_netustm vector_nwspol vector_ppltrst vector_fair vector_trstplc vector_trstprl vector_trstlgl
Mean 46.17999 229.4874 66.87689 7.310295 7.75234 8.084953 7.038157 7.763139
Median 48.00000 180.0000 60.00000 8.000000 8.00000 8.000000 7.000000 8.000000
Mode 52.00000 120.0000 60.00000 8.000000 8.00000 9.000000 8.000000 8.000000

Graphs based on the Central Tendency Measures

Next, we are going to look at 3 descriptive charts for some of the variables using their central tendency measures and distributions before further usage. Let’s start with the histogram on daily Internet usage. This variable is definitely not about any politics but we found it interesting to study in relation to trust or other concepts. The research question we were asked is (1) “how much time people spend on the Internet per day in Netherlands?”.

As the distribution is skewed to the right, the conclusion we have is that most of the people in Netherlands usually spend no more than 1.5 hours on the Internet per day. Nevertheless, the distribution tail remains relatively high straight to 2.5 hours.

Going further, we want to compare the distributions of the institutional and social forms of trust. The institutional trust is basically the trust to any formalized institution - in our case, the list of them includes the court, the police, and the parliament. For comparison through the distribution, we have chosen the police as the one with the biggest values of Central Tendency Measures. The social trust, in its turn, is the degree to which an individual feels comfortable with other citizens. The question we may be curious about in this case is (2) “Do people trust in the police more than to each other?”.

ggplot(NL1, aes(x = trstplc), binwidth = 15, color = "black", alpha = 0.3) + 
  geom_bar() + 
  ggtitle("Distribution of the police trust") + 
  xlab("Police trust") + ylab("Number of people") + 
  theme_classic() + xlim(c(0, 10)) + 
  geom_vline(aes(xintercept = 8, color = "median")) + 
  geom_vline(aes(xintercept = 8.085855, color = "mean"))

ggplot(NL1, aes(x = ppltrst), binwidth = 15, color = "black", alpha = 0.3) + 
  geom_bar() + ggtitle("Distribution of the social trust") + 
  xlab("Social trust") + 
  ylab("Number of people") + 
  theme_classic() + 
  xlim(c(0, 10)) + 
  geom_vline(aes(xintercept = 8, color = "median")) + 
  geom_vline(aes(xintercept = 7.284307, color = "mean"))

The reason why we built these barplots is that the interplay between the mentioned categories of trust says a lot about the attitude to the government structure on the one hand and attitude to the nationals on the other. In simpler words, it is about the powers of the state and civil society. Surprisingly, the trust to the police in Netherlands is higher than the social trust. Well, the policemen in Netherlands are really respected and probably do their job properly.

Exploratory graphs

In the final section of this subproject, we present 5 hypotheses and 5 respective charts.

To start analyzing the political life in Netherlands, we will look at the petition signing as the political action. We assume that the Internet (partly described before) affects it, and our question (3) sounds like “Is there a correlation between frequency of Internet use and signing petitions?”.

NL %>%
  filter(!is.na(sgnptit)) %>% 
  ggplot(aes(netusoft, fill = sgnptit)) + 
  geom_bar(position = "fill") +
  labs(title ="Participation in Signing Petitions due to Frequency of Internet Usage", x= "Frequency of Internet Usage", y = "Share of Population")+
  coord_flip()+
  guides(fill=guide_legend("Signed Petition\n Last 12 Month"))+
  theme_classic()

To find an answer, we came up with the stacked barplot based on the relevant variables (netusoft and sgnptit). Overall, people in every category are likely not to sign petitions, however, people who use the Internet every day and most days signed petitions in the last 12 months more often than those who use the Internet less frequently.

Next, we decided to include age into the analysis and extend our perception of Internet usage in Netherlands. The question (4) is, “How much time different age groups spend on the Internet in a day?”. To create this chart, we first split the age groups into 4 categories.

NL$agea = as.numeric(as.character(NL$agea))
NL$agea1[NL$agea <= 25] <- "young (up to 25 y.o.)" 
NL$agea1[NL$agea > 25 & NL$agea <= 45] <- "adult (26-45 y.o.)" 
NL$agea1[NL$agea > 45 & NL$agea <= 65] <- "old adult (46-65 y.o.)" 
NL$agea1[NL$agea > 65] <- "old (more than 65 y.o)" 
NL$netustm = as.numeric(as.character(NL$netustm))
NL$agea1 <- factor(NL$agea1 , levels=c("young (up to 25 y.o.)", "adult (26-45 y.o.)", "old adult (46-65 y.o.)", "old (more than 65 y.o)"))

NL%>%  filter(!is.na(agea1)) %>% 
  ggplot() + geom_boxplot(aes(agea1, netustm), fill="#E667AF", col="#85004B", alpha = 0.5) + xlab("") +
  ylab("Time spent on the internet in minutes per day") +
  ggtitle("Time spent on the Internet by age groups") +
  theme_classic()

The boxplot shows that with the age growth the time spent on the Internet decreases. The range of minutes spent on the internet per day is bigger for younger people and lower for older people. However, there are outliers among the “old” age group, who spend more time on the Internet than the rest of the sample.

Developing an understanding of the age groups, we take a look at their media consumption and the time spent on the Internet. The research question (5) looks similar: “What are the relationships between Internet usage and political media consumption by different age groups?”.

NL$netustm = as.numeric(as.character(NL$netustm))
NL$nwspol = as.numeric(as.character(NL$nwspol))
NL %>% filter(!is.na(nwspol) & !(is.na(netustm)) & !(is.na(agea1))) %>% ggplot(aes(nwspol, netustm, color = agea1)) +
geom_point() +
geom_smooth( method = "lm")+
scale_x_continuous( limits=c(0, 600)) +
labs(title = "Time spent on Internet & Media",
subtitle = "in minutes per day", x = "Political media consumption per day", y = "Internet usage per day")+
theme_classic() +
scale_color_discrete(name = "Age group") 

In order to visualize these relationships, we used the scatterplot and added aged trend lines. Let’s consider the groups separately. Generally, people from the “old” age group consume political media more often than other age groups while their Internet usage is the lowest. The “young” age group uses the Internet more frequently than others, while its political media consumption is the lowest. The results from “adult” and “old adult” age groups are distributed on a graph without any visible trend. So, we can say that young people use the Internet mostly not for political news, while old people consume political news outside the Internet.

Another human attribute we study with regard to Internet is gender. So, (6) “how much time males and females spent on the internet?”.

NL$gndr <- relevel(NL$gndr, ref = "Female")
ggplot(NL, aes(x=as.numeric(netustm), color=gndr, fill=gndr)) +
  geom_histogram(aes(y=..density..), position = "identity", alpha = 0.5, binwidth = 3) +
  labs(title ="Time Spent on the Internet", x= "Time Spent in Minutes per Day", y = "Destiny") +
  guides(fill=guide_legend("Gender"), color = guide_legend("Gender"))+
  theme_classic()

In fact, the respective histogram is not really illustrative. The majority of both females and males spend approximately one hour on Internet per day.

The last chart we prepared for this work is a barplot that illustrates the personal and institutional forms of trust. We already mentioned this topic briefly but here we want to compare the social trust not only to the police one. Thus, the question is (7) “How much do people trust in political and social institutes?”. To note, we use 2 distinct variables for the social trust.

sorts_of_trust <- c("social", "social fair", "police", "parliament", "court") 
means <- c(mean(NL1$ppltrst), mean(NL1$pplfair), mean(NL1$trstplc), mean(NL1$trstprl), mean(NL1$trstlgl)) 
kind_of_trust <- c("personal", "personal", "institutional", "institutional", "institutional") 
data.frame(sorts_of_trust, means, kind_of_trust) %>% ggplot() + geom_bar(aes(sorts_of_trust, means, fill = kind_of_trust), stat = "identity") + labs(title = "Social trust & trust in institutions", subtitle = "decimal scale", x = "", y = "Mean value") + theme_classic() + scale_fill_discrete(name = "Kind of trust")

The conclusion is the following: there is no significant difference between institutional and personal trust in the Netherlands. The mean value of all kinds of trust is from 7 to 8. Although, we may see the hierarchy of the institutional trust: the police is the most trusted, court possesses a bit less, and the trust to the parliament is weak.

These graphs told us a rich story! Hope, you are still here.

Project 2: Tests!

In this part, we focused on more detailed questions dedicated to our topic. In fact, our goal was to get to know where and how to apply the statistical tests in practice, so let’s take a look at our findings.

Chi-square

The trust in the court as a branch of power is essential. People tend to ask for their help only when there are no other ways as this institution controls the others. On the other side, the petition signing as a political action might be used against the court’s verdict or when the judges do not want to consider a case. The relation between the institution and this expression of political will was thus chosen to be examined.

Originally in ESS, it was measured from 0 to 10 where 0 is “not trust at all” and 10 is “Complete trust” what is not really convenient. So, we recoded this variable using 3 categories: “small” for points 0-3, “middle” for the points 4-7, and “high”: 8-10.

NL$trstlgl <- as.numeric(as.character(NL$trstlgl)) 

NL$trstlgl1[NL$trstlgl <= 3 & NL$trstlgl >= 0] <- "Small" 
NL$trstlgl1[NL$trstlgl > 3 & NL$trstlgl <= 7] <- "Medium" 
NL$trstlgl1[NL$trstlgl > 7 & NL$trstlgl <= 10] <- "High"

NL_legal_pet <- NL %>%
  select(trstlgl1, sgnptit) %>%
  na.omit(NL)

The contingency table presented below shows that we are free to perform the chi-squared test: the number of observations for each cell is bigger than 5.

NL_legal_pet$sgnptit <- factor(NL_legal_pet$sgnptit, labels = c("Not signed", "Signed"), ordered = F, exclude = NA)
conttable_legal = table(NL_legal_pet$trstlgl1, NL_legal_pet$sgnptit) 
kable(conttable_legal) %>% kable_styling(bootstrap_options=c("bordered", "responsive","striped"), full_width = FALSE)
Not signed Signed
High 180 394
Medium 238 670
Small 27 91

For further exploration, we used a stacked barplot format. It may already present the existing relationship: The higher level of trust is surrounded by a higher level of petition participation. It contains values of share for each “group of trust” and some statistical information.

plot_xtab(NL_legal_pet$trstlgl1, NL_legal_pet$sgnptit, type = "bar", margin ="row",
          bar.pos = "stack", title = "Trust to court & petitions", title.wtd.suffix = NULL,
          axis.titles = c("", ""), axis.labels = NULL, legend.title = "Petition",
          legend.labels = c("Signed", "Not signed"), weight.by = NULL, rev.order = FALSE,
          show.values = TRUE, show.n = TRUE, show.prc = TRUE, show.total = TRUE,
          show.legend = TRUE, show.summary = TRUE, summary.pos = "r", drop.empty = TRUE,
          string.total = "Total", wrap.title = 50, wrap.labels = 4,
          wrap.legend.title = 20, wrap.legend.labels = 10, geom.size = 0.7,
          geom.spacing = 0.1,
          smooth.lines = TRUE, grid.breaks = 0.2, expand.grid = FALSE,
          ylim = NULL, y.offset = 1,
          coord.flip = TRUE) + theme_classic()

Let’s now conduct a chi-squared test. The null hypothesis is follow: there is no relation between the degree of trust to the court and petition participation. An alternative hypothesis - there is a relation between the degree of trust to the court and petition participation.

Due to a small p-value (0.04541 < 0.05), the null hypothesis should be rejected: so,we may conclude that there is a relation among analysed variables.

Another plots look at the residuals. The corrplot shows the positive (blue) and negative (red) associations. Assocplot also proofs our previous results: the red rectangles are for negative association, and the black ones are for positive.

kable(chi_lgl$stdres)
Not signed Signed
High 2.367975 -2.367975
Medium -1.637204 1.637204
Small -1.242149 1.242149
corrplot(chi_lgl$stdres, is.cor = FALSE)

assocplot(t(conttable_legal), main = "Residuals")

To conclude, there is a relationship between our variables but their nature should be further explained: the people who trust the courts more have a higher chance to not sign the petitions (than if we observed the independent variables). The reason for this interplay might be that “trusters” do not really interested in petitions and are generally fine about the actual political layout.

T-test

In this section of the analysis, we consider the relation between a nwspol (the time a person spends on consuming news about politics and current affairs in minutes) and gender variables. The first one is the ratio, while the second is the nominal one.

The null hypothesis: there is no relation between gender and news consumption. An alternative hypothesis is males on average spend more time-consuming political news than females. It can be explained by exposed gender roles when women look more for internal family life and men are more open to the outside world affairs.

To start with, we need to upload the data, select variables, and filter NAs:

NL_t <- NL %>% 
  select(gndr, nwspol)

NL_t$nwspol= as.numeric(as.character(NL_t$nwspol))
NL_t = NL_t %>%
  filter(!is.na(nwspol))

Than we visualized the variables related to the t-test through the boxplot:

ggplot(NL_t, aes(x = gndr, y = nwspol))+ 
  geom_boxplot() +
  stat_summary(fun = mean, geom = "point", shape = 4, size = 4) +
  xlab("Gender ") + 
  ylab("Time spent on consuming political news in minutes") +
  ggtitle("Time Spent on consuming news by gender") +
  theme_classic() +
  ylim(0,500)

Another step is to check the normality assumptions. We performed this procedure in 2 ways: (1) by using skews and kutrosis and (2) by building a histogram.

describeBy(NL_t, NL_t$gndr)
## 
##  Descriptive statistics by group 
## group: Female
##        vars   n  mean    sd median trimmed   mad min max range skew kurtosis
## gndr*     1 837  1.00  0.00      1     1.0  0.00   1   1     0  NaN      NaN
## nwspol    2 837 63.63 71.94     55    52.2 37.06   0 915   915 4.63    37.25
##          se
## gndr*  0.00
## nwspol 2.49
## ------------------------------------------------------------ 
## group: Male
##        vars   n  mean    sd median trimmed   mad min max range skew kurtosis
## gndr*     1 829  2.00  0.00      2    2.00  0.00   2   2     0  NaN      NaN
## nwspol    2 829 74.68 70.02     60   64.75 44.48   0 930   930 4.23    35.34
##          se
## gndr*  0.00
## nwspol 2.43

The values for asymmetry and kurtosis between -2 and +2 are considered acceptable in order to prove normal univariate distribution (George, 2011). Coming from this assumption, we can conclude that this distribution cannot be accepted to be close to normal.

NL_t$gndr <- relevel(NL_t$gndr, ref = "Female")
ggplot(NL_t, aes(x = nwspol, color = gndr, fill = gndr)) +
  geom_histogram(aes(y=..density..), position = "identity", alpha = 0.5, binwidth = 20) +
  labs(title = "Consumption of political news by gender",x = "Consumption of political news in minutes per week", y = "Density") +
  guides(fill=guide_legend("gender"), color = guide_legend("gender"))+
  theme_classic()

In both groups, there are more than 300 observations (females ~ 840 and males ~ 830), so according to the CLT, we can care less about the normality of distributions for both groups. From the histogram we can see that on average males have more time spending on consuming news. Both distributions are not normal and have outliers (skewed to the right).

Finally, let’s conduct a t-test:

options(scipen = 99) 
leveneTest(NL_t$nwspol~ NL_t$gndr)
## Levene's Test for Homogeneity of Variance (center = median)
##         Df F value Pr(>F)
## group    1  0.0115 0.9146
##       1664
bartlett.test(NL_t$nwspol~ NL_t$gndr)
## 
##  Bartlett test of homogeneity of variances
## 
## data:  NL_t$nwspol by NL_t$gndr
## Bartlett's K-squared = 0.61209, df = 1, p-value = 0.434
t.test(NL_t$nwspol~ NL_t$gndr, var.equal = T)
## 
##  Two Sample t-test
## 
## data:  NL_t$nwspol by NL_t$gndr
## t = -3.1758, df = 1664, p-value = 0.001522
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -17.86996  -4.22429
## sample estimates:
## mean in group Female   mean in group Male 
##             63.63321             74.68034

Both tests for homogeneity of variances show that the data have a very high probability to occur if the null hypothesis is true. Thus, the variances are really close to be equal.

The conclusions are: 1. on average, males spend more time-consuming news for 74.68034 min/week, then females who spend 63.63321 min/week. 2.The t-statistic t(1664) = - 3.1758 (p-value ~ 0.001), which means that the observed difference in means is statistically significant across the two groups(higher for males).

For sure, we conducted a double-check with non-parametric test:

wilcox.test(nwspol ~ gndr, data = NL_t)
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  nwspol by gndr
## W = 292743, p-value = 0.0000000238
## alternative hypothesis: true location shift is not equal to 0

Wilcoxon rank sum test equals 292743 at p-value < 0.001 which means that males and females spend different time consuming political news and the difference is statistically significant.

And an effect size:

cohen.d(NL_t$nwspol ~ NL_t$gndr)
## 
## Cohen's d
## 
## d estimate: -0.155613 (negligible)
## 95 percent confidence interval:
##       lower       upper 
## -0.25186685 -0.05935925

With a Cohen’s d of approximately 0.16, 56.2% of the “treatment” group(males) will be above the mean of the “control” group (females), 93.8% of the two groups will overlap, there is a 54.4% chance that a person picked at random from the treatment group will have a higher score than a person picked at random from the control group.

ANOVA

For the last test in this section we chose a relations between the age and the Internet usage. We already examined this relations in the first project and here we are going to deepen our ideas.

Data transformation:

NL$netustm <- as.numeric(as.character(NL$netustm))

To start, we will consider the plot from the previous project again. It clearly demonstrates that With growing age the time spent on the Internet decreases. The range of minutes spent on the internet per day is bigger for younger people and lower for older people. However, there outliers among “old” age group, who spend more time on the Interent than the rest of the sample.

NL$netustm = as.numeric(as.character(NL$netustm))
NL$agea1 <- factor(NL$agea1 , levels=c("young (up to 25 y.o.)", "adult (26-45 y.o.)", "old adult (46-65 y.o.)", "old (more than 65 y.o)"))

NL%>%  filter(!is.na(agea1)) %>% 
  ggplot() + geom_boxplot(aes(agea1, netustm), fill="#E667AF", col="#85004B", alpha = 0.5)+
  xlab("") + 
  ylab("Time spent on the internet in minutes per day") +
  ggtitle("Time spent on the Internet by age groups") +
  theme_classic()

NExt, we need to check the homogenity of variances:

leveneTest(NL$netustm ~ NL$agea1)
## Levene's Test for Homogeneity of Variance (center = median)
##         Df F value              Pr(>F)    
## group    3  22.404 0.00000000000003505 ***
##       1455                                
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

P-value is very small, therefore we can assume that the variances among different age groups are not homogenous. Now it is time for ANOVA test:

oneway.test(NL$netustm ~ NL$agea1, var.equal = F)
## 
##  One-way analysis of means (not assuming equal variances)
## 
## data:  NL$netustm and NL$agea1
## F = 144.7, num df = 3.00, denom df = 678.52, p-value <
## 0.00000000000000022
aov.out <- aov(NL$netustm ~ NL$agea1)
summary(aov.out)
##               Df  Sum Sq Mean Sq F value              Pr(>F)    
## NL$agea1       3  571766  190589   104.7 <0.0000000000000002 ***
## Residuals   1455 2647986    1820                                
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 210 observations deleted due to missingness

For one-way analysis of means p-value< 0.001, so we can say, that the differences of means of time spending on the Interent per day across different age groups is statistically significant.

Cheking normality of residuals shows that skew is < 2 but kurtosis is > 2, while the p-value in shapiro test is > 0.05:

plot(aov.out, 2) 

layout(matrix(1:4, 2, 2)) 
plot(aov.out) 

anova.res <- residuals(object = aov.out) 
describe(anova.res) 
##    vars    n mean    sd median trimmed   mad    min    max  range skew kurtosis
## X1    1 1459    0 42.62  -4.96   -2.47 47.44 -90.29 147.04 237.33 0.49    -0.44
##      se
## X1 1.12
shapiro.test(x = anova.res) 
## 
##  Shapiro-Wilk normality test
## 
## data:  anova.res
## W = 0.96881, p-value < 0.00000000000000022

A histogram below shows inperfectly normal distribution. Shapiro test tells that residuals are not normal. However, skew is right and kurtosis is not much bigger than 2. As residuals are not normal, non-parametric test should be used.

hist(anova.res)

kruskal.test(netustm ~ agea1, data = NL) 
## 
##  Kruskal-Wallis rank sum test
## 
## data:  netustm by agea1
## Kruskal-Wallis chi-squared = 277.79, df = 3, p-value <
## 0.00000000000000022

p-value is less than 0.001, thus mean ranks of age groups are not the same. This result confirms basic ANOVA test. Results from non-parametric test were Statistically significant, so the post hoc test should be carried out:

NL$netustm2 <- as.factor(NL$netustm) 
NL$agea12 <- as.factor(NL$agea1) 
DunnTest(netustm2 ~ agea12, data = NL) 
## 
##  Dunn's test of multiple comparisons using rank sums : holm  
## 
##                                               mean.rank.diff
## adult (26-45 y.o.)-young (up to 25 y.o.)           -208.4979
## old adult (46-65 y.o.)-young (up to 25 y.o.)       -370.6157
## old (more than 65 y.o)-young (up to 25 y.o.)       -602.9567
## old adult (46-65 y.o.)-adult (26-45 y.o.)          -162.1177
## old (more than 65 y.o)-adult (26-45 y.o.)          -394.4588
## old (more than 65 y.o)-old adult (46-65 y.o.)      -232.3411
##                                                               pval    
## adult (26-45 y.o.)-young (up to 25 y.o.)           0.0000000008859 ***
## old adult (46-65 y.o.)-young (up to 25 y.o.)  < 0.0000000000000002 ***
## old (more than 65 y.o)-young (up to 25 y.o.)  < 0.0000000000000002 ***
## old adult (46-65 y.o.)-adult (26-45 y.o.)          0.0000000014905 ***
## old (more than 65 y.o)-adult (26-45 y.o.)     < 0.0000000000000002 ***
## old (more than 65 y.o)-old adult (46-65 y.o.)      0.0000000000093 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

As F-ration was significant, we can also conduct pairwise comparison.

NL$netustm2 <- as.vector(NL$netustm) 
pairwise.t.test(NL$netustm2, NL$agea12, 
                adjust = "bonferroni")
## 
##  Pairwise comparisons using t tests with pooled SD 
## 
## data:  NL$netustm2 and NL$agea12 
## 
##                        young (up to 25 y.o.) adult (26-45 y.o.)  
## adult (26-45 y.o.)     0.0000000000942       -                   
## old adult (46-65 y.o.) < 0.0000000000000002  0.0000000000942     
## old (more than 65 y.o) < 0.0000000000000002  < 0.0000000000000002
##                        old adult (46-65 y.o.)
## adult (26-45 y.o.)     -                     
## old adult (46-65 y.o.) -                     
## old (more than 65 y.o) 0.0000000000037       
## 
## P value adjustment method: holm

All p-values are less than 0.05 for both tests, thus all pairs show statistically significant differences. And finally, let’s examine the effect size:

epsilonSquared(x = NL$netustm, g = NL$agea1)
## epsilon.squared 
##           0.167

Epsilon squared showed large effect size (0.167 > 0.14), which means that the difference is valuable.

anova_stats(aov.out) 
##        term   df     sumsq     meansq statistic p.value etasq partial.etasq
## 1  NL$agea1    3  571766.3 190588.766   104.724       0 0.178         0.178
## 2 Residuals 1455 2647986.4   1819.922        NA      NA    NA            NA
##   omegasq partial.omegasq epsilonsq cohens.f power
## 1   0.176           0.176     0.176    0.465     1
## 2      NA              NA        NA       NA    NA

As F-ration was statistically significant we can double-check effect size with omega-square. The result is 0.176 > 0.15, which also confirms the large effect size.

Project 3+4: linear models

The variables were chosen according to our topic. Our outcome variable is the satisfaction with the government, and as predictors, we have chosen the forms of institutional trust (police, parliament, and court, remember?). So to say, we check if satisfaction with government correlates with three branches of government. As a categorical variable, we have chosen a subjective indicator of freedom of speech on governmental matters.

Background (Literature)

While exploring the literature on the topic, we found some facts which we wanted to test ourselves: 1) Citizens trust political elites if they are satisfied with the current government. If not, elites receive cynicism (Citrin, 1974); 2) One of the aspects of good governance is freedom of expression (Ott, 2011). So, we wanted to test that in the Netherlands!

Our research questions are: (1) Is there any correlation between satisfaction with government and three branches of government? (2) If yes, which predictor correlates the most?

We hypothesize, that there will be correlations, and they will be from moderate to high as well as positive. We will then add an interaction effect to the model, to see if it is better than the model, without the interaction.

Manipulations with the data

NL3 = NL %>% filter(trstprl != 77 | trstprl != 88 | trstprl != 99| stfgov != 77 | stfgov != 88 | stfgov != 99| trstlgl != 77 | trstlgl != 88 | trstlgl != 99 | trstplc != 77 | trstplc != 88 | trstplc != 99 )
NL3$trstprl <- as.character(NL3$trstprl)
NL3$trstlgl <- as.character(NL3$trstlgl)
NL3$trstplc <- as.character(NL3$trstplc)
NL3$stfgov <- as.character(NL3$stfgov)

NL3$trstprl[NL3$trstprl == "No trust at all"] <- "0"
NL3$trstprl[NL3$trstprl == "Complete trust"] <- "10"

NL3$trstlgl[NL3$trstlgl == "No trust at all"] <- "0"
NL3$trstlgl[NL3$trstlgl == "Complete trust"] <- "10"

NL3$trstplc[NL3$trstplc == "No trust at all"] <- "0"
NL3$trstplc[NL3$trstplc == "Complete trust"] <- "10"

NL3$stfgov[NL3$stfgov == "Extremely dissatisfied"] <- "0"
NL3$stfgov[NL3$stfgov == "Extremely satisfied"] <- "10"
NL3$trstprl <- as.numeric(NL3$trstprl)
NL3$trstlgl <- as.numeric(NL3$trstlgl)
NL3$trstplc <- as.numeric(NL3$trstplc)
NL3$stfgov <- as.numeric(NL3$stfgov)

NL3 <- NL3[!is.na(NL3$stfgov),]
NL3 <- NL3[!is.na(NL3$trstprl),]
NL3 <- NL3[!is.na(NL3$trstlgl),]
NL3 <- NL3[!is.na(NL3$trstplc),]
NL3 <- NL3[!is.na(NL3$psppsgva),]
table(NL3$trstlgl)
## 
##   1   2   3   4   5   6   7   8   9 
##  23  32  57  85 142 242 401 393 165

Table with variables

Label4<- c("`trstprl`", "`trstlgl`", "`trstplc `", "`stfgov`", "`psppsgva`" )
Meaning4<- c("Trust to parliament", "Trust in the legal system", "Trust in the police", "How satisfied with the national government", "Political system allows people to have a say in what government does")
Level_Of_Measurement4<- c("Interval", "Interval", "Interval", "Interval", "Ordinal")
Measurement4<- c("0 - 10","0 - 10", "0 - 10", "0 - 10", "Not at all - Very little - Some - A lot - A great deal")
data.frame(Label4, Meaning4, Level_Of_Measurement4, Measurement4, stringsAsFactors = FALSE) %>% kable() %>% kable_styling(bootstrap_options=c("bordered", "responsive","striped"), full_width = FALSE)
Label4 Meaning4 Level_Of_Measurement4 Measurement4
trstprl Trust to parliament Interval 0 - 10
trstlgl Trust in the legal system Interval 0 - 10
trstplc Trust in the police Interval 0 - 10
stfgov How satisfied with the national government Interval 0 - 10
psppsgva Political system allows people to have a say in what government does Ordinal Not at all - Very little - Some - A lot - A great deal

Exploring the data

NL3 %>% select(stfgov, trstprl, trstlgl, trstplc, psppsgva) %>% summary()
##      stfgov         trstprl          trstlgl         trstplc  
##  Min.   : 0.00   Min.   : 0.000   Min.   :1.000   Min.   : 0  
##  1st Qu.: 5.00   1st Qu.: 5.000   1st Qu.:6.000   1st Qu.: 6  
##  Median : 6.00   Median : 6.000   Median :7.000   Median : 7  
##  Mean   : 5.71   Mean   : 5.918   Mean   :6.621   Mean   : 7  
##  3rd Qu.: 7.00   3rd Qu.: 7.000   3rd Qu.:8.000   3rd Qu.: 8  
##  Max.   :10.00   Max.   :10.000   Max.   :9.000   Max.   :10  
##          psppsgva  
##  Not at all  :164  
##  Very little :490  
##  Some        :668  
##  A lot       :204  
##  A great deal: 14  
## 

Boxplot to check the outliers:

par(mfrow=c(2,2))
boxplot(NL3$trstprl, main="Trust in country's parliament", sub=paste("Outlier rows: ", boxplot.stats(NL3$trstprl)$out))

boxplot(NL3$trstlgl, main="Trust in the legal system", sub=paste("Outlier rows: ", boxplot.stats(NL3$trstlgl)$out))

boxplot(NL3$trstplc, main="Trust in the police", sub=paste("Outlier rows: ", boxplot.stats(NL3$trstplc)$out))

boxplot(NL3$stfgov, main="Satisfaction with national government", sub=paste("Outlier rows: ", boxplot.stats(NL3$stfgov)$out))

As we can see, we do not have that many outliers in each boxplot. 2 dots are in “Trust in country’s parliament” and “Satisfaction with national government”. 3 dots are found in “Trust in the legal system” and “Trust in the police”. We can also see, that “Satisfaction with the government” and “Trust in country’s parliament” both has the lowest mediana - 6.

Further, we check if continious variables are normally disrtibuted using histograms:

p1 = ggplot(data = NL3, aes(x = trstprl)) + geom_histogram(aes(y=..density..), position = "identity", alpha = 0.7, binwidth = 1, fill = "orange") + geom_density(col = "blue", fill = "white", alpha = 0.1) + xlab("Trust in parliament") + theme_classic()
p2 = ggplot(data = NL3, aes(x = trstlgl)) + geom_histogram(aes(y=..density..), position = "identity", alpha = 0.7, binwidth = 1, fill = "purple") + geom_density(col = "blue", fill = "white", alpha = 0.1) + xlab("Trust in legal systems") + theme_classic()
p3 = ggplot(data = NL3, aes(x = trstplc)) + geom_histogram(aes(y=..density..), position = "identity", alpha = 0.7, binwidth = 1, fill = "pink") + geom_density(col = "blue", fill = "white", alpha = 0.1) + xlab("Trust in police")+ theme_classic()
p4 = ggplot(data = NL3, aes(x = stfgov)) + geom_histogram(aes(y=..density..), position = "identity", alpha = 0.7, binwidth = 1, fill = "green") + geom_density(col = "blue", fill = "white", alpha = 0.1) + xlab("Satisfaction with national government")
plot_grid(p1, p2, p3, p4) + theme_classic()

remove(p1)
remove(p2)
remove(p3)
remove(p4)

As can be seen from the graphs, all the histograms are left-skewed. Graphs on trust in legal systems & police and governmental satisfaction look very similar, most people trust in these institutions more than on average (>5). We can also say, that the graph “Trust in parliament” looks closer to be normally distributed, compared to the other three graphs. Most people are also voting higher than average (>5).

Next, we construct a barplot to see if the categorical variable is representative:

NL3psppsgva <- factor(NL3$psppsgva,ordered= F,exclude = NA) 
ggplot(data = NL3, aes(x = psppsgva)) + 
  geom_bar(aes(y = (..count..)/sum(..count..)), fill = "pink") + 
  scale_y_continuous(labels=scales::percent) + 
  ylab("Relative frequencies") +
  xlab("Observations")+
  ggtitle("Governmental allowance to have a say ") + 
  coord_flip() +
  theme_classic() 

From a barplot it is clear that the majority of respondents in Netherlands belive to have some say in what government does.

Scatterplots that visualize the relationsips:

ggplot(data = NL3, aes(x = trstlgl, y = stfgov)) +
geom_point() + geom_smooth(method = lm, fill="blue", color="blue", se = FALSE)+
ggtitle("Trust in legal system by satisfaction with national government") +
xlab("Trust in legal system") + ylab("Satisfaction with national government") +
theme_classic()

ggplot(data = NL3, aes(x = trstprl, y = stfgov)) +
geom_point() + geom_smooth(method = lm, fill="blue", color="blue", se = FALSE)+
ggtitle("Trust in parlment by satisfaction with national government") +
xlab("Trust in parlament") + ylab("Satisfaction with national government") +
theme_classic()

ggplot(data = NL3, aes(x = trstplc, y = stfgov)) +
geom_point() + geom_smooth(method = lm, fill="blue", color="blue", se = FALSE)+
ggtitle("Trust in police by satisfaction with national government") +
xlab("Trust in police") + ylab("Satisfaction with national government") +
theme_classic()

From the scatterplot, it is seen that all relationships are positive. Since we have outliers and our variables are not normally distributed, we will calculate the correlation coefficient with cor() function, using you spearman’s method:

stfgov_trstprl = cor(NL3$stfgov, NL3$trstprl, use = "complete.obs", method = "spearman")
stfgov_trstlgl = cor(NL3$stfgov, NL3$trstlgl, use = "complete.obs", method = "spearman")
stfgov_trstplc = cor(NL3$stfgov, NL3$trstplc, use = "complete.obs", method = "spearman")
paste("correlation between satisfaction with national government and trust to parliament = ", round(stfgov_trstprl, 2))
## [1] "correlation between satisfaction with national government and trust to parliament =  0.6"
paste("correlation between satisfaction with national government and trust to legal system = ", round(stfgov_trstlgl, 2))
## [1] "correlation between satisfaction with national government and trust to legal system =  0.4"
paste("correlation between satisfaction with national government and trust in the police = ", round(stfgov_trstplc, 2))
## [1] "correlation between satisfaction with national government and trust in the police =  0.33"

Correlation matrix

In the matrix it is shown that correlation coefficients between “Satisfaction with the government”, our outcome variable, and all predictors are medium and positive, however, not identical. The biggest correlation is between “Satisfaction with government” and “trust in parlament”, 0.631, the smallest correlation is between “Satisfaction with government” and “trust in police”,0.388.

As we can see from results, the differences between correlation matrix and spearman’s method are not critical(no more than 0,05).

Linear regression model with 1 predictor variable

Since the highest correlation coefficient was with “trust in parlament” we would build a regression model with it.

  stfgov
Predictors Estimates p
(Intercept) 2.13 <0.001
trstprl 0.61 <0.001
Observations 1540
R2 / R2 adjusted 0.398 / 0.398
## 
## Call:
## lm(formula = stfgov ~ trstprl, data = NL3)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.3659 -0.7603  0.2397  0.8452  5.8730 
## 
## Coefficients:
##             Estimate Std. Error t value            Pr(>|t|)    
## (Intercept)  2.12701    0.11728   18.14 <0.0000000000000002 ***
## trstprl      0.60555    0.01897   31.92 <0.0000000000000002 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.33 on 1538 degrees of freedom
## Multiple R-squared:  0.3984, Adjusted R-squared:  0.398 
## F-statistic:  1019 on 1 and 1538 DF,  p-value: < 0.00000000000000022

From the results we can conclude that p-value is significant (<0.001). Adjusted R-squared is 0.398 which means that the model explains 39.8% of variation in “satisfaction with government”. Regression coefficient is ~0.03 smaller than correlation coefficient. Intercept is 2.13 which means that predicted “satisfaction with government” will be 2.13 if “trust in parlament” is 0. With the increase in “trust in parlament” by 1, “satisfaction with government” will increase by 0.61. The equation is the following: stfgov = 2.12 + 0.61*trstprl

Linear regression with all the predictors

  stfgov
Predictors Estimates p
(Intercept) 1.30 <0.001
trstprl 0.49 <0.001
trstlgl 0.03 0.284
trstplc 0.13 <0.001
psppsgva [Very little] 0.40 0.001
psppsgva [Some] 0.52 <0.001
psppsgva [A lot] 0.69 <0.001
psppsgva [A great deal] 0.83 0.024
Observations 1540
R2 / R2 adjusted 0.421 / 0.419
## 
## Call:
## lm(formula = stfgov ~ trstprl + trstlgl + trstplc + psppsgva, 
##     data = NL3)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -6.6448 -0.6836  0.1656  0.7993  5.3745 
## 
## Coefficients:
##                      Estimate Std. Error t value             Pr(>|t|)    
## (Intercept)           1.30086    0.17916   7.261    0.000000000000609 ***
## trstprl               0.49001    0.02562  19.123 < 0.0000000000000002 ***
## trstlgl               0.02714    0.02535   1.071             0.284434    
## trstplc               0.12553    0.02545   4.933    0.000000898380372 ***
## psppsgvaVery little   0.40108    0.12100   3.315             0.000939 ***
## psppsgvaSome          0.51765    0.12594   4.110    0.000041617635056 ***
## psppsgvaA lot         0.69250    0.15434   4.487    0.000007763728277 ***
## psppsgvaA great deal  0.83373    0.36933   2.257             0.024124 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.307 on 1532 degrees of freedom
## Multiple R-squared:  0.4213, Adjusted R-squared:  0.4186 
## F-statistic: 159.3 on 7 and 1532 DF,  p-value: < 0.00000000000000022

The R has chosen the category “Not at all” of the “psppsgva” as the reference point. P-value is significant in all cases (p approximately about .001) excluding the trust to the legal system of the community. The overall distribution looks normal as the residuals are distanced equally from median. The multiple R-squared (as well as the adjusted one) shows that the model predicts 42% of the variation in satisfaction with the national government.

And now we look at the estimates to be able to construct the equation that looks like this: stfdem = 1.3 + 0.49∗trstprl + 0.03∗trstlgl + 0.13∗trstplt + 0.40∗psppsgva[Very little] + 0.52∗psppsgva[Some] + 0.69∗psppsgva[A lot]] + 0.8∗3psppsgv∗[A great deal]

The intercept is equal to 1.3 and refers to the predicted value of satisfaction with the way the country government works. It is the estimate without an accounting of the trust to the social institutions (police, the legal system, and parliament);

With each increase in trust in parliament institution by 1 point, the satisfaction with the country’s governmental work increases by 0.49.

This predictor is the most powerful among institutional ones. Our guess is that people see the parliament as the most representative branch of power, so they tend to rely on its “truthfulness” more than on the others. Moreover, people get in touch with the police or court more often thus having more opportunities to have a negative experiences.

With each increase in trust in court institution by 1 point, the satisfaction with the country’s governmental work increases by 0.03.

With each increase in trust in police institution by 1 point, the satisfaction with the country’s governmental work increases by 0.13.

If a respondent is partly sure that the government takes his or her opinion into the account than the satisfaction with its work rises by 0.40.

If a respondent is not sure at all that the government takes his or her opinion into the account than the satisfaction with its work rises by 0.52.

If a respondent is sure that the government takes his or her opinion into the account than the satisfaction with its work rises by 0.69.

If a respondent is totally sure that the government takes his or her opinion into the account than the satisfaction with its work rises by 0.83. Though the p-value is bigger than 0.001 for this category, its “input” is the biggest among other predictors.

Comparing models

## Analysis of Variance Table
## 
## Model 1: stfgov ~ trstprl
## Model 2: stfgov ~ trstprl + trstlgl + trstplc + psppsgva
##   Res.Df    RSS Df Sum of Sq      F           Pr(>F)    
## 1   1538 2722.0                                         
## 2   1532 2618.7  6    103.32 10.074 0.00000000006067 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

It is seen from the results, that p-value is less than 0,05. So we should look at the RSS value and consider with the least value as the better one. In our case, it is the second model(with all 4 predictors).

Output table:

tab_model(model1, model2, title = "Models comparison", dv.labels = c("First Model", "Second model"), CSS = list(css.depvarhead = '+color: blue;'))
Models comparison
  First Model Second model
Predictors Estimates CI p Estimates CI p
(Intercept) 2.13 1.90 – 2.36 <0.001 1.30 0.95 – 1.65 <0.001
trstprl 0.61 0.57 – 0.64 <0.001 0.49 0.44 – 0.54 <0.001
trstlgl 0.03 -0.02 – 0.08 0.284
trstplc 0.13 0.08 – 0.18 <0.001
psppsgva [Very little] 0.40 0.16 – 0.64 0.001
psppsgva [Some] 0.52 0.27 – 0.76 <0.001
psppsgva [A lot] 0.69 0.39 – 1.00 <0.001
psppsgva [A great deal] 0.83 0.11 – 1.56 0.024
Observations 1540 1540
R2 / R2 adjusted 0.398 / 0.398 0.421 / 0.419

This table allows as to compare the estimated values (it is interesting that they changed after adding more predictors to the model) and presents confidence intervals (again, they changed in the second model and were “redistributed” among all of the predictors). A note that was not mentioned by far is that the adjusted R^2 sensitive to the number of predictors is affected in the second model.

model3 = lm(stfgov ~ trstprl + trstplc + psppsgva + trstlgl * trstplc, data = NL3)
sjPlot:: tab_model(model3)
  stfgov
Predictors Estimates CI p
(Intercept) -0.00 -0.75 – 0.75 1.000
trstprl 0.49 0.44 – 0.54 <0.001
trstplc 0.34 0.22 – 0.46 <0.001
psppsgva [Very little] 0.35 0.12 – 0.59 0.003
psppsgva [Some] 0.48 0.23 – 0.72 <0.001
psppsgva [A lot] 0.69 0.39 – 0.99 <0.001
psppsgva [A great deal] 0.82 0.10 – 1.55 0.025
trstlgl 0.27 0.14 – 0.40 <0.001
trstplc * trstlgl -0.04 -0.06 – -0.02 <0.001
Observations 1540
R2 / R2 adjusted 0.427 / 0.424

Let us now interpret this huge table!

  1. As for interaction we have chosen trust in police and trust in legal systems since the interaction effect between these variables is the most significant;

  2. All the p-values in our table are significant;

  3. The values for R-squared and adjusted R-squared are bigger than those from model 2. They rose from 0,421/0,419 to 0,427/0,424, meaning that the model with interaction explains more than the previous one. This model explains over 40% of the variance of satisfaction with government.

Let us now interpret some important estimates:

  1. Intercept equals 0. This means that if all our predictors: trust in parliament, trust in police, trust in the legal system, right to have a word in what the government does(psppsgva) equal zero, satisfaction with government equals zero as well.

  2. The estimate for “Trust in parliament” is 0.49. This means that if other variables are constant, a person who gained an increase by 1 in “Trust in parliament” will be satisfied with the government on 0.49 more than a person who thinks that they are provided no freedom of speech at all.

  3. The estimate for “Trust in legal systems” is 0.27. This means that if other variables are constant, a person who gained an increase by 1 in “Trust in legal systems” will be satisfied with the government on 0.27 more than a person who thinks that they are provided no freedom of speech at all.

  4. The estimate for “Trust in police” is 0.34. This means that if other variables are constant, a person who gained an increase by 1 in “Trust in police” will be satisfied with the government on 0.34 more than a person who thinks that they are provided no freedom of speech at all.

  5. The estimate for “Right to have a word in what the government does” [Very little] is 0.35. This means that if other variables are constant, a person who thinks that the government provides very little freedom of speech will be satisfied with the government on 0.35 more than a person who thinks that they are provided no freedom of speech at all.

  6. The estimate for “Right to have a word in what the government does” [Some] is 0.48. This means that if other variables are constant, a person who thinks that the government provides some freedom of speech will be satisfied with the government on 0.48 more than a person who thinks that they are provided no freedom of speech at all.

  7. The estimate for “Right to have a word in what the government does” [A lot] is 0.69. This means that if other variables are constant, a person who thinks that the government provides lots of freedom of speech will be satisfied with the government on 0.69 more than a person who thinks that they are provided no freedom of speech at all.

  8. The estimate for “Right to have a word in what the government does” [A great deal] is 0.82. This means that if other variables are constant, a person who thinks that the government provides a great deal of freedom of speech will be satisfied with the government on 0.82 more than a person who thinks that they are provided no freedom of speech at all.

  9. The estimate for “Trust in legal systems” and “Trust in police” is -0.04. It means, that if a person thinks that he has no freedom of speech at all, each increase by 1 in both these predictors will bring -0.04 points to the “Satisfaction with the government”.

Thus, the final formula is: stfgov = 0.49trstprl + 0.27trstlgl + 0.34trstplc + 0.35psppsgva[Very little] + 0.48psppsgva[Some] + 0.69psppsgva[A lot] + 0.82psppsgva[A great deal] - 0.04(trstlgltrstplc)

anova(model2, model3)
## Analysis of Variance Table
## 
## Model 1: stfgov ~ trstprl + trstlgl + trstplc + psppsgva
## Model 2: stfgov ~ trstprl + trstplc + psppsgva + trstlgl * trstplc
##   Res.Df    RSS Df Sum of Sq      F    Pr(>F)    
## 1   1532 2618.7                                  
## 2   1531 2593.3  1    25.304 14.938 0.0001157 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

As we can see from the table, the p-value is 0.00012 (<0,001), so for us to determine the best model, we should look at RSS and consider the model with the least value as the best one. In our case, model 3, with interaction effect, has lower RSS value, meaning that this model is the best!

plot_model(model3, type = "int", terms = "trstplс", mdrt.values = "minmax") +
labs(x = "Trust in legal system", y = "Satisfaction with government", title = "Predicted values of Satisfaction with government", color = "Trust in police")

The plot shows the interaction between our variables. Here relationships between “Satisfaction with the government” and “Trust in the legal system” are presented with the moderation of “Trust in police”. As the plot shows, the relationship is different for those who trust the police and for those who do not. The higher “Trust in the legal system” is the higher the “Satisfaction with the government” is if “Trust in police” is minimal (0). However, if “Trust in police” is maximal (10), the relationship is negative: the higher the “Trust in the legal system” is the lower the “Satisfaction with the government” is. Also, we can see that the graphs intersect somewhere around 7 on the x-axis. This is the point where predictions become insignificant.

plot_model(model3, type = "int", terms = "trustplc", mdrt.values = "meansd")+
labs(x = "Trust in legal system", y = "Satisfaction with government", title = "Predicted values of Satisfaction with government", color = "Trust in police")

Here we can see that the graphs intersect almost in every point.

Conclusion

After the analysis, we can claim that our research question is answered! Here are the results:

  1. Satisfaction with the government is related to trusts in three branches of power: police, parliament and legal system;

  2. Satisfaction with the government is also related to the experienced level of freedom of speech on political matters;

  3. The interaction effect between trust in police and trust in the legal system is significant;

  4. Satisfaction with government correlates the most with trust in parliament.

The final formula is: stfgov = 0.49∗trstprl + 0.27∗trstlgl + 0.34∗trstplc + 0.35∗psppsgva[Very little] + 0.48∗psppsgva[Some] + 0.69∗psppsgva[A lot] + 0.82∗psppsgva[A great deal] - 0.04(trstlgl∗trstplc)

References

  1. Citrin, J. (1974). Comment: The Political Relevance of Trust in Government*. American Political Science Review, 68(3), 973–988. https://doi.org/10.2307/1959141

  2. George, D. (2011). SPSS for windows step by step: A simple study guide and reference, 17.0 update, 10/e. Pearson Education India.

  3. Ott, J. C. (2011). Government and Happiness in 130 Nations: Good Governance Fosters Higher Level and More Equality of Happiness. Social Indicators Research, 102(1), 3–22. https://doi.org/10.1007/s11205-010-9719-z

  4. Stevens, S. S. (1946). On the Theory of Scales of Measurement. Science, New Series, 103(2684), 677–680.