Hello, we are 2BK team (Bakhareva, Borisenko, Kireeva, Kuzmicheva) and we are happy to demonstrate our project on linear regression modeling. We are going to analyze how different factors influence on the satisfaction with democracy in Ireland (round 8). As predictor variables we have chosen the following variables: trust to parliament, voting on last elections and trust to politicians. Why we have chosen exactly these ones, we will tell you further.
As for the contribution:
Since our topic is politics, we tried to find some interesting articles on our topic to have an inspiration for further analysis of variables. So, we came up to the articles that told us the following:
We would like to know, which of these variables predicts the satisfaction with democracy in the best way. To know this, we are going to construct several mathematical models and compare them to come up to the conclusion.
In our analysis, we selected variables that hold data about the level of trust in politicians and parliament in Ireland, as well as about participation in elections and the level of satisfaction with democracy. For these variables, we will build a mathematical model, which will help us to predict the value of the output variable based on one or more of the input predictor variables.
Our variables are:
Label <- c("`trstprl`", "`ipstrgv`", "`stfdem`", "`trstplt`" )
Meaning <- c("Trust to parliament", "Important that government is strong and ensures safety", "Satisfaction with democracy", "Trust to politicians")
Level_Of_Measurement <- c("Interval", "Ordinal", "Interval", "Interval")
Measurement <- c("0 - 10","Very much like me - Like me - Somewhat like me - A little like me - Not like me", "0 - 10", "0 - 10")
df <- data.frame(Label, Meaning, Level_Of_Measurement, Measurement, stringsAsFactors = FALSE)
kable(df) %>%
kable_styling(bootstrap_options=c("bordered", "responsive","striped"), full_width = FALSE)| Label | Meaning | Level_Of_Measurement | Measurement |
|---|---|---|---|
trstprl
|
Trust to parliament | Interval | 0 - 10 |
ipstrgv
|
Important that government is strong and ensures safety | Ordinal | Very much like me - Like me - Somewhat like me - A little like me - Not like me |
stfdem
|
Satisfaction with democracy | Interval | 0 - 10 |
trstplt
|
Trust to politicians | Interval | 0 - 10 |
ESS1_8e01 <- read_spss("ESS1-8e01.sav")
es1 = ESS1_8e01es2 = es1 %>%
select(trstprl, stfdem, trstplt, ipstrgv)
es2$ipstrgv= as.factor(es2$ipstrgv)
es2$trstprl = as.numeric(as.character(es2$trstprl))
es2$trstplt = as.numeric(as.character(es2$trstplt))
es2$stfdem = as.numeric(as.character(es2$stfdem))
es2 = es2 %>%
filter(trstprl != 77) %>%
filter(trstprl != 88) %>%
filter(trstprl != 99)
es2 = es2 %>%
filter(ipstrgv != 7) %>%
filter(ipstrgv != 8) %>%
filter(ipstrgv != 9)
es2 = es2 %>%
filter(stfdem != 77) %>%
filter(stfdem != 88) %>%
filter(stfdem != 99)
es2 = es2 %>%
filter(trstplt != 77) %>%
filter(trstplt != 88) %>%
filter(trstplt != 99)
es2 <- es2[!is.na(es2$trstprl),]
es2 <- es2[!is.na(es2$ipstrgv),]
es2 <- es2[!is.na(es2$trstplt),]
es2 <- es2[!is.na(es2$stfdem),]So, first of all, we should have a glance on specifications of our dataset with the function summary.
summary(es2)## trstprl stfdem trstplt ipstrgv
## Min. : 0.000 Min. : 0.000 Min. : 0.000 1:729
## 1st Qu.: 3.000 1st Qu.: 4.000 1st Qu.: 2.000 2:998
## Median : 5.000 Median : 6.000 Median : 4.000 3:421
## Mean : 4.538 Mean : 5.423 Mean : 3.775 4:233
## 3rd Qu.: 6.000 3rd Qu.: 7.000 3rd Qu.: 5.000 5:125
## Max. :10.000 Max. :10.000 Max. :10.000 6: 26
Seems legit, now it is time to check for outliers. We surely can do this with the graphs.
Then, we need to understand our variables from our dataset graphically. For that we will need to create:
We construct boxplots as follows:
par(mfrow=c(1, 3))
boxplot(es2$trstprl, main="Trust in country's parliament", sub=paste("Outlier rows: ", boxplot.stats(es2$trstprl)$out))
boxplot(es2$trstplt, main="Trust in politicians", sub=paste("Outlier rows: ", boxplot.stats(es2$trstplt)$out))
boxplot(es2$stfdem, main="Satisfaction with democracy", sub=paste("Outlier rows: ", boxplot.stats(es2$stfdem)$out))Itcan be seen that there are virtually no outliers except for one point in “trust in politicians” (it can be found on line 10 in our dataset). Moreover, it can be seen that trust in politicians has the lowest median of level of trust.
par = ggplot(data = es2, aes(x = trstprl)) + geom_histogram(aes(y=..density..), position = "identity", alpha = 0.7, binwidth = 1, fill = "orange") + geom_density(col = "blue", fill = "white", alpha = 0.1) + xlab("Trust in parliament")
dem = ggplot(data = es2, aes(x = stfdem)) + geom_histogram(aes(y=..density..), position = "identity", alpha = 0.7, binwidth = 1, fill = "purple") + geom_density(col = "blue", fill = "white", alpha = 0.1) + xlab("Satisfaction with democracy")
polit = ggplot(data = es2, aes(x = trstplt)) + geom_histogram(aes(y=..density..), position = "identity", alpha = 0.7, binwidth = 1, fill = "grey") + geom_density(col = "blue", fill = "white", alpha = 0.1) + xlab("Trust in politicians")
plot_grid(par, polit, dem)trust in parliament and satisfaction with democracy are slightly close to normal distribution. As for the trust in politicians, the histogram is not normally distributed. However, we can surely work with that.library(scales)
levels(es2$ipstrgv)## [1] "1" "2" "3" "4" "5" "6"
levels(es2$ipstrgv)[1] <- "Very important"
levels(es2$ipstrgv)[2] <- "Important"
levels(es2$ipstrgv)[3] <- "Quite important"
levels(es2$ipstrgv)[4] <- "Little important"
levels(es2$ipstrgv)[5] <- "Not really important"
levels(es2$ipstrgv)[6] <- "Not important at all"
es2$ipstrgv <- factor(es2$ipstrgv,ordered= F,exclude = NA)
ggplot(data = es2, aes(x = ipstrgv)) + geom_bar(aes(y = (..count..)/sum(..count..)), fill = "pink") + scale_y_continuous(labels=scales::percent) + ylab("Relative frequencies") + ggtitle("important for government to be strong") + coord_flip()w = ggplot(data = es2, aes(x = trstprl, y = stfdem)) + geom_point() + geom_smooth(method = lm, fill="blue", color="blue", se = FALSE) + ggtitle("Trust in parlment by satisfaction with democracy") + xlab("Trust in parlament") + ylab("Satisfaction with democracy")
we = ggplot(data = es2, aes(x = trstplt, y = stfdem)) + geom_point() + geom_smooth(method = lm, fill="blue", color="blue", se = FALSE) + ggtitle(" Trust in politicians by satisfaction with democracy") + xlab("Trust in politicians") + ylab("Satisfaction with democracy")
eOur scatterplots show that:
satisfaction with democracy and trust in parliamentsatisfaction with democracy and trust in politiciansWe will have a look on them on this fine visualisation:
es3 = es2 %>%
select( - ipstrgv)
q = cor(es3)
sjp.corr(es3, show.legend = TRUE)Since we have seen the linear relationship pictorially in the scatter plot and by computing the correlation, it is time for model conduction.
First, we look at the model with one predictor. Here we want to see how satisfaction with democracy can be predicted by trust to the parliament. We construct a table and look at what it means:
model1 = lm( stfdem ~ trstprl, data = es2)
sjPlot::tab_model(model1)| stfdem | |||
|---|---|---|---|
| Predictors | Estimates | CI | p |
| (Intercept) | 3.22 | 3.06 – 3.38 | <0.001 |
| trstprl | 0.49 | 0.45 – 0.52 | <0.001 |
| Observations | 2532 | ||
| R2 / adjusted R2 | 0.265 / 0.265 | ||
\[stfdem = 3.19 + 0.49 * trstprl \]
Now we add another predictor to our model. We add trust to politicians to see, if the additional variable will help us to predict the satisfaction with democracy better
model2 = lm( stfdem ~ trstprl + trstplt , data = es2)
sjPlot::tab_model(model2)| stfdem | |||
|---|---|---|---|
| Predictors | Estimates | CI | p |
| (Intercept) | 3.06 | 2.89 – 3.22 | <0.001 |
| trstprl | 0.37 | 0.33 – 0.41 | <0.001 |
| trstplt | 0.19 | 0.15 – 0.22 | <0.001 |
| Observations | 2532 | ||
| R2 / adjusted R2 | 0.290 / 0.290 | ||
\[stfdem = 3.02 + 0.37 * trstprl + 0.18 * trstplt \]
Finally, we add a variable ipstrgv to our model.
model3 = lm( stfdem ~ trstprl + trstplt + ipstrgv, data = es2)
sjPlot::tab_model(model3)| stfdem | |||
|---|---|---|---|
| Predictors | Estimates | CI | p |
| (Intercept) | 3.22 | 3.03 – 3.41 | <0.001 |
| trstprl | 0.37 | 0.33 – 0.41 | <0.001 |
| trstplt | 0.19 | 0.15 – 0.23 | <0.001 |
| Important | -0.24 | -0.41 – -0.06 | 0.008 |
| Quite important | -0.35 | -0.57 – -0.13 | 0.002 |
| Little important | -0.28 | -0.55 – -0.01 | 0.043 |
| Not really important | -0.07 | -0.42 – 0.28 | 0.687 |
| Not important at all | 0.04 | -0.67 – 0.76 | 0.908 |
| Observations | 2532 | ||
| R2 / adjusted R2 | 0.294 / 0.292 | ||
\[stfdem = 2.99 + 0.37 * trstprl + 0.18 * trstplt + 0.13 * non-voter \]
Anova helps us to compare models in which everything is the same, but several variables are added to one of them (or more), which are not taken into account in another model.
anova(model1, model2)# можно сказать, что модель 3 лучше, потому что выше R2
anova(model2, model3)We`ve added interaction to the best model(model3 with 3 predictors) according to ANOVA.
model4 = lm( stfdem ~ trstprl + trstplt + ipstrgv + trstprl * trstplt , data = es2)
sjPlot::tab_model(model4)| stfdem | |||
|---|---|---|---|
| Predictors | Estimates | CI | p |
| (Intercept) | 3.04 | 2.80 – 3.28 | <0.001 |
| trstprl | 0.42 | 0.36 – 0.47 | <0.001 |
| trstplt | 0.26 | 0.19 – 0.33 | <0.001 |
| Important | -0.24 | -0.42 – -0.07 | 0.007 |
| Quite important | -0.36 | -0.58 – -0.14 | 0.001 |
| Little important | -0.30 | -0.57 – -0.03 | 0.029 |
| Not really important | -0.07 | -0.41 – 0.28 | 0.709 |
| Not important at all | 0.07 | -0.65 – 0.79 | 0.847 |
| trstprl:trstplt | -0.01 | -0.03 – -0.00 | 0.015 |
| Observations | 2532 | ||
| R2 / adjusted R2 | 0.295 / 0.293 | ||
# The model with interaction (model 4) is better than the additive model (model 3).R^2
anova(model3, model4)#не работает :ссс
library(margins)
margins(model4)plot_model(model4, type = "int", terms = "trstplt", mdrt.values = "minmax")# выглядит так себе, видимо не сигнификант
plot_model(model4, type = "int", terms = "trustpoliticians", mdrt.values = "quart")In other words, the difference is substantial only for the most and least человеки.
Linear regression makes several assumptions about the data, such as :
par(mfrow = c(2, 2))
plot(model4)Based on our analysis, after having modeled a mathematical function and checked its assumptions, we can make the following conclusions:
Trust in politics depends on trust in parliament. Together they are the main elements of our model, since they have the most significant effect on the satisfaction with democracyvote. Accordingly, the fact that a person takes part in elections or not does not play a huge role in constructing our modelThe final formula is:
\[ stfdem = 2.99 + 0.37 * trstprl + 0.18 * trstplt + 0.13 * vote(-) \] We can safely say that according to these variables, one can predict satisfaction with democracy in Ireland.