Variables:
http://www.parlgov.org/documentation/codebook/#party:
left/right — Castles/Mair 1983 (left/right), Huber/Inglehart 1995 (left/right), Benoit/Laver 2006 – (left/right), CHES 2010 (lrgen 1999 and 2002 and 2006)
state/market — Benoit/Laver 2006 (taxes/spending), CHES 2010 (lrecon 1999 and 2002 and 2006)
liberty/authority — Benoit/Laver 2006 (social), CHES 2010 (galtan 1999 and 2002 and 2006)
EU anti/pro — Ray 1999 (pos96), Benoit/Laver 2006 (euauthority or eulargerstronger or eujoining), CHES 2010 (position 1999 and 2002 and 2006)
**
This data set describes political parties in the governments all over the European Union and Associated democracies. You are working with a cleaned, pre-processed dataset.
Solve the problems below by answering the listed questions. You can solve the problems in any order using any correct way to do it.
Write you answer as an Rmd script to an html file. Knit the solution with your comments in it and submit the HTML.
All the parties are classified into families by their position in an economic (state/market) and a cultural (liberty/authority) left/right dimension. The classification leads to eight party family categories: Communist/Socialist, Green/Ecologist, Social democracy, Liberal, Christian democracy, Agrarian, Conservative, Right-wing.
Compare whether the party families (variable ‘party_family_short’) in these data differ on the left-right scale (variable ‘left_right’).
library(readr)
df <- read_csv("sem6_parlgov.csv")
summary(df)
## X1 country_name_short country_name party_name_short
## Min. : 1.0 Length:1034 Length:1034 Length:1034
## 1st Qu.: 259.2 Class :character Class :character Class :character
## Median : 517.5 Mode :character Mode :character Mode :character
## Mean : 517.5
## 3rd Qu.: 775.8
## Max. :1034.0
## party_name_english family_name_short family_name left_right
## Length:1034 Length:1034 Length:1034 Min. :0.000
## Class :character Class :character Class :character 1st Qu.:3.300
## Mode :character Mode :character Mode :character Median :6.000
## Mean :5.359
## 3rd Qu.:7.400
## Max. :9.825
## state_market liberty_authority eu_anti_pro country_id
## Min. :0.2143 Min. :0.3338 Min. : 0.000 Min. : 1.00
## 1st Qu.:3.5000 1st Qu.:3.5000 1st Qu.: 3.300 1st Qu.:23.00
## Median :5.7000 Median :4.5056 Median : 7.900 Median :41.00
## Mean :4.9005 Mean :5.2013 Mean : 6.414 Mean :40.41
## 3rd Qu.:6.4000 3rd Qu.:7.0000 3rd Qu.: 8.300 3rd Qu.:60.00
## Max. :9.4737 Max. :9.7895 Max. :10.000 Max. :75.00
## party_id family_id EU_memb2000
## Min. : 2.0 Min. : 2.00 Length:1034
## 1st Qu.: 657.2 1st Qu.: 6.00 Class :character
## Median :1425.0 Median :14.00 Mode :character
## Mean :1463.4 Mean :16.23
## 3rd Qu.:2312.5 3rd Qu.:26.00
## Max. :2804.0 Max. :40.00
1. Run a formal overall test to compare group means. Use a parametric test for this.
oneway.test(left_right ~ family_name_short, data = df, var.equal = T)
##
## One-way analysis of means
##
## data: left_right and family_name_short
## F = 1472, num df = 7, denom df = 1026, p-value < 2.2e-16
aov <- aov(df$left_right ~ df$family_name_short)
summary(aov)
## Df Sum Sq Mean Sq F value Pr(>F)
## df$family_name_short 7 5641 805.9 1472 <2e-16 ***
## Residuals 1026 562 0.5
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
plot(aov)
Tukey <- TukeyHSD(aov)
plot(Tukey, las = 2)
2. Are there any pairs of party families that do not differ on the left-right dimension (p = 0.05)? If yes, name them.
No, all the families have different means on the left-right dimension (as there is no intercantions of 0 line on the graph).
3. What are the maximal and minimal mean values for a party family on the left-right scale? Name the party families, report the means and standard deviations (round to two digits after the point).
There are several scales that differentiate between political parties in the data set: the left/right scale, the state/market scale, the liberty/authority, and the EU anti/pro scale. Do these scales measure similar or unrelated features?
1. Evaluate whether all the four scales are close to normal distribution. Use the values of skew and kurtosis, and make sure the distributions are bell-shaped to be considered normal. Report your decisions on all four scales.
library(ggplot2)
g1 <- ggplot(df, aes(x = left_right)) +
geom_density() +
theme_bw()
g2 <- ggplot(df, aes(x = state_market)) +
geom_density() +
labs(y="") +
theme_bw()
g3 <- ggplot(df, aes(x = liberty_authority)) +
geom_density() +
theme_bw()
g4 <- ggplot(df, aes(x = eu_anti_pro)) +
geom_density() +
labs(y="") +
theme_bw()
library(gbm)
grid.arrange(g1, g2, g3, g4, ncol=2, nrow = 2)
Variables are not normally distributed, so we should use spearman’s correlation.
2. Use the proper method to calculate a correlation matrix between the four scales. Report the statistically significant correlations (p = 0.05). Name the direction and magnitude of relationships.
library(dplyr)
cor_data <- df %>% select("left_right", "state_market", "liberty_authority", "eu_anti_pro") %>% na.omit()
cor(cor_data, method = "spearman")
## left_right state_market liberty_authority eu_anti_pro
## left_right 1.00000000 0.7108134 0.8382975 -0.04270487
## state_market 0.71081336 1.0000000 0.4458473 0.42967947
## liberty_authority 0.83829748 0.4458473 1.0000000 -0.20591361
## eu_anti_pro -0.04270487 0.4296795 -0.2059136 1.00000000
#df %>% select(left_right, state_market, liberty_authority, eu_anti_pro) %>% cor.test(method = "spearman")
#cor.test(cor_data, method = "spearman")
#library(corrplot)
#corrplot(cor_data, method="number")
#library(sjPlot)
#tab_corr(cor_data)
library(ggcorrplot)
cor(cor_data, method = "spearman") %>%
round(2) %>%
ggcorrplot(hc.order = TRUE, type = "upper", ggtheme = ggplot2::theme_bw, colors =c("darkturquoise", "white", "#E46726"))
We can see:
There is a variable indicating whether the country was a member of the European Union in 2000 or not (‘EU_memb2000’).
6. Were the party families (variable ‘family_name’) equally represented in the EU members and non-members? Report a formal test to answer that question. If yes, report which party family was less likely to occur in the non-EU countries? If not, name the party family which was represented in the two groups in the least balanced way.
data_chi <- df %>% select("family_name", "EU_memb2000") %>% na.omit()
data_chi$EU_memb2000 <- as.factor(data_chi$EU_memb2000)
data_chi$family_name <- as.factor(data_chi$family_name)
chisq.test(table(data_chi$EU_memb2000, data_chi$family_name))
##
## Pearson's Chi-squared test
##
## data: table(data_chi$EU_memb2000, data_chi$family_name)
## X-squared = 17.267, df = 7, p-value = 0.01575
residuals(chisq.test(table(data_chi$EU_memb2000, data_chi$family_name)))
##
## Agrarian Christian democracy Communist/Socialist Conservative
## No 0.76861787 -1.21931170 -2.37851961 0.40563696
## Yes -0.66414172 1.05357395 2.05521385 -0.35049982
##
## Green/Ecologist Liberal Right-wing Social democracy
## No -0.04005708 0.71551578 0.71049194 0.98408811
## Yes 0.03461223 -0.61825765 -0.61391668 -0.85032367
The party family which was represented in the two groups in the least balanced way is Communist/Socialist (it has higher residuals)
You need to predict the party’s position on the state-market scale (variable ‘state_market’) where 0 means “the state should regulate the economy” and 10 means “the state should be minimal in the economy” (from Benoit/Laver 2006 and CHES 2010). Use the party’s left-right position and the stance towards the EU for this (variable ‘eu_anti_pro’, 0 means ‘totally against’ and 10 means ‘totally pro-EU’).
library(dplyr)
df1 <- df %>% select("state_market", "eu_anti_pro", "left_right")
df1_nona <- df1 %>% na.omit()
7. How much variance (in percent) in state_market can be explained with the party’s left-right position and the stance towards the EU in the model? (round to 0 digits after the point)
model1 <- lm(state_market ~ eu_anti_pro + left_right, data = df1)
summary(model1)
##
## Call:
## lm(formula = state_market ~ eu_anti_pro + left_right, data = df1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.6273 -0.4513 -0.2293 0.3891 4.2486
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.397890 0.087677 -4.538 6.34e-06 ***
## eu_anti_pro 0.263986 0.009851 26.797 < 2e-16 ***
## left_right 0.672724 0.010472 64.238 < 2e-16 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.8247 on 1031 degrees of freedom
## Multiple R-squared: 0.8262, Adjusted R-squared: 0.8258
## F-statistic: 2450 on 2 and 1031 DF, p-value: < 2.2e-16
8. Is the relationship between the left-right party programme dependent on its pro- or anti-EU position? Compare this model to the previous one. Is it significantly better?
model2 <- lm(state_market ~ left_right * eu_anti_pro, data = df1)
summary(model2)
##
## Call:
## lm(formula = state_market ~ left_right * eu_anti_pro, data = df1)
##
## Residuals:
## Min 1Q Median 3Q Max
## -4.5341 -0.3636 -0.0993 0.3198 4.3642
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 0.224848 0.138769 1.620 0.105
## left_right 0.569351 0.020773 27.408 < 2e-16 ***
## eu_anti_pro 0.129336 0.025413 5.089 4.27e-07 ***
## left_right:eu_anti_pro 0.023057 0.004022 5.733 1.30e-08 ***
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.8122 on 1030 degrees of freedom
## Multiple R-squared: 0.8316, Adjusted R-squared: 0.8311
## F-statistic: 1695 on 3 and 1030 DF, p-value: < 2.2e-16
anova(model1, model2)
9. What are the standardized regression coefficients of the left-right and the eu_anti_pro predictors in the largest model? Interpret the bigger of the two coefficients.
library(lm.beta)
lm.beta(model2)
##
## Call:
## lm(formula = state_market ~ left_right * eu_anti_pro, data = df1)
##
## Standardized Coefficients::
## (Intercept) left_right eu_anti_pro
## 0.0000000 0.7060123 0.1704892
## left_right:eu_anti_pro
## 0.2324470
The bigger standardazed coefficients show ‘left_right’ predctor - it gives the stronger effect on the outcome, which each point of ‘left_right’ variable the outcome changes by 0.71.
10. Draw the interaction plot for the largest model.
It is close to zero (red line, its left point)
First, we need to look at the left part of the graph, and compare level of red and blue lines. The blue line is higher, meaning that supporters of EU are less strict of state regulation, they are more relaxed, than anti_EU.
We have to look at the highest point of the graph, which is on the right side, the blue line - right supporters of EU.
library(sjPlot)
#plot_model(model2)
plot_model(model2, type = "int")