2BK team: Bakhareva, Borisenko, Kireeva, Kuzmicheva
24/02/2019
Hello. We are 2BK. Our topic is “Politics”. The country we have chosen for studying is Ireland (round 8). Team members are Bakhareva Anastasia, Borisenko Iana, Kireeva Irina, Kuzmicheva Daria. We have focused on the results of the surveys connected both with politics and personal information on Ireland.
As for individual contribution, there it is done as follows:
Anastasia Bakhareva: Constucting stacked barplot в„–2, The conclusion about the groups representativeness, construction of histogram and conclusion about distribution by histogram , creating report
Iana Borisenko: Constructing contigency table, constucting stacked barplot в„–1, assocplot building and conclusion about residuals, overall conclusion, creating report
Irina Kireeva: Manipulations with data set, creating new variable “politics13$lr”, boxplot conduction and conclusion, descriptive statistics for T-test, Wilcoxon test conduction and conclusions
Daria Kuzmicheva: Describing variables with different measurement scales, Chi-test conduction and conclusion, conclusion about distribution by skew Рё kurtosis, Q-Q plot construction and conclusions
First of all, we run all the libraries necessary for the analysis.
library(readr)
library(ggplot2)
library(dplyr)
library(sjlabelled)
library(sjmisc)
library(sjstats)
library(ggeffects)
library(sjPlot)
library(knitr)
library(psych)
library(foreign)Next, we load out dataset. It is a combination of data connected with both politics and personal information.
politics_gender <- read_spss("politics_gender.sav")##
|
| | 0%
|
| | 1%
|
|= | 1%
|
|= | 2%
|
|== | 2%
|
|== | 3%
|
|== | 4%
|
|=== | 4%
|
|=== | 5%
|
|==== | 6%
|
|==== | 7%
|
|===== | 7%
|
|===== | 8%
|
|====== | 8%
|
|====== | 9%
|
|====== | 10%
|
|======= | 10%
|
|======= | 11%
|
|======== | 12%
|
|======== | 13%
|
|========= | 13%
|
|========= | 14%
|
|========== | 15%
|
|========== | 16%
|
|=========== | 16%
|
|=========== | 17%
|
|=========== | 18%
|
|============ | 18%
|
|============ | 19%
|
|============= | 19%
|
|============= | 20%
|
|============= | 21%
|
|============== | 21%
|
|============== | 22%
|
|=============== | 22%
|
|=============== | 23%
|
|=============== | 24%
|
|================ | 24%
|
|================ | 25%
|
|================= | 25%
|
|================= | 26%
|
|================= | 27%
|
|================== | 27%
|
|================== | 28%
|
|=================== | 29%
|
|=================== | 30%
|
|==================== | 30%
|
|==================== | 31%
|
|===================== | 32%
|
|===================== | 33%
|
|====================== | 33%
|
|====================== | 34%
|
|====================== | 35%
|
|======================= | 35%
|
|======================= | 36%
|
|======================== | 36%
|
|======================== | 37%
|
|======================== | 38%
|
|========================= | 38%
|
|========================= | 39%
|
|========================== | 39%
|
|========================== | 40%
|
|========================== | 41%
|
|=========================== | 41%
|
|=========================== | 42%
|
|============================ | 42%
|
|============================ | 43%
|
|============================ | 44%
|
|============================= | 44%
|
|============================= | 45%
|
|============================== | 45%
|
|============================== | 46%
|
|============================== | 47%
|
|=============================== | 47%
|
|=============================== | 48%
|
|================================ | 49%
|
|================================ | 50%
|
|================================= | 50%
|
|================================= | 51%
|
|================================== | 52%
|
|================================== | 53%
|
|=================================== | 53%
|
|=================================== | 54%
|
|=================================== | 55%
|
|==================================== | 55%
|
|==================================== | 56%
|
|===================================== | 56%
|
|===================================== | 57%
|
|===================================== | 58%
|
|====================================== | 58%
|
|====================================== | 59%
|
|======================================= | 59%
|
|======================================= | 60%
|
|======================================= | 61%
|
|======================================== | 61%
|
|======================================== | 62%
|
|========================================= | 62%
|
|========================================= | 63%
|
|========================================= | 64%
|
|========================================== | 64%
|
|========================================== | 65%
|
|=========================================== | 65%
|
|=========================================== | 66%
|
|=========================================== | 67%
|
|============================================ | 67%
|
|============================================ | 68%
|
|============================================= | 69%
|
|============================================= | 70%
|
|============================================== | 70%
|
|============================================== | 71%
|
|=============================================== | 72%
|
|=============================================== | 73%
|
|================================================ | 73%
|
|================================================ | 74%
|
|================================================ | 75%
|
|================================================= | 75%
|
|================================================= | 76%
|
|================================================== | 76%
|
|================================================== | 77%
|
|================================================== | 78%
|
|=================================================== | 78%
|
|=================================================== | 79%
|
|==================================================== | 79%
|
|==================================================== | 80%
|
|==================================================== | 81%
|
|===================================================== | 81%
|
|===================================================== | 82%
|
|====================================================== | 82%
|
|====================================================== | 83%
|
|====================================================== | 84%
|
|======================================================= | 84%
|
|======================================================= | 85%
|
|======================================================== | 86%
|
|======================================================== | 87%
|
|========================================================= | 87%
|
|========================================================= | 88%
|
|========================================================== | 89%
|
|========================================================== | 90%
|
|=========================================================== | 90%
|
|=========================================================== | 91%
|
|=========================================================== | 92%
|
|============================================================ | 92%
|
|============================================================ | 93%
|
|============================================================= | 93%
|
|============================================================= | 94%
|
|============================================================== | 95%
|
|============================================================== | 96%
|
|=============================================================== | 96%
|
|=============================================================== | 97%
|
|=============================================================== | 98%
|
|================================================================ | 98%
|
|================================================================ | 99%
|
|=================================================================| 99%
|
|=================================================================| 100%
Here we filter our data in order to delete all the observation useless for the analysis.
politics <- politics_gender %>%
select(agea, lrscale, sgnptit, vote)
politics <- politics[!is.na(politics$agea),]
politics <- politics[!is.na(politics$lrscale),]
politics <- politics[!is.na(politics$sgnptit),]
politics <- politics[!is.na(politics$vote),]politics13 <- politics %>%
filter(lrscale != 77) %>%
filter(lrscale != 88) %>%
filter(lrscale != 99 )
politics13 <- politics13 %>%
filter(sgnptit != 7) %>%
filter(sgnptit != 8) %>%
filter(sgnptit != 9)
politics13 <- politics13 %>%
filter(agea != 999)First, we modify one of the variables to make it comfortable for manipulations. Then, we update our dataset.
politics13$lr <- ifelse(politics13$lrscale <= 3, "Left",
ifelse(politics13$lrscale >= 7, "Right", "Middle"))
politics13 <- politics13 %>%
select(- lrscale)Now, let`s look at the number of observations and the number of variables.
dim(politics13)## [1] 2231 4
Then, there is a description of chosen variables presented.
Label <- c("`sgnptit`", "`lr`", "`agea`", "`vote`" )
Meaning <- c("Signed petition last 12 months", "Placement on left right scale", "Age", "Voted last national election")
Level_Of_Measurement <- c("Nominal", "Nominal", "Ratio", "Nominal")
Test <- c("Chi-squared test", "Chi-squared test", "T-test for independet variables", "T-test for independet variables")
df <- data.frame(Label, Meaning, Level_Of_Measurement,Test, stringsAsFactors = FALSE)
kable(df)| Label | Meaning | Level_Of_Measurement | Test |
|---|---|---|---|
sgnptit |
Signed petition last 12 months | Nominal | Chi-squared test |
lr |
Placement on left right scale | Nominal | Chi-squared test |
agea |
Age | Ratio | T-test for independet variables |
vote |
Voted last national election | Nominal | T-test for independet variables |
Firstly, we select variables necessary for chi-square test. Next, there is a contigency table presented.
politics_chi <- politics13 %>%
select(lr, sgnptit)
politics_chi$sgnptit <- factor(politics_chi$sgnptit, labels = c("Yes", "No"), ordered= F,exclude = NA)
ContigencyTable <- table(politics_chi$lr, politics_chi$sgnptit)
kable(ContigencyTable)| Yes | No | |
|---|---|---|
| Left | 141 | 188 |
| Middle | 281 | 1036 |
| Right | 95 | 490 |
In order to check whether our categories are successful to run chi-square test, we are going to create stacked barplots and analyze them.
ggplot() +
geom_bar(data = politics_chi, aes(x = lr, fill = sgnptit), position = "fill")+
coord_flip()+
xlab("Party affiliation") +
ylab("Percentage of people") +
ggtitle("Participation in signing petitions due to party affiliation")sjp.xtab(politics_chi$lr, politics_chi$sgnptit, type = "bar", margin ="row",
bar.pos = "stack", title = "Participation in signing petitions due to party affiliation", title.wtd.suffix = NULL,
axis.titles = NULL, axis.labels = NULL, legend.title = NULL,
legend.labels = NULL, weight.by = NULL, rev.order = FALSE,
show.values = TRUE, show.n = TRUE, show.prc = TRUE, show.total = TRUE,
show.legend = TRUE, show.summary = TRUE, summary.pos = "r",
string.total = "Total", wrap.title = 50, wrap.labels = 15,
wrap.legend.title = 20, wrap.legend.labels = 20, geom.size = 0.7,
geom.spacing = 0.1, geom.colors = "Paired", dot.size = 3,
smooth.lines = FALSE, grid.breaks = 0.2, expand.grid = FALSE,
ylim = NULL, vjust = "bottom", hjust = "left", y.offset = NULL,
coord.flip = TRUE)With the introduction of the Internet, signing petitions has become available online. That is, the study of the largerst online petitions` platform (Change.org) has shown that this source is strongly biased toward liberal causes.
In Ireland, the half of the political parties presented are right(conservatism) or centre(social democracy, liberal conservatism, populism) and another half is left(socialism, respublicanism), so citizens have a wide spectrum of different views to share. Accordingly, in order to find out whether this distribution of the petition signatories due to their political preferences is random, we decided to build a Chi- square test.
The following hypotheses were approved for this:
So, let`s run chi-square test.
colnames(ContigencyTable) <- c("Petition +", "Petition -")
rownames(ContigencyTable) <- c("L", "R", "C")
chi.test <- chisq.test(ContigencyTable)
chi.test##
## Pearson's Chi-squared test
##
## data: ContigencyTable
## X-squared = 89.895, df = 2, p-value < 2.2e-16
Next, we have to look at residuals.
kable(chi.test$stdres) #for residuals| Petition + | Petition - | |
|---|---|---|
| L | 9.164265 | -9.164265 |
| R | -2.468595 | 2.468595 |
| C | -4.627590 | 4.627590 |
assocplot(t(ContigencyTable), main="Residuals and number of observations" )On the plot of residuals, we can see the confirmation of our conclusion on the Chi-test: the difference in the number of petitioners who belong to different political parties is too big to say that the variables are independent of each other. Especially distinguished are the liberals, in whose ranks the number of signatories of the petition for indicator 10 is greater than expected, if these variables were independent; as well as the “right” ones, where the indicator 6 is less than the expected number of people who signed the petitions, if these variables were independent.
Thus, we were convinced that, apparently, since the Chi-square test and the difference in the residuals indicate a lack of evidence in favor of the independence of these data, we can assert that the political preferences of the respondents and their desire to sign or not to sign petitions of any kinds are related.
Here we start with filtering data to delete values useless for our test.
politics_ttest <- politics13 %>%
select(agea, vote) %>%
filter(vote != 3) %>%
filter(vote != 7) %>%
filter(vote != 8)Next, Let’s compare mean values with the help of boxplot.
politics_ttest$vote <- factor(politics_ttest$vote, labels = c("Yes", "No"), ordered= F,exclude = NA)
ggplot() +
geom_boxplot(data = politics_ttest, aes(x = vote, y = agea), fill="#A44200", col="#A44200", alpha = 0.5) +
scale_y_continuous(limits = c(0,100)) +
xlab("Voted last national election") +
ylab("Age") +
ggtitle("Participation in the election due to age")There is the first way to check normality presented.
describeBy(politics_ttest, politics_ttest$vote)##
## Descriptive statistics by group
## group: Yes
## vars n mean sd median trimmed mad min max range skew
## agea 1 1689 54.93 16.41 55 54.98 19.27 19 92 73 -0.03
## vote* 2 1689 1.00 0.00 1 1.00 0.00 1 1 0 NaN
## kurtosis se
## agea -0.77 0.4
## vote* NaN 0.0
## --------------------------------------------------------
## group: No
## vars n mean sd median trimmed mad min max range skew
## agea 1 408 40.69 15.87 38 39.19 14.83 16 96 80 0.87
## vote* 2 408 2.00 0.00 2 2.00 0.00 2 2 0 NaN
## kurtosis se
## agea 0.44 0.79
## vote* NaN 0.00
Skewness is a measure of the symmetry in a distribution. The normal distribution is symmetrical, so skew should be equal to 0 in normal distribution. In the group of voters skew equals to 7.49, and in the group of non-voters the skew is higher, 8.71. The distribution of age is more symmetrical in the group of voters, but still it is far away from normal. However, both of skews are greater than 1, so both of the groups have a high positive skewness (right).
Kurtosis tells us, whether the distribution is peaked or plain. The kurtosis of the age in voters group equals to 55.29, and in the non-voters group kurtosis equals to 75.76. That means that the distribution of the first group (voters) is less sharp than the distribution of the second group.
Next, we check normality with the help of histogram.
library(ggplot2)
ggplot(politics_ttest, aes(x = agea, fill = vote)) +
geom_histogram(aes(y=..density..), position = "identity", alpha = 0.7, binwidth = 3) +
geom_density(col = "yellow", fill = "white", alpha = 0.1) +
geom_vline(aes(xintercept = mean(politics_ttest$agea), color = 'mean'), linetype="dashed", size=1) +
geom_vline(aes(xintercept = median(politics_ttest$agea), color = 'median'), linetype="longdash", size=1) +
scale_color_manual(name = "Measurement", values = c(median = "#cb3f68", mean = "#824acd")) +
xlab("Age") +
ylab("Density") +
ggtitle("Age distribution of voters and non-voters")Finally, we check normality with the help of Q-Q Plot.
#creating subgroups based on voting / non-voting
voteplus <- subset(politics_ttest[politics_ttest$vote == "Yes",])
voteminus <- subset(politics_ttest[politics_ttest$vote == "No",])
par(mfrow = c(1,2))
# y is limited from 18 because it is age at which the Irish are allowed to vote
qqnorm(voteplus$agea, ylim = c(18, 100), main = "Normal Q-Q Plot for vote+"); qqline(voteplus$agea,ylim = c(18 ,100), col= 2)
qqnorm(voteminus$agea, ylim = c(18 ,100), main = "Normal Q-Q Plot for vote-"); qqline(voteminus$agea, col= 2, ylim = c(18 ,100))Age is sometimes mentioned as one of the factor which can influence voting behaviour, but it seems that every country should be studied as a unique case.
As in the case with political preferences and signed petitions, we would like to find out whether voting behavior in Ireland is related to the respondent’s age, so, we should conduct a T-test. The following hypotheses were approved for this:
Now we are going to run T-test.
t.test(politics_ttest$agea ~ politics_ttest$vote)##
## Welch Two Sample t-test
##
## data: politics_ttest$agea by politics_ttest$vote
## t = 16.147, df = 634.29, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## 12.50096 15.96259
## sample estimates:
## mean in group Yes mean in group No
## 54.92540 40.69363
wilcox.test(agea ~ vote, data = politics_ttest)##
## Wilcoxon rank sum test with continuity correction
##
## data: agea by vote
## W = 509930, p-value < 2.2e-16
## alternative hypothesis: true location shift is not equal to 0
Thus, by operating on the data and having conducted several statistical tests, we can confidently assert the following: