2BK team: Bakhareva, Borisenko, Kireeva, Kuzmicheva
15/03/2019
Hello. We are 2BK. Our topic is “Politics”. The country we have chosen for studying is Ireland (round 8). Team members are Bakhareva Anastasia, Borisenko Iana, Kireeva Irina, Kuzmicheva Daria. We have focused on the results of the surveys connected both with politics and personal information on Ireland.
As for individual contribution, there it is done as follows:
Anastasia Bakhareva: smth
Iana Borisenko: smth
Irina Kireeva: smth
Daria Kuzmicheva: smth
library(dplyr)
library(ggplot2)
library(tidyverse)
library(psych)
library(magrittr)
library(knitr)
library(kableExtra)
library(readr)politics_media <- read_csv("~/2nd round of hell/politics_media.csv")politics = politics_media %>%
select( stflife, polintr)
politics = politics %>%
filter(stflife != 77) %>%
filter(stflife != 88) %>%
filter(stflife != 99) politics.1 = politics %>%
select( stflife, polintr) %>%
filter(polintr != 7) %>%
filter(polintr != 8) %>%
filter(polintr != 9 ) Then, there is a description of chosen variables presented.
Label <- c("`polintr`", "`stflife`")
Meaning <- c("How interested in politics", "How satisfied with life as a whole")
Level_Of_Measurement <- c("Ordinal", "Interval")
Measurement <- c("Very - Quite - Hardly - Not at all", "0 - 10")
df <- data.frame(Label, Meaning, Level_Of_Measurement, Measurement, stringsAsFactors = FALSE)
kable(df) %>%
kable_styling(bootstrap_options=c("bordered", "responsive","striped"), full_width = FALSE)| Label | Meaning | Level_Of_Measurement | Measurement |
|---|---|---|---|
polintr
|
How interested in politics | Ordinal | Very - Quite - Hardly - Not at all |
stflife
|
How satisfied with life as a whole | Interval | 0 - 10 |
politics.2 = politics.1 %>%
select(polintr, stflife)
politics.2$polintr <- ifelse (politics.2$polintr == 1, "Very interested",
ifelse(politics.2$polintr == 2, "Quite interested",
ifelse(politics.2$polintr == 3, "Hardly interested", "Not interested")))
politics.2$stflife <- as.numeric(as.character(politics.2$stflife))
politics.2$polintr <- as.factor(politics.2$polintr)
politics.3 <- data.frame(politics.2$polintr,politics.2$stflife)
str(politics.3)## 'data.frame': 2749 obs. of 2 variables:
## $ politics.2.polintr: Factor w/ 4 levels "Hardly interested",..: 1 1 3 1 1 1 3 2 1 3 ...
## $ politics.2.stflife: num 4 6 6 4 6 5 7 4 5 7 ...
politics.11 = politics.2 %>%
filter(politics.1$stflife != 88)
politics.11 = politics.11 %>%
filter(politics.11$stflife != 77)
politics.11 = politics.11 %>%
filter(politics.11$stflife != 99)
describeBy(politics.11$stflife, politics.11$polintr, mat = TRUE) %>% #create dataframe
select(polintr = group1, N=n, Mean=mean, SD=sd, Median=median, Min=min, Max=max,
Skew=skew, Kurtosis=kurtosis, st.error = se) %>%
kable(align=c("lrrrrrrrr"), digits=2, row.names = FALSE,
caption="Satisfaction with life by political preferences") %>%
kable_styling(bootstrap_options=c("bordered", "responsive","striped"), full_width = FALSE)| polintr | N | Mean | SD | Median | Min | Max | Skew | Kurtosis | st.error |
|---|---|---|---|---|---|---|---|---|---|
| Hardly interested | 733 | 7.23 | 1.90 | 8 | 0 | 10 | -0.97 | 1.42 | 0.07 |
| Not interested | 744 | 7.12 | 2.00 | 7 | 0 | 10 | -0.87 | 1.05 | 0.07 |
| Quite interested | 964 | 7.29 | 1.80 | 7 | 0 | 10 | -1.02 | 2.00 | 0.06 |
| Very interested | 308 | 7.54 | 1.87 | 8 | 0 | 10 | -0.99 | 1.35 | 0.11 |
By Ann: nothing, but there should be smth
par(mar = c(3,10,0,3))
barplot(table(politics.11$polintr)/nrow(politics.11)*100, horiz = T, xlim = c(0,60), las = 2)By Ann: The groups are of comparable size now, and the continuous variable is normally distributed within the groups (this is good).
ggplot()+
geom_boxplot(data = politics.11, aes(x = polintr, y = stflife), fill="pink", col="purple", alpha = 0.5) +
ylim(c(0,10)) +
xlab("How interested in politics") +
ylab("Level of Life satisfaction") +
ggtitle("Life satisfaction due to the interest in politics")By Ann: From this boxplot, we see that the Y variables is distributed rather normally in across the education groups and that the level of emancipative values is slightly higher among the better educated respondents.
library(car)
leveneTest(politics.11$stflife ~ politics.11$polintr)## Levene's Test for Homogeneity of Variance (center = median)
## Df F value Pr(>F)
## group 3 1.7685 0.151
## 2745
By Ann: Describe p-value.
oneway.test(politics.11$stflife ~ politics.11$polintr, var.equal = T)##
## One-way analysis of means
##
## data: politics.11$stflife and politics.11$polintr
## F = 3.8028, num df = 3, denom df = 2745, p-value = 0.009808
aov.out <- aov(politics.11$stflife ~ politics.11$polintr) # another function of ANOVA which should be used here
summary(aov.out)## Df Sum Sq Mean Sq F value Pr(>F)
## politics.11$polintr 3 41 13.562 3.803 0.00981 **
## Residuals 2745 9790 3.566
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
By Ann: The F(df, df(residuals)) = F value, p-value < .001 means that the differences in the level of emancipative values across education groups are not equal.
layout(matrix(1:4, 2, 2))
plot(aov.out)Conclusion. By Ann: In normally distributed residuals, you will see a straight red line in the two upper graphs, and a straight line in the Q-Q plot.
By skew and kurtosis
anova.res <- residuals(object = aov.out)
describe(anova.res) ## vars n mean sd median trimmed mad min max range skew kurtosis
## X1 1 2749 0 1.89 -0.12 0.14 1.48 -7.54 2.88 10.42 -0.96 1.5
## se
## X1 0.04
Conclusion. By Ann: look at skew and kurtosis, should be < 2
By Shapiro test
shapiro.test(x = anova.res)##
## Shapiro-Wilk normality test
##
## data: anova.res
## W = 0.93435, p-value < 2.2e-16
Conclusion. By Ann: if the p-value is > .05, the distribution IS normal.
By histogram
hist(anova.res, main = "Distribution of residuals", xlab = "Residuals", col = "pink", border = "#BC6B97")