Indentifying topic and describing individual contribution

Hello. We are 2BK. Our topic is “Politics”. The country we have chosen for studying is Ireland (round 8). Team members are Bakhareva Anastasia, Borisenko Iana, Kireeva Irina, Kuzmicheva Daria. We have focused on the results of the surveys connected both with politics and personal information on Ireland.

During our work, our hypotheses and conclusions will be marked in this way.

As for individual contribution, there it is done as follows:

Anastasia Bakhareva: smth

Iana Borisenko: smth

Irina Kireeva: smth

Daria Kuzmicheva: smth

Preparing data for analysis

library(dplyr)
library(ggplot2)
library(tidyverse)
library(psych)
library(magrittr)
library(knitr)
library(kableExtra)
library(readr)

politics_media <- read_csv("~/2nd round of hell/politics_media.csv")

politics = politics_media %>% 
  select( stflife, polintr)

politics = politics %>%
  filter(stflife != 77) %>%
  filter(stflife != 88) %>%
  filter(stflife != 99)

politics.1 = politics %>% 
  select( stflife, polintr) %>% 
  filter(polintr != 7) %>% 
  filter(polintr != 8) %>% 
  filter(polintr != 9 )

Manipulating & Describing variables

Then, there is a description of chosen variables presented.

Label <- c("`polintr`", "`stflife`") 
Meaning <- c("How interested in politics", "How satisfied with life as a whole")
Level_Of_Measurement <- c("Ordinal", "Interval")
Measurement <- c("Very - Quite - Hardly - Not at all", "0 - 10")
df <- data.frame(Label, Meaning, Level_Of_Measurement, Measurement, stringsAsFactors = FALSE)
kable(df) %>% 
  kable_styling(bootstrap_options=c("bordered", "responsive","striped"), full_width = FALSE)

Label	Meaning	Level_Of_Measurement	Measurement
`polintr`	How interested in politics	Ordinal	Very - Quite - Hardly - Not at all
`stflife`	How satisfied with life as a whole	Interval	0 - 10

politics.2 = politics.1 %>% 
  select(polintr, stflife)
politics.2$polintr <- ifelse (politics.2$polintr == 1, "Very interested",
                    ifelse(politics.2$polintr == 2, "Quite interested", 
                    ifelse(politics.2$polintr == 3, "Hardly interested", "Not interested")))
politics.2$stflife <-  as.numeric(as.character(politics.2$stflife))
politics.2$polintr <- as.factor(politics.2$polintr)
politics.3 <- data.frame(politics.2$polintr,politics.2$stflife)
str(politics.3)

## 'data.frame':    2749 obs. of  2 variables:
##  $ politics.2.polintr: Factor w/ 4 levels "Hardly interested",..: 1 1 3 1 1 1 3 2 1 3 ...
##  $ politics.2.stflife: num  4 6 6 4 6 5 7 4 5 7 ...

Values descriptives across the groups

politics.11 = politics.2 %>% 
  filter(politics.1$stflife != 88)

politics.11 = politics.11 %>% 
  filter(politics.11$stflife != 77)

politics.11 = politics.11 %>% 
  filter(politics.11$stflife != 99)


describeBy(politics.11$stflife, politics.11$polintr, mat = TRUE) %>% #create dataframe
  select(polintr = group1, N=n, Mean=mean, SD=sd, Median=median, Min=min, Max=max, 
                Skew=skew, Kurtosis=kurtosis, st.error = se) %>% 
  kable(align=c("lrrrrrrrr"), digits=2, row.names = FALSE,
        caption="Satisfaction with life by political preferences") %>% 
  kable_styling(bootstrap_options=c("bordered", "responsive","striped"), full_width = FALSE)

Satisfaction with life by political preferences
polintr	N	Mean	SD	Median	Max	Skew	Kurtosis	st.error
Hardly interested	733	7.23	1.90	8	10	-0.97	1.42	0.07
Not interested	744	7.12	2.00	7	10	-0.87	1.05	0.07
Quite interested	964	7.29	1.80	7	10	-1.02	2.00	0.06
Very interested	308	7.54	1.87	8	10	-0.99	1.35	0.11

By Ann: nothing, but there should be smth

Looking at groups

par(mar = c(3,10,0,3))
barplot(table(politics.11$polintr)/nrow(politics.11)*100, horiz = T, xlim = c(0,60), las = 2)

By Ann: The groups are of comparable size now, and the continuous variable is normally distributed within the groups (this is good).

Creating boxplot

ggplot()+
  geom_boxplot(data = politics.11, aes(x = polintr, y = stflife), fill="pink", col="purple", alpha = 0.5) +
  ylim(c(0,10)) +
  xlab("How interested in politics") + 
  ylab("Level of Life satisfaction") +
  ggtitle("Life satisfaction due to the interest in politics")

By Ann: From this boxplot, we see that the Y variables is distributed rather normally in across the education groups and that the level of emancipative values is slightly higher among the better educated respondents.

Homogeneity of variances

library(car)
leveneTest(politics.11$stflife ~ politics.11$polintr)

## Levene's Test for Homogeneity of Variance (center = median)
##         Df F value Pr(>F)
## group    3  1.7685  0.151
##       2745

By Ann: Describe p-value.

ANOVA test

Здесь гипотеза!

oneway.test(politics.11$stflife ~ politics.11$polintr, var.equal = T)

## 
##  One-way analysis of means
## 
## data:  politics.11$stflife and politics.11$polintr
## F = 3.8028, num df = 3, denom df = 2745, p-value = 0.009808

aov.out <- aov(politics.11$stflife ~ politics.11$polintr) # another function of ANOVA which should be used here
summary(aov.out)

##                       Df Sum Sq Mean Sq F value  Pr(>F)   
## politics.11$polintr    3     41  13.562   3.803 0.00981 **
## Residuals           2745   9790   3.566                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

By Ann: The F(df, df(residuals)) = F value, p-value < .001 means that the differences in the level of emancipative values across education groups are not equal.

Normality of residuals

By plots

layout(matrix(1:4, 2, 2))
plot(aov.out)

Conclusion. By Ann: In normally distributed residuals, you will see a straight red line in the two upper graphs, and a straight line in the Q-Q plot.
By skew and kurtosis

anova.res <- residuals(object = aov.out) 
describe(anova.res)

##    vars    n mean   sd median trimmed  mad   min  max range  skew kurtosis
## X1    1 2749    0 1.89  -0.12    0.14 1.48 -7.54 2.88 10.42 -0.96      1.5
##      se
## X1 0.04

Conclusion. By Ann: look at skew and kurtosis, should be < 2
By Shapiro test

shapiro.test(x = anova.res)

## 
##  Shapiro-Wilk normality test
## 
## data:  anova.res
## W = 0.93435, p-value < 2.2e-16

Conclusion. By Ann: if the p-value is > .05, the distribution IS normal.
By histogram

hist(anova.res, main = "Distribution of residuals", xlab = "Residuals", col = "pink", border = "#BC6B97")

Conclusion
Overall conclusion. By Ann: Here, skew and kurtosis are OK for a normal distribution. However, the Shapiro-Wilk test says that it is not normal. Visual analysis and skew and kurtosis agree here, which is why we can conclude that the normality assumption holds.