ANOVA

2BK team: Bakhareva, Borisenko, Kireeva, Kuzmicheva

15/03/2019

Indentifying topic and describing individual contribution

Hello. We are 2BK. Our topic is “Politics”. The country we have chosen for studying is Ireland (round 8). Team members are Bakhareva Anastasia, Borisenko Iana, Kireeva Irina, Kuzmicheva Daria. We have focused on the results of the surveys connected both with politics and personal information on Ireland.

As for individual contribution, there it is done as follows:

Anastasia Bakhareva: smth

Iana Borisenko: smth

Irina Kireeva: smth

Daria Kuzmicheva: smth

Preparing data for analysis

library(dplyr)
library(ggplot2)
library(tidyverse)
library(psych)
library(magrittr)
library(knitr)
library(kableExtra)
library(readr)
politics_media <- read_csv("~/2nd round of hell/politics_media.csv")
politics = politics_media %>% 
  select( stflife, polintr)

politics = politics %>%
  filter(stflife != 77) %>%
  filter(stflife != 88) %>%
  filter(stflife != 99) 
politics.1 = politics %>% 
  select( stflife, polintr) %>% 
  filter(polintr != 7) %>% 
  filter(polintr != 8) %>% 
  filter(polintr != 9 ) 

Manipulating & Describing variables

Then, there is a description of chosen variables presented.

Label <- c("`polintr`", "`stflife`") 
Meaning <- c("How interested in politics", "How satisfied with life as a whole")
Level_Of_Measurement <- c("Ordinal", "Interval")
Measurement <- c("Very - Quite - Hardly - Not at all", "0 - 10")
df <- data.frame(Label, Meaning, Level_Of_Measurement, Measurement, stringsAsFactors = FALSE)
kable(df) %>% 
  kable_styling(bootstrap_options=c("bordered", "responsive","striped"), full_width = FALSE)
Label Meaning Level_Of_Measurement Measurement
polintr How interested in politics Ordinal Very - Quite - Hardly - Not at all
stflife How satisfied with life as a whole Interval 0 - 10
politics.2 = politics.1 %>% 
  select(polintr, stflife)
politics.2$polintr <- ifelse (politics.2$polintr == 1, "Very interested",
                    ifelse(politics.2$polintr == 2, "Quite interested", 
                    ifelse(politics.2$polintr == 3, "Hardly interested", "Not interested")))
politics.2$stflife <-  as.numeric(as.character(politics.2$stflife))
politics.2$polintr <- as.factor(politics.2$polintr)
politics.3 <- data.frame(politics.2$polintr,politics.2$stflife)
str(politics.3)
## 'data.frame':    2749 obs. of  2 variables:
##  $ politics.2.polintr: Factor w/ 4 levels "Hardly interested",..: 1 1 3 1 1 1 3 2 1 3 ...
##  $ politics.2.stflife: num  4 6 6 4 6 5 7 4 5 7 ...

Values descriptives across the groups

politics.11 = politics.2 %>% 
  filter(politics.1$stflife != 88)

politics.11 = politics.11 %>% 
  filter(politics.11$stflife != 77)

politics.11 = politics.11 %>% 
  filter(politics.11$stflife != 99)


describeBy(politics.11$stflife, politics.11$polintr, mat = TRUE) %>% #create dataframe
  select(polintr = group1, N=n, Mean=mean, SD=sd, Median=median, Min=min, Max=max, 
                Skew=skew, Kurtosis=kurtosis, st.error = se) %>% 
  kable(align=c("lrrrrrrrr"), digits=2, row.names = FALSE,
        caption="Satisfaction with life by political preferences") %>% 
  kable_styling(bootstrap_options=c("bordered", "responsive","striped"), full_width = FALSE)
Satisfaction with life by political preferences
polintr N Mean SD Median Min Max Skew Kurtosis st.error
Hardly interested 733 7.23 1.90 8 0 10 -0.97 1.42 0.07
Not interested 744 7.12 2.00 7 0 10 -0.87 1.05 0.07
Quite interested 964 7.29 1.80 7 0 10 -1.02 2.00 0.06
Very interested 308 7.54 1.87 8 0 10 -0.99 1.35 0.11

By Ann: nothing, but there should be smth

Looking at groups

par(mar = c(3,10,0,3))
barplot(table(politics.11$polintr)/nrow(politics.11)*100, horiz = T, xlim = c(0,60), las = 2)

By Ann: The groups are of comparable size now, and the continuous variable is normally distributed within the groups (this is good).

Creating boxplot

ggplot()+
  geom_boxplot(data = politics.11, aes(x = polintr, y = stflife), fill="pink", col="purple", alpha = 0.5) +
  ylim(c(0,10)) +
  xlab("How interested in politics") + 
  ylab("Level of Life satisfaction") +
  ggtitle("Life satisfaction due to the interest in politics")

By Ann: From this boxplot, we see that the Y variables is distributed rather normally in across the education groups and that the level of emancipative values is slightly higher among the better educated respondents.

Homogeneity of variances

library(car)
leveneTest(politics.11$stflife ~ politics.11$polintr)
## Levene's Test for Homogeneity of Variance (center = median)
##         Df F value Pr(>F)
## group    3  1.7685  0.151
##       2745

By Ann: Describe p-value.

ANOVA test

oneway.test(politics.11$stflife ~ politics.11$polintr, var.equal = T)
## 
##  One-way analysis of means
## 
## data:  politics.11$stflife and politics.11$polintr
## F = 3.8028, num df = 3, denom df = 2745, p-value = 0.009808
aov.out <- aov(politics.11$stflife ~ politics.11$polintr) # another function of ANOVA which should be used here
summary(aov.out)
##                       Df Sum Sq Mean Sq F value  Pr(>F)   
## politics.11$polintr    3     41  13.562   3.803 0.00981 **
## Residuals           2745   9790   3.566                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

By Ann: The F(df, df(residuals)) = F value, p-value < .001 means that the differences in the level of emancipative values across education groups are not equal.

Normality of residuals

  1. By plots
layout(matrix(1:4, 2, 2))
plot(aov.out)

  1. Conclusion. By Ann: In normally distributed residuals, you will see a straight red line in the two upper graphs, and a straight line in the Q-Q plot.

  2. By skew and kurtosis

anova.res <- residuals(object = aov.out) 
describe(anova.res) 
##    vars    n mean   sd median trimmed  mad   min  max range  skew kurtosis
## X1    1 2749    0 1.89  -0.12    0.14 1.48 -7.54 2.88 10.42 -0.96      1.5
##      se
## X1 0.04
  1. Conclusion. By Ann: look at skew and kurtosis, should be < 2

  2. By Shapiro test

shapiro.test(x = anova.res)
## 
##  Shapiro-Wilk normality test
## 
## data:  anova.res
## W = 0.93435, p-value < 2.2e-16
  1. Conclusion. By Ann: if the p-value is > .05, the distribution IS normal.

  2. By histogram

hist(anova.res, main = "Distribution of residuals", xlab = "Residuals", col = "pink", border = "#BC6B97")

  1. Conclusion
  2. Overall conclusion. By Ann: Here, skew and kurtosis are OK for a normal distribution. However, the Shapiro-Wilk test says that it is not normal. Visual analysis and skew and kurtosis agree here, which is why we can conclude that the normality assumption holds.