ANOVA

2BK team: Bakhareva, Borisenko, Kireeva, Kuzmicheva

15/03/2019

Indentifying topic and describing individual contribution

Hello. We are 2BK. Our topic is “Politics”. The country we have chosen for studying is Ireland (round 8). Team members are Bakhareva Anastasia, Borisenko Iana, Kireeva Irina, Kuzmicheva Daria. We have focused on the results of the surveys connected both with politics in Ireland.

During the discussion of Ireland as a object of our research, we found out, that it is a quite welfare country. It is on the sixth place on a scale of human development index, which is extremely cool. However, this index, which is composed of life expectancy, education, and per capita income indicators, does not include the political aspects of a country. Since our research topic is Politics, we were concerned with this fact, and tried to figure out and explore, how does involvement in politics affect life satisfaction in Ireland. Our expectations were that the most interested in politics people have the highest level of life satisfaction, comparing to other people who are not that interested in political processes.

As for individual contribution, there it is done as follows:

Preparing data for analysis

Our research question is “Do irish people who are interested in politics to different extents have the same level of life satisfaction?”

In order to explore the issue we got the following data from the all-countries file:

library(dplyr)
library(ggplot2)
library(tidyverse)
library(psych)
library(magrittr)
library(knitr)
library(kableExtra)
library(readr)
library(foreign)
library(haven)
politics_media <- read_sav("ESS1-8e01.sav") 
politics = politics_media %>% 
  select( stflife, polintr)

politics = politics %>%
  filter(stflife != 77) %>%
  filter(stflife != 88) %>%
  filter(stflife != 99) 
politics.1 = politics %>% 
  select( stflife, polintr) %>% 
  filter(polintr != 7) %>% 
  filter(polintr != 8) %>% 
  filter(polintr != 9 ) 

Manipulating & Describing variables

Then, there is a description of chosen variables presented.

Label <- c("`polintr`", "`stflife`") 
Meaning <- c("How interested in politics", "How satisfied with life as a whole")
Level_Of_Measurement <- c("Ordinal", "Interval")
Measurement <- c("Very - Quite - Hardly - Not at all", "0 - 10")
df <- data.frame(Label, Meaning, Level_Of_Measurement, Measurement, stringsAsFactors = FALSE)
kable(df) %>% 
  kable_styling(bootstrap_options=c("bordered", "responsive","striped"), full_width = FALSE)
Label Meaning Level_Of_Measurement Measurement
polintr How interested in politics Ordinal Very - Quite - Hardly - Not at all
stflife How satisfied with life as a whole Interval 0 - 10
politics.2 = politics.1 %>% 
  select(polintr, stflife)
politics.2$polintr <- ifelse (politics.2$polintr == 1, "Very interested",
                    ifelse(politics.2$polintr == 2, "Quite interested", 
                    ifelse(politics.2$polintr == 3, "Hardly interested", "Not interested")))
politics.2$stflife <-  as.numeric(as.character(politics.2$stflife))
politics.2$polintr <- as.factor(politics.2$polintr)
politics.3 <- data.frame(politics.2$polintr,politics.2$stflife)
str(politics.3)
## 'data.frame':    2749 obs. of  2 variables:
##  $ politics.2.polintr: Factor w/ 4 levels "Hardly interested",..: 1 1 3 1 1 1 3 2 1 3 ...
##  $ politics.2.stflife: num  4 6 6 4 6 5 7 4 5 7 ...
politics.2$polintr <- factor(politics.2$polintr, c("Not interested", "Hardly interested", "Quite interested", "Very interested" ))

Values descriptives across the groups

politics.11 = politics.2 %>% 
  filter(politics.1$stflife != 88)

politics.11 = politics.11 %>% 
  filter(politics.11$stflife != 77)

politics.11 = politics.11 %>% 
  filter(politics.11$stflife != 99)


describeBy(politics.11$stflife, politics.11$polintr, mat = TRUE) %>% #create dataframe
  select(polintr = group1, N=n, Mean=mean, SD=sd, Median=median, Min=min, Max=max, 
                Skew=skew, Kurtosis=kurtosis, st.error = se) %>% 
  kable(align=c("lrrrrrrrr"), digits=2, row.names = FALSE,
        caption="Satisfaction with life by political preferences") %>% 
  kable_styling(bootstrap_options=c("bordered", "responsive","striped"), full_width = FALSE)
Satisfaction with life by political preferences
polintr N Mean SD Median Min Max Skew Kurtosis st.error
Not interested 744 7.12 2.00 7 0 10 -0.87 1.05 0.07
Hardly interested 733 7.23 1.90 8 0 10 -0.97 1.42 0.07
Quite interested 964 7.29 1.80 7 0 10 -1.02 2.00 0.06
Very interested 308 7.54 1.87 8 0 10 -0.99 1.35 0.11

By looking at this table we can conclude that the sizes of our groups are quite comparable

Looking at groups

Next, we are to look at groups` sizes to be sure that they are representative.

par(mar = c(3,10,0,3))
barplot(table(politics.11$polintr)/nrow(politics.11)*100, horiz = T, xlim = c(0,60), las = 2)

Now, by looking at the barplot, we also can conclude that the groups are of a comparable size.

Creating boxplot

ggplot()+
  geom_boxplot(data = politics.2, aes(x = polintr, y = stflife), fill="pink", col="purple", alpha = 0.5) +
  ylim(c(0,10)) +
  xlab("How interested in politics") + 
  ylab("Level of Life satisfaction") +
  ggtitle("Life satisfaction by the level of interest in politics")

Conclusion: From the boxplot we can see that the Y variables are quite normally distributed among the groups. However,there are several outliers. Moreover, it can be see that those, who are completely not interested in politics and those who are very interested in politics have the higher mean of life satisfaction level.

Homogeneity of variances

The next step is to check the assumptions for ANOVA-test. Then, let`s look at homogeneity of variances with the help of Levene test.

library(car)
leveneTest(politics.11$stflife ~ politics.11$polintr)
## Levene's Test for Homogeneity of Variance (center = median)
##         Df F value Pr(>F)
## group    3  1.7685  0.151
##       2745

Conclusion: From the results of the Levene’s Test it can be seen that the p-value is much higher than the significance level of 0.05. This means that there is no evidence to suggest that the variance among groups is statistically significantly different. Therefore, we can assume the homogeneity of variances in the different groups of political interest.

ANOVA test

oneway.test(politics.11$stflife ~ politics.11$polintr, var.equal = T)
## 
##  One-way analysis of means
## 
## data:  politics.11$stflife and politics.11$polintr
## F = 3.8028, num df = 3, denom df = 2745, p-value = 0.009808
aov.out <- aov(politics.11$stflife ~ politics.11$polintr)
summary(aov.out)
##                       Df Sum Sq Mean Sq F value  Pr(>F)   
## politics.11$polintr    3     41  13.562   3.803 0.00981 **
## Residuals           2745   9790   3.566                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Conclusion: As the p-value is less than the chosen significance level 0.05 we can conclude, that the differences in level of life satisfaction across the groups of political interest are not equal.

Normality of residuals

  1. By plots
layout(matrix(1:4, 2, 2))
plot(aov.out)

Conclusion: We can see that on the upper two graphs the red line is pretty straight. The lime on the Q-Q plot is not as straight. However, on the basis of these graphs, we can conclude that the distribution of residuals is quite normal.

  1. By skew and kurtosis
anova.res <- residuals(object = aov.out) 
describe(anova.res) 
##    vars    n mean   sd median trimmed  mad   min  max range  skew kurtosis
## X1    1 2749    0 1.89  -0.12    0.14 1.48 -7.54 2.88 10.42 -0.96      1.5
##      se
## X1 0.04

Conclusion: Skew and kurtosis are <2, so the distribution of residuals is normal

  1. By Shapiro test
shapiro.test(x = anova.res)
## 
##  Shapiro-Wilk normality test
## 
## data:  anova.res
## W = 0.93435, p-value < 2.2e-16

Conclusion: The p-value is extremely small, whcich testifies the non-normal distribution of residuals (!)

  1. By histogram
hist(anova.res, main = "Distribution of residuals", xlab = "Residuals", col = "pink", border = "#BC6B97")

Conclusion By looking at the histogram we can conclude that residuals are quite normally distributed

Overall conclusion: All the tests except the Shapiro test tell that the distribution of residuals is normal. So, the assumption of the normality of residuals holds.

Post-hoc Tukey test

In the ANOVA test a significant p-value indicates that means in some groups are different, though it doesn`t show, which pairs of groups this exactly are. To find this out, a post hoc test can be conducted to determine if the mean difference between specific pairs of group are statistically significant.

As variances across groups are practically equal, we chose Tukey test for that.

par(mar = c(5, 15, 3, 1)) 
Tukey <- TukeyHSD(aov.out) 
plot(Tukey, las = 2, col = "red" )

Conclusion The test results show, that only the difference between very interested in politics and not interested in politics groups is significant, since the projection of difference between means of these two groups cross the “0” line

Non-parametric test (Kruskal-Wallis)

As it could be seen from the boxplot, there are some outliers. Therefore we want to double-check our results using non-parametric test.

kruskal.test(politics.11$stflife ~ politics.11$polintr, data = politics_media) 
## 
##  Kruskal-Wallis rank sum test
## 
## data:  politics.11$stflife by politics.11$polintr
## Kruskal-Wallis chi-squared = 12.764, df = 3, p-value = 0.005176

Conclusion On the significance level of 5%, the test confrims the results of the ANOVA test, since p-value here is less than 0.05.

Dunn’s test

library(DescTools) 
DunnTest(politics.11$stflife ~ politics.11$polintr, data = politics_media)
## 
##  Dunn's test of multiple comparisons using rank sums : holm  
## 
##                                    mean.rank.diff   pval    
## Hardly interested-Not interested        51.233292 0.4132    
## Quite interested-Not interested         57.533139 0.3912    
## Very interested-Not interested         188.485145 0.0022 ** 
## Quite interested-Hardly interested       6.299848 0.8690    
## Very interested-Hardly interested      137.251854 0.0476 *  
## Very interested-Quite interested       130.952006 0.0476 *  
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Conclusion: The results of Dunn test show, that besides people who are very interested in politics and people who are not interested in politics at all there are two more pairs of groups, which differences in means are statistically significant. These are:

The rest pairs of groups of people with different levels of political interest have not statistically significant differences in means.

Total conclusion:

So, answering our research question, we can argue that some groups of Irish people who are differently interested in politics have a different average level of life satisfaction. To be more precise, the following groups have a significant differences:

People, who are quite interested in politics and hardly interested in politics do not have statistically significant differences in means of life satisfaction level. The same goes also for these pairs of groups:

After all these tests and analysis we can conclude that the Irish people who are interested in politics to different extents indeed have not the same level of life satisfaction. Moreover, our expectations are met and people with the highest political interest are most satisfied with life.

That’s all, thank you for your attention