11 марта 2018 г

First work

Goal of our work

We took the dataset on UA in 2012 to do research about how people evaluate their goverment and actions of politicians. The future work will be a comparison of these attitudes between different years to show changes. Maybe later we even can say about how Maydan evolve - because of people (if their attitudes changed to worse) or it was created artificially by someone (if attitudes of people didn't change). Now we want to describe one particular dataset to show the variables that we need in our work:

Variable Qualitative_or_Quantitative Level_of_measurement Continuous_or_Discrete
polintr: How interested in politics Quantitative Ordinal Discrete
trstprl: Trust in countrys parliament Quantitative Ordinal Discrete
trstlgl: Trust in the legal system Quantitative Ordinal Discrete
trstplc: Trust in the police Quantitative Ordinal Discrete
trstplt: Trust in politicians Quantitative Ordinal Discrete
trstprt: Trust in political parties Quantitative Ordinal Discrete
vote: Voted last national election Quantitative Ordinal Discrete
contplt: Contacted politician or government official last 12 months Quantitative Ordinal Discrete
pbldmn: Taken part in lawful public demonstration last 12 months Quantitative Ordinal Discrete
implvdm: How important for you to live in democratically governed country Quantitative Ordinal Discrete
dmcntov: How democratic [country] is overall Quantitative Ordinal Discrete
stflife: How satisfied with life as a whole Quantitative Ordinal Discrete
stfgov: How satisfied with the national government Quantitative Ordinal Discrete

Working with the data

Fisrt step - upload the ESS6UA dataset to the environment and create a new data set to more comfortable work with only those variables that we need:

getwd()
## [1] "C:/Users/Meatya Glotov/Documents/Arrr/ADishe/ADPortfolio"
politics = haven::read_sav("ESS6UA.sav")
politics <- dplyr::select(politics, polintr, trstprl, trstlgl, trstplc, trstplt, trstprt, vote, contplt, pbldmn, implvdm, dmcntov, stflife, stfgov)

politics <- na.omit(politics)

Second step - downloading packages that we need in our work.

library(ggplot2)
library(dplyr)
library(knitr)
library(rmarkdown)

Also add the "Mode" function to build graphs.

Mode <- function(x) {
  ux <- unique(x)
  ux[which.max(tabulate(match(x, ux)))]
}

Table statistics on single variables

##     polintr         trstprl          trstlgl          trstplc      
##  Min.   :1.000   Min.   : 0.000   Min.   : 0.000   Min.   : 0.000  
##  1st Qu.:2.000   1st Qu.: 0.000   1st Qu.: 0.000   1st Qu.: 0.000  
##  Median :3.000   Median : 1.000   Median : 1.000   Median : 1.000  
##  Mean   :2.787   Mean   : 1.776   Mean   : 1.786   Mean   : 1.922  
##  3rd Qu.:3.000   3rd Qu.: 3.000   3rd Qu.: 3.000   3rd Qu.: 3.000  
##  Max.   :4.000   Max.   :10.000   Max.   :10.000   Max.   :10.000  
##     trstplt          trstprt            vote          contplt     
##  Min.   : 0.000   Min.   : 0.000   Min.   :1.000   Min.   :1.000  
##  1st Qu.: 0.000   1st Qu.: 0.000   1st Qu.:1.000   1st Qu.:2.000  
##  Median : 1.000   Median : 1.000   Median :1.000   Median :2.000  
##  Mean   : 1.719   Mean   : 1.908   Mean   :1.245   Mean   :1.921  
##  3rd Qu.: 3.000   3rd Qu.: 3.000   3rd Qu.:1.000   3rd Qu.:2.000  
##  Max.   :10.000   Max.   :10.000   Max.   :3.000   Max.   :2.000  
##      pbldmn         implvdm          dmcntov          stflife      
##  Min.   :1.000   Min.   : 0.000   Min.   : 0.000   Min.   : 0.000  
##  1st Qu.:2.000   1st Qu.: 6.000   1st Qu.: 2.000   1st Qu.: 3.000  
##  Median :2.000   Median : 8.000   Median : 4.000   Median : 5.000  
##  Mean   :1.975   Mean   : 7.373   Mean   : 4.002   Mean   : 4.996  
##  3rd Qu.:2.000   3rd Qu.:10.000   3rd Qu.: 6.000   3rd Qu.: 7.000  
##  Max.   :2.000   Max.   :10.000   Max.   :10.000   Max.   :10.000  
##      stfgov     
##  Min.   : 0.00  
##  1st Qu.: 1.00  
##  Median : 2.00  
##  Mean   : 2.45  
##  3rd Qu.: 4.00  
##  Max.   :10.00

Single variables with central tendency measures

This chart illustrates how many people in Ukraine trust in the parliament, where trust varies from 0- not trust at all to 10-complete trust. According to this graph, we can conclude that in average (mean is red) ukrainians do not trust in parliament. The median (is blue) is in the left side, we can say that there were people who trust in the lowest level or do not trust at all.

This graph shows the level of importance democratic regimes for ukrainian people, where importance varies from 0- Not at all important to 10- Extremely important. In average democracy is important for people. Lots of people conclude that democracy is significantly important.

Meaningful binary combinations of variables - categorical by categorical

On these boxplots we can see the corelation between life-satisfaction and evaluation of democracy in country - we need to know more about it and prove this corelation with statistical tests.

This graph is really complicated and shows us that people who voted in the last election and less satisfied with the national goverment attend to public demostrations more

In conclusion we want to notice that this work can will be really important in researchs about how revolutions are evolving and they are created by people's will or artificially by some small group of people.

Second work

t-test

Variables:

dmcntov: How democratic [country] is overall - "0" means not democratic at all, "10" means fully democratic

gndr: Gender

Hypothesis

We assume that there is significant difference between how males and females evaluate how democratic Ukraine:

H0 - there is no significant difference between genders.

H1 - there is significant difference between genders.

Firstly test our variable on normality:

## 
##  Shapiro-Wilk normality test
## 
## data:  as.numeric(DA2$dmcntov)
## W = 0.96228, p-value < 0.00000000000000022

QQ-plot and Shapiro-Wilk normality test show that our variable haven't normal distribution.

Secondly test homogeneity of variances in groups:

## 
##  Bartlett test of homogeneity of variances
## 
## data:  DA2$dmcntov by DA2$gndr
## Bartlett's K-squared = 0.063856, df = 1, p-value = 0.8005
## Levene's Test for Homogeneity of Variance (center = median)
##         Df F value Pr(>F)
## group    1  0.7668 0.3813
##       1958

The p-values of Levene Test=0.3813 and Bartlett Test=0.8005 both above the significance level of 0.05 so we can assume that variances are equal.

## 
##  Two Sample t-test
## 
## data:  DA2$dmcntov by DA2$gndr
## t = -4.2878, df = 1958, p-value = 0.00001893
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -0.7281001 -0.2710835
## sample estimates:
##   mean in group Male mean in group Female 
##             3.722449             4.222041

With p-value=0.00001995 we reject H0 hypothesis, therefore, different groups define how Ukraine democratic overall.

Distribution in boxplots

chi-square

Variables:

vote: Voted last national election: "1" means "Yes", "2" means "No".

gndr: Gender: "1" means "Male", "2" means "Female".

Hypothesis:

We assume that females and males voted differently - gender had influence on voting behaviour on the last national elections.

H0 - there is no significant association between gender and voting behaviour.

H1 - there is significant association between gender and voting behaviour.

Chi-square test:

##       Voted
## Gender Male Female
##    Yes  543    976
##    No   192    249
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  DA2$vote and DA2$gndr
## X-squared = 8.5204, df = 1, p-value = 0.003512

With p-value=0.003512 we can decline H0-hypothesis and say that there is significant association between categories. From that we can assume that gender may have influence on voting behaviour on the last national elections.

Pearson residuals:

Third work

Variables:

stfgov: How satisfied with the national government: from 1-completely dissatisfied to 10-extremely satisfied.

prtcldua: Which party feel closer to, Ukraine: to which party belongs.

Information about variables

Summary:

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00    1.00    2.00    2.88    5.00   10.00

Size of groups:

## Fatherland    Freedom Communists       UDAR    Regions      Other 
##        245        119         94        140        274         21

Means of each group:

## Fatherland    Freedom Communists       UDAR    Regions      Other 
##   2.048980   1.478992   2.563830   2.314286   4.726277   1.619048

Variances of each group:

## Fatherland    Freedom Communists       UDAR    Regions      Other 
##   3.841853   2.505911   4.291581   4.634327   5.532860   3.247619

Distribution in boxplots:

From these boxplots we see that Party of Regions have a way more satisfaction with national goverment, but not that much (near the 5) and that's interesting, that governing party not really satisfied with themselves.

Hypothesis

We assume that there is significant difference between adherents of different political parties according to their satisfaction with the national government:

H0 - there is no significant difference between adherents of different political parties.

H1 - there is significant difference between adherents of different political parties.

Check normality:

## 
##  Shapiro-Wilk normality test
## 
## data:  as.numeric(politics$stfgov)
## W = 0.91427, p-value < 0.00000000000000022

QQ-plot and Shapiro-Wilk normality test show that our variable haven’t normal distribution.

Check equality of variances:

## Levene's Test for Homogeneity of Variance (center = median)
##        Df F value     Pr(>F)    
## group   5  5.7439 0.00003163 ***
##       887                       
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
##  Bartlett test of homogeneity of variances
## 
## data:  politics$stfgov by politics$prtcldua
## Bartlett's K-squared = 26.356, df = 5, p-value = 0.00007611

The p-values of Levene Test=0.00003163 and Bartlett Test=0.00007611 above the significance level of 0.05 so we can assume that variances are unequal. From that we set "var.equal = FALSE"

OneWay-ANOVA:

oneway.test(politics$stfgov~politics$prtcldua, var.equal = FALSE)
## 
##  One-way analysis of means (not assuming equal variances)
## 
## data:  politics$stfgov and politics$prtcldua
## F = 61.338, num df = 5.00, denom df = 165.21, p-value <
## 0.00000000000000022

The p-value of OneWay.ANOVA=0.00000000000000022 above the significance level of 0.05 so we can assept H1-hypothesis and can assume that there is difference between adherents of different political parties according to their satisfaction with the national government. But because of not normal distribution will be better to use non-parametric Kruskal Test

Non-parametric test:

kruskal.test(politics$stfgov~politics$prtcldua)
## 
##  Kruskal-Wallis rank sum test
## 
## data:  politics$stfgov by politics$prtcldua
## Kruskal-Wallis chi-squared = 240.63, df = 5, p-value <
## 0.00000000000000022

The p-value of Kruskal-Wallis=0.00000000000000022 so we have the same p-value with ANOVA and the same conclusion.

Tukey Test

##   Tukey multiple comparisons of means
##     95% family-wise confidence level
## 
## Fit: aov(formula = stfgov ~ prtcldua, data = politics)
## 
## $prtcldua
##               diff         lwr         upr     p adj
## FD-FL   -0.5699880 -1.23498096  0.09500497 0.1411635
## CM-FL    0.5148502 -0.20721392  1.23691431 0.3224786
## U-FL     0.2653061 -0.36522450  0.89583675 0.8361185
## PR-FL    2.6772978  2.15400079  3.20059477 0.0000000
## Othr-FL -0.4299320 -1.78316008  0.92329613 0.9446633
## CM-FD    1.0848382  0.26358839  1.90608799 0.0023718
## U-FD     0.8352941  0.09324054  1.57734770 0.0169702
## PR-FD    3.2472858  2.59389892  3.90067263 0.0000000
## Othr-FD  0.1400560 -1.26859615  1.54870820 0.9997518
## U-CM    -0.2495441 -1.04314628  0.54405813 0.9469801
## PR-CM    2.1624476  1.45105787  2.87383730 0.0000000
## Othr-CM -0.9447822 -2.38125768  0.49169334 0.4161937
## PR-U     2.4119917  1.79371372  3.03026960 0.0000000
## Othr-U  -0.6952381 -2.08795278  0.69747658 0.7113682
## Othr-PR -3.1072298 -4.45479239 -1.75966712 0.0000000

According to this output we can see that differences Freedom-Fatherland, Communists-Fatherland, Udar-Fatherland, Other-Fatherland, Udar-Communists, Other-Communists, Other-Udar are not significant, while Regions-Fatherland, Communists-Freedom, Regions-Fatherland, Regions-Communists, Regions-Uda, Other-Regions have a significant level of difference.

Significant difference Not significant difference
Regions-Fatherland Freedom-Fatherland
Communists-Freedom Communists-Fatherland
Regions-Freedom Udar-Fatherland
Regions-Communists Other-Fatherland
Regions-Udar Udar-Communists
Other-Regions Other-Communists
Other-Udar

Our data was someway obvious cause Party of Regions was governing party at 2012 so this party have significant dofference with other parties in satisfaction with national goverment, but we proved it statistically.

Omega squared:

## [1] 0.2656263

And omega-squared=0.2656263 shows us that we have slightly dependence between satisfaction with national goverment and chosen political party.