Chi-squared and t-test

2BK team: Bakhareva, Borisenko, Kireeva, Kuzmicheva

24/02/2019

Indentifying topic and describing individual contribution

Hello. We are 2BK. Our topic is “Politics”. The country we have chosen for studying is Ireland (round 8). Team members are Bakhareva Anastasia, Borisenko Iana, Kireeva Irina, Kuzmicheva Daria. We have focused on the results of the surveys connected both with politics and personal information on Ireland.

As for individual contribution, there it is done as follows:

Anastasia Bakhareva: Constucting stacked barplot в„–2, The conclusion about the groups representativeness, construction of histogram and conclusion about distribution by histogram , creating report

Iana Borisenko: Constructing contigency table, constucting stacked barplot в„–1, assocplot building and conclusion about residuals, overall conclusion, creating report

Irina Kireeva: Manipulations with data set, creating new variable “politics13$lr”, boxplot conduction and conclusion, descriptive statistics for T-test, Wilcoxon test conduction and conclusions

Daria Kuzmicheva: Describing variables with different measurement scales, Chi-test conduction and conclusion, conclusion about distribution by skew Рё kurtosis, Q-Q plot construction and conclusions

Preparing data for analysis.

First of all, we run all the libraries necessary for the analysis.

library(readr)
library(ggplot2)
library(dplyr)
library(sjlabelled)
library(sjmisc) 
library(sjstats) 
library(ggeffects)  
library(sjPlot)
library(knitr)
library(psych)
library(foreign)

Next, we load out dataset. It is a combination of data connected with both politics and personal information.

politics_gender <- read_spss("politics_gender.sav")
## 
  |                                                                       
  |                                                                 |   0%
  |                                                                       
  |                                                                 |   1%
  |                                                                       
  |=                                                                |   1%
  |                                                                       
  |=                                                                |   2%
  |                                                                       
  |==                                                               |   2%
  |                                                                       
  |==                                                               |   3%
  |                                                                       
  |==                                                               |   4%
  |                                                                       
  |===                                                              |   4%
  |                                                                       
  |===                                                              |   5%
  |                                                                       
  |====                                                             |   6%
  |                                                                       
  |====                                                             |   7%
  |                                                                       
  |=====                                                            |   7%
  |                                                                       
  |=====                                                            |   8%
  |                                                                       
  |======                                                           |   8%
  |                                                                       
  |======                                                           |   9%
  |                                                                       
  |======                                                           |  10%
  |                                                                       
  |=======                                                          |  10%
  |                                                                       
  |=======                                                          |  11%
  |                                                                       
  |========                                                         |  12%
  |                                                                       
  |========                                                         |  13%
  |                                                                       
  |=========                                                        |  13%
  |                                                                       
  |=========                                                        |  14%
  |                                                                       
  |==========                                                       |  15%
  |                                                                       
  |==========                                                       |  16%
  |                                                                       
  |===========                                                      |  16%
  |                                                                       
  |===========                                                      |  17%
  |                                                                       
  |===========                                                      |  18%
  |                                                                       
  |============                                                     |  18%
  |                                                                       
  |============                                                     |  19%
  |                                                                       
  |=============                                                    |  19%
  |                                                                       
  |=============                                                    |  20%
  |                                                                       
  |=============                                                    |  21%
  |                                                                       
  |==============                                                   |  21%
  |                                                                       
  |==============                                                   |  22%
  |                                                                       
  |===============                                                  |  22%
  |                                                                       
  |===============                                                  |  23%
  |                                                                       
  |===============                                                  |  24%
  |                                                                       
  |================                                                 |  24%
  |                                                                       
  |================                                                 |  25%
  |                                                                       
  |=================                                                |  25%
  |                                                                       
  |=================                                                |  26%
  |                                                                       
  |=================                                                |  27%
  |                                                                       
  |==================                                               |  27%
  |                                                                       
  |==================                                               |  28%
  |                                                                       
  |===================                                              |  29%
  |                                                                       
  |===================                                              |  30%
  |                                                                       
  |====================                                             |  30%
  |                                                                       
  |====================                                             |  31%
  |                                                                       
  |=====================                                            |  32%
  |                                                                       
  |=====================                                            |  33%
  |                                                                       
  |======================                                           |  33%
  |                                                                       
  |======================                                           |  34%
  |                                                                       
  |======================                                           |  35%
  |                                                                       
  |=======================                                          |  35%
  |                                                                       
  |=======================                                          |  36%
  |                                                                       
  |========================                                         |  36%
  |                                                                       
  |========================                                         |  37%
  |                                                                       
  |========================                                         |  38%
  |                                                                       
  |=========================                                        |  38%
  |                                                                       
  |=========================                                        |  39%
  |                                                                       
  |==========================                                       |  39%
  |                                                                       
  |==========================                                       |  40%
  |                                                                       
  |==========================                                       |  41%
  |                                                                       
  |===========================                                      |  41%
  |                                                                       
  |===========================                                      |  42%
  |                                                                       
  |============================                                     |  42%
  |                                                                       
  |============================                                     |  43%
  |                                                                       
  |============================                                     |  44%
  |                                                                       
  |=============================                                    |  44%
  |                                                                       
  |=============================                                    |  45%
  |                                                                       
  |==============================                                   |  45%
  |                                                                       
  |==============================                                   |  46%
  |                                                                       
  |==============================                                   |  47%
  |                                                                       
  |===============================                                  |  47%
  |                                                                       
  |===============================                                  |  48%
  |                                                                       
  |================================                                 |  49%
  |                                                                       
  |================================                                 |  50%
  |                                                                       
  |=================================                                |  50%
  |                                                                       
  |=================================                                |  51%
  |                                                                       
  |==================================                               |  52%
  |                                                                       
  |==================================                               |  53%
  |                                                                       
  |===================================                              |  53%
  |                                                                       
  |===================================                              |  54%
  |                                                                       
  |===================================                              |  55%
  |                                                                       
  |====================================                             |  55%
  |                                                                       
  |====================================                             |  56%
  |                                                                       
  |=====================================                            |  56%
  |                                                                       
  |=====================================                            |  57%
  |                                                                       
  |=====================================                            |  58%
  |                                                                       
  |======================================                           |  58%
  |                                                                       
  |======================================                           |  59%
  |                                                                       
  |=======================================                          |  59%
  |                                                                       
  |=======================================                          |  60%
  |                                                                       
  |=======================================                          |  61%
  |                                                                       
  |========================================                         |  61%
  |                                                                       
  |========================================                         |  62%
  |                                                                       
  |=========================================                        |  62%
  |                                                                       
  |=========================================                        |  63%
  |                                                                       
  |=========================================                        |  64%
  |                                                                       
  |==========================================                       |  64%
  |                                                                       
  |==========================================                       |  65%
  |                                                                       
  |===========================================                      |  65%
  |                                                                       
  |===========================================                      |  66%
  |                                                                       
  |===========================================                      |  67%
  |                                                                       
  |============================================                     |  67%
  |                                                                       
  |============================================                     |  68%
  |                                                                       
  |=============================================                    |  69%
  |                                                                       
  |=============================================                    |  70%
  |                                                                       
  |==============================================                   |  70%
  |                                                                       
  |==============================================                   |  71%
  |                                                                       
  |===============================================                  |  72%
  |                                                                       
  |===============================================                  |  73%
  |                                                                       
  |================================================                 |  73%
  |                                                                       
  |================================================                 |  74%
  |                                                                       
  |================================================                 |  75%
  |                                                                       
  |=================================================                |  75%
  |                                                                       
  |=================================================                |  76%
  |                                                                       
  |==================================================               |  76%
  |                                                                       
  |==================================================               |  77%
  |                                                                       
  |==================================================               |  78%
  |                                                                       
  |===================================================              |  78%
  |                                                                       
  |===================================================              |  79%
  |                                                                       
  |====================================================             |  79%
  |                                                                       
  |====================================================             |  80%
  |                                                                       
  |====================================================             |  81%
  |                                                                       
  |=====================================================            |  81%
  |                                                                       
  |=====================================================            |  82%
  |                                                                       
  |======================================================           |  82%
  |                                                                       
  |======================================================           |  83%
  |                                                                       
  |======================================================           |  84%
  |                                                                       
  |=======================================================          |  84%
  |                                                                       
  |=======================================================          |  85%
  |                                                                       
  |========================================================         |  86%
  |                                                                       
  |========================================================         |  87%
  |                                                                       
  |=========================================================        |  87%
  |                                                                       
  |=========================================================        |  88%
  |                                                                       
  |==========================================================       |  89%
  |                                                                       
  |==========================================================       |  90%
  |                                                                       
  |===========================================================      |  90%
  |                                                                       
  |===========================================================      |  91%
  |                                                                       
  |===========================================================      |  92%
  |                                                                       
  |============================================================     |  92%
  |                                                                       
  |============================================================     |  93%
  |                                                                       
  |=============================================================    |  93%
  |                                                                       
  |=============================================================    |  94%
  |                                                                       
  |==============================================================   |  95%
  |                                                                       
  |==============================================================   |  96%
  |                                                                       
  |===============================================================  |  96%
  |                                                                       
  |===============================================================  |  97%
  |                                                                       
  |===============================================================  |  98%
  |                                                                       
  |================================================================ |  98%
  |                                                                       
  |================================================================ |  99%
  |                                                                       
  |=================================================================|  99%
  |                                                                       
  |=================================================================| 100%

Here we filter our data in order to delete all the observation useless for the analysis.

politics <- politics_gender %>% 
  select(agea, lrscale, sgnptit, vote)

politics <- politics[!is.na(politics$agea),]
politics <- politics[!is.na(politics$lrscale),]
politics <- politics[!is.na(politics$sgnptit),]
politics <- politics[!is.na(politics$vote),]
politics13 <- politics %>% 
  filter(lrscale != 77) %>% 
  filter(lrscale != 88) %>% 
  filter(lrscale != 99 ) 

politics13 <- politics13 %>%
  filter(sgnptit != 7) %>%
  filter(sgnptit != 8) %>%
  filter(sgnptit != 9) 

politics13 <- politics13 %>% 
  filter(agea != 999)

Describing variables

First, we modify one of the variables to make it comfortable for manipulations. Then, we update our dataset.

politics13$lr <- ifelse(politics13$lrscale <= 3, "Left",
                    ifelse(politics13$lrscale >= 7, "Right", "Middle"))
politics13 <- politics13 %>% 
  select(- lrscale)

Now, let`s look at the number of observations and the number of variables.

dim(politics13)
## [1] 2231    4

Then, there is a description of chosen variables presented.

Label <- c("`sgnptit`", "`lr`", "`agea`", "`vote`" ) 
Meaning <- c("Signed petition last 12 months", "Placement on left right scale", "Age", "Voted last national election")
Level_Of_Measurement <- c("Nominal", "Nominal", "Ratio", "Nominal")
Test <- c("Chi-squared test", "Chi-squared test", "T-test for independet variables", "T-test for independet variables")
df <- data.frame(Label, Meaning, Level_Of_Measurement,Test, stringsAsFactors = FALSE)
kable(df)
Label Meaning Level_Of_Measurement Test
sgnptit Signed petition last 12 months Nominal Chi-squared test
lr Placement on left right scale Nominal Chi-squared test
agea Age Ratio T-test for independet variables
vote Voted last national election Nominal T-test for independet variables

Exploratory analysis of categories

Firstly, we select variables necessary for chi-square test. Next, there is a contigency table presented.

politics_chi <- politics13 %>% 
  select(lr, sgnptit)
politics_chi$sgnptit <- factor(politics_chi$sgnptit, labels = c("Yes", "No"), ordered= F,exclude = NA)
ContigencyTable <- table(politics_chi$lr, politics_chi$sgnptit)
kable(ContigencyTable)
Yes No
Left 141 188
Middle 281 1036
Right 95 490

In order to check whether our categories are successful to run chi-square test, we are going to create stacked barplots and analyze them.

Stacked barplot №1

ggplot() +
  geom_bar(data = politics_chi, aes(x = lr, fill = sgnptit), position = "fill")+
  coord_flip()+
  xlab("Party affiliation") + 
  ylab("Percentage of people") +
  ggtitle("Participation in signing petitions due to party affiliation")

Stacked barplot №2

sjp.xtab(politics_chi$lr, politics_chi$sgnptit, type = "bar", margin ="row",
  bar.pos = "stack", title = "Participation in signing petitions due to party affiliation", title.wtd.suffix = NULL,
  axis.titles = NULL, axis.labels = NULL, legend.title = NULL,
  legend.labels = NULL, weight.by = NULL, rev.order = FALSE,
  show.values = TRUE, show.n = TRUE, show.prc = TRUE, show.total = TRUE,
  show.legend = TRUE, show.summary = TRUE, summary.pos = "r",
  string.total = "Total", wrap.title = 50, wrap.labels = 15,
  wrap.legend.title = 20, wrap.legend.labels = 20, geom.size = 0.7,
  geom.spacing = 0.1, geom.colors = "Paired", dot.size = 3,
  smooth.lines = FALSE, grid.breaks = 0.2, expand.grid = FALSE,
  ylim = NULL, vjust = "bottom", hjust = "left", y.offset = NULL,
  coord.flip = TRUE)

Chi-square test

With the introduction of the Internet, signing petitions has become available online. That is, the study of the largerst online petitions` platform (Change.org) has shown that this source is strongly biased toward liberal causes. Graph

In Ireland, the half of the political parties presented are right(conservatism) or centre(social democracy, liberal conservatism, populism) and another half is left(socialism, respublicanism), so citizens have a wide spectrum of different views to share. Accordingly, in order to find out whether this distribution of the petition signatories due to their political preferences is random, we decided to build a Chi- square test.

The following hypotheses were approved for this:

So, let`s run chi-square test.

colnames(ContigencyTable) <- c("Petition +", "Petition -")
rownames(ContigencyTable) <- c("L", "R", "C")
chi.test <- chisq.test(ContigencyTable)
chi.test
## 
##  Pearson's Chi-squared test
## 
## data:  ContigencyTable
## X-squared = 89.895, df = 2, p-value < 2.2e-16

Next, we have to look at residuals.

kable(chi.test$stdres) #for residuals
Petition + Petition -
L 9.164265 -9.164265
R -2.468595 2.468595
C -4.627590 4.627590
assocplot(t(ContigencyTable), main="Residuals and number of observations" )

Exploring data for T-test

Here we start with filtering data to delete values useless for our test.

politics_ttest <- politics13 %>% 
  select(agea, vote) %>% 
  filter(vote != 3) %>% 
  filter(vote != 7) %>% 
  filter(vote != 8)

Next, Let’s compare mean values with the help of boxplot.

politics_ttest$vote <- factor(politics_ttest$vote, labels = c("Yes", "No"), ordered= F,exclude = NA)
ggplot() +
  geom_boxplot(data = politics_ttest, aes(x = vote, y = agea), fill="#A44200", col="#A44200", alpha = 0.5) +
  scale_y_continuous(limits = c(0,100)) +
  xlab("Voted last national election") + 
  ylab("Age") +
  ggtitle("Participation in the election due to age")

Cheking normality of distribution

There is the first way to check normality presented.

describeBy(politics_ttest, politics_ttest$vote)
## 
##  Descriptive statistics by group 
## group: Yes
##       vars    n  mean    sd median trimmed   mad min max range  skew
## agea     1 1689 54.93 16.41     55   54.98 19.27  19  92    73 -0.03
## vote*    2 1689  1.00  0.00      1    1.00  0.00   1   1     0   NaN
##       kurtosis  se
## agea     -0.77 0.4
## vote*      NaN 0.0
## -------------------------------------------------------- 
## group: No
##       vars   n  mean    sd median trimmed   mad min max range skew
## agea     1 408 40.69 15.87     38   39.19 14.83  16  96    80 0.87
## vote*    2 408  2.00  0.00      2    2.00  0.00   2   2     0  NaN
##       kurtosis   se
## agea      0.44 0.79
## vote*      NaN 0.00

Next, we check normality with the help of histogram.

library(ggplot2)
ggplot(politics_ttest, aes(x = agea, fill = vote)) +
      geom_histogram(aes(y=..density..), position = "identity", alpha = 0.7, binwidth = 3) +
  geom_density(col = "yellow", fill = "white", alpha = 0.1) +
  geom_vline(aes(xintercept = mean(politics_ttest$agea), color = 'mean'), linetype="dashed", size=1) +
  geom_vline(aes(xintercept = median(politics_ttest$agea), color = 'median'), linetype="longdash", size=1) +
  scale_color_manual(name = "Measurement", values = c(median = "#cb3f68", mean = "#824acd")) +
  xlab("Age") + 
  ylab("Density") +
  ggtitle("Age distribution of voters and non-voters")

Finally, we check normality with the help of Q-Q Plot.

#creating subgroups based on voting / non-voting
voteplus <- subset(politics_ttest[politics_ttest$vote == "Yes",]) 
voteminus <- subset(politics_ttest[politics_ttest$vote == "No",])
par(mfrow = c(1,2))
# y is limited from 18 because it is age at which the Irish are allowed to vote
qqnorm(voteplus$agea, ylim = c(18, 100), main = "Normal Q-Q Plot for vote+"); qqline(voteplus$agea,ylim = c(18 ,100), col= 2)
qqnorm(voteminus$agea, ylim = c(18 ,100), main = "Normal Q-Q Plot for vote-"); qqline(voteminus$agea, col= 2, ylim = c(18 ,100))

Conducting T-test

Age is sometimes mentioned as one of the factor which can influence voting behaviour, but it seems that every country should be studied as a unique case.

As in the case with political preferences and signed petitions, we would like to find out whether voting behavior in Ireland is related to the respondent’s age, so, we should conduct a T-test. The following hypotheses were approved for this:

Now we are going to run T-test.

t.test(politics_ttest$agea ~ politics_ttest$vote)
## 
##  Welch Two Sample t-test
## 
## data:  politics_ttest$agea by politics_ttest$vote
## t = 16.147, df = 634.29, p-value < 2.2e-16
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  12.50096 15.96259
## sample estimates:
## mean in group Yes  mean in group No 
##          54.92540          40.69363

Double-checking results with non-parametric test

wilcox.test(agea ~ vote, data = politics_ttest)
## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  agea by vote
## W = 509930, p-value < 2.2e-16
## alternative hypothesis: true location shift is not equal to 0

Overall conclusion

Thus, by operating on the data and having conducted several statistical tests, we can confidently assert the following: