Areas of members’ responsibilities:
While the country of our interest is still Germany, in this particular project we use data from two themes, namely Social Demographics and Climate Change, in order to investigate the connection between them. The data was collected in 2016.
library(foreign)
library(ggplot2)
library(corrplot)
library(gridExtra)
library(sjPlot)
library(knitr)
#Uploading data
data <- read.spss("ESS8DE.sav", use.value.labels = TRUE, to.data.frame = TRUE)
Let’s begin with Chi-squared test. As two categorical variables, which are necessary to apply chi-square test, we decided to take gender and people’s attitudes towards increasing taxes on fossil fuels, such as oil, gas and coal with the aim to reduce climate change.
#Removing NAs from dataset which is used in analysis
data1 <- data[!is.na(data$inctxff),]
#Creating variables and assigning values to them
gender <- data1$gndr
att <- data1$inctxff
Here is a short description of our variables:
## Male Female
## 1493 1321
## Strongly in favour Somewhat in favour
## 241 816
## Neither in favour nor against Somewhat against
## 668 798
## Strongly against
## 291
Now, in order to make assumptions we need to visualize data regarding taken variable. Here is a stacked barplot visualizing gender composition for each point of view and bar chart which shows frequences of answers in each category by gender.
#Creating plots
set_theme(legend.pos = "top", legend.inside = TRUE, axis.textsize = 0.8, title.align = "center")
plot1 <- sjp.xtab(att,gender, bar.pos = "stack", legend.title = "Gender",
axis.titles = "Attitude towards tax increase", title = "Gender composition of taxation attitude", show.total = FALSE, margin = "row",
geom.colors = (palette = "Pastel2"))
plot2 <- sjp.grpfrq(att, gender, type = "bar", legend.title = "Gender", geom.spacing = - 1,
axis.titles = "Attitude towards tax increase", title = "Attitude distribution by gender", show.prc = FALSE, geom.colors = (palette = "Pastel2"))
grid.arrange(plot1, plot2, ncol=2)
As it can be seen from the graphs, the proportion of men and proportion of women in all five of the options are not equal. The only option in which the proportion of women is bigger is “Neither in favour nor against”, while the biggest difference is observed in the “Strongly against” category. The second biggest difference is observed in “Strongly in favour” category, while remaining two categories differ by around 10%. “Somewhat against” was the most popular option for men, “Neither in favour nor against” – for women. “Strongly in favour” was chosen by the fewest number of men and women.
Overall, women tend to stick to the “Neither in favour nor against” option, remaining neutral, while men are likely to express their opinion and chose a side.
We then make sure to match all necessary assumptions of chi-square test:
Here are hypotheses for the test:
In order to check dependece of chosen variables we need to apply Pearson’s Chi-squared test. For this it is necessary to create a contingency table which contains observed frequencies.
ct<-table(gender, att)
kable(ct)
| Strongly in favour | Somewhat in favour | Neither in favour nor against | Somewhat against | Strongly against | |
|---|---|---|---|---|---|
| Male | 143 | 440 | 281 | 446 | 183 |
| Female | 98 | 376 | 387 | 352 | 108 |
Then, we run Pearson’s Chi-squared test using function chisq.test().
test <-chisq.test(ct)
test
##
## Pearson's Chi-squared test
##
## data: ct
## X-squared = 50.32, df = 4, p-value = 3.096e-10
We obtained the Chi-square statistic of 50.32 and a p-value equal to 3.096e-10 (0.0000000003096) having the degree of freedom 4. The critical value of x2 with degree of freedom 4 and significance level 0.05 is 9.49. Obtained Chi-square statistic exceeds such critical value, and p-value is a lot smaller than the significance level of 0.05, meaning that the probability to obtain the observed, or more extreme, results if the null hypothesis (H0) of a study question is true (variables are independent) is extremely low. Therefore, since we have a strong evidence of dependence between variables, we cannot accept the null hypothesis.
Since the Chi-square test statistic is significant, we would like to take a look on residuals. So, let’s create tables with expected and observed freaquences and then with residuals.
| Strongly in favour | Somewhat in favour | Neither in favour nor against | Somewhat against | Strongly against | |
|---|---|---|---|---|---|
| Male | 127.8653 | 432.9382 | 354.4151 | 423.3881 | 154.3934 |
| Female | 113.1347 | 383.0618 | 313.5849 | 374.6119 | 136.6066 |
| Strongly in favour | Somewhat in favour | Neither in favour nor against | Somewhat against | Strongly against | |
|---|---|---|---|---|---|
| Male | 143 | 440 | 281 | 446 | 183 |
| Female | 98 | 376 | 387 | 352 | 108 |
| Strongly in favour | Somewhat in favour | Neither in favour nor against | Somewhat against | Strongly against | |
|---|---|---|---|---|---|
| Male | 2.042912 | 0.5878676 | -6.517588 | 1.894942 | 3.548674 |
| Female | -2.042912 | -0.5878676 | 6.517588 | -1.894942 | -3.548674 |
According to the table, there are residuals with absolut value bigger than 2. We then proceed to draw an association plot in order to take a closer look at residuals.
We observe that in regards to male respondents all cells except for “Neither in favour nor against” category contain more observations than we would expect in case of variables independence. For women it’s the other way around: we would have expected fewer observations in all four remaining categories. While in case of “Somewhat in favour” category the difference between expected and observed observations is less significant, a considerable difference can be observed even in its opposite category “Somewhat against”.
The same situation can be observed by using a Correlation plot drawn below. There is a strong positive association between female respondents and “Neither in favour nor against” category while for males it’s the only category with a negative association.
Overall, we can conclude that chosen variables turned out to be dependent: attitude towards the increase in taxation on fossil fuels, such as oil, gas and coal with the aim to reduce climate change depends on the gender of a respondent. In particular, females tend to choose the “Neither in favour nor against” option, staying neutral, while males prefer to choose either of two sides of the argument, still having a tendency to be against the increase of taxes, more than we would expect them to in case of variable independence.
blablabla