Chi-squared test

Variables

Let’s begin with Chi-squared test. As two categorical variables, which are necessary to apply chi-square test, we decided to take gender and people’s attitudes towards increasing taxes on fossil fuels, such as oil, gas and coal with the aim to reduce climate change.

#Removing NAs from dataset which is used in analysis
data1 <- data[!is.na(data$inctxff),]
#Creating variables and assigning values to them
gender <- data1$gndr
att <- data1$inctxff

Here is a short description of our variables:

Gender - categorical, nominal variable. The values of variable are below.

##   Male Female 
##   1493   1321

People’s attitudes towards increasing taxes on fossil fuels - categorical, ordinal variable. The scale and values are below.

##            Strongly in favour            Somewhat in favour 
##                           241                           816 
## Neither in favour nor against              Somewhat against 
##                           668                           798 
##              Strongly against 
##                           291

Visualization and discription of data

Now, in order to make assumptions we need to visualize data regarding taken variable. Here is a stacked barplot visualizing gender composition for each point of view and bar chart which shows frequences of answers in each category by gender.

#Creating plots 
set_theme(legend.pos = "top", legend.inside = TRUE, axis.textsize = 0.8, title.align = "center")

plot1 <- sjp.xtab(att,gender, bar.pos = "stack", legend.title = "Gender", 
                  axis.titles = "Attitude towards tax increase", title = "Gender composition of taxation attitude", show.total = FALSE, margin = "row", 
                  geom.colors = (palette = "Pastel2"))

plot2 <- sjp.grpfrq(att, gender, type = "bar", legend.title = "Gender", geom.spacing = - 1, 
                    axis.titles = "Attitude towards tax increase", title = "Attitude distribution by gender", show.prc = FALSE, geom.colors = (palette = "Pastel2"))

grid.arrange(plot1, plot2, ncol=2)

As it can be seen from the graphs, the proportion of men and proportion of women in all five of the options are not equal. The only option in which the proportion of women is bigger is “Neither in favour nor against”, while the biggest difference is observed in the “Strongly against” category. The second biggest difference is observed in “Strongly in favour” category, while remaining two categories differ by around 10%. “Somewhat against” was the most popular option for men, “Neither in favour nor against” – for women. “Strongly in favour” was chosen by the fewest number of men and women.

Overall, women tend to stick to the “Neither in favour nor against” option, remaining neutral, while men are likely to express their opinion and chose a side.

Assumptions and hypotheses

We then make sure to match all necessary assumptions of chi-square test:

Data in contingency table is presented in counts (not in percent)
All cells contain more than 5 observations
Each observation contributes to one group only
Groups are independent
The variables under study are categorical
The sample is, supposedly, reasonably random

Here are hypotheses for the test:

H0: In the population, the two categorical variables (gender and taxation attitude) are independent.
H1: In the population, two categorical variables (gender and taxation attitude) are dependent.

Test

In order to check dependece of chosen variables we need to apply Pearson’s Chi-squared test. For this it is necessary to create a contingency table which contains observed frequencies.

ct<-table(gender, att)
kable(ct)

	Strongly in favour	Somewhat in favour	Neither in favour nor against	Somewhat against	Strongly against
Male	143	440	281	446	183
Female	98	376	387	352	108

Then, we run Pearson’s Chi-squared test using function chisq.test().

test <-chisq.test(ct)
test

## 
##  Pearson's Chi-squared test
## 
## data:  ct
## X-squared = 50.32, df = 4, p-value = 3.096e-10

We obtained the Chi-square statistic of 50.32 and a p-value equal to 3.096e-10 (0.0000000003096) having the degree of freedom 4. The critical value of x2 with degree of freedom 4 and significance level 0.05 is 9.49. Obtained Chi-square statistic exceeds such critical value, and p-value is a lot smaller than the significance level of 0.05, meaning that the probability to obtain the observed, or more extreme, results if the null hypothesis (H0) of a study question is true (variables are independent) is extremely low. Therefore, since we have a strong evidence of dependence between variables, we cannot accept the null hypothesis.

Since the Chi-square test statistic is significant, we would like to take a look on residuals. So, let’s create tables with expected and observed freaquences and then with residuals.

Expected frequences
	Strongly in favour	Somewhat in favour	Neither in favour nor against	Somewhat against	Strongly against
Male	127.8653	432.9382	354.4151	423.3881	154.3934
Female	113.1347	383.0618	313.5849	374.6119	136.6066

Observed frequences
	Strongly in favour	Somewhat in favour	Neither in favour nor against	Somewhat against	Strongly against
Male	143	440	281	446	183
Female	98	376	387	352	108

Residuals
	Strongly in favour	Somewhat in favour	Neither in favour nor against	Somewhat against	Strongly against
Male	2.042912	0.5878676	-6.517588	1.894942	3.548674
Female	-2.042912	-0.5878676	6.517588	-1.894942	-3.548674

According to the table, there are residuals with absolut value bigger than 2. We then proceed to draw an association plot in order to take a closer look at residuals.

We observe that in regards to male respondents all cells except for “Neither in favour nor against” category contain more observations than we would expect in case of variables independence. For women it’s the other way around: we would have expected fewer observations in all four remaining categories. While in case of “Somewhat in favour” category the difference between expected and observed observations is less significant, a considerable difference can be observed even in its opposite category “Somewhat against”.

The same situation can be observed by using a Correlation plot drawn below. There is a strong positive association between female respondents and “Neither in favour nor against” category while for males it’s the only category with a negative association.

Conclusion

Overall, we can conclude that chosen variables turned out to be dependent: attitude towards the increase in taxation on fossil fuels, such as oil, gas and coal with the aim to reduce climate change depends on the gender of a respondent. In particular, females tend to choose the “Neither in favour nor against” option, staying neutral, while males prefer to choose either of two sides of the argument, still having a tendency to be against the increase of taxes, more than we would expect them to in case of variable independence.

Project 2. Chi-squared and t-test