library(ggplot2)
library(dplyr)
library(foreign)
setwd("C:/Users/TEMP.SPB.100/Desktop")
ESS <- read.spss("ESS8FI.sav", use.value.labels = T, to.data.frame = T) 
ESS1 <- dplyr::select(ESS, c("gndr", "wrclmch", "eduyrs"))
ESS1 = na.omit(ESS1)
ESS1$eduyrs <-  as.numeric(ESS1$eduyrs)  
ESS1$wrclmch <- as.numeric(ESS1$wrclmch)

t-test conducting

Hypotheses

H0: there is no correlation between concernment about climate change the years spend on education. H1: there there is a correlation between concernment about climate change the years spend on education. In order to check these hypotheses, we used T-test. *First of all we converted variable, that measures the level of concernment about climate change into two categories: ([1,2] – «not concerned», [3,5] – «concerned».**

ESS1 = ESS1 %>% mutate(wrclmch1 = factor(wrclmch > 2,
                                         labels = c("Not concerned", "Concerned"))) 

Shapiro test

shapiro.test(ESS1$eduyrs)
## 
##  Shapiro-Wilk normality test
## 
## data:  ESS1$eduyrs
## W = 0.98535, p-value = 4.968e-13

We conducted a Shapiro-test. The H0 states that distribution is normal, H1-that it is not normal. At the result of Shapiro-test we got a p-value that is less than 0,05, so we can conclude that distribution is normal.

ggplot() +
  geom_histogram(data = ESS1, aes(x = eduyrs), binwidth = 1, fill="#008080", col="#483D8B", alpha = 0.5)+
  ggtitle("Distribution of years of full-time education completed") + 
  theme_bw()

To confirm the normality of distribution, we the used histogram. So, distribution seems to be normal, so we don’t need to find a logarithm.

The variances

var.test(eduyrs ~ wrclmch1, ESS1, 
         alternative = "two.sided")
## 
##  F test to compare two variances
## 
## data:  eduyrs by wrclmch1
## F = 0.9222, num df = 393, denom df = 1506, p-value = 0.3236
## alternative hypothesis: true ratio of variances is not equal to 1
## 95 percent confidence interval:
##  0.790961 1.083140
## sample estimates:
## ratio of variances 
##          0.9222044

And we can see that the variances are mostly equal as P-value > 0.05

t = t.test(eduyrs  ~ wrclmch1, data = ESS1, var.equal = F)
t
## 
##  Welch Two Sample t-test
## 
## data:  eduyrs by wrclmch1
## t = -6.646, df = 634.12, p-value = 6.494e-11
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
##  -1.888175 -1.026864
## sample estimates:
## mean in group Not concerned     mean in group Concerned 
##                    10.64467                    12.10219

Then we used T-test to check our hypotheses. As the result we got p-value, which is bigger than 0,05, so we tend to REJECT the H0, H1 is more true.

ggplot() +
  geom_boxplot(data = ESS1, aes(x = wrclmch1, y = eduyrs ), col = "#E52B50", fill = "#F0F8FF") + 
  ylab("Years of full-time education") + 
  ggtitle("Years of full-time education completed and Degree of concernment") + 
  theme_bw()

We created a boxplot. It provides information, that people, who spend slightly more time on education(~9-15 years) VS (~8-13 years) tend to be concerned on climate change. However, there are outlines: people who have studied for (20-25) years are not concerned, while people with 25+ years of education are concerned on climate change.

Chi-square test conducting

Hypotheses

H0: Concernment on climate change is not related with gender. H1: Concernment on climate change relates with gender.

ggplot()+
  geom_bar(data = ESS1, aes(x=wrclmch1))+
  ggtitle("Number of people who concerned about climate change and not")+ 
  theme_bw()

ggplot()+
  geom_bar(data = ESS1, aes(x=gndr))+
  ggtitle("Number of female and male")+ 
  theme_bw()

These graphics prove that we have enough observations to use Chi-square test

ch = chisq.test(ESS1$wrclmch1, ESS1$gndr)
ch
## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  ESS1$wrclmch1 and ESS1$gndr
## X-squared = 49.86, df = 1, p-value = 1.651e-12

As the result we received that p-value is less than 0,05, so we can accept H0 as more probable, so Concernment on climate change is not related with gender.

Pearson residuals

df_resid = as.data.frame(ch$residuals)
df_resid
##   ESS1.wrclmch1 ESS1.gndr      Freq
## 1 Not concerned      Male  4.480003
## 2     Concerned      Male -2.290708
## 3 Not concerned    Female -4.482360
## 4     Concerned    Female  2.291913
df_count = as.data.frame(ch$observed)
df_count
##   ESS1.wrclmch1 ESS1.gndr Freq
## 1 Not concerned      Male  260
## 2     Concerned      Male  691
## 3 Not concerned    Female  134
## 4     Concerned    Female  816
ggplot() + 
  geom_raster(data = df_resid, aes(x = ESS1.gndr, y = ESS1.wrclmch1, fill = Freq), hjust = 0.5, vjust = 0.5) + 
  scale_fill_gradient2("Pearson residuals", low = "#2166ac", mid = "#f7f7f7", high = "#b2182b", midpoint = 0) +
  geom_text(data = df_count, aes(x = ESS1.gndr, y = ESS1.wrclmch1, label = Freq)) +
  xlab("Gender") +
  ylab("Degree of concernment") +
  theme_bw()

There are significantly more male who do not concern on climate change than it is expected, and less female who also do not concern on climate change than it is expected.