1 Project 1

1.1 Research question

1.1.1 How Greek people feel about religion?

We are team Люпин - Koniakhina Liza, Dydykina Julia and Kovyazina Polina and this is our first step of analysing religion in Greece - the country, that we have chosen for a deep analysis. This study aims to identify key features of Greeks’ attitudes towards religion: how religious they are, how comfortable Greece is for a religious person, which religions are the most widespread, discrimination against religions, etc., and for a full immersion in the context it is important to study some other indicators and data, such as “how important it is to follow traditions and habits”. We are interested in this study because according to many scientific data, religion significantly determines the social context, moreover, it is often the negative aspect of religion that manifests itself more vividly (discrimination of people on the basis of religious affiliation, the desire to relentlessly follow outdated ideas and habits, etc.), [3] in our study we want to find out how religion manifests itself in the context of Greek society.

1.2 Loading dataset & packages needed

library(foreign)
ESS <- read.spss("/Users/elizavetakoniakhina/Desktop/Data analysis/ESS10/ESS10.sav", use.value.labels =  T, to.data.frame = T)
library(dplyr)
library(kableExtra)
library(car)
library(ggplot2)
library(corrplot)
library(sjPlot)
#install.packages("psych")
library(psych)
#install.packages("effsize")
library(effsize)
#install.packages("apa")
library(apa)
library(effectsize)
library(DescTools)
library(sjstats)
#install.packages("pwr")
library(pwr)
#install.packages("rcompanion")
library(rcompanion)
library(rstatix)
library(ggstatsplot)
library(haven)

1.3 Clearing data

On this step we create new dataset which will include only variables, that we find suitable for the analysis and we also make less values, by specifying the variable cntry (country of the respondent) equal to Greece, because we are interested only in respondents from this country, and by these we got 2799 observations, meaning that there were 2799 respondents from Greece. Also we making ratio variables real ratio - firstly, by making them character, because in the original dataset all the variables are saved as factors and we cannot count descriptive statistics for such type of variable. We also rename some of the values of variables in order to, again, be able to count descriptive statistics for them, because in the original dataset extreme variables(like 0 and 10) were saved as phrases, so we transformed them to their numeric equivalent.

greece <- ESS %>%
  filter(cntry == "Greece") %>%
  select(rlgdnagr, rlgblge, dscrrlg, rlgblg, imptrad, rlgatnd, pray, ipfrule, wpestop, wpestopc, rlgdgr, yrbrn, agea, gndr, netustm, volunfp, freehms, nwspol, imueclt)

greece$wpestop <- as.character(greece$wpestop)
greece$wpestop[greece$wpestop == "Extremely important for democracy in general"] = "10"
greece$wpestop[greece$wpestop == "Not at all important for democracy in general"] = "0"
greece$wpestop <- as.integer(greece$wpestop)
greece$wpestopc <- as.character(greece$wpestopc)
greece$wpestopc[greece$wpestopc == "Does not apply at all"] = "0"
greece$wpestopc[greece$wpestopc == "Applies completely"] = "10"
greece$wpestopc <- as.integer(greece$wpestopc)
greece$rlgdgr <- as.character(greece$rlgdgr)
greece$rlgdgr[greece$rlgdgr == "Not at all religious"] = "0"
greece$rlgdgr[greece$rlgdgr == "Very religious"] = "10"
greece$rlgdgr <- as.integer(greece$rlgdgr)
greece$yrbrn <- as.numeric(as.character(greece$yrbrn))
greece$agea <- as.numeric(as.character(greece$agea))
greece$netustm <- as.integer(as.character(greece$netustm))
greece$agea <-as.integer(as.character(greece$agea))

greece$rlgdgr <- as.character(greece$rlgdgr)
greece$rlgdgr[greece$rlgdgr == "Not at all religious"] = "0"
greece$rlgdgr[greece$rlgdgr == "Very religious"] = "10"
greece$rlgdgr <- as.integer(greece$rlgdgr)

greece$netustm <- as.integer(as.character(greece$netustm))
greece$nwspol <- as.integer(as.character(greece$nwspol))

greece$imueclt <- as.character(greece$imueclt)
greece$imueclt[greece$imueclt == "Cultural life undermined"] = "0"
greece$imueclt[greece$imueclt == "Cultural life enriched"] = "10"
greece$imueclt <- as.integer(greece$imueclt)

1.4 Describing variable types

In this section we create table with types of variables and their meaning.

labels = c("rlgdnagr", "rlgblge", "dscrrlg", "rlgblg", "imptrad", "rlgatnd", "pray", "ipfrule", "wpestop", "wpestopc", "rlgdgr", "yrbrn", "agea", "gndr", "netustm", "volunfp", "freehms")
meaning = c("Religion or denomination belonging to at present, Greece", "Ever belonging to particular religion or denomination", "Discrimination of respondent's group: religion", "Belonging to particular religion or denomination", "Important to follow traditions and customs", "How often attend religious services apart from special occasions", "How often pray apart from at religious services", "Important to do what is told and follow rules", "The will of the people cannot be stopped", "In country the will of the people cannot be stopped", "How religious are you", "Year of birth of a respondent", "Age of a respondent",  "Gender of a respondent", "How much time does a respondent spend using Internet", "Volunteered for non-profit or charitable organization", "Gays and lesbians are free to live life as they wish")
type_of_variable = c("nominal", "binary", "binary", "binary", "ordinal", "ordinal", "ordinal", "ordinal", "ratio", "ratio", "ratio", "interval", "Interval", "Binary", "Ratio", "Binary", "Ordinal")
variable_type <- data.frame(labels, meaning, type_of_variable, stringsAsFactors = F)
kable(variable_type) %>% 
  kable_styling(bootstrap_options=c("bordered", "responsive","striped"), full_width = FALSE)

labels	meaning	type_of_variable
rlgdnagr	Religion or denomination belonging to at present, Greece	nominal
rlgblge	Ever belonging to particular religion or denomination	binary
dscrrlg	Discrimination of respondent’s group: religion	binary
rlgblg	Belonging to particular religion or denomination	binary
imptrad	Important to follow traditions and customs	ordinal
rlgatnd	How often attend religious services apart from special occasions	ordinal
pray	How often pray apart from at religious services	ordinal
ipfrule	Important to do what is told and follow rules	ordinal
wpestop	The will of the people cannot be stopped	ratio
wpestopc	In country the will of the people cannot be stopped	ratio
rlgdgr	How religious are you	ratio
yrbrn	Year of birth of a respondent	interval
agea	Age of a respondent	Interval
gndr	Gender of a respondent	Binary
netustm	How much time does a respondent spend using Internet	Ratio
volunfp	Volunteered for non-profit or charitable organization	Binary
freehms	Gays and lesbians are free to live life as they wish	Ordinal

1.5 Descriptive statistics

In this part of our work we count descriptive statistics for our ratio variables. We count mean, median and mode. First 2 statistics are counted with functions from base library and mode was counted using function Mode() from library DescTools - we have chosen this function because the base mode() function don’t work if there are some NAs in values of variable, and in our case we clearly have a lot of NAs, so we found another function in DescTools package, that knows how to deal with NAs while counting mode. The results are presented in a table where rows are - variables, and columns -statistics.

mean_rlgdgr <- mean(greece$rlgdgr, na.rm = TRUE)
median_rlgdgr <- median(greece$rlgdgr, na.rm = TRUE)
mode_rlgdgr <- Mode(greece$rlgdgr, na.rm=TRUE)
max_rlgdgr <-max(greece$rlgdgr, na.rm = TRUE)
min_rlgdgr <- min(greece$rlgdgr, na.rm = TRUE)
sd_rlgdgr <-sd(greece$rlgdgr, na.rm = TRUE)

mean_wpestopc <- mean(greece$wpestopc, na.rm = TRUE)
median_wpestopc <- median(greece$wpestopc, na.rm = TRUE)
mode_wpestopc <- Mode(greece$wpestopc, na.rm=TRUE)
max_wpestopc <-max(greece$wpestopc, na.rm = TRUE)
min_wpestopc <- min(greece$wpestopc, na.rm = TRUE)
sd_wpestopc <-sd(greece$wpestopc, na.rm = TRUE)

mean_wpestop <- mean(greece$wpestop, na.rm = TRUE)
median_wpestop <- median(greece$wpestop, na.rm = TRUE)
mode_wpestop <- Mode(greece$wpestop, na.rm=TRUE)
max_wpestop <-max(greece$wpestop, na.rm = TRUE)
min_wpestop <- min(greece$wpestop, na.rm = TRUE)
sd_wpestop <-sd(greece$wpestop, na.rm = TRUE)

mean_yrbrn <- mean(greece$yrbrn, na.rm = TRUE)
median_yrbrn <- median(greece$yrbrn, na.rm = TRUE)
mode_yrbrn <- Mode(greece$yrbrn, na.rm=TRUE)
max_yrbrn <-max(greece$yrbrn, na.rm = TRUE)
min_yrbrn <- min(greece$yrbrn, na.rm = TRUE)
sd_yrbrn <-sd(greece$yrbrn, na.rm = TRUE)


Mean = c(mean_rlgdgr, mean_wpestopc, mean_wpestop, mean_yrbrn)
Median = c(median_rlgdgr, median_wpestopc, median_wpestop, median_yrbrn)
Mode = c(mode_rlgdgr, mode_wpestopc, mode_wpestop, mode_yrbrn)
Max = c(max_rlgdgr, max_wpestopc, max_wpestop, max_yrbrn)
Min = c(min_rlgdgr, min_wpestopc, min_wpestop, min_yrbrn)
Sd = c(sd_rlgdgr, sd_wpestopc, sd_wpestop, sd_yrbrn)
results <- data.frame(Mean,Median,Mode, Max, Min, Sd, stringsAsFactors = F)
rownames(results) <- c("rlgdgr", "wpestopc", "wpestop", "yrbrn")
kable(results) %>% 
  kable_styling(bootstrap_options=c("bordered", "responsive","striped"), full_width = FALSE)

	Mean	Median	Mode	Max	Min	Sd
rlgdgr	6.294181	7	7	10	0	2.327025
wpestopc	4.423203	4	5	10	0	2.388620
wpestop	8.627204	9	10	10	0	1.424331
yrbrn	1971.366485	1971	1960	2006	1933	16.966668

In the table with descriptive statistics, we see results for several of our variables. Let’s take a closer look at them. Let’s start with the variable responsible for respondents’ religiosity. Looking at the data obtained by descriptive statistics, we can say that our respondents in Greece are quite religious, but not fanatical, as indicated by the mean, median and mode, which are approximately the same. Moving on to the next variable, looking at the results, we can say that the respondents believe that the will of the people is more likely still to be stopped by the state, but at the same time their position is not firm enough, as the results range from 4 to 5 A similar variable concerning democracy and the fact that the will of the people cannot be stopped in principle. The results here are quite different. Almost all respondents believe that this is incredibly important for democracy, and the very small standard deviation tells us about it, which means that, indeed, there was an almost unanimous answer to this question. As for age, we can see that people of very different generations were interviewed, the difference between the minimum and maximum is 46 years. Judging by the results of descriptive statistics, we can conclude that the survey was conducted on average by older people around 30 years old

1.6 Graphs

In this part of our project we create several graphs in order to see some dependence or just the distribution of variables. Under each graph the description of what’s going on is provided.

1.6.1 Scatter plot

greece %>%
  filter(!is.na(wpestop), !is.na(rlgdgr)) %>%
  ggplot()+
  geom_point(aes(x = rlgdgr, y =wpestop), color = "#619EFD")+
  labs(title = "Relationship between the religiosity of people and their attitude towards\nthe statement 'the will of people cannot be stopped'", y = "Do you agree with the statement?", x = "religiosity")+
  scale_x_continuous(breaks = c(0,1,2,3,4,5,6,7,8,9,10))+
  theme_minimal()

1.6.1.1 Description:

In this graph, we investigated whether people with varying degrees of religiosity agree that people’s wills cannot be stopped. Most of the respondents agreed with this statement, but all of them had different degrees of religiosity. However, on the graph we can also notice some outliers, for example, a person who strongly disagrees with the given statement, but is quite religious. Further we can check and analyse their relationship using statistical tests.

1.6.2 Histogram for religiosity

greece %>%
  filter(!is.na(rlgdgr)) %>%
  ggplot(aes(x= rlgdgr))+
  geom_histogram(aes(y = ..density..), fill = "lightgreen", binwidth = 1)+
  geom_density(alpha=0.5, fill="#619EFD")+
  labs(title = "Frequency of people's level of religiosity", x = "Religiosity", y = "Frequency")+
  geom_vline(aes(xintercept=mean(rlgdgr, na.rm = T), color="mean"), size=1)+
  geom_vline(aes(xintercept=median(rlgdgr, na.rm = T), color="median"), size=1)+
  geom_vline(aes(xintercept=Mode(rlgdgr, na.rm = T), color="mode"), linetype = "dashed", size=1)+
  scale_color_manual(name = "Statistic:", values = c(mean = "blue", median = "coral", mode = "black"))+
  theme_minimal()

1.6.2.1 Description:

This graph provides information on the prevalence of religiosity in Greece. Analyzing the graph, we get that Greece can be called quite a religious country, as the bulk of people are beyond the mark of “5” points, that is, they evaluate themselves above average, but we can not say that the religiosity of Greeks is fanatical, as after “8” points people are much less. Let’s conclude that Greece is quite a religious country. According to research about Greece and religiosity: “it is a country where the links between religion and national identity and church and state are strong enough to cloud expectations of changes at the nexus of religion and human rights”[1], thus religion is quite important in Greece. We also decided to put a density graph over the histogram for a better representation, and also placed descriptive statistics counted for these variable

The distribution of this variable is seems to be quite normal, but skewed to the left(this is seen both from the graph and from the comparison of mean and median(mean is smaller than the median))

1.6.3 Histogram for the year of birth of respondent

greece %>%
  filter(!is.na(yrbrn)) %>%
  ggplot(aes(x= yrbrn))+
  geom_histogram(aes(y = ..density..), fill = "lightgreen", binwidth = 1)+
  geom_density(alpha=0.5, fill="#619EFD")+
  geom_vline(aes(xintercept=mean(yrbrn, na.rm = T), color="mean"), size=1)+
  geom_vline(aes(xintercept=median(yrbrn, na.rm = T), color="median"), size=1)+
  geom_vline(aes(xintercept=Mode(yrbrn, na.rm = T), color="mode"), linetype = "dashed", size=1)+
  scale_color_manual(name = "Statistic:", values = c(mean = "blue", median = "coral", mode = "black"))+
  theme_minimal()+
  labs(title = "Frequency of variable 'year of birth of a respondent'", x = "Year of birth")

1.6.3.1 Description:

From this histogram it is actualy pretty hard to tell if this variable is normally distributed, for now we can only suppose that it is normal, but we will check the normality of our data in further projects, however we can for sure tell that there is no skew - the mean and the median are the same.

1.6.4 Histogram for the variable “wpestop”

greece %>%
  filter(!is.na(wpestop)) %>%
  ggplot(aes(x= wpestop))+
  geom_histogram( fill = "lightgreen", binwidth = 1)+
  geom_vline(aes(xintercept=mean(wpestop, na.rm = T), color="mean"), size=1)+
  geom_vline(aes(xintercept=median(wpestop, na.rm = T), color="median"), size=1)+
  geom_vline(aes(xintercept=Mode(wpestop, na.rm = T), color="mode"), linetype = "dashed", size=1)+
  scale_color_manual(name = "Statistic:", values = c(mean = "blue", median = "coral", mode = "black"))+
  theme_minimal()+
  labs(title = "Frequency of occurence of variable wpestop")

1.6.4.1 Description

Here, actually, we would say that it is still hard to tell whether the variable distributed normally, however we can tell that it is skewed to the left, meaning that there are some people who strongly dissagre with the statement about the will of a person.

1.6.5 Histogram for frequency of variable “wpestopc”

greece %>%
  filter(!is.na(wpestopc)) %>%
  ggplot(aes(x= wpestopc))+
  geom_histogram( fill = "lightgreen", binwidth = 1)+
  geom_vline(aes(xintercept=mean(wpestopc, na.rm = T), color="mean"), size=1)+
  geom_vline(aes(xintercept=median(wpestopc, na.rm = T), color="median"), size=1)+
  geom_vline(aes(xintercept=Mode(wpestopc, na.rm = T), color="mode"), linetype = "dashed", size=1)+
  scale_color_manual(name = "Statistic:", values = c(mean = "blue", median = "coral", mode = "black"))+
  theme_minimal()+
  labs(title = "Frequency of occurence of variable wpestopc")

1.6.5.1 Description

In this case it is also hard to clearly state whether the data is normally distributed, but for now we will assume its normality and also check it later with the proper tests, but we can clearly say that the data is skewed to the right(mean of the variable is bigger than the median)

1.6.6 Bar chart

greece %>%
  filter(!is.na(rlgdnagr)) %>%
  ggplot()+
  geom_bar(aes(x= rlgdnagr, fill = rlgdnagr))+
  coord_flip() +
  labs(title = "The distribution of different religions\namong Greek people", y = "Number of people", x = "Religions")+
  scale_fill_brewer(palette = "PuBuGn")+
  theme_minimal()+
  theme(legend.position = "none")

1.6.6.1 Description:

The following graph clearly shows which religion prevails. In this case, Greek Orthodoxy clearly outweighs the volume from the other religions, as it is the state religion

1.6.7 Boxplot

greece %>%
  filter(!is.na(imptrad), !is.na(rlgdgr)) %>%
  ggplot()+
  geom_boxplot(aes(x= imptrad, y = rlgdgr, fill = imptrad))+
  labs(title = "Religiosity of people according to their attitude towards\npreserving traditions and customs from religion or family", x = "Attitude towards preserving traditions",  y= "Level of religiosity")+
  scale_fill_brewer(palette = "PuBuGn")+
  theme_minimal()+
  theme(legend.position = "none")

1.6.7.1 Description:

In this graph we can see quite diverse situation for different groups. First one is medians - they are all different among all groups. The lowest median is in the group of those who do not preserve traditions(which is quite predictable - religiosity is in some terms about preserving traditions, so people with low religiosity level will not preserve traditions) and the highest median in group of where people preserve traditions(which is also expected). However what is also quite interesting is the range in group of people who do not preserve traditions - despite the low median level of religiosity in this group, the range is quite big - most of the values are between 0 and somewhere around 7 points, so we can state that for some people preserving traditions not only about religion and even religious people may not preserve traditions.

1.6.8 Stacked bar chart

greece %>%
  filter(!is.na(dscrrlg), !is.na(rlgdnagr)) %>%
  ggplot()+
  geom_bar(aes(x = rlgdnagr, fill = dscrrlg), position = "fill")+
  coord_flip()+
  labs(title = "Descrimination of people\ndepending on their religion", x = "Religions", y = "Proportion of people", fill = "Presence of discrimination")+
  scale_fill_brewer(palette = "PuBuGn")+
  theme_minimal()

1.6.8.1 Description:

In Greece according to our research there are indeed manifestations of discrimination against other religions, mainly the religions that are discriminated against differ significantly from the state religion of Greece, for example, consider Islam, which has a pronounced percentage of discrimination - Greeks emphasize the differences in culture between people who practice Islam and people who practice Greek Orthodoxy, for example, according to the research: “words such as ‘burqa’, ‘Islamic veil’ and ‘jihad’ create a negative impression among 67%, 62% and 53% of respondents respectively”[2] and Greeks consider islam as more dangerous religion: “1 in 2 (51%) Greeks believe that Islam leads more easily to violence than other religions, including Christianity” [2].

1.7 Summary

In this project our team made a first step into analysing the topic of religion in Greece. We took a look at some variables, which we found relevant for our study and even found some interesting things(like the one we discussed in the description of scatter plot) that we think we will test with different statistical tests later.

1.8 References:

1 - Giuseppe, Giordan Global Eastern Orthodoxy / Giordan Giuseppe, Zrinščak Siniša. — 1. — Switzerland : Springer Nature Switzerland AG, 2020. — 264 c.(https://doi.org/10.1007/978-3-030-28687-3)

2 - Greek public opinion on Islam and construction of a mosque in Athens. — Текст : электронный // Public Issue : [сайт]. — URL: https://www.publicissue.gr/en/wp-content/uploads/2010/12/islam-2010.pdf

3 - Dr, S. N. Religion and Its Role in Society / S. N. Dr. — Текст : непосредственный // IOSR Journal Of Humanities And Social Science. — 2015. — № Volume 20, Issue 11, Ver. IV (Nov. 2015). — С. 82-85.

2 Project 2

2.1 Chi-square test

2.1.1 Intro

We will begin with the chi-square test. With this statistical test we want to analyse whether there is or there no some kind of relationship between the variables rlgblg(whether a person belongs to a certain religion) and variable volunfp(whether a person has volunteered for non profit or charitable organization lately). This 2 variables and the presence or absence of relationship between them are of high interest to us because there are studies on the connection or lack thereof between religiosity and volunteerism, however, studies more often say that there is no direct connection between belonging to a religion and participating in volunteer activities. According to one study [1], there is relationship between religion and volunteerism and it is even indirect and quite complex, or rather, religious people are not always engaged in volunteer activities.

2.1.2 Checking assumptions

levels(greece$rlgblg)

## [1] "Yes" "No"

levels(greece$volunfp)

## [1] "Yes" "No"

The first assumption in chi-square is that both variables should be categorical. With the above functions, we can see that indeed our two variables are categories.

After this, we are going to create a contingency table with our 2 variables, where rows are levels of variable rlgblg(whether a person belongs to a certain religion) and columns are levels of variable volunfp(whether a person volunteered for charitable organization) in order to check the assumption of frequency of data in cells.

table(greece$rlgblg, greece$volunfp) %>%
  kable()

	Yes	No
Yes	183	2339
No	46	190

We see that we have enough data in cells to conduct chi-square test

One more assumption - independence of variables. As we now, ESS do not conduct surveys before and after some event/experiment, thus we can assume that each observation in our variables correspond to different people.

Another assumption is mutually exclusiveness of data. This means that each subject fit only into one level of each variable. As we saw above, each of our variable has only 2 levels which are completely different from each other, so one cannot include another into it. The data is mutually exclusive.

2.1.3 Chi-square test & hypotheses

Let’s formulate our null and alternative hypotheses:

H0: The observed and the expected frequencies of variables rlgblg and volunfp are the same. So there is no relationship between variables rlgblg and volunf.

HA: There is a statistically significant difference between the observed and expected frequencies of variables rlgblg and volunfp.

Now let’s run a chi-square test and see what we can say about the relationship between these 2 variables

chisq.test(greece$rlgblg, greece$volunfp)

## 
##  Pearson's Chi-squared test with Yates' continuity correction
## 
## data:  greece$rlgblg and greece$volunfp
## X-squared = 40.841, df = 1, p-value = 1.651e-10

2.1.4 Inspecting expected values

So, after running a chi-square we receive a very small p-value, meaning, we need to reject the null hypothesis. But let’s firstly check whether the assumption about the minimum number of expected values(5) in each cell is met to understand whether we can trust our test or not.

chisq.test(greece$rlgblg, greece$volunfp)$expected %>%
  kable()

	Yes	No
Yes	209.40464	2312.5954
No	19.59536	216.4046

Each cell was expected to contain more than 5 observations, thus we can trust our test and we are moving further to see what categories made the most impact. To do this, we should see the standardized residuals.

2.2 Inspecting standardized residuals

chisq.test(greece$rlgblg, greece$volunfp)$stdres %>%
  kable()

	Yes	No
Yes	-6.514048	6.514048
No	6.514048	-6.514048

So, in our case each of the cell made a huge impact on the results of our test, meaning:

we have much less people who belong to any religion and volunteer than expected
we have much more people who belong to some religion and do not volunteer than expected
we have much more people who do not belong to any religion and volunteer than expected
we have much less people who do not belong to any religion and do not volunteer than expected

Here we reject the null hypothesis and assert that there is an inverse relationship between the variable a believer or not and whether he volunteers or not, here we can say that religious people volunteer less than we expected, which means that volunteering is not particularly common among Orthodox people in Greece (the main religion of this country). The number of people who do not belong to any religion and do not volunteer is less than those who belong to religion and volunteer, which means that here we must take into account the factor of religion and its relationship with participation in volunteer activities, since we rejected the null hypothesis, we accept an alternative hypothesis that there is the relationship between faith and volunteering, and this relationship is indirect, which means that religious people volunteered less than expected, and not religious people more than expected and correspondingly more than religious people. This can be explained by one of the studies, which explained that very few religions (and according to the above study, only the Catholic Church) draw the attention of parishioners to additional voluntary activities in the form of volunteerism, since the Orthodox Church prevails in Greece, we understand why religious people are not particularly[1]

2.2.1 Plot

Finally let’s plot our variables and see how different the occurense is:

plot_xtab(greece$rlgblg, greece$volunfp, 
          type = "bar", 
          margin = "row", 
          bar.pos = "stack", 
          legend.title = "Volunteering", 
          axis.titles = c("Belonging to religion", ""))

From the plot we can see that, first of all, there are more people who identify themselves as belonging to any religion and in each group there are more people who do not volunteer at all no matter their belonging to religion, thus we can state that there is indeed some difference between these 2 variables and there are some reationships.

2.3 T-test

Moving to t-test! For this statistical test we have decided to check whether there is a statistical difference between the level of religiosity(variable rlgdgr) and the gender of a respondent(variable gndr). We have decided to check this dependence because there are also many studies on the issue of the differences in the degree of religiosity between males and females and, according to the articles articles, it was suggested that women are more prone to safety, the presence of parental instincts, etc. [2], it allows us to suggest again that women will be more religious than men and we want to see whether we are correct in our assumptions, so we are conducting a t-test.

2.3.1 Skew and kurtosis

So firstly let’s look at our data and see whether we can, first of all, tell something about its normality. Let’s use function describeBy and look on skew and kurtosis of our variables as these 2 indicators will tell us whether there are problems with normality of our data.

describeBy(greece$rlgdgr, group = greece$gndr, mat = TRUE) %>%
  select(Var = group1, N = n, Mean = mean, SD = sd, Median = median, Min = min, Max = max, 
                Skew = skew, Kurtosis = kurtosis, st.error = se) %>% 
  kable()

	Var	N	Mean	SD	Median	Min	Max	Skew	Kurtosis	st.error
X11	Male	1329	5.851768	2.420389	6	0	10	-0.6140330	0.1836024	0.0663931
X12	Female	1455	6.698282	2.161395	7	0	10	-0.8028286	1.0211574	0.0566634

In general, we can tell that firstly for female group we have higher values of skew and kurtosis, but for both groups these values are within the normality allowable range(+-2 for both). Let’s construct the histogram of variables.

2.3.2 Histogram

ggplot(greece, aes(x = rlgdgr, fill = gndr))+
  geom_histogram(position = "identity", alpha = 0.3, binwidth = 1) +
  theme_minimal()+
  labs(x = "Degree of religiosity", y = "Frequencies", title = "Histogram for religiosity and gender", fill = "Gender")

So from our histogram we actually can tell that yes both female and male groups have left skewed distribution and female group has a bigger value of kurtosis, but in general the data is quite normal, considering that we have around 3000 values and more than 300 in each group, so we can apply central limit theorem here and, basically, ignore the non-normality in this case, even if it is seems to be not that bad.

2.3.3 Boxplot and homogeneity of variances

Let’s now create the boxplot and see whether the distribution of the variances is the same.

ggplot(greece, aes(x = gndr, y = rlgdgr))+
  geom_boxplot()+
  labs(title = "Gender & level of religiosity", x = "Gender", y = "Religiosity degree")+
  theme_minimal()

From the boxplot we can see that in fact variances are not really equal, so let’s now check it more formally with 2 tests - Bartlett and Levene. In both these test the H0 says that the variances in groups are the same and don’t differ statistically, thus, the alternative hypothesis states that there is statistically significant difference between the variances.

bartlett.test(greece$rlgdgr ~ greece$gndr)

## 
##  Bartlett test of homogeneity of variances
## 
## data:  greece$rlgdgr by greece$gndr
## Bartlett's K-squared = 17.796, df = 1, p-value = 2.458e-05

leveneTest(greece$rlgdgr ~ greece$gndr)

## Levene's Test for Homogeneity of Variance (center = median)
##         Df F value    Pr(>F)    
## group    1  17.938 2.357e-05 ***
##       2782                      
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

In both tests p-value is really small, so this makes us reject the null hypothesis and accept the alternative, thus, the variances in groups are statistically different.

2.3.4 Results of explanatory analysis

So, by far we’ve got:

the distributions of data in groups are pretty normal. And because we have more than 300 values, according to the central limit theorem, we can ignore the non-normality of our data.
the variances are not equal
we want to test 2 independent variables, thus t-test for independent samples should be applied

2.3.5 Hypotheses and t-test

Let’s run the t-test! We want to test whether the difference in means is less than 0, so we put attribute - alternative = “less” We decided ti add the attribute less, because we found the article which states that women are more religious than men: “Preliminary analyses showed that sex was significantly related to the three religiosity variables (church attendance, frequency of prayer, belief salience), with women being more religious than men. Consistent with previous research, correlations suggested that church attendance and belief salience were associated with better life satisfaction”[4]

H0: the difference between means in female and male groups of variable rlgdgr(degree of religiosity) is 0

HA: the difference between the means in group male and group female is less than 0

t.test(greece$rlgdgr ~ greece$gndr, var.equal = FALSE, alternative = "less")

## 
##  Welch Two Sample t-test
## 
## data:  greece$rlgdgr by greece$gndr
## t = -9.6982, df = 2672.3, p-value < 2.2e-16
## alternative hypothesis: true difference in means between group Male and group Female is less than 0
## 95 percent confidence interval:
##        -Inf -0.7028917
## sample estimates:
##   mean in group Male mean in group Female 
##             5.851768             6.698282

We received a very small p-value, which means that we have to reject our null hypothesis and state that the difference in religiosity levels between males and females is really less than 0, thus, we have to conclude that women are more religious than men. But our distribution of variables was not really normal - let’s double check the results of t-test with non-parametric test.

2.3.6 Double-checking with non-parametric test

wilcox.test(greece$rlgdgr ~ greece$gndr, data = greece)

## 
##  Wilcoxon rank sum test with continuity correction
## 
## data:  greece$rlgdgr by greece$gndr
## W = 764298, p-value < 2.2e-16
## alternative hypothesis: true location shift is not equal to 0

The results are the same - p-value is really small, thus there is the difference in means and this difference is statistically significant. But how large does this difference is? Let’s calculate the effect size.

2.3.7 Effect size

psych::cohen.d(greece$rlgdgr, greece$gndr)

## Call: psych::cohen.d(x = greece$rlgdgr, group = greece$gndr)
## Cohen d statistic of difference between two means
##      lower effect upper
## [1,]  0.29   0.37  0.44
## 
## Multivariate (Mahalanobis) distance between groups
## [1] 0.37
## r equivalent of difference between two means
## data 
## 0.18

interpret_cohens_d(0.37, rules = "cohen1988")

## [1] "small"
## (Rules: cohen1988)

We got the value of Cohen’s d equal to 0.37 - according to Cohen’s interpretation this is considered to be small affect, however, we are social scientists, so 0.37 is close to 0.4 which can be consireder a medium size and it indeed means something, thus, we can state that yes, there is a statistically significant difference between the religiosity of male and female and it is not that small!

2.4 Anova

Next test that we want to run is Anova. Here we want to see whether people of different age(variable agea) pray differently apart from religious services(variable pray). We decided to test these variables because there are some research[3] about age and practices of private prayers, which shows that with aging people tend to more pray apart from religious services, so we want to see whether this statement can be applied to people in Greece.

First of all, let’s check some of the anova assumptions, firstly let’s look on the variables themselves:

describeBy(greece$agea, greece$pray, mat = TRUE) %>%
    select(Var = group1, N = n, Mean = mean, SD = sd, Median = median, Min = min, Max = max, 
                Skew = skew, Kurtosis = kurtosis, st.error = se) %>% 
  kable()

	Var	N	Mean	SD	Median	Min	Max	Skew	Kurtosis	st.error
X11	Every day	787	55.70648	16.74465	56	16	89	-0.1240224	-0.8961411	0.5968822
X12	More than once a week	517	52.45068	16.79886	52	15	89	-0.0295332	-0.7750071	0.7388128
X13	Once a week	277	51.67870	16.60806	51	18	89	-0.0018454	-0.9178224	0.9978814
X14	At least once a month	259	46.21236	15.62389	45	18	89	0.2561993	-0.4463882	0.9708212
X15	Only on special holy days	217	44.81567	16.53260	43	17	89	0.4110129	-0.5719453	1.1223062
X16	Less often	478	45.45607	15.77054	45	17	89	0.1571551	-0.6515749	0.7213275
X17	Never	171	45.67836	16.32993	45	16	81	0.1758495	-0.8626348	1.2487808

So from the table with some statistics we again are interested in the skew and kurtosis - they are in the range between -2 and 2 which indicates that the data presented is pretty normal. Let’s look now at the variances of variables.

2.4.1 Boxplot and homogeneity of variances

greece%>%
  filter(!is.na(pray)) %>%
  ggplot(aes(x = pray, y = agea))+
  geom_boxplot()+
  theme_minimal()+
  theme(axis.text.x = element_text(angle=25))+
  labs(x = "Pray", y = "Age of respondent", title = "Boxplot for age and how often\npray apart from religious services")

From the boxplot we can see that people who pray every day are mostly more elderly people, while those who pray only on special holy days are mostly the youngest. Let’s check the homogeneity of variances with 2 tests.

bartlett.test(greece$agea ~ greece$pray)

## 
##  Bartlett test of homogeneity of variances
## 
## data:  greece$agea by greece$pray
## Bartlett's K-squared = 3.999, df = 6, p-value = 0.6768

leveneTest(greece$agea ~ greece$pray)

## Levene's Test for Homogeneity of Variance (center = median)
##         Df F value Pr(>F)
## group    6  1.0756 0.3746
##       2699

In both cases we got very high p-value, which means we cannot reject the null hypothesis and thus, makes us understand that our variances are equal. Now it is time for F-test.

2.4.2 F-test

H0: there is no difference in means of variable agea between the groups of variable pray.

HA: there is at least one pair with statistically significant difference in means.

oneway.test(greece$agea ~ greece$pray, var.equal = TRUE)

## 
##  One-way analysis of means
## 
## data:  greece$agea and greece$pray
## F = 31.871, num df = 6, denom df = 2699, p-value < 2.2e-16

aov(greece$agea ~ greece$pray) -> anov
summary(anov)

##               Df Sum Sq Mean Sq F value Pr(>F)    
## greece$pray    6  51587    8598   31.87 <2e-16 ***
## Residuals   2699 728111     270                   
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 93 observations deleted due to missingness

In both cases we receive the same result - very small p-value, which means we have to reject the null hypothesis, which means that there is at least 1 pair of groups whose means are different. But firstly, let’s check the normality of residuals.

2.4.3 Normality of residuals

plot(anov, 2)

anova_res <- residuals(object = anov)
describe(anova_res) %>%
  kable()

	vars	n	mean	sd	median	trimmed	mad	min	max	range	skew	kurtosis	se
X1	1	2706	0	16.40648	-0.450677	-0.0952724	18.93573	-39.70648	44.18433	83.89081	0.0459171	-0.7565277	0.3153925

Despite the fact, that the qq-plot shows some problems on the both ends, the vast majority of residuals are placed along the line, and looking at the skew and kurtosis we also can tell that they are both smaller than 2, which indicated pretty normal distribution.

2.4.4 Post-hoc test

Now as we agreed that the distribution of residuals is normal, we want to see which exactly groups impact to the significant difference the most. For this we have to perform the post-hoc test. Before we understood that our variances are equal, thus in our case we have to use Tukey’s post-hoc test.

par(mar = c(5, 19, 3, 0))
TukeyHSD(anov)->post.hoc
plot(post.hoc, las = 2)

From the post-hoc Tukey’s test we see that many pairs show statistical significance, but we will state those that do not have statistical significance - these are those intervals that cross the dashed line:

once a week/more than once a week
only on special holy days/at least once a month
less often/at least once a month
never/ at least once a month
less often/only on special holy days
never/only on special holy days
never/less often

And these are the pairs that have statistically significant difference between the age of people, however we cannot clearly state clearly in what direction the difference is(for example, the age of people who never pray more than th age of people who pray once a week or vice verca), because for this we will have to see the absolute means of groups, which anova doesn’t show. - More than once a week-Every day
- Once a week-Every day
- At least once a month-Every day
- Only on special holy days-Every day
- Less often-Every day
- Never-Every day - At least once a month-More than once a week
- Only on special holy days-More than once a week
- Less often-More than once a week
- Never-More than once a week
- At least once a month-Once a week
- Only on special holy days-Once a week
- Less often-Once a week
- Never-Once a week

To be sure, that our test is correct we need to run the non-parametric test.

2.4.5 Non-parametric test

Let’s run the Kruskal-Wallis test.

H0: the mean ranks of the groups of variable pray are the same

HA: the mean ranks of groups are different

kruskal.test(greece$agea, greece$pray)

## 
##  Kruskal-Wallis rank sum test
## 
## data:  greece$agea and greece$pray
## Kruskal-Wallis chi-squared = 171.53, df = 6, p-value < 2.2e-16

The p-value is really small, thus we have to reject our null hypothesis, which means that our non-parametric test shows the same results. Let’s calculate non-paramentic post-hoc tets.

2.4.6 Non-parametric post-hoc test

DunnTest(agea ~ pray, data = greece)

## 
##  Dunn's test of multiple comparisons using rank sums : holm  
## 
##                                                 mean.rank.diff    pval    
## More than once a week-Every day                     -144.54817 0.00973 ** 
## Once a week-Every day                               -177.94681 0.00973 ** 
## At least once a month-Every day                     -429.61735 3.1e-13 ***
## Only on special holy days-Every day                 -497.74128 1.9e-15 ***
## Less often-Every day                                -457.36719 < 2e-16 ***
## Never-Every day                                     -446.61575 2.2e-10 ***
## Once a week-More than once a week                    -33.39864 1.00000    
## At least once a month-More than once a week         -285.06918 2.5e-05 ***
## Only on special holy days-More than once a week     -353.19310 3.6e-07 ***
## Less often-More than once a week                    -312.81901 4.7e-09 ***
## Never-More than once a week                         -302.06757 0.00014 ***
## At least once a month-Once a week                   -251.67054 0.00213 ** 
## Only on special holy days-Once a week               -319.79447 8.2e-05 ***
## Less often-Once a week                              -279.42037 3.0e-05 ***
## Never-Once a week                                   -268.66894 0.00406 ** 
## Only on special holy days-At least once a month      -68.12393 1.00000    
## Less often-At least once a month                     -27.74983 1.00000    
## Never-At least once a month                          -16.99840 1.00000    
## Less often-Only on special holy days                  40.37409 1.00000    
## Never-Only on special holy days                       51.12553 1.00000    
## Never-Less often                                      10.75144 1.00000    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Here we can see the same pairs that do not have statistically significant difference and those that have this difference, thus our non-parametric test shows the same as our Tukey post-hoc test.

2.4.7 Effect size

The last thing that is left to calculate is the effect sizes for both parametric and non-paramentic anovas. For parametric anova the effect size is calculated with the function anova_stats:

anova_stats(anov)

## term        |   df |     sumsq |   meansq | statistic | p.value | etasq | partial.etasq | omegasq | partial.omegasq | epsilonsq | cohens.f | power
## --------------------------------------------------------------------------------------------------------------------------------------------------
## greece$pray |    6 | 51587.086 | 8597.848 |    31.871 |  < .001 | 0.066 |         0.066 |   0.064 |           0.064 |     0.064 |    0.266 |     1
## Residuals   | 2699 | 7.281e+05 |  269.771 |           |         |       |               |         |                 |           |          |

From all these values we need to check the omega squared - 0.064 means the moderate size of the effect, now let’s calculate the effect size for the non-parametric test, we will use function kruskal_effsize

kruskal_effsize(greece, pray ~ agea)

## # A tibble: 1 × 5
##   .y.       n effsize method  magnitude
## * <chr> <int>   <dbl> <chr>   <ord>    
## 1 pray   2799  0.0658 eta2[H] moderate

Here we also got pretty the same value of the effect size, which indicates the moderate level of effect. Thus, we can understand that yes, there is a statistically significant difference between the groups, but in general it is not of very high effect.

2.5 Conclusions

After performing the chi-square test, we found that there is indeed some relationship between belonging to any religion and volunteering in non-profitable organization
After conducting a t-test we found that there is a difference between the level of religiosity and the gender of a person - females are more religious than males
After conducting anova we understood that there is a statistically significant age difference between pairs formed by the frequency of praying apart from religious services(variable pray).

2.6 References:

1 — Wilson, John and Janoski, Thomas, “The Contribution of Religion to Volunteer Work” (1995). Special Topics, General. 57. https://digitalcommons.unomaha.edu/slcestgen/57

2 - Modiri, F., & Azadarmaki, T. (2013). Gender and Religiosity. Journal of Applied Sociology, 24(3), 1-14.

3 - Kyriakos S. Markides, Aging, Religiosity, and Adjustment: A Longitudinal Analysis, Journal of Gerontology, Volume 38, Issue 5, September 1983, Pages 621–625, https://doi.org/10.1093/geronj/38.5.621

4 - Leondari, A., & Gialamas, V. (2009). Religiosity and psychological well‐being. International Journal of Psychology, 44(4), 241–248. https://doi.org/10.1080/00207590701700529

3 Project 3

3.1 Introduction + choice of variables

Hello! This is team Люпин and this is our third project on Data analysis course. In this project, we want to analyze religious people’s views on social issues and see how this relationship is connected with some other factors (particularly, volunfp). There are a few articles suggesting that, in Greece, religious people have strong opinions on the matter of immigration [1], so we decided to choose the opinion on this social issue as our main outcome variable. We would also like to explore the use of mass information devices as a factor in forming of these opinions, so we decided to include netustm and nwspol variables along with the rlgdgr variable. We also decided to look into the volunfp variable, since it is our suggestion that people who volunteer will have a better attitude toward migrants as they might be more kind and understanding of their problems.

So, our research question is:

How does religiosity correspond with the opinions of Greek people on the social issue of migration?

Our research hypotheses would be:

The amount of time the person watches political news corresponds with their attitude towards migrants.

The amount of time the person spends on the internet corresponds with their attitude towards migrants.

The more religious the person is, the less they think in favor of migrants.

People who volunteer tend to think in favor of migrants more than people who don’t.

3.2 Describing variables

In this section we create table with types of variables and their meaning.

labels2 = c("rlgdgr", "nwspol", "netustm", "imueclt", "volunfp")
meaning2 = c("How religious are you", "News about politics and current affairs, watching, reading or listening, in minute", "Internet use, how much time on typical day, in minutes", "Country's cultural life undermined or enriched by immigrants", "Volunteered for non-profit or charitable organization")
type_of_variable2 = c("quasi-interval", "Ratio", "Ratio", "quasi-interval", "Binary")
variable_type <- data.frame(labels2, meaning2, type_of_variable2, stringsAsFactors = F)
kable(variable_type) %>% 
  kable_styling(bootstrap_options=c("bordered", "responsive","striped"), full_width = FALSE)

labels2	meaning2	type_of_variable2
rlgdgr	How religious are you	quasi-interval
nwspol	News about politics and current affairs, watching, reading or listening, in minute	Ratio
netustm	Internet use, how much time on typical day, in minutes	Ratio
imueclt	Country’s cultural life undermined or enriched by immigrants	quasi-interval
volunfp	Volunteered for non-profit or charitable organization	Binary

Then, we decided to explore the distribution of the variables and their mean, median and mode.

hist(greece$imueclt, col = "#619EFD", main = "The distribution of the imueclt variable")

hist(greece$rlgdgr, col = "#619EFD", main = "The distribution of the rlgdgr variable")

hist(greece$nwspol, col = "#619EFD", main = "The distribution of the nwspol variable")

hist(greece$netustm, col = "#619EFD", main = "The distribution of the netustm variable")

We can see that, although all the variable distributions resemble a bell shape, some of them (netustm and nwspol) have really big right tails, so they are more skewed and more non-normal than others.

Then, we look at the descriptive statistics.

mean_imueclt <- mean(greece$imueclt, na.rm = TRUE)
median_imueclt <- median(greece$imueclt, na.rm = TRUE)
mode_imueclt <- Mode(greece$imueclt, na.rm=TRUE)

mean_rlgdgr <- mean(greece$rlgdgr, na.rm = TRUE)
median_rlgdgr <- median(greece$rlgdgr, na.rm = TRUE)
mode_rlgdgr <- Mode(greece$rlgdgr, na.rm=TRUE)

mean_nwspol <- mean(greece$nwspol, na.rm = TRUE)
median_nwspol <- median(greece$nwspol, na.rm = TRUE)
mode_nwspol <- Mode(greece$nwspol, na.rm=TRUE)

mean_netustm <- mean(greece$netustm, na.rm = TRUE)
median_netustm <- median(greece$netustm, na.rm = TRUE)
mode_netustm <- Mode(greece$netustm, na.rm=TRUE)

Mean = c(mean_imueclt, mean_rlgdgr, mean_nwspol, mean_netustm)
Median = c(median_imueclt, median_rlgdgr, median_nwspol, median_netustm)
Mode = c(mode_imueclt, mode_rlgdgr, mode_nwspol, mode_netustm)
results <- data.frame(Mean,Median,Mode, stringsAsFactors = F)
rownames(results) <- c("imueclt", "rlgdgr", "nwspol", "netustm")
kable(results) %>% 
  kable_styling(bootstrap_options=c("bordered", "responsive","striped"), full_width = FALSE)

	Mean	Median	Mode
imueclt	4.186865	4	5
rlgdgr	6.294181	7	7
nwspol	106.673874	60	60
netustm	195.464621	170	120

Let’s take a closer look at the table. On the basis of descriptive statistics we can conclude that Greek residents believe that migrants are more likely to undermine local culture than to enrich it. This is evidenced by the results of the mean and median, which are equal to 4. Next, let us turn to religiosity and its level among the respondents. Based on the results we can say that people are quite religious (above average), but at the same time they are not fanatical (the results are close to 7) Looking at the variable related to news, we can say that in this case there are extreme values that differ from the majority of answers. The huge difference between the mean and the median shows us this. At the same time, people use the Internet actively, and we do not observe strong deviations. What is also interesting is that people use the Internet more actively than they read and listen to news in other formats

3.3 Correlation matrix

Then, we build a correlation matrix.

corgreece = data.frame(greece$imueclt,greece$rlgdgr,greece$nwspol,greece$netustm)

cor(corgreece, use="pairwise.complete.obs", method="spearman")

##                greece.imueclt greece.rlgdgr greece.nwspol greece.netustm
## greece.imueclt    1.000000000    -0.1906570    0.01391719    0.003663789
## greece.rlgdgr    -0.190656970     1.0000000    0.13212990   -0.102690452
## greece.nwspol     0.013917193     0.1321299    1.00000000    0.084149743
## greece.netustm    0.003663789    -0.1026905    0.08414974    1.000000000

For the sake of this project, we will only be looking at the first column. We can see, that our outcome variable imueclt does not correlate well with other continuous variables, except for the rlgdgr variable, with which it weakly/moderately negatively correlates. So, we will take this variable as our predictor.

3.4 Boxplot

Next, let’s graphically explore our categorical predictor: volunfp.

boxplot(data=greece, imueclt~volunfp, col = "#619EFD", main = "The distribution of imueclt by volunfp")

We can see that the median is slightly higher for people who volunteer. It should be assumed that people whose cultural life in this country is weakened by immigration may be less inclined to work as volunteers in non-profit organizations and charitable foundations.Secondly, who believes that the cultural life of the country is enriched by immigration, helps to support the reconciliation and interaction between different cultures and communities.Perhaps this is due to the influence of attitudes towards cultural life on participation in art.

3.5 Regression models

Then, we build two models: with only rlgdgr as a predictor (model1) and with rlgdgr and volunfp as predictors (model2). Then, to compare the model fit, we use r-squared and anova() and get the following results:

greece_regression <- greece %>% select(imueclt, rlgdgr, volunfp)

greece_regression <- drop_na(greece_regression)

model1 <- lm(data=greece_regression, imueclt~rlgdgr)
model2 <- lm(data=greece_regression, imueclt~rlgdgr+volunfp)

summary(model1)

## 
## Call:
## lm(formula = imueclt ~ rlgdgr, data = greece_regression)
## 
## Residuals:
##    Min     1Q Median     3Q    Max 
## -5.375 -1.616  0.143  1.574  6.522 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  5.37462    0.11971   44.90   <2e-16 ***
## rlgdgr      -0.18971    0.01787  -10.61   <2e-16 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.162 on 2716 degrees of freedom
## Multiple R-squared:  0.03982,    Adjusted R-squared:  0.03947 
## F-statistic: 112.6 on 1 and 2716 DF,  p-value: < 2.2e-16

summary(model2)

## 
## Call:
## lm(formula = imueclt ~ rlgdgr + volunfp, data = greece_regression)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.2771 -1.6228  0.1934  1.5610  6.5610 
## 
## Coefficients:
##             Estimate Std. Error t value Pr(>|t|)    
## (Intercept)  6.01757    0.17729  33.941  < 2e-16 ***
## rlgdgr      -0.18380    0.01784 -10.303  < 2e-16 ***
## volunfpNo   -0.74051    0.15115  -4.899 1.02e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.153 on 2715 degrees of freedom
## Multiple R-squared:  0.04823,    Adjusted R-squared:  0.04753 
## F-statistic:  68.8 on 2 and 2715 DF,  p-value: < 2.2e-16

anova(model1, model2)

## Analysis of Variance Table
## 
## Model 1: imueclt ~ rlgdgr
## Model 2: imueclt ~ rlgdgr + volunfp
##   Res.Df   RSS Df Sum of Sq      F    Pr(>F)    
## 1   2716 12700                                  
## 2   2715 12588  1    111.28 24.001 1.019e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

The r-squared is higher in the second model and p-value is less than 0.05, so we can say, that the second model is better.

3.6 Interpretation

sjPlot::tab_model(model2)

	imueclt
Predictors	Estimates	CI	p
(Intercept)	6.02	5.67 – 6.37	<0.001
rlgdgr	-0.18	-0.22 – -0.15	<0.001
volunfp [No]	-0.74	-1.04 – -0.44	<0.001
Observations	2718
R² / R² adjusted	0.048 / 0.048

R-squared adjusted is 0.048, so 5% of changing in the imueclt variable is explained by the model.

The model doesn’t explain the variance sufficiently.

The equation:

\[ imueclt = 6,02 - 0,18*rlgdgr - 0,74*volunfp[No] \]

The intercept coefficient is 6.02 -> the imueclt is 6.02 when the rlgdgr is 0 and volunfp == No.
The rlgdr coefficient is -0.18 -> when the rlgdgr increases by 1, the imueclt decreases by 0.18.
The volunfp coefficient is -0.74 -> the imueclt decreases by 0.74 when the volunfp == No.

3.7 Conclusion

It seems that our assumptions were not particularly correct. However, there is a little correspondence between our chosen variables. It seems to us that there could be other factors explaining the attitude towards migrants, such as political views or some socio-demographic and economic characteristics.

3.8 References

1 - Karyotis, G., & Patrikios, S. (2010). Religion, securitization and anti-immigration attitudes: The case of Greece. Journal of Peace Research, 47(1), 43-57.

4 Project 4

Hello! This is team Люпин and, sadly to inform, but this is our last project on Data Analysis Course :( In this project we will create interaction models and analyse them.

In this project we will describe how religiosity, the usage of the internet and the agreement with the statement that the will cannot be stopped affect the level of agreement with the statement that migrants enrich the cultural life of Greece

So, our research question is - Does the level of agreement with the statemnt that nigrants enrich the cultural life of Greece depends on the level of religiocity of people and their attitude towards that “will of people cannot be stopped”?

Our main hypothesis is:

People who have higher level of religiocity have higher faith that migrants enrich the culture of Greece and they also have higher level of faith that the will of people cannot be stopped, comparing with less religious people

4.1 Variables used

mean_imueclt <- mean(greece$imueclt, na.rm = TRUE)
median_imueclt <- median(greece$imueclt, na.rm = TRUE)
mode_imueclt <- Mode(greece$imueclt, na.rm=TRUE)
sd_imueclt <- sd(greece$imueclt, na.rm = T)
min_imueclt <- min(greece$imueclt, na.rm = T)
max_imueclt <- max(greece$imueclt, na.rm = T)

mean_rlgdgr <- mean(greece$rlgdgr, na.rm = TRUE)
median_rlgdgr <- median(greece$rlgdgr, na.rm = TRUE)
mode_rlgdgr <- Mode(greece$rlgdgr, na.rm=TRUE)
sd_rlgdgr <- sd(greece$rlgdgr, na.rm = T)
min_rlgdgr <- min(greece$rlgdgr, na.rm = T)
max_rlgdgr <- max(greece$rlgdgr, na.rm = T)

mean_wpestop <- mean(greece$wpestop, na.rm = TRUE)
median_wpestop <- median(greece$wpestop, na.rm = TRUE)
mode_wpestop <- Mode(greece$wpestop, na.rm=TRUE)
sd_wpestop <-sd(greece$wpestop, na.rm = T)
min_wpestop <- min(greece$wpestop, na.rm = T)
max_wpestop <- max(greece$wpestop, na.rm = T)

mean_netustm <- mean(greece$netustm, na.rm = TRUE)
median_netustm <- median(greece$netustm, na.rm = TRUE)
mode_netustm <- Mode(greece$netustm, na.rm=TRUE)
sd_netustm <- sd(greece$netustm, na.rm = T)
min_netustm <- min(greece$netustm, na.rm = T)
max_netustm <- max(greece$netustm, na.rm = T)

Mean = c(mean_imueclt, mean_rlgdgr, mean_wpestop, mean_netustm)
Median = c(median_imueclt, median_rlgdgr, median_wpestop, median_netustm)
Mode = c(mode_imueclt, mode_rlgdgr, mode_wpestop, mode_netustm)
Sd = c(sd_imueclt, sd_rlgdgr, sd_wpestop, sd_netustm)
Min = c(min_imueclt, min_rlgdgr, min_wpestop, min_netustm)
Max = c(max_imueclt, max_rlgdgr, max_wpestop, max_netustm)
results <- data.frame(Mean,Median,Mode, Sd, Min, Max, stringsAsFactors = F)
rownames(results) <- c("imueclt", "rlgdgr", "wpestop", "netustm")
kable(results) %>% 
  kable_styling(bootstrap_options=c("bordered", "responsive","striped"), full_width = FALSE)

	Mean	Median	Mode	Sd	Max
imueclt	4.186865	4	5	2.210252	10
rlgdgr	6.294181	7	7	2.327025	10
wpestop	8.627204	9	10	1.424331	10
netustm	195.464621	170	120	136.999033	1200

As we see from the table, people in Greece mostly think that migrants undermine culture rather than enrich here(the median in immueclt states that). People are quite religious in Greece with the median level of religiosity - 7. People also think that will of people cannot be stopped, as the median of the wpestop is 9. However, we can see that people do not use internet that much - the median of netustm is 170, while the maximum is 1200!

4.2 Model with interaction

mod = lm(imueclt~rlgdgr*wpestop + netustm, data = greece)
summary(mod)

## 
## Call:
## lm(formula = imueclt ~ rlgdgr * wpestop + netustm, data = greece)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -5.6798 -1.4679  0.2709  1.5531  6.6279 
## 
## Coefficients:
##                  Estimate Std. Error t value Pr(>|t|)    
## (Intercept)     1.8494977  0.8318691   2.223 0.026308 *  
## rlgdgr          0.4747813  0.1335205   3.556 0.000386 ***
## wpestop         0.3836468  0.0911896   4.207 2.70e-05 ***
## netustm        -0.0001021  0.0003525  -0.290 0.772119    
## rlgdgr:wpestop -0.0704328  0.0147948  -4.761 2.07e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 2.124 on 1974 degrees of freedom
##   (820 observations deleted due to missingness)
## Multiple R-squared:  0.03807,    Adjusted R-squared:  0.03612 
## F-statistic: 19.53 on 4 and 1974 DF,  p-value: 8.913e-16

So, in this model we received the following statistically significant results:

the estimate for the degree of religiosity on the “influence of immigrant on country’s cultural life” is 0.47, which means that with the level of religiocity increasing by 1 point, the level of agreement that migrants enrich cultural life of Greece increases by 0.47(this is statistically significant result on the 99% confidence interval)
the estimate for the level of agreement with the statement that the will of people cannot be stopped is 0.38, which means that with the increasing level of agreement with the statement that will of people cannot be stopped, the level of agreement with the statepent that migrants enrich cultural life of Greece increases by 0.38
the effect of the level of agreement with the statement that the will of people cannot be stopped decreases by 0.07 with each point increasing of the level of religiocity. Thus:
with the complete disagreement with the statement that the will of people cannot be stopped the level of agreement with the statement that migrants enrich the cultural life will increase by 0.47 with each point increasing of the level of religiocity. However at the level of agreement (wpestops) of 1, the level of agreement with the statement that migrants enrich culture the level of religiocity will be increasing with 0.4(0.40+(-0.07)*1) points by increasing in the level of religiocity
the multiple R squared equals 0.038, which means that our model explains only around 4% of data, which seems not that good. But with the really small p-value we may state that this is statistically significant result

4.3 Visualization of model

car::avPlots(mod, col = "black", lwd = 2)

With this visualization we can see how each of our predictors influence our outcome variable(imecult). The slope of blue line indicates the direction of relationship(positive/negative) of each variable.

We can see here that the level of religiocity and the level of agreement with the statement that will of people cannot be stopped have both positive relationships with the level of agreement that migrants enrich cultural life, while our interaction has negative relatioship with our outcome variable.

4.4 Comparing models

In this part of our project we will compare the model we got in the previous project and our model with moderation.

mod_ad = lm(imueclt~rlgdgr+wpestop + netustm, data = greece)

anova(mod_ad, mod)

## Analysis of Variance Table
## 
## Model 1: imueclt ~ rlgdgr + wpestop + netustm
## Model 2: imueclt ~ rlgdgr * wpestop + netustm
##   Res.Df    RSS Df Sum of Sq      F   Pr(>F)    
## 1   1975 9007.9                                 
## 2   1974 8905.7  1    102.25 22.664 2.07e-06 ***
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

sjPlot::tab_model(mod_ad, mod, show.ci = F)

	imueclt		imueclt
Predictors	Estimates	p	Estimates	p
(Intercept)	5.46	<0.001	1.85	0.026
rlgdgr	-0.15	<0.001	0.47	<0.001
wpestop	-0.02	0.602	0.38	<0.001
netustm	-0.00	0.485	-0.00	0.772
rlgdgr × wpestop			-0.07	<0.001
Observations	1979		1979
R² / R² adjusted	0.027 / 0.026		0.038 / 0.036

We compare our models with the anova test. The RSS for the second model with interaction is smaller -> this model better fits our model. Also, looking at the R^2 we can see that for the model with interaction it is bigger, which means that it explains more of our data. So, indeed, comparing these 2 models, the one with the interaction is better.

4.5 Interaction plots

#install.packages("interactions")
library(interactions)
interact_plot(mod, pred="rlgdgr",
                      modx = "wpestop")

The graph shows negative relationships between the level of religiocity and the level of agreement with the statement that migrants enrich the cultural life.

sjPlot::plot_model(mod, type = "int") + 
    xlab("Level of religiocity") +
    ylab("Migrants enrich culture")

The graph shows how the level of religiocity affects the level of agreement with the statement about mighrants and culture for the different levels of the agreement with the statement that will cannot be stopped. It should be noticed that results are statistically significant until the level of religiocity is somewhere around 6.

4.6 Final conclusion

Formula:

\[ imueclt = 1,84 +0,47*rlgdgr +0.38*wpestop - 0,07*rlgdgr*wpestop \]

Based on the analysis we found, that our hypothesis was not true - people who have higher level of religiocity indeed have higher level of faith that migrants enrich the culture, however their faith that the will cannot be stopped is lower, not higher as we supposed.

Due to the fact that there are not so many research on such a narrow topic, we, sadly, cannot reflect on the literature.

This was our final project on the data analysis course! Hope you enjoyed our projects!

haven::write_sav(greece, "greece_final.sav")

Team Люпин!

Final project Data Analysis Course

team Люпин