library(foreign) # To read SPSS data -> read.spss
library(ggplot2) # Plotting system
library(plyr) # mapvalues function
library(gridExtra) # Grid plotting for ggplot2
## Loading required package: grid
The European Social Survey - ESS - is an academically driven cross-national survey that has been conducted every two years across Europe since 2001. The survey measures the attitudes, beliefs and behaviour patterns of diverse populations in more than thirty nations. The sampling methodology to carry it out follows this key principals:
The data used in this study corresponds to the year 2012. The study collected information on 586 variables across 24 countries, resulting in a data set that contains 44,243 observations.
The data can be downloaded for widely known and used statistical software packages (SAS, SPSS and STATA). In this case the SPSS file was downloaded and loaded into R using the ‘foreign’ package, which coerced some of the observations into N/As, namely the observations corresponding to those who: refused to answer, did not know or did not answer. This causes the warnings that can be seen when the data is read. Although this information could, perhaps, be useful to spot some relationships, given the scope of this study they will not be taken into consideration.
ess_raw <- read.spss('raw_data/ESS6e01_2.sav',
use.value.labels = TRUE,
to.data.frame = TRUE)
## Warning: raw_data/ESS6e01_2.sav: Unrecognized record type 7, subtype 18
## encountered in system file
## re-encoding from CP1252
## Warning: duplicated levels in factors are deprecated
## Warning: duplicated levels in factors are deprecated
# Create a data frame and include only the selected variables for the study
ess <- data.frame(ess_raw[c("cntry","rlgdgr", "freehms")])
# Omit NA values
ess <- na.omit(ess)
This observational study will attempt to answer the question is there a relationship between religiosity and homosexual intolerance? This will be done by investigating the correlation (or lack thereof) between the followingn variables:
# Order the factors within the variable
ess$rlgdgr <- ordered(ess$rlgdgr,
levels = c( "Not at all religious", "1", "2", "3", "4", "5",
"6", "7", "8", "9", "Very religious"))
# Transform to an ordinal scale from 0 to 10
ess$rlgdgr <- mapvalues(ess$rlgdgr,
from=c("Not at all religious", "Very religious"),
to=c("0", "10"))
# Order the factors within the variable
ess$freehms <- ordered(ess$freehms,
levels = c("Agree strongly", "Agree", "Neither agree nor disagree",
"Disagree", "Disagree strongly"))
# Transform to an ordinal scale from 1 to 5
ess$freehms <- mapvalues(ess$freehms,
from=c("Agree strongly", "Agree", "Neither agree nor disagree",
"Disagree", "Disagree strongly"),
to=c("1", "2", "3", "4", "5"))
As samples have been collected to “be representative of all persons aged 15 and over (no upper age limit) resident within private households in each country, regardless of their nationality, citizenship or language”, we can generalize to the whole population older than 15. One should be very careful however when generalizing the data results as the total data merges together data from, likely, very different countries. Arguably,there are significant differences from country to country that can result in significant biased interpretations of the European population as a whole.
The data, loaded initially for all variables, has been limited to include only the variables of interest plus the country. The 2,568 NA values (less than 5% of the total) have been omitted to facilitate the study, which results in a data set containing 41,757 valid observations.
p1 = ggplot(ess, aes(x=rlgdgr)) +
geom_bar(fill="#7dd1fa") +
scale_y_continuous(lim=c(0,7000), breaks=seq(0,7000, 1000)) +
xlab('0 = Not religious at all 10 = Very religious') +
ylab('Number of respondents') +
ggtitle('How religious are you?')
p2 = ggplot(ess, aes(freehms)) +
geom_histogram(fill="#ffb980") +
scale_y_continuous(lim=c(0,15250), breaks=seq(0,15000, 2500)) +
xlab('1 = Agree strongly 5 = Disagree strongly') +
ylab('Number of respondents') +
ggtitle('Homosexuals free to live life as they wish')
grid.arrange(p1, p2, ncol=2)
The distribution of religiosity from 0 to 10 is a bimodal distribution with peaks on 0, not religious at all, and 5, moderately religious. The distribution is rather flat otherwise. Homosexual intolerance is unimodal and strongly right-skewed, with a higher amount of respondents in the values 1 and 2, “strongly agree” and “agree” respectively.
However, the distributions of both variables by country vary greatly from country to country and do not, in most of the cases, resemble the European aggregate shown in the histograms above.
ggplot(ess, aes(x=rlgdgr)) +
geom_bar(fill="#7dd1fa") +
xlab('0 = Not religious at all 10 = Very religious') +
ylab('Number of respondents') +
ggtitle('How religious are you? - Country split') +
facet_wrap(~cntry, ncol=6)
ggplot(ess, aes(x=freehms)) +
geom_histogram(fill="#ffb980") +
xlab('1 = Agree strongly 5 = Disagree strongly') +
ylab('Number of respondents') +
ggtitle('Homosexuals free to live life as they wish - Country split') +
facet_wrap(~cntry, ncol=6)
The proportion contingency table provides some insight into the relationship between the variables, which becomes clearer by visualizing it.
ess_table = table(ess$freehms, ess$rlgdgr)
round(prop.table(ess_table)* 100, 2)
##
## 0 1 2 3 4 5 6 7 8 9 10
## 1 7.33 2.53 2.98 2.92 2.07 4.66 2.91 3.17 2.51 0.91 1.20
## 2 5.11 1.89 2.62 3.10 2.65 6.26 3.79 4.25 3.71 1.38 1.50
## 3 1.79 0.72 0.80 1.00 0.89 2.28 1.43 1.68 1.62 0.75 0.94
## 4 0.94 0.40 0.49 0.55 0.47 1.30 0.85 1.05 1.12 0.56 0.93
## 5 0.81 0.28 0.32 0.47 0.37 1.20 0.68 0.89 1.02 0.61 1.34
percent_table = prop.table(ess_table, 2)
barplot(as.matrix(percent_table),
col=c("#ffb97f", "#dcba99", "#b9bcb3", "#96becc", "#73c0e6"),
xlab="How religious are you",
ylab="Homosexuals free to live life as they wish",
main="Views towards homosexuals by religiosity level",
legend.text=TRUE,
args.legend=list(x="left", bg="white")
)
The plot shows the proportions displayed in the contingency table. The x axis runs along the levels of religiosity and the y axis along the views towards homosexuals. The proportion of respondents who strongly agreed with the statement “Homosexuals free to live life as they wish” appear to grow as the religiosity level approaches zero, “not religious at all”. Conversely, it seems to follow the opposite trend as respondents declare themselves “very religious”.
The relationship between religiosity and homosexual intolerance can be evaluated using the chi-square independence test. The conditions for the applicability of the test are met:
The hypothesis for the test can be established as follows:
# Load inference function from the course repository
source("http://bit.ly/dasi_inference")
inference(y=ess$freehms,
x=ess$rlgdgr,
est="proportion",
type="ht",
method="theoretical",
alternative="greater",
siglevel=0.01)
## Response variable: categorical, Explanatory variable: categorical
## Chi-square test of independence
##
## Summary statistics:
## x
## y 0 1 2 3 4 5 6 7 8 9 10
## 1 3060 1055 1243 1220 865 1945 1216 1322 1048 380 501
## 2 2134 791 1095 1295 1105 2615 1582 1774 1549 575 627
## 3 746 300 336 418 372 954 599 703 678 315 393
## 4 392 165 204 231 195 542 353 440 467 233 388
## 5 338 117 132 196 154 503 286 371 426 255 558
## Sum 6670 2428 3010 3360 2691 6559 4036 4610 4168 1758 2467
## x
## y Sum
## 1 13855
## 2 15142
## 3 5814
## 4 3610
## 5 3336
## Sum 41757
## H_0: Response and explanatory variable are independent.
## H_A: Response and explanatory variable are dependent.
## Check conditions: expected counts
## x
## y 0 1 2 3 4 5 6 7 8 9
## 1 2213.1 805.6 998.7 1114.8 892.9 2176.3 1339.2 1529.6 1383.0 583.3
## 2 2418.7 880.5 1091.5 1218.4 975.8 2378.4 1463.5 1671.7 1511.4 637.5
## 3 928.7 338.1 419.1 467.8 374.7 913.2 562.0 641.9 580.3 244.8
## 4 576.6 209.9 260.2 290.5 232.6 567.0 348.9 398.6 360.3 152.0
## 5 532.9 194.0 240.5 268.4 215.0 524.0 322.4 368.3 333.0 140.4
## x
## y 10
## 1 818.5
## 2 894.6
## 3 343.5
## 4 213.3
## 5 197.1
##
## Pearson's Chi-squared test
##
## data: y_table
## X-squared = 2413, df = 40, p-value < 2.2e-16
The result of the inference function for the chi-square test of independence returns a p-value of 2.2e-16, which contrasted with the significance level of 1% applied leads us to reject the null-hypothesis. There is a correlation between religiosity and views towards homosexuals.
The chi-square test of independence provides strong evidence to reject the null-hypothesis and supports the alternative hypothesis: religiosity and views towards homosexuals are dependent. There is a correlation between the answers given to the statements “how religious are you” and “gays and lesbians free to live life as they wish”. Homosexual tolerance seems to increase as religiosity decreases, and viceversa. This however does not allow us to infer a causal link. There might be other factors influencing the variables that have not been accounted for.
ggplot(ess, aes(x=rlgdgr, y=freehms)) +
geom_jitter(alpha=0.1, colour='#73c0e6') +
xlab("How religious are you") +
ylab("Homosexuals free to live life as they wish")
Although not as intuitive as the visualization of the proportion contingency table, this correlation is also hinted at by the distribution of all observations across the levels of homosexual intolerance by the levels of religiosity. Each blue dot corresponds to a single observation. The number of respondents seems to increase towards the lower levels of both homosexual intolerance and religiosity.
Please note that this conclusion has been arrived at testing the relationship between the variables for the European countries included in the study as a whole. The same relationship, however, cannot be concluded to exist within any of the countries in the study. Future explorations could be aimed at analyzing the relationship within each country.
This correlation study was carried out as the final project for the course on Data Analysis and Statistical Inference. The whole project and data can be found on Github.