Religiosity and views towards homosexuality

library(foreign) # To read SPSS data -> read.spss
library(ggplot2) # Plotting system
library(plyr) # mapvalues function
library(gridExtra) # Grid plotting for ggplot2
## Loading required package: grid

Introduction

The European Social Survey - ESS - is an academically driven cross-national survey that has been conducted every two years across Europe since 2001. The survey measures the attitudes, beliefs and behaviour patterns of diverse populations in more than thirty nations. The sampling methodology to carry it out follows this key principals:

  • Samples must be representative of all persons aged 15 and over (no upper age limit) resident within private households in each country, regardless of their nationality, citizenship or language
  • Individuals are selected by strict random probability methods at every stage
  • Sampling frames of individuals, households and addresses may be used
  • All countries must aim for a minimum ‘effective achieved sample size’ of 1,500 or 800 in countries with ESS populations of less than 2 million after discounting for design effects
  • Quota sampling is not permitted at any stage
  • Substitution of non-responding households or individuals (whether ‘refusals’, ‘non-contacts’ or ‘ineligibles’) is not permitted at any stage

The data used in this study corresponds to the year 2012. The study collected information on 586 variables across 24 countries, resulting in a data set that contains 44,243 observations.

The data can be downloaded for widely known and used statistical software packages (SAS, SPSS and STATA). In this case the SPSS file was downloaded and loaded into R using the ‘foreign’ package, which coerced some of the observations into N/As, namely the observations corresponding to those who: refused to answer, did not know or did not answer. This causes the warnings that can be seen when the data is read. Although this information could, perhaps, be useful to spot some relationships, given the scope of this study they will not be taken into consideration.

ess_raw <- read.spss('raw_data/ESS6e01_2.sav',
                     use.value.labels = TRUE,
                     to.data.frame = TRUE)
## Warning: raw_data/ESS6e01_2.sav: Unrecognized record type 7, subtype 18
## encountered in system file
## re-encoding from CP1252
## Warning: duplicated levels in factors are deprecated
## Warning: duplicated levels in factors are deprecated
# Create a data frame and include only the selected variables for the study
ess <- data.frame(ess_raw[c("cntry","rlgdgr", "freehms")])

# Omit NA values
ess <- na.omit(ess)

Data

This observational study will attempt to answer the question is there a relationship between religiosity and homosexual intolerance? This will be done by investigating the correlation (or lack thereof) between the followingn variables:

  • rlgdgr - The explanatory variable, “how religious are you” is a categorical variable that follows an ordinal scale from 0, not at all religious, to 10, very religious. It contains a total of 43,820 valid observations and 423 missing cases.
# Order the factors within the variable
ess$rlgdgr <- ordered(ess$rlgdgr,
                      levels = c( "Not at all religious", "1", "2", "3", "4", "5",
                                  "6", "7", "8", "9", "Very religious"))

# Transform to an ordinal scale from 0 to 10
ess$rlgdgr <- mapvalues(ess$rlgdgr,
                        from=c("Not at all religious", "Very religious"),
                        to=c("0", "10"))
  • freehms - The response variable, “gays and lesbians free to live life as they wish” is a categorical variable that follows an ordinal scale from 1, agree strongly, to 5, disagree strongly. It contains a total of 42,098 valid observations and 2145 missing cases. Please note how it has been originally ordered in an increasing degree of intolerance, 1 being the most tolerant, and 5 the least.
# Order the factors within the variable
ess$freehms <- ordered(ess$freehms,
                       levels = c("Agree strongly", "Agree", "Neither agree nor disagree",
                                  "Disagree", "Disagree strongly"))

# Transform to an ordinal scale from 1 to 5
ess$freehms <- mapvalues(ess$freehms,
                         from=c("Agree strongly", "Agree", "Neither agree nor disagree",
                                "Disagree", "Disagree strongly"),
                         to=c("1", "2", "3", "4", "5"))

As samples have been collected to “be representative of all persons aged 15 and over (no upper age limit) resident within private households in each country, regardless of their nationality, citizenship or language”, we can generalize to the whole population older than 15. One should be very careful however when generalizing the data results as the total data merges together data from, likely, very different countries. Arguably,there are significant differences from country to country that can result in significant biased interpretations of the European population as a whole.


Exploratory data analysis

The data, loaded initially for all variables, has been limited to include only the variables of interest plus the country. The 2,568 NA values (less than 5% of the total) have been omitted to facilitate the study, which results in a data set containing 41,757 valid observations.

p1 = ggplot(ess, aes(x=rlgdgr)) +
  geom_bar(fill="#7dd1fa") +
  scale_y_continuous(lim=c(0,7000), breaks=seq(0,7000, 1000)) +
  xlab('0 = Not religious at all 10 = Very religious') +
  ylab('Number of respondents') +
  ggtitle('How religious are you?')

p2 = ggplot(ess, aes(freehms)) +
  geom_histogram(fill="#ffb980") +
  scale_y_continuous(lim=c(0,15250), breaks=seq(0,15000, 2500)) +
  xlab('1 = Agree strongly 5 = Disagree strongly') +
  ylab('Number of respondents') +
  ggtitle('Homosexuals free to live life as they wish')
  

grid.arrange(p1, p2, ncol=2)

plot of chunk full histograms

The distribution of religiosity from 0 to 10 is a bimodal distribution with peaks on 0, not religious at all, and 5, moderately religious. The distribution is rather flat otherwise. Homosexual intolerance is unimodal and strongly right-skewed, with a higher amount of respondents in the values 1 and 2, “strongly agree” and “agree” respectively.

However, the distributions of both variables by country vary greatly from country to country and do not, in most of the cases, resemble the European aggregate shown in the histograms above.

ggplot(ess, aes(x=rlgdgr)) +
  geom_bar(fill="#7dd1fa") +
  xlab('0 = Not religious at all 10 = Very religious') +
  ylab('Number of respondents') +
  ggtitle('How religious are you? - Country split') +
  facet_wrap(~cntry, ncol=6)

plot of chunk religiosity_country

ggplot(ess, aes(x=freehms)) +
  geom_histogram(fill="#ffb980") +
  xlab('1 = Agree strongly 5 = Disagree strongly') +
  ylab('Number of respondents') +
  ggtitle('Homosexuals free to live life as they wish - Country split') +
  facet_wrap(~cntry, ncol=6)

plot of chunk views_towards_homosexuals_country

The proportion contingency table provides some insight into the relationship between the variables, which becomes clearer by visualizing it.

ess_table = table(ess$freehms, ess$rlgdgr)

round(prop.table(ess_table)* 100, 2)
##    
##        0    1    2    3    4    5    6    7    8    9   10
##   1 7.33 2.53 2.98 2.92 2.07 4.66 2.91 3.17 2.51 0.91 1.20
##   2 5.11 1.89 2.62 3.10 2.65 6.26 3.79 4.25 3.71 1.38 1.50
##   3 1.79 0.72 0.80 1.00 0.89 2.28 1.43 1.68 1.62 0.75 0.94
##   4 0.94 0.40 0.49 0.55 0.47 1.30 0.85 1.05 1.12 0.56 0.93
##   5 0.81 0.28 0.32 0.47 0.37 1.20 0.68 0.89 1.02 0.61 1.34
percent_table = prop.table(ess_table, 2)

barplot(as.matrix(percent_table),
        col=c("#ffb97f", "#dcba99", "#b9bcb3", "#96becc", "#73c0e6"),
        xlab="How religious are you",
        ylab="Homosexuals free to live life as they wish",
        main="Views towards homosexuals by religiosity level",
        legend.text=TRUE,
        args.legend=list(x="left", bg="white")
        )

plot of chunk proportion_contingency_table

The plot shows the proportions displayed in the contingency table. The x axis runs along the levels of religiosity and the y axis along the views towards homosexuals. The proportion of respondents who strongly agreed with the statement “Homosexuals free to live life as they wish” appear to grow as the religiosity level approaches zero, “not religious at all”. Conversely, it seems to follow the opposite trend as respondents declare themselves “very religious”.


Inference

The relationship between religiosity and homosexual intolerance can be evaluated using the chi-square independence test. The conditions for the applicability of the test are met:

  • Sampled observations are independent.
  • Sampling has been done witout replacement.
  • Each case contributes to one cell in the contingency table (see above).
  • Each particular scenario has at least 5 expected cases.

The hypothesis for the test can be established as follows:

  • Null hypothesis: Religiosity and homosexual intolerance are independent. Homosexual intolerance does not vary by levels of religiosity.
  • Alternative hypothesis: Religiosity and homosexual intolerance are dependent. Homosexual intolerance does vary by levels of religiosity.
# Load inference function from the course repository
source("http://bit.ly/dasi_inference")
inference(y=ess$freehms,
          x=ess$rlgdgr,
          est="proportion",
          type="ht",
          method="theoretical",
          alternative="greater",
          siglevel=0.01)
## Response variable: categorical, Explanatory variable: categorical
## Chi-square test of independence
## 
## Summary statistics:
##      x
## y         0     1     2     3     4     5     6     7     8     9    10
##   1    3060  1055  1243  1220   865  1945  1216  1322  1048   380   501
##   2    2134   791  1095  1295  1105  2615  1582  1774  1549   575   627
##   3     746   300   336   418   372   954   599   703   678   315   393
##   4     392   165   204   231   195   542   353   440   467   233   388
##   5     338   117   132   196   154   503   286   371   426   255   558
##   Sum  6670  2428  3010  3360  2691  6559  4036  4610  4168  1758  2467
##      x
## y       Sum
##   1   13855
##   2   15142
##   3    5814
##   4    3610
##   5    3336
##   Sum 41757
## H_0: Response and explanatory variable are independent.
## H_A: Response and explanatory variable are dependent.
## Check conditions: expected counts
##    x
## y        0     1      2      3     4      5      6      7      8     9
##   1 2213.1 805.6  998.7 1114.8 892.9 2176.3 1339.2 1529.6 1383.0 583.3
##   2 2418.7 880.5 1091.5 1218.4 975.8 2378.4 1463.5 1671.7 1511.4 637.5
##   3  928.7 338.1  419.1  467.8 374.7  913.2  562.0  641.9  580.3 244.8
##   4  576.6 209.9  260.2  290.5 232.6  567.0  348.9  398.6  360.3 152.0
##   5  532.9 194.0  240.5  268.4 215.0  524.0  322.4  368.3  333.0 140.4
##    x
## y      10
##   1 818.5
##   2 894.6
##   3 343.5
##   4 213.3
##   5 197.1
## 
##  Pearson's Chi-squared test
## 
## data:  y_table
## X-squared = 2413, df = 40, p-value < 2.2e-16

plot of chunk inference_function

The result of the inference function for the chi-square test of independence returns a p-value of 2.2e-16, which contrasted with the significance level of 1% applied leads us to reject the null-hypothesis. There is a correlation between religiosity and views towards homosexuals.


Conclusion

The chi-square test of independence provides strong evidence to reject the null-hypothesis and supports the alternative hypothesis: religiosity and views towards homosexuals are dependent. There is a correlation between the answers given to the statements “how religious are you” and “gays and lesbians free to live life as they wish”. Homosexual tolerance seems to increase as religiosity decreases, and viceversa. This however does not allow us to infer a causal link. There might be other factors influencing the variables that have not been accounted for.

ggplot(ess, aes(x=rlgdgr, y=freehms)) +
  geom_jitter(alpha=0.1, colour='#73c0e6') +
  xlab("How religious are you") +
  ylab("Homosexuals free to live life as they wish")

plot of chunk jitter_plot Although not as intuitive as the visualization of the proportion contingency table, this correlation is also hinted at by the distribution of all observations across the levels of homosexual intolerance by the levels of religiosity. Each blue dot corresponds to a single observation. The number of respondents seems to increase towards the lower levels of both homosexual intolerance and religiosity.


Please note that this conclusion has been arrived at testing the relationship between the variables for the European countries included in the study as a whole. The same relationship, however, cannot be concluded to exist within any of the countries in the study. Future explorations could be aimed at analyzing the relationship within each country.

This correlation study was carried out as the final project for the course on Data Analysis and Statistical Inference. The whole project and data can be found on Github.