I created a small data set using Pew Research Trust 2014 survey data describing the relationship between income and religion in the United States. The data show the proportions of sampled individuals from each religious tradition that fall into selected income bands: < $30K, $30K - $49,999, $50K - $99,999, and > $100K.
Data source: Pew Research Trust. (2014). “Income distribution by religious group”. Accessed from https://www.pewforum.org/religious-landscape-study/income-distribution/
I start by reading in the data and checking its structure.
library(tidyr)
library(dplyr)
library(ggplot2)
religion <- read.csv('https://raw.githubusercontent.com/chrosemo/data607_fall19_project2/master/pew.csv', header=TRUE)
religion
## Religious.tradition Less.than..30.000 X.30.000..49.999
## 1 Buddhist 36% 18%
## 2 Catholic 36% 19%
## 3 Evangelical Protestant 35% 22%
## 4 Hindu 17% 13%
## 5 Historically Black Protestant 53% 22%
## 6 Jehovah's Witness 48% 25%
## 7 Jewish 16% 15%
## 8 Mainline Protestant 29% 20%
## 9 Mormon 27% 20%
## 10 Muslim 34% 17%
## 11 Orthodox Christian 18% 17%
## 12 Unaffiliated (religious "nones") 33% 20%
## X.50.000..99.999 X.100.000.or.more Sample.Size
## 1 32% 13% 233
## 2 26% 19% 6,137
## 3 28% 14% 7,462
## 4 34% 36% 172
## 5 17% 8% 1,704
## 6 22% 4% 208
## 7 24% 44% 708
## 8 28% 23% 5,208
## 9 33% 20% 594
## 10 29% 20% 205
## 11 36% 29% 155
## 12 26% 21% 6,790
str(religion)
## 'data.frame': 12 obs. of 6 variables:
## $ Religious.tradition: Factor w/ 12 levels "Buddhist","Catholic",..: 1 2 3 4 5 6 7 8 9 10 ...
## $ Less.than..30.000 : Factor w/ 11 levels "16%","17%","18%",..: 9 9 8 2 11 10 1 5 4 7 ...
## $ X.30.000..49.999 : Factor w/ 8 levels "13%","15%","17%",..: 4 5 7 1 7 8 2 6 6 3 ...
## $ X.50.000..99.999 : Factor w/ 10 levels "17%","22%","24%",..: 7 4 5 9 1 2 3 5 8 6 ...
## $ X.100.000.or.more : Factor w/ 11 levels "13%","14%","19%",..: 1 3 2 8 11 9 10 6 4 4 ...
## $ Sample.Size : Factor w/ 12 levels "1,704","155",..: 6 9 11 3 1 5 12 7 8 4 ...
After renaming all of the columns and converting some of them, I gather the income-specific ones and their data into an ‘Income Range’ column containing the income bands and a ‘Proportion’ column containing the values. I then create a new column with estimated counts of sampled individuals.
religion <- religion %>% rename('< $30,000' = 'Less.than..30.000', '$30,000 - $49,999' = 'X.30.000..49.999', '$50,000 - $99,999' = 'X.50.000..99.999', '> $100,000' = 'X.100.000.or.more')
religion$Religious.tradition <- as.character(religion$Religious.tradition)
religion$Sample.Size <- as.double(gsub(',', '', religion$Sample.Size))
religion <- gather(religion, Income.range, Proportion, -Religious.tradition, -Sample.Size)
religion$Income.range <- factor(religion$Income.range, ordered=TRUE, levels=c('> $100,000','$50,000 - $99,999', '$30,000 - $49,999','< $30,000'))
religion$Proportion <- as.numeric(gsub('%', '', religion$'Proportion'))/100
head(religion)
## Religious.tradition Sample.Size Income.range Proportion
## 1 Buddhist 233 < $30,000 0.36
## 2 Catholic 6137 < $30,000 0.36
## 3 Evangelical Protestant 7462 < $30,000 0.35
## 4 Hindu 172 < $30,000 0.17
## 5 Historically Black Protestant 1704 < $30,000 0.53
## 6 Jehovah's Witness 208 < $30,000 0.48
With no specific analysis noted, I embark on an exploratory data analysis. I start by checking the mean proportion falling in each income band. Averaging across all religions, approximately 31.8 percent falls below 30,000 dollars, approximately 19 percent falls between 30,000 and 49,999, approximately 27.9 percent falls between 50,000 and 99,999, and approximately 20.9 percent falls above 100,000.
Considering specific religions, Historically Black Protestant (0.53), Jehovah’s Witness (0.48), and Buddhist (0.36) and Catholic (0.36) have the highest proportions falling below 30,000 dollars, while Jewish (0.16), Hindu (0.17), and Orthodox Christian (0.18) have the lowest proportions. Regarding the 100,000 dollar and above band, Jewish (0.46), Hindu (0.36), and Orthodox Christian (0.29) have the highest proportions, and Jehovah’s Witness (0.04), Historically Black Protestant (0.08), and Buddhist (0.13) have the lowest proportions.
tapply(religion$Proportion, religion$Income.range, mean)
## > $100,000 $50,000 - $99,999 $30,000 - $49,999 < $30,000
## 0.2091667 0.2791667 0.1900000 0.3183333
religion %>% arrange(Proportion) %>% group_by(Income.range) %>% top_n(3, Proportion)
## # A tibble: 13 x 4
## # Groups: Income.range [4]
## Religious.tradition Sample.Size Income.range Proportion
## <chr> <dbl> <ord> <dbl>
## 1 Evangelical Protestant 7462 $30,000 - $49,999 0.22
## 2 Historically Black Protestant 1704 $30,000 - $49,999 0.22
## 3 Jehovah's Witness 208 $30,000 - $49,999 0.25
## 4 Orthodox Christian 155 > $100,000 0.290
## 5 Mormon 594 $50,000 - $99,999 0.33
## 6 Hindu 172 $50,000 - $99,999 0.34
## 7 Buddhist 233 < $30,000 0.36
## 8 Catholic 6137 < $30,000 0.36
## 9 Orthodox Christian 155 $50,000 - $99,999 0.36
## 10 Hindu 172 > $100,000 0.36
## 11 Jewish 708 > $100,000 0.44
## 12 Jehovah's Witness 208 < $30,000 0.48
## 13 Historically Black Protestant 1704 < $30,000 0.53
religion %>% arrange(Proportion) %>% group_by(Income.range) %>% top_n(-3, Proportion)
## # A tibble: 13 x 4
## # Groups: Income.range [4]
## Religious.tradition Sample.Size Income.range Proportion
## <chr> <dbl> <ord> <dbl>
## 1 Jehovah's Witness 208 > $100,000 0.04
## 2 Historically Black Protestant 1704 > $100,000 0.08
## 3 Hindu 172 $30,000 - $49,999 0.13
## 4 Buddhist 233 > $100,000 0.13
## 5 Jewish 708 $30,000 - $49,999 0.15
## 6 Jewish 708 < $30,000 0.16
## 7 Hindu 172 < $30,000 0.17
## 8 Muslim 205 $30,000 - $49,999 0.17
## 9 Orthodox Christian 155 $30,000 - $49,999 0.17
## 10 Historically Black Protestant 1704 $50,000 - $99,999 0.17
## 11 Orthodox Christian 155 < $30,000 0.18
## 12 Jehovah's Witness 208 $50,000 - $99,999 0.22
## 13 Jewish 708 $50,000 - $99,999 0.24
I finish by visualizing the data as a horizontal stacked bar chart. Again, there are clear differences in income across religious traditions, though there are clearly also omitted variables (race/ethnicity, education, etc.) that correlate with income and/or religion.
ggplot(data = religion) +
geom_bar(
mapping = aes(x=Religious.tradition, y=Proportion, fill=Income.range),
position = 'fill',
stat = 'identity'
) +
geom_text(aes(x=Religious.tradition, y=Proportion, label=Proportion),
position = position_stack(vjust=0.5)) +
coord_flip()