Data Description

I created a small data set using Pew Research Trust 2014 survey data describing the relationship between income and religion in the United States. The data show the proportions of sampled individuals from each religious tradition that fall into selected income bands: < $30K, $30K - $49,999, $50K - $99,999, and > $100K.

Data source: Pew Research Trust. (2014). “Income distribution by religious group”. Accessed from https://www.pewforum.org/religious-landscape-study/income-distribution/


Loading the data

I start by reading in the data and checking its structure.

library(tidyr)
library(dplyr)
library(ggplot2)
religion <- read.csv('https://raw.githubusercontent.com/chrosemo/data607_fall19_project2/master/pew.csv', header=TRUE)
religion
##                 Religious.tradition Less.than..30.000 X.30.000..49.999
## 1                          Buddhist               36%              18%
## 2                          Catholic               36%              19%
## 3            Evangelical Protestant               35%              22%
## 4                             Hindu               17%              13%
## 5     Historically Black Protestant               53%              22%
## 6                 Jehovah's Witness               48%              25%
## 7                            Jewish               16%              15%
## 8               Mainline Protestant               29%              20%
## 9                            Mormon               27%              20%
## 10                           Muslim               34%              17%
## 11               Orthodox Christian               18%              17%
## 12 Unaffiliated (religious "nones")               33%              20%
##    X.50.000..99.999 X.100.000.or.more Sample.Size
## 1               32%               13%         233
## 2               26%               19%       6,137
## 3               28%               14%       7,462
## 4               34%               36%         172
## 5               17%                8%       1,704
## 6               22%                4%         208
## 7               24%               44%         708
## 8               28%               23%       5,208
## 9               33%               20%         594
## 10              29%               20%         205
## 11              36%               29%         155
## 12              26%               21%       6,790
str(religion)
## 'data.frame':    12 obs. of  6 variables:
##  $ Religious.tradition: Factor w/ 12 levels "Buddhist","Catholic",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ Less.than..30.000  : Factor w/ 11 levels "16%","17%","18%",..: 9 9 8 2 11 10 1 5 4 7 ...
##  $ X.30.000..49.999   : Factor w/ 8 levels "13%","15%","17%",..: 4 5 7 1 7 8 2 6 6 3 ...
##  $ X.50.000..99.999   : Factor w/ 10 levels "17%","22%","24%",..: 7 4 5 9 1 2 3 5 8 6 ...
##  $ X.100.000.or.more  : Factor w/ 11 levels "13%","14%","19%",..: 1 3 2 8 11 9 10 6 4 4 ...
##  $ Sample.Size        : Factor w/ 12 levels "1,704","155",..: 6 9 11 3 1 5 12 7 8 4 ...


Tidying the data

After renaming all of the columns and converting some of them, I gather the income-specific ones and their data into an ‘Income Range’ column containing the income bands and a ‘Proportion’ column containing the values. I then create a new column with estimated counts of sampled individuals.

religion <- religion %>% rename('< $30,000' = 'Less.than..30.000', '$30,000 - $49,999' = 'X.30.000..49.999', '$50,000 - $99,999' = 'X.50.000..99.999', '> $100,000' = 'X.100.000.or.more') 
religion$Religious.tradition <- as.character(religion$Religious.tradition)
religion$Sample.Size <- as.double(gsub(',', '', religion$Sample.Size))
religion <- gather(religion, Income.range, Proportion, -Religious.tradition, -Sample.Size)
religion$Income.range <- factor(religion$Income.range, ordered=TRUE, levels=c('> $100,000','$50,000 - $99,999', '$30,000 - $49,999','< $30,000'))
religion$Proportion <- as.numeric(gsub('%', '', religion$'Proportion'))/100
head(religion)
##             Religious.tradition Sample.Size Income.range Proportion
## 1                      Buddhist         233    < $30,000       0.36
## 2                      Catholic        6137    < $30,000       0.36
## 3        Evangelical Protestant        7462    < $30,000       0.35
## 4                         Hindu         172    < $30,000       0.17
## 5 Historically Black Protestant        1704    < $30,000       0.53
## 6             Jehovah's Witness         208    < $30,000       0.48


Exploring the data

With no specific analysis noted, I embark on an exploratory data analysis. I start by checking the mean proportion falling in each income band. Averaging across all religions, approximately 31.8 percent falls below 30,000 dollars, approximately 19 percent falls between 30,000 and 49,999, approximately 27.9 percent falls between 50,000 and 99,999, and approximately 20.9 percent falls above 100,000.

Considering specific religions, Historically Black Protestant (0.53), Jehovah’s Witness (0.48), and Buddhist (0.36) and Catholic (0.36) have the highest proportions falling below 30,000 dollars, while Jewish (0.16), Hindu (0.17), and Orthodox Christian (0.18) have the lowest proportions. Regarding the 100,000 dollar and above band, Jewish (0.46), Hindu (0.36), and Orthodox Christian (0.29) have the highest proportions, and Jehovah’s Witness (0.04), Historically Black Protestant (0.08), and Buddhist (0.13) have the lowest proportions.

tapply(religion$Proportion, religion$Income.range, mean)
##        > $100,000 $50,000 - $99,999 $30,000 - $49,999         < $30,000 
##         0.2091667         0.2791667         0.1900000         0.3183333
religion %>% arrange(Proportion) %>% group_by(Income.range) %>% top_n(3, Proportion)
## # A tibble: 13 x 4
## # Groups:   Income.range [4]
##    Religious.tradition           Sample.Size Income.range      Proportion
##    <chr>                               <dbl> <ord>                  <dbl>
##  1 Evangelical Protestant               7462 $30,000 - $49,999      0.22 
##  2 Historically Black Protestant        1704 $30,000 - $49,999      0.22 
##  3 Jehovah's Witness                     208 $30,000 - $49,999      0.25 
##  4 Orthodox Christian                    155 > $100,000             0.290
##  5 Mormon                                594 $50,000 - $99,999      0.33 
##  6 Hindu                                 172 $50,000 - $99,999      0.34 
##  7 Buddhist                              233 < $30,000              0.36 
##  8 Catholic                             6137 < $30,000              0.36 
##  9 Orthodox Christian                    155 $50,000 - $99,999      0.36 
## 10 Hindu                                 172 > $100,000             0.36 
## 11 Jewish                                708 > $100,000             0.44 
## 12 Jehovah's Witness                     208 < $30,000              0.48 
## 13 Historically Black Protestant        1704 < $30,000              0.53
religion %>% arrange(Proportion) %>% group_by(Income.range) %>% top_n(-3, Proportion)
## # A tibble: 13 x 4
## # Groups:   Income.range [4]
##    Religious.tradition           Sample.Size Income.range      Proportion
##    <chr>                               <dbl> <ord>                  <dbl>
##  1 Jehovah's Witness                     208 > $100,000              0.04
##  2 Historically Black Protestant        1704 > $100,000              0.08
##  3 Hindu                                 172 $30,000 - $49,999       0.13
##  4 Buddhist                              233 > $100,000              0.13
##  5 Jewish                                708 $30,000 - $49,999       0.15
##  6 Jewish                                708 < $30,000               0.16
##  7 Hindu                                 172 < $30,000               0.17
##  8 Muslim                                205 $30,000 - $49,999       0.17
##  9 Orthodox Christian                    155 $30,000 - $49,999       0.17
## 10 Historically Black Protestant        1704 $50,000 - $99,999       0.17
## 11 Orthodox Christian                    155 < $30,000               0.18
## 12 Jehovah's Witness                     208 $50,000 - $99,999       0.22
## 13 Jewish                                708 $50,000 - $99,999       0.24


I finish by visualizing the data as a horizontal stacked bar chart. Again, there are clear differences in income across religious traditions, though there are clearly also omitted variables (race/ethnicity, education, etc.) that correlate with income and/or religion.

ggplot(data = religion) +
  geom_bar(
    mapping = aes(x=Religious.tradition, y=Proportion, fill=Income.range),
    position = 'fill',
    stat = 'identity'
  ) +
  geom_text(aes(x=Religious.tradition, y=Proportion, label=Proportion),
    position = position_stack(vjust=0.5)) +
  coord_flip()