As mentioned by the classmate (Shovan Biswas) who posted this dataset would like to tidy this dataset into 3 variables religion, income and frequency. The sample size is based on the religion distribution across the United States. There are more analysis can be done besides tidying the data such as:

(1) For each income level, identify which religion has the highest and lowest household income.
(2) Figure out which income level has the highest variation.

Load the original data file from GitHub.

IncomeReligious <- read.csv("https://raw.githubusercontent.com/SieSiongWong/DATA-607/master/Income%20Distirbution%20by%20Religious%20Group.csv", header=TRUE, sep=",",)

Load the packages required to tidy and transform the data.

library(dplyr)
library(tidyr)
library(ggplot2)
library(reshape2)
library(stringr)

Review the dataset.

IncomeReligious
##                 Religious.tradition Less.than..30.000 X.30.000..49.999
## 1                          Buddhist               36%              18%
## 2                          Catholic               36%              19%
## 3            Evangelical Protestant               35%              22%
## 4                             Hindu               17%              13%
## 5     Historically Black Protestant               53%              22%
## 6                 Jehovah's Witness               48%              25%
## 7                            Jewish               16%              15%
## 8               Mainline Protestant               29%              20%
## 9                            Mormon               27%              20%
## 10                           Muslim               34%              17%
## 11               Orthodox Christian               18%              17%
## 12 Unaffiliated (religious "nones")               33%              20%
##    X.50.000..99.999 X.100.000.or.more Sample.Size
## 1               32%               13%         233
## 2               26%               19%       6,137
## 3               28%               14%       7,462
## 4               34%               36%         172
## 5               17%                8%       1,704
## 6               22%                4%         208
## 7               24%               44%         708
## 8               28%               23%       5,208
## 9               33%               20%         594
## 10              29%               20%         205
## 11              36%               29%         155
## 12              26%               21%       6,790
str(IncomeReligious)
## 'data.frame':    12 obs. of  6 variables:
##  $ Religious.tradition: Factor w/ 12 levels "Buddhist","Catholic",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ Less.than..30.000  : Factor w/ 11 levels "16%","17%","18%",..: 9 9 8 2 11 10 1 5 4 7 ...
##  $ X.30.000..49.999   : Factor w/ 8 levels "13%","15%","17%",..: 4 5 7 1 7 8 2 6 6 3 ...
##  $ X.50.000..99.999   : Factor w/ 10 levels "17%","22%","24%",..: 7 4 5 9 1 2 3 5 8 6 ...
##  $ X.100.000.or.more  : Factor w/ 11 levels "13%","14%","19%",..: 1 3 2 8 11 9 10 6 4 4 ...
##  $ Sample.Size        : Factor w/ 12 levels "1,704","155",..: 6 9 11 3 1 5 12 7 8 4 ...

Clean the data.

## Remove the sample size as it's useless.
IncomeReligious2 <- subset(IncomeReligious, select=-Sample.Size)

## Rename the columns 1:5.
IncomeReligious2 <- IncomeReligious2 %>% rename("Religion"="Religious.tradition", "Less than $30k"="Less.than..30.000", "$30k-$49,999"="X.30.000..49.999", "$50k-$99,999"="X.50.000..99.999", "Over $100k"="X.100.000.or.more")

Rehape the clean data.

## Convert the dataset to long form by turning the column 2 to 5 into rows.
IncomeReligious2 <- IncomeReligious2 %>% gather(Income, value,2:5)

## Rename the column name value to Frequency.
IncomeReligious2 <- rename(IncomeReligious2, "Frequency"="value")

## Convert the frequency column into percentatge value.
IncomeReligious2 <- IncomeReligious2 %>% transform(Frequency=as.numeric(unlist(str_extract(IncomeReligious2$Frequency,"[[:digit:]]{1,}"))))

## Sort the religion column.
IncomeReligious2 <- IncomeReligious2 %>% arrange(Religion, desc(Religion))
IncomeReligious2
##                            Religion         Income Frequency
## 1                          Buddhist Less than $30k        36
## 2                          Buddhist   $30k-$49,999        18
## 3                          Buddhist   $50k-$99,999        32
## 4                          Buddhist     Over $100k        13
## 5                          Catholic Less than $30k        36
## 6                          Catholic   $30k-$49,999        19
## 7                          Catholic   $50k-$99,999        26
## 8                          Catholic     Over $100k        19
## 9            Evangelical Protestant Less than $30k        35
## 10           Evangelical Protestant   $30k-$49,999        22
## 11           Evangelical Protestant   $50k-$99,999        28
## 12           Evangelical Protestant     Over $100k        14
## 13                            Hindu Less than $30k        17
## 14                            Hindu   $30k-$49,999        13
## 15                            Hindu   $50k-$99,999        34
## 16                            Hindu     Over $100k        36
## 17    Historically Black Protestant Less than $30k        53
## 18    Historically Black Protestant   $30k-$49,999        22
## 19    Historically Black Protestant   $50k-$99,999        17
## 20    Historically Black Protestant     Over $100k         8
## 21                Jehovah's Witness Less than $30k        48
## 22                Jehovah's Witness   $30k-$49,999        25
## 23                Jehovah's Witness   $50k-$99,999        22
## 24                Jehovah's Witness     Over $100k         4
## 25                           Jewish Less than $30k        16
## 26                           Jewish   $30k-$49,999        15
## 27                           Jewish   $50k-$99,999        24
## 28                           Jewish     Over $100k        44
## 29              Mainline Protestant Less than $30k        29
## 30              Mainline Protestant   $30k-$49,999        20
## 31              Mainline Protestant   $50k-$99,999        28
## 32              Mainline Protestant     Over $100k        23
## 33                           Mormon Less than $30k        27
## 34                           Mormon   $30k-$49,999        20
## 35                           Mormon   $50k-$99,999        33
## 36                           Mormon     Over $100k        20
## 37                           Muslim Less than $30k        34
## 38                           Muslim   $30k-$49,999        17
## 39                           Muslim   $50k-$99,999        29
## 40                           Muslim     Over $100k        20
## 41               Orthodox Christian Less than $30k        18
## 42               Orthodox Christian   $30k-$49,999        17
## 43               Orthodox Christian   $50k-$99,999        36
## 44               Orthodox Christian     Over $100k        29
## 45 Unaffiliated (religious "nones") Less than $30k        33
## 46 Unaffiliated (religious "nones")   $30k-$49,999        20
## 47 Unaffiliated (religious "nones")   $50k-$99,999        26
## 48 Unaffiliated (religious "nones")     Over $100k        21

Analyze the clean data.

## % of adults at each household income level for each religion bar plot:- Figure 1
IncomeReligious2$Income <- factor(IncomeReligious2$Income, levels=c("Less than $30k", "$30k-$49,999", "$50k-$99,999", "Over $100k"))

ggplot(IncomeReligious2, aes(x = Religion, y = Frequency, fill = Income)) + geom_bar(stat="identity", position = position_stack(reverse = FALSE)) + xlab("Religion") + ylab("% of Adults Household Income") + scale_fill_brewer(palette = "Set2") + coord_flip() + theme(legend.position = "top") +  geom_text(aes(label=Frequency), position = position_stack(vjust = .5), size = 3)

## % of income distribution by religions at each income level box plot: Figure 2
ggplot(IncomeReligious2, aes(x=reorder(factor(Income), Frequency, fun=median),y=Frequency,fill=factor(Income))) + geom_boxplot() + labs(title="% Income Distribution by Religions") + ylab("% Income Distribution") + theme(legend.position = "none", axis.title.x = element_blank(), axis.text.x=element_text(angle=45)) + theme(plot.title = element_text(hjust=0.5)) + theme(axis.text.x = element_text(margin = margin(t = 25, r = 20, b = 0, l = 0)))

Conclusions:

From the figure 1, we can clearly see that at the income level “Less Than $30k”, the Historically Black Protestant has the highest percentage, 53% and the Jewish has the lowest percentage,16%. For the income level “$30k-$49,999”, the Jehovah’s Witness has highest percentage, 25% and the Jewish has the lowest percentage, 15. For the income level “$50k-$99,999”, the Orthodox Christian has the highest percentage, 36% and the Historically Black Protestant has the lowest percentage, 17%. For the income level “Over $100k”, the Jewish has the highest percentage, 44% and the Jehovah’s Witness has the lowest percentage, 4%.
From the figure 2, we can see that the income level over $100k and the income level below $30k have huge variation of % income distribution. As we can see from the figure 1, this happens mainly because at the income level over $100k, the Jewish religion has much higher percentage compared to other. Same thing for income level below $30k that Historically Black Protestant and Jehovah’s Witness religions have much higher percentage compared other.