Original


Data Visualization Online @ WPI 2018


Objective Data on the numbers of persons belonging to various religions are viewed as horizontal stacked plot in the 20 most populous nations. These religions include Christian, Hindu, Islamic, Buddhist, Jewish, folk, and non-affiliated religions. In growing order of their complete population, the list of nations is listed below.the China is the most populous nation and the least populated nation in “DR congo.” The graph shows population and religion in each nation sub-population. The main objective is to demonstrate the difference in proportionality between religion and people in each nation.

#Issues Identified

Some regions in the graph allow certain changes to be made to make the perception of information simpler and to make it easier for those who are shared to benefit from blindness. Since several combinations of the two categorical variables can be made, for all types of blindness the graph is very difficult to structure so that we have worked on the common blindness shape (red and green)

Following two are the suggested improvements 1. Representation : Although the graph compares the proportionality between religion and population in the various nations and gives no great amount of information as it becomes hard when comparing one country’s religious community with another nation, we have therefore built up a mosaic chart with boundaries along the corners, as well as axes that demonstrate the percentage of population and religion in each nation.

  1. Title and naming the Axes : The title and the axes are important for easy understanding of the interpreted chart, it was missing so was added.

Reference

https://medium.com/@currankelleher/data-visualization-online-wpi-2018-f662bf32908d

Code

The following code was used to fix the issues identified in the original.

install.packages("ggplot2")
## 
## The downloaded binary packages are in
##  /var/folders/_g/hp2r1dx123l5zcyp6wn3zv100000gn/T//RtmpanRUW2/downloaded_packages
install.packages("colourpicker")
## 
## The downloaded binary packages are in
##  /var/folders/_g/hp2r1dx123l5zcyp6wn3zv100000gn/T//RtmpanRUW2/downloaded_packages
install.packages("devtools")
## 
## The downloaded binary packages are in
##  /var/folders/_g/hp2r1dx123l5zcyp6wn3zv100000gn/T//RtmpanRUW2/downloaded_packages
devtools::install_github("wilkelab/cowplot")
devtools::install_github("clauswilke/colorblindr")
install.packages("colorspace", repos = "http://cran.us.r-project.org")
## 
## The downloaded binary packages are in
##  /var/folders/_g/hp2r1dx123l5zcyp6wn3zv100000gn/T//RtmpanRUW2/downloaded_packages
install.packages("colorblindr")

library(readr)
library(dplyr)
library(tidyr)
library(ggplot2)
library(reshape2)
library(colourpicker)
library(ggmosaic)
library(data.table)
library(devtools)
library(colorblindr)
library(usethis)

Data Reference

https://medium.com/@currankelleher/data-visualization-online-wpi-2018-f662bf32908d

## # A tibble: 6 x 3
##   country religion       population
##   <chr>   <chr>               <dbl>
## 1 China   Christian        68410000
## 2 China   Muslim           24690000
## 3 China   Unaffiliated    700680000
## 4 China   Hindu               20000
## 5 China   Buddhist        244130000
## 6 China   Folk Religions  294320000
## Classes 'spec_tbl_df', 'tbl_df', 'tbl' and 'data.frame': 160 obs. of  3 variables:
##  $ country   : chr  "China" "China" "China" "China" ...
##  $ religion  : chr  "Christian" "Muslim" "Unaffiliated" "Hindu" ...
##  $ population: num  6.84e+07 2.47e+07 7.01e+08 2.00e+04 2.44e+08 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   country = col_character(),
##   ..   religion = col_character(),
##   ..   population = col_double()
##   .. )
#convert population to millions

ord <- religionByCountryTop20 %>% group_by(country) %>%summarise(sum = sum(population))

ord$country<- ord$country %>%
  factor(levels = ord$country[order(ord$sum)],ordered=TRUE)

str(ord)
## Classes 'tbl_df', 'tbl' and 'data.frame':    20 obs. of  2 variables:
##  $ country: Ord.factor w/ 20 levels "DR Congo"<"Thailand"<..: 13 16 20 1 5 7 6 19 17 4 ...
##  $ sum    : num  1.49e+08 1.95e+08 1.34e+09 6.60e+07 8.11e+07 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   country = col_character(),
##   ..   religion = col_character(),
##   ..   population = col_double()
##   .. )
# convert population to millions

religionByCountryTop20$population <- religionByCountryTop20$population/1e6
#convert country and religion to a factor as it they are categorial variables

religionByCountryTop20$country <- as.factor(religionByCountryTop20$country)

religionByCountryTop20$religion <- as.factor(religionByCountryTop20$religion)

str(religionByCountryTop20)
## Classes 'spec_tbl_df', 'tbl_df', 'tbl' and 'data.frame': 160 obs. of  3 variables:
##  $ country   : Factor w/ 20 levels "Bangladesh","Brazil",..: 3 3 3 3 3 3 3 3 8 8 ...
##  $ religion  : Factor w/ 8 levels "Buddhist","Christian",..: 2 6 8 4 1 3 7 5 2 6 ...
##  $ population: num  68.41 24.69 700.68 0.02 244.13 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   country = col_character(),
##   ..   religion = col_character(),
##   ..   population = col_double()
##   .. )
# to arrange countries in increasing order of their population

order <- religionByCountryTop20 %>% group_by(country) %>%summarise(sum = sum(population))

order$country<- order$country %>%
  factor(levels = order$country[order(order$sum)],ordered=TRUE)

str(order)
## Classes 'tbl_df', 'tbl' and 'data.frame':    20 obs. of  2 variables:
##  $ country: Ord.factor w/ 20 levels "DR Congo"<"Thailand"<..: 13 16 20 1 5 7 6 19 17 4 ...
##  $ sum    : num  148.7 195 1341.3 66 81.1 ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   country = col_character(),
##   ..   religion = col_character(),
##   ..   population = col_double()
##   .. )
# sort country in terms of population
religionByCountryTop20$country <- religionByCountryTop20$country %>%
  factor(levels = sort(order$country) ,ordered=TRUE)
# creating the original visualization in R

p1 <- ggplot(data = religionByCountryTop20, aes(x = reorder(country,-population), y = population, fill = religion))

p2 <- p1 + geom_bar(stat = "identity",colour="grey")+ coord_flip()
#create a table for country vs religion values religionByCountryTop20Final

table <- acast(religionByCountryTop20, country~religion, value.var='population' )

attributes(table)
## $dim
## [1] 20  8
## 
## $dimnames
## $dimnames[[1]]
##  [1] "DR Congo"      "Thailand"      "Turkey"        "Iran"         
##  [5] "Egypt"         "Germany"       "Ethiopia"      "Vietnam"      
##  [9] "Philippines"   "Mexico"        "Japan"         "Russia"       
## [13] "Bangladesh"    "Nigeria"       "Pakistan"      "Brazil"       
## [17] "Indonesia"     "United States" "India"         "China"        
## 
## $dimnames[[2]]
## [1] "Buddhist"        "Christian"       "Folk Religions"  "Hindu"          
## [5] "Jewish"          "Muslim"          "Other Religions" "Unaffiliated"
margin.table(table,1) #Row marginals
##      DR Congo      Thailand        Turkey          Iran         Egypt 
##         65.97         69.11         72.74         73.96         81.11 
##       Germany      Ethiopia       Vietnam   Philippines        Mexico 
##         82.31         82.93         87.85         93.25        113.40 
##         Japan        Russia    Bangladesh       Nigeria      Pakistan 
##        126.54        142.96        148.69        158.42        173.58 
##        Brazil     Indonesia United States         India         China 
##        194.95        239.88        310.39       1224.60       1341.33
margin.table(table,2) #Column marginals
##        Buddhist       Christian  Folk Religions           Hindu 
##          384.79         1105.84          354.67          996.72 
##          Jewish          Muslim Other Religions    Unaffiliated 
##            6.36         1070.93           46.21          918.45
tab <- prop.table(table, 1)
#to get propotion and population in same data frame
t1 <-as.data.frame(tab)

col_names <-t1 %>% select (Buddhist:Unaffiliated) %>% colnames()




setDT(t1, keep.rownames = TRUE)[]
##                rn     Buddhist   Christian Folk Religions        Hindu
##  1:      DR Congo 0.000000e+00 0.958162801   0.0074276186 4.547522e-04
##  2:      Thailand 9.321372e-01 0.008681812   0.0008681812 1.012878e-03
##  3:        Turkey 5.499038e-04 0.004399230   0.0002749519 0.000000e+00
##  4:          Iran 0.000000e+00 0.001487290   0.0000000000 2.704164e-04
##  5:         Egypt 0.000000e+00 0.050795216   0.0000000000 0.000000e+00
##  6:       Germany 2.551330e-03 0.686915320   0.0004859677 9.719354e-04
##  7:      Ethiopia 0.000000e+00 0.627878934   0.0256843121 0.000000e+00
##  8:       Vietnam 1.636881e-01 0.081616392   0.4524758110 0.000000e+00
##  9:   Philippines 8.579088e-04 0.926219839   0.0153351206 0.000000e+00
## 10:        Mexico 0.000000e+00 0.951587302   0.0006172840 0.000000e+00
## 11:         Japan 3.620989e-01 0.016042358   0.0035561878 2.370792e-04
## 12:        Russia 1.189144e-03 0.732722440   0.0021684387 2.098489e-04
## 13:    Bangladesh 4.842289e-03 0.001883113   0.0034972090 9.092743e-02
## 14:       Nigeria 6.312334e-05 0.492677692   0.0144552455 0.000000e+00
## 15:      Pakistan 1.152206e-04 0.015842839   0.0001728310 1.918424e-02
## 16:        Brazil 1.282380e-03 0.888945884   0.0284175430 0.000000e+00
## 17:     Indonesia 7.170252e-03 0.098632650   0.0031265633 1.688344e-02
## 18: United States 1.150166e-02 0.783079352   0.0020297046 5.766938e-03
## 19:         India 7.553487e-03 0.025420545   0.0047689041 7.951576e-01
## 20:         China 1.820059e-01 0.051001618   0.2194240045 1.491057e-05
##           Jewish       Muslim Other Religions Unaffiliated
##  1: 0.000000e+00 0.0147036532    0.0015158405 0.0177353342
##  2: 0.000000e+00 0.0545507162    0.0000000000 0.0027492403
##  3: 2.749519e-04 0.9806158922    0.0020621391 0.0118229310
##  4: 0.000000e+00 0.9947268794    0.0020281233 0.0014872904
##  5: 0.000000e+00 0.9492047836    0.0000000000 0.0000000000
##  6: 2.794314e-03 0.0578301543    0.0012149192 0.2472360588
##  7: 0.000000e+00 0.3458338358    0.0000000000 0.0006029181
##  8: 0.000000e+00 0.0018212863    0.0039840637 0.2964143426
##  9: 0.000000e+00 0.0552278820    0.0013941019 0.0009651475
## 10: 6.172840e-04 0.0000000000    0.0001763668 0.0470017637
## 11: 0.000000e+00 0.0015805279    0.0465465465 0.5699383594
## 12: 1.608842e-03 0.0999580302    0.0000000000 0.1621432569
## 13: 0.000000e+00 0.8981101621    0.0002017621 0.0005380321
## 14: 0.000000e+00 0.4879434415    0.0005681101 0.0042923873
## 15: 0.000000e+00 0.9644544302    0.0001152206 0.0001152206
## 16: 5.642472e-04 0.0002051808    0.0015388561 0.0790459092
## 17: 0.000000e+00 0.8717692179    0.0014173754 0.0010005003
## 18: 1.833178e-02 0.0089242566    0.0061213312 0.1642449821
## 19: 8.165932e-06 0.1438755512    0.0225053079 0.0007104361
## 20: 0.000000e+00 0.0184071034    0.0067694005 0.5223770437
ta<-t1 %>% gather(col_names, key = "religion", value = "propotion")


colnames(ta) <- c("country","religion","Proportion")


religionByCountryTop20Final <- merge(ta,religionByCountryTop20,by = c("country","religion"))


labs<-round(prop.table(tab,1),2)
labs
##               Buddhist Christian Folk Religions Hindu Jewish Muslim
## DR Congo          0.00      0.96           0.01  0.00   0.00   0.01
## Thailand          0.93      0.01           0.00  0.00   0.00   0.05
## Turkey            0.00      0.00           0.00  0.00   0.00   0.98
## Iran              0.00      0.00           0.00  0.00   0.00   0.99
## Egypt             0.00      0.05           0.00  0.00   0.00   0.95
## Germany           0.00      0.69           0.00  0.00   0.00   0.06
## Ethiopia          0.00      0.63           0.03  0.00   0.00   0.35
## Vietnam           0.16      0.08           0.45  0.00   0.00   0.00
## Philippines       0.00      0.93           0.02  0.00   0.00   0.06
## Mexico            0.00      0.95           0.00  0.00   0.00   0.00
## Japan             0.36      0.02           0.00  0.00   0.00   0.00
## Russia            0.00      0.73           0.00  0.00   0.00   0.10
## Bangladesh        0.00      0.00           0.00  0.09   0.00   0.90
## Nigeria           0.00      0.49           0.01  0.00   0.00   0.49
## Pakistan          0.00      0.02           0.00  0.02   0.00   0.96
## Brazil            0.00      0.89           0.03  0.00   0.00   0.00
## Indonesia         0.01      0.10           0.00  0.02   0.00   0.87
## United States     0.01      0.78           0.00  0.01   0.02   0.01
## India             0.01      0.03           0.00  0.80   0.00   0.14
## China             0.18      0.05           0.22  0.00   0.00   0.02
##               Other Religions Unaffiliated
## DR Congo                 0.00         0.02
## Thailand                 0.00         0.00
## Turkey                   0.00         0.01
## Iran                     0.00         0.00
## Egypt                    0.00         0.00
## Germany                  0.00         0.25
## Ethiopia                 0.00         0.00
## Vietnam                  0.00         0.30
## Philippines              0.00         0.00
## Mexico                   0.00         0.05
## Japan                    0.05         0.57
## Russia                   0.00         0.16
## Bangladesh               0.00         0.00
## Nigeria                  0.00         0.00
## Pakistan                 0.00         0.00
## Brazil                   0.00         0.08
## Indonesia                0.00         0.00
## United States            0.01         0.16
## India                    0.02         0.00
## China                    0.01         0.52
religionByCountryTop20Final$country <- religionByCountryTop20Final$country %>%
  factor(levels = sort(order$country) ,ordered=TRUE)

ord$prop <- ord$sum / sum(ord$sum)


religionByCountryTop20Final <- merge(ord,religionByCountryTop20Final,by = c("country"))

Reconstruction

The following plot solves the major issues in the original plot.

here we use the ggplot2 to visualize the region by country. It is to visualize the religions of top 20 countries which uses different styles.

#create mosaic plot on population propotion

p4 <- ggplot(religionByCountryTop20Final)

p5 <- p4 + geom_mosaic(aes(x = product(country), weight = c(population), fill = religion))

p6 <- p5 + theme(axis.text.x=element_blank() + geom_text(data = religionByCountryTop20Final, aes(x = 0, y = 1,label=round(1,2)),inherit.aes = FALSE))
#Adding propotion limits for religion propotion

p7<- p6 +geom_text(data = religionByCountryTop20Final, aes(x = 0, y = 0.9,label=round(0.9,2)), inherit.aes = FALSE)+geom_text(data = religionByCountryTop20Final, aes(x = 0, y = 0.8,label=round(0.8,2)), inherit.aes = FALSE)+geom_text(data = religionByCountryTop20Final, aes(x = 0, y = 0.7,label=round(0.7,2)), inherit.aes = FALSE)+geom_text(data = religionByCountryTop20Final, aes(x = 0, y = 0.6,label=round(0.6,2)), inherit.aes = FALSE)+geom_text(data = religionByCountryTop20Final, aes(x = 0, y = 0.5,label=round(0.5,2)), inherit.aes = FALSE)

p8 <- p7  +geom_text(data = religionByCountryTop20Final, aes(x = 0, y = 0.4,label=round(0.4,2)), inherit.aes = FALSE)+geom_text(data = religionByCountryTop20Final, aes(x = 0, y = 0.3,label=round(0.3,2)), inherit.aes = FALSE)+geom_text(data = religionByCountryTop20Final, aes(x = 0, y = 0.2,label=round(0.2,2)), inherit.aes = FALSE)+geom_text(data = religionByCountryTop20Final, aes(x = 0, y = 0.1,label=round(0.1,2)), inherit.aes = FALSE)
#Adding propotion limits for country population

p9 <-p8 +geom_text(data = religionByCountryTop20Final, aes(x = 0.9, y = 1,label=round(0.1,2)), inherit.aes = FALSE)+geom_text(data = religionByCountryTop20Final, aes(x = 0.8, y = 1,label=round(0.2,2)), inherit.aes = FALSE)+geom_text(data = religionByCountryTop20Final, aes(x = 0.7, y = 1,label=round(0.3,2)), inherit.aes = FALSE)+geom_text(data = religionByCountryTop20Final, aes(x = 0.6, y = 1,label=round(0.4,2)), inherit.aes = FALSE)

p10 <- p9 +geom_text(data = religionByCountryTop20Final, aes(x = 0.5, y = 1,label=round(0.5,2)), inherit.aes = FALSE) +geom_text(data = religionByCountryTop20Final, aes(x = 0.4, y = 1,label=round(0.6,2)), inherit.aes = FALSE) +geom_text(data = religionByCountryTop20Final, aes(x = 0.3, y = 1,label=round(0.7,2)), inherit.aes = FALSE)

p11 <-p10 +geom_text(data = religionByCountryTop20Final, aes(x = 0.2, y = 1,label=round(0.8,2)), inherit.aes = FALSE)+geom_text(data = religionByCountryTop20Final, aes(x = 0.1, y = 1,label=round(0.9,2)), inherit.aes = FALSE)
# (II) Adding Proper Title and Naming the Axes

p12 <- p11 +labs(title = "Religions of Largest 20 Countries Population Proportion",y = "Religion",x = "Country")
p12 <- p12+coord_flip()
p12 <- p12+theme(axis.text.x=element_blank(),axis.ticks.x=element_blank())
# (III) color Adjustment

plotHelper(p13 <- p12 +scale_fill_manual(values=c( "#CDAD00","#1874CD" ,"#EE4000", "#FFC125", "#FFAAAA", "#FF8247", "#5E5E5E", "#FF1493")))
## [1] "#1f78b4"
p13

#Color Saturation

We update the hue and saturation such that it is easy for even people suffering from protanalomy (Red-green color blindness) which is the most common form of color-blindness to distinguish between the factors.It can be seen that it would be difficult to distinguish some categories by people suffering from color blindness and makes difficult to perceive the intended knowledge. Tritanomoly is the plot to visualise it after adjustments which is very clear.

p14<-cvd_grid(p13)

p14