Visualization Proportion-Stacked bars

Pie charts - work well when the goal is to emphasize simple fractions, such as one-half, one-third, or one-quarter - work well when we have very small datasets - are not effective for comparing proportions

Stacked bars

work for side-by-side comparisons of multiple conditions or in a time series

Side-by-side bar charts - to directly compare the individual fractions to each other.

library(tidyverse)

## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --

## v ggplot2 3.3.0     v purrr   0.3.3
## v tibble  2.1.3     v dplyr   0.8.4
## v tidyr   1.0.2     v stringr 1.4.0
## v readr   1.3.1     v forcats 0.4.0

## Warning: package 'ggplot2' was built under R version 3.6.3

## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag()    masks stats::lag()

library(ggplot2)
library(ggpol)
library(plyr)

## ------------------------------------------------------------------------------

## You have loaded plyr after dplyr - this is likely to cause problems.
## If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
## library(plyr); library(dplyr)

## ------------------------------------------------------------------------------

## 
## Attaching package: 'plyr'

## The following objects are masked from 'package:dplyr':
## 
##     arrange, count, desc, failwith, id, mutate, rename, summarise,
##     summarize

## The following object is masked from 'package:purrr':
## 
##     compact

BRFSS <- read.csv(file="C:/Users/user/Documents/Data Science/School Programs/Data analytic, Big Datam and Predictive Analytics/Capstone Project/Submission/R codes/BRFSS2018-V4.csv", header = TRUE, sep = ",")

Stacked bars are used when we have only two bars in each stack

Example: to examine the difference in earning between male and female (data source: BRFSS2018)

Here is how the aggregated data look like

df_GenderIncome <- BRFSS %>% 
  group_by(GENDER, INCOME) %>% 
  dplyr::summarise(Count =  n()) %>% 
  ddply(.(INCOME),transform, Percentage = Count*100/sum(Count)) %>% 
  mutate(pos = cumsum(Percentage) - (0.5 * Percentage))
                      
df_GenderIncome

##    GENDER     INCOME Count Percentage       pos
## 1  Female $20K-<$35K 29608   57.43996  28.71998
## 2    Male $20K-<$35K 21938   42.56004  78.71998
## 3  Female $35K-<$50K 20421   52.79882 126.39941
## 4    Male $35K-<$50K 18256   47.20118 176.39941
## 5  Female $50K-<$75K 23800   50.50184 225.25092
## 6    Male $50K-<$75K 23327   49.49816 275.25092
## 7  Female      $75K+ 47630   46.60561 323.30280
## 8    Male      $75K+ 54568   53.39439 373.30280
## 9  Female      <$20K 24445   61.07585 430.53793
## 10   Male      <$20K 15579   38.92415 480.53793

ggplot(data = df_GenderIncome)+
  geom_bar(aes(x = INCOME, 
               y = Percentage,
               fill = GENDER),
               stat = "identity") +
   labs(y = "Relative proportion (%)", 
        x = "Salary Bracket") +
  scale_fill_manual(values = c("#A9A9A9", "#FF8C00")) +
  scale_y_continuous(breaks = seq(0, 100, 25)) +
  geom_hline(yintercept = 50,  linetype = "dashed", color = "blue")+
  theme(axis.text = element_text(size = 12, color = "black"),
                           axis.title = element_text(size = 12, face = "bold"),
                           plot.caption = element_text(color = "grey", size = 12, face = "italic"),
                           legend.text = element_text(colour="black", size = 12),
                           legend.position="right")

Visualization Proportion-Stacked bars

Minh Trung DANG

11/01/2021