Pie charts - work well when the goal is to emphasize simple fractions, such as one-half, one-third, or one-quarter - work well when we have very small datasets - are not effective for comparing proportions
Stacked bars
Side-by-side bar charts - to directly compare the individual fractions to each other.
library(tidyverse)
## -- Attaching packages --------------------------------------- tidyverse 1.3.0 --
## v ggplot2 3.3.0 v purrr 0.3.3
## v tibble 2.1.3 v dplyr 0.8.4
## v tidyr 1.0.2 v stringr 1.4.0
## v readr 1.3.1 v forcats 0.4.0
## Warning: package 'ggplot2' was built under R version 3.6.3
## -- Conflicts ------------------------------------------ tidyverse_conflicts() --
## x dplyr::filter() masks stats::filter()
## x dplyr::lag() masks stats::lag()
library(ggplot2)
library(ggpol)
library(plyr)
## ------------------------------------------------------------------------------
## You have loaded plyr after dplyr - this is likely to cause problems.
## If you need functions from both plyr and dplyr, please load plyr first, then dplyr:
## library(plyr); library(dplyr)
## ------------------------------------------------------------------------------
##
## Attaching package: 'plyr'
## The following objects are masked from 'package:dplyr':
##
## arrange, count, desc, failwith, id, mutate, rename, summarise,
## summarize
## The following object is masked from 'package:purrr':
##
## compact
BRFSS <- read.csv(file="C:/Users/user/Documents/Data Science/School Programs/Data analytic, Big Datam and Predictive Analytics/Capstone Project/Submission/R codes/BRFSS2018-V4.csv", header = TRUE, sep = ",")
Stacked bars are used when we have only two bars in each stack
Example: to examine the difference in earning between male and female (data source: BRFSS2018)
Here is how the aggregated data look like
df_GenderIncome <- BRFSS %>%
group_by(GENDER, INCOME) %>%
dplyr::summarise(Count = n()) %>%
ddply(.(INCOME),transform, Percentage = Count*100/sum(Count)) %>%
mutate(pos = cumsum(Percentage) - (0.5 * Percentage))
df_GenderIncome
## GENDER INCOME Count Percentage pos
## 1 Female $20K-<$35K 29608 57.43996 28.71998
## 2 Male $20K-<$35K 21938 42.56004 78.71998
## 3 Female $35K-<$50K 20421 52.79882 126.39941
## 4 Male $35K-<$50K 18256 47.20118 176.39941
## 5 Female $50K-<$75K 23800 50.50184 225.25092
## 6 Male $50K-<$75K 23327 49.49816 275.25092
## 7 Female $75K+ 47630 46.60561 323.30280
## 8 Male $75K+ 54568 53.39439 373.30280
## 9 Female <$20K 24445 61.07585 430.53793
## 10 Male <$20K 15579 38.92415 480.53793
ggplot(data = df_GenderIncome)+
geom_bar(aes(x = INCOME,
y = Percentage,
fill = GENDER),
stat = "identity") +
labs(y = "Relative proportion (%)",
x = "Salary Bracket") +
scale_fill_manual(values = c("#A9A9A9", "#FF8C00")) +
scale_y_continuous(breaks = seq(0, 100, 25)) +
geom_hline(yintercept = 50, linetype = "dashed", color = "blue")+
theme(axis.text = element_text(size = 12, color = "black"),
axis.title = element_text(size = 12, face = "bold"),
plot.caption = element_text(color = "grey", size = 12, face = "italic"),
legend.text = element_text(colour="black", size = 12),
legend.position="right")