Setup

Load packages

library(dplyr)
library(ggplot2)
library(scales)
library(knitr)
library(RColorBrewer)

Load data

load("brfss2013.RData")

Creating the Plot

First of all, I create da data frame for the plot removing the NA by the filter. After that, I group the data by “income2” and “sex” variables. So, the 100% will be created inside of each category from “income2” variable. The next step was to count the number of each varible of interest separate by goup, because I want to fill the plot with the marital status, facet the graphic by gender and display by income group.

# Creating the data for the plot1
plot1_data <- brfss2013 %>%                                   # the data frame to be created
  filter(!is.na(marital), !is.na(income2), !is.na(sex)) %>%   # filtering: remove the NA values
  group_by(income2, sex) %>%                                  # grouping by thr variable of interest
  arrange(marital) %>%                                        # arranging/sorting by marital
  count(sex, marital) %>%                                     # counting each categorical variable
  mutate(percent = n/sum(n))                                  # creating the "%" variable and values

plot1_data
## # A tibble: 96 x 5
## # Groups:   income2, sex [16]
##    income2           sex    marital                             n percent
##    <fct>             <fct>  <fct>                           <int>   <dbl>
##  1 Less than $10,000 Male   Married                          1436  0.174 
##  2 Less than $10,000 Male   Divorced                         2038  0.248 
##  3 Less than $10,000 Male   Widowed                           537  0.0652
##  4 Less than $10,000 Male   Separated                         409  0.0497
##  5 Less than $10,000 Male   Never married                    3473  0.422 
##  6 Less than $10,000 Male   A member of an unmarried couple   341  0.0414
##  7 Less than $10,000 Female Married                          2279  0.134 
##  8 Less than $10,000 Female Divorced                         5007  0.294 
##  9 Less than $10,000 Female Widowed                          3296  0.193 
## 10 Less than $10,000 Female Separated                        1352  0.0793
## # ... with 86 more rows

Because of the percent variable created, I use geom_colum() instead of geom_bar, because I’m not counting the categorical variables and their distribution, but using the percent (a numerical variable). It is important do not confound the percent variable with the percent funtion. The percent funtion inside geom_text it was used to convert the values from percent variable into a percent format.

# creating the plot1
plot1_marital_income <- 
  plot1_data %>%
  ggplot(data = ., mapping = aes(x = income2, y = percent, fill = marital)) +
  geom_col() +
  geom_text(mapping = aes(label = percent(percent)),              # converting the values to percent
            size = 3,                                             # size of the font
            position = position_stack(vjust = 0.5)) +             # positioning in the middle
  scale_fill_brewer(palette = "Set3") +                           # coloring the plot
  facet_grid(.~sex) +
  labs(x = "Income",                                              # labelling x axis
         y = "Percentage",                                        # labeling y axis
         title = "Percentage of Marital Status by Income",        # title
         fill = "Marital Status") +                               # legend
  scale_y_continuous(labels = scales::percent_format()) +         # changing the y axis nber format
  theme(
    axis.text.x = element_text(angle = 90,                        # rotating the x axis text
                               vjust = 0.5),                      # adjusting the position
    axis.title.x = element_text(face = "bold"),                   # face the x axit title/label
    axis.title.y = element_text(face = "bold"),                   # face the y axis title/label
    plot.title = element_text(hjust = 0.5),                       # positioning the plot title
    legend.title = element_text(face = "bold")                    # face the legend title
  )

plot1_marital_income