library(dplyr)
library(ggplot2)
library(scales)
library(knitr)
library(RColorBrewer)load("brfss2013.RData")First of all, I create da data frame for the plot removing the NA by the filter. After that, I group the data by “income2” and “sex” variables. So, the 100% will be created inside of each category from “income2” variable. The next step was to count the number of each varible of interest separate by goup, because I want to fill the plot with the marital status, facet the graphic by gender and display by income group.
# Creating the data for the plot1
plot1_data <- brfss2013 %>% # the data frame to be created
filter(!is.na(marital), !is.na(income2), !is.na(sex)) %>% # filtering: remove the NA values
group_by(income2, sex) %>% # grouping by thr variable of interest
arrange(marital) %>% # arranging/sorting by marital
count(sex, marital) %>% # counting each categorical variable
mutate(percent = n/sum(n)) # creating the "%" variable and values
plot1_data## # A tibble: 96 x 5
## # Groups: income2, sex [16]
## income2 sex marital n percent
## <fct> <fct> <fct> <int> <dbl>
## 1 Less than $10,000 Male Married 1436 0.174
## 2 Less than $10,000 Male Divorced 2038 0.248
## 3 Less than $10,000 Male Widowed 537 0.0652
## 4 Less than $10,000 Male Separated 409 0.0497
## 5 Less than $10,000 Male Never married 3473 0.422
## 6 Less than $10,000 Male A member of an unmarried couple 341 0.0414
## 7 Less than $10,000 Female Married 2279 0.134
## 8 Less than $10,000 Female Divorced 5007 0.294
## 9 Less than $10,000 Female Widowed 3296 0.193
## 10 Less than $10,000 Female Separated 1352 0.0793
## # ... with 86 more rows
Because of the percent variable created, I use geom_colum() instead of geom_bar, because I’m not counting the categorical variables and their distribution, but using the percent (a numerical variable). It is important do not confound the percent variable with the percent funtion. The percent funtion inside geom_text it was used to convert the values from percent variable into a percent format.
# creating the plot1
plot1_marital_income <-
plot1_data %>%
ggplot(data = ., mapping = aes(x = income2, y = percent, fill = marital)) +
geom_col() +
geom_text(mapping = aes(label = percent(percent)), # converting the values to percent
size = 3, # size of the font
position = position_stack(vjust = 0.5)) + # positioning in the middle
scale_fill_brewer(palette = "Set3") + # coloring the plot
facet_grid(.~sex) +
labs(x = "Income", # labelling x axis
y = "Percentage", # labeling y axis
title = "Percentage of Marital Status by Income", # title
fill = "Marital Status") + # legend
scale_y_continuous(labels = scales::percent_format()) + # changing the y axis nber format
theme(
axis.text.x = element_text(angle = 90, # rotating the x axis text
vjust = 0.5), # adjusting the position
axis.title.x = element_text(face = "bold"), # face the x axit title/label
axis.title.y = element_text(face = "bold"), # face the y axis title/label
plot.title = element_text(hjust = 0.5), # positioning the plot title
legend.title = element_text(face = "bold") # face the legend title
)
plot1_marital_income