In this code through, I will show you a couple of ways you can use the IRS’ Pub 78 data set to create barplot charts.
Specifically, I will demonstrate how to use the filter() and group_by() functions to narrow the data set. I will also demonstrate how to create new data frames based on filtered values. These new data frames will be used to create barplots with ggplot().
sessionInfo()## R version 4.1.0 (2021-05-18)
## Platform: x86_64-apple-darwin17.0 (64-bit)
## Running under: macOS Catalina 10.15.7
##
## Matrix products: default
## BLAS: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRblas.dylib
## LAPACK: /Library/Frameworks/R.framework/Versions/4.1/Resources/lib/libRlapack.dylib
##
## locale:
## [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## loaded via a namespace (and not attached):
## [1] digest_0.6.27 R6_2.5.0 jsonlite_1.7.2 magrittr_2.0.1
## [5] evaluate_0.14 rlang_0.4.11 stringi_1.6.2 jquerylib_0.1.4
## [9] bslib_0.2.5.1 rmarkdown_2.9 tools_4.1.0 stringr_1.4.0
## [13] xfun_0.24 yaml_2.2.1 compiler_4.1.0 htmltools_0.5.1.1
## [17] knitr_1.33 sass_0.4.0
The International Revenue Service (IRS) is a U.S. federal agency responsible for collecting taxes and enforcing tax laws. The Pub 78 data set has information regarding an organization’s ability to receive tax-deductible charitable contributions. This data set can be found here: (https://apps.irs.gov/pub/epostcard/data-download-pub78.zip). It contains 949,651 observations of 6 variables.
This data set is valuable because it allows people to analyze nonprofit organizations at the national level, the state level, and the city level using variables like deductibility type.
The default file name is “data-download-pub78.txt”. It will be stored on your local device.
You can use file.choose() in place of file = to interactively select your file.
dat <- read.table(file = "data-download-pub78.txt",
sep = '|') # Field Separator head(dat) #Preview dataI used the fields from the IRS (2021) Tax Exempt Organization Search to determine new column names.
To change column names, use colnames().
colnames(dat) <- c("ein", "org.name", "city",
"state", "country", "deductibility.code")head(dat) # With Updated Column NamesThe following packages are required to replicate this code-through:
library(dplyr) # Data Manipulation
library(viridis) # Colorblind-Friendly Colors
library(ggplot2) # Data Visualization
library(pander)Verify installed packages: Use installed.packages() to check your previously installed packages.
Install missing packages: Use install.packages() to install them.
Remember to load the required libraries each time you restart R.
Package Reference Manuals:
dplyr (Wickham, François, Henry & Müller, 2021)
viridis(Garnier et.al, 2021)
ggplot2 (Wickham, 2021).
pander (Daróczi & Tsegelskyi, 2021).
Other Helpful Resources:
The R Graph Gallery’s barplot section is helpful when creating barplots.
To examine data only from the United States, use filter().
dat %>%
filter(country == "United States")dat %>%
filter(country == "United States") %>%
group_by(state) %>%
count() %>%
arrange(desc(n))dat %>%
filter(country == "United States") %>%
group_by(city) %>%
count() %>%
arrange(desc(n))dat %>%
filter(country == "United States") %>%
group_by(deductibility.code) %>%
count() %>%
arrange(desc(n))I’m going to focus on creating barplots related to AZ nonprofits. To examine data only from Arizona, use filter(state == "AZ").
dat %>%
filter(state == "AZ")After filtering for the state variable, you can group by other variables in the data set.
You can use group_by(city) after filtering to gather information related to the city variable.
dat %>%
filter(state == "AZ") %>%
group_by(city) %>%
count() %>%
arrange(desc(n))I created a new data frame using the values from the top 10 results when grouping by city to serve as the basis for my barplot chart.
# Making New Data Frame
az_top10_cities <- data.frame(
city = c("Phoenix", "Tucson", "Scottsdale", "Mesa", "Chandler",
"Tempe", "Gilbert", "Glendale", "Flagstaff", "Peoria"),
number = c(3789, 2347, 1339, 908, 627,
579, 540, 487, 402, 384)
)
# Creating Barplot
b <- ggplot(az_top10_cities, aes(x = city, y = number, fill = city)) +
geom_bar(stat = "identity") +
scale_fill_viridis_d(option = "inferno") +
theme(legend.position = "none")
# Adding Labels to Barplot
b + labs(title = "The 10 AZ Cities with the Most Nonprofits",
x = "City", y = "Number of Nonprofits",
caption = "Source: IRS Pub 78 Data")You can use group_by(deductibility.code) after filtering to gather information related to the deductibility variable.
dat %>%
filter(state == "AZ") %>%
group_by(deductibility.code) %>%
count() %>%
arrange(desc(n))I created a new data frame using the values from all results when grouping by deductibility to serve as the basis for my barplot chart.
# Making New Data Frame
az_cities_by_ded <- data.frame(
deductibility.code = c("PC", "PF", "SOUNK", "POF", "SO",
"EO", "GROUP", "EO,LODGE", "UNKWN",
"EO,GROUP,LODGE", "EO,GROUP"),
number = c(15206, 1390, 184, 119, 72,
70, 35, 14, 10, 3, 2)
)
# Creating Barplot
h.b <- ggplot(az_cities_by_ded, aes(x = deductibility.code, y = number, fill = deductibility.code)) +
geom_bar(stat = "identity") +
coord_flip() +
scale_fill_viridis_d(option = "inferno") +
theme(legend.position = "none")
# Adding Labels to Barplot
h.b + labs(title = "AZ Nonprofits by Deductibility",
x = "Deductibility Type", y = "Number of Nonprofits",
caption = "Source: IRS Pub 78 Data")
This code through cites the following sources:
*Daróczi G & Tsegelskyi R (2021). pander: An R ‘Pandoc’ Writer. https://cloud.r-project.org/web/packages/pander/pander.pdf.
*Garnier, Simon, Ross, Noam, Rudis, Robert, Camargo, Pedro A, Sciaini, Marco, Scherer, Cédric (2021). viridis - Colorblind-Friendly Color Maps for R. https://cloud.r-project.org/web/packages/viridis/viridis.pdf.
*IRS (2021). Pub 78 Data. https://apps.irs.gov/pub/epostcard/data-download-pub78.zip.
*IRS (2021). Tax Exempt Organization Search. https://apps.irs.gov/app/eos/charitySearch.
*IRS (2021). Tax Exempt Organization Search: Deductibility Status Codes. https://www.irs.gov/charities-non-profits/tax-exempt-organization-search-deductibility-status-codes.
*R Graph Gallery. Barplot. https://www.r-graph-gallery.com/barplot.html.
*Wickham H, François R, Henry L, Müller K (2021). dplyr: A Grammar of Data Manipulation. https://cloud.r-project.org/web/packages/dplyr/dplyr.pdf.
*Wickham H (2016). ggplot2: Elegant Graphics for Data Analysis. https://cloud.r-project.org/web/packages/ggplot2/ggplot2.pdf.