The dataset I used for these Data Visualizations is the Baltimore FY2022 Budget. This dataset represents the 2022 fiscal year budget of Baltimore City and contains detailed financial information about the planned funds and their planned allocation to different parts of the city, allowing for analysis of the priorities of the city.
The variables I used from this dataset are the following:
For this assignment, these are the libraries used as well as preliminary code for reading in the dataset:
# Libraries:
library(plyr)
library(dplyr)
library(ggplot2)
library(scales)
library(RColorBrewer)
library(ggthemes)
library(data.table)
library(plotly)
# "True" dataframe with all columns
dfTrue <- read.csv("~/RProjects/Data Visualization /Sources/Baltimore_City_Fiscal_2022_Budget_Data.csv")
cat("Total number of budget line items:", nrow(dfTrue), "\n")
## Total number of budget line items: 20147
cat("Number of unique agencies:", length(unique(dfTrue$agency_name)), "\n") #Cat over print here as print only accepts one argument
## Number of unique agencies: 53
cat("Number of unique fund types:", length(unique(dfTrue$fund_name)), "\n")
## Number of unique fund types: 11
cat("Number of unique object types:", length(unique(dfTrue$object_name)), "\n\n")
## Number of unique object types: 10
print("FY22 Budget Summary")
## [1] "FY22 Budget Summary"
summary(dfTrue$fy22_budget)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## -19837221 1157 5280 190772 32104 677526135
#This is some of the preliminary code for visualizaitons that will be used.
dfBudget <- fread("~/RProjects/Data Visualization /Sources/Baltimore_City_Fiscal_2022_Budget_Data.csv",
select = c("service_name", "fy22_budget"))
budgetgrouped <- group_by(dfBudget, service_name)
budgetsum <- summarise(budgetgrouped, totalbudget = sum(fy22_budget))
budgetsumsorted <- arrange(budgetsum, desc(totalbudget))
agencytotal <- group_by(dfTrue, agency_name)
agencysum <- summarise(agencytotal, total = sum(fy22_budget))
agencysorted <- arrange(agencysum, desc(total))
top5 <- head(agencysorted$agency_name, 5)
p2 <- filter(dfTrue, agency_name %in% top5)
This bar chart shows the five city services with the lowest total budgets in Baltimore FY2022 budget. The x-axis shows the service name and the y-axis shows the total budget. Despite sharing the similarity of being the five services with the lowest total budgets, these 5 services seemingly share no correlation. The Workers’ Compenstion Practice is the lowest; receiving less than $75,000.
ggplot(tail(budgetsumsorted, 5), aes(x = reorder(service_name, -totalbudget), y = totalbudget)) +
geom_bar(colour="black", fill="red", stat="identity") +
labs(title = "The 5 Services with the Lowest Budget in Baltimore (2022)", x = "Service", y = "Budget") +
theme(plot.title = element_text(hjust = 0.5)) +
scale_y_continuous(labels=comma)
This stacked bar chat shows the top 5 agencies based on total budget with each bar representing the different fund types. The x-axis is the total budget while the y-axis are the different agency names. This chart allows for the comparison of the total budget as well as the contribution of each fund type for each agency. Public Works has the most diverse buget by fund and has the highest total budget amount. M-R: American Rescue Plan Act is the second highest but all of the budget comes from Federal. M-R: Baltimore City Public Schools receives the least total amount of budget and all of their budget comes from General.
stackbarchart <- ggplot(p2, aes(x = agency_name, y = fy22_budget, fill = fund_name)) +
geom_col() +
coord_flip() +
scale_y_continuous(labels = comma) +
labs(title = "Top 5 Agency Budgets by Fund Source in Baltimore (2022)", x = "Agency Name", y = "Total Budget", fill = "Fund Type") +
theme(plot.title = element_text(hjust = 0.5))
stackbarchart
This pie chart breaks down the FY2022 budget by the subobject category. Five specific subobjects are labeled with the rest following under the cateogry “Other”. The specified subobjects are the five highest in terms of total budget. For readability, the percentages of each slice on the pie chart is only displayed if they are over 5% of the total budget. Permanent Full-Time receives 13.8%, which is a massive amount for an individual subobject.
object_df <- dfTrue %>%
select(subobject_name, fy22_budget) %>%
mutate(myObject = ifelse(subobject_name=="FICA - Medicare Only", "FICA - Medicare Only",
ifelse(subobject_name=="Workers' Comp - Direct", "Workers' Comp - Direct",
ifelse(subobject_name=="Social Security-City Share (Regular)", "Social Security-City Share (Regular)",
ifelse(subobject_name=="Survivor Benefits", "Survivor Benefits",
ifelse(subobject_name=="Permanent Full-time", "Permanent Full-time", "Other")))))) %>%
group_by(myObject) %>%
summarise(n = length(myObject), total_budget = sum(fy22_budget), .groups = 'drop') %>%
mutate(percent = round(100 * n / sum(n), 1))
ggplot(object_df, aes(x = "", y = total_budget, fill = myObject)) +
geom_col(width = 1) +
coord_polar(theta = "y") +
geom_text(aes(label = ifelse(total_budget / sum(total_budget) > 0.05,
paste0(round(100 * total_budget / sum(total_budget), 1), "%"), "")),
position = position_stack(vjust = 0.5)) +
labs(title = "FY2022 Budget by Subobject Category", fill = "Category") +
theme_void() + #Void went better with piechart compared to light
theme(plot.title = element_text(hjust = 0.5))
This heatmap shows the budget allocation based on the top five agencies by fund type. The more red a cell is indicates higher budget amounts while Empty white cells indiciate that the agency did not receive funding from the corresponding fund type. The visualizatio shows that M-R American Rescue Plan Act is by far the darkest red cell with over $680,00,000; indicating it receives the most budget allocation.
heatmap_df <- fread("~/RProjects/Data Visualization /Sources/Baltimore_City_Fiscal_2022_Budget_Data.csv",
select = c("fund_name", "agency_name", "fy22_budget")) %>%
group_by(agency_name, fund_name) %>%
filter(agency_name %in% top5) %>%
summarise(total = sum(fy22_budget), .groups = 'drop')
p4 <- ggplot(heatmap_df, aes(x = fund_name, y = agency_name, fill = total)) +
geom_tile(color="black") +
geom_text(aes(label = comma(total))) +
coord_equal(ratio=1) +
labs(title="Heatmap of Top 5 Agencies based on Fund Type:",
x= "Fund Type",
y = "Agency",
fill = "Total Budget") +
theme_minimal() +
theme(plot.title = element_text(hjust=0.5)) +
scale_fill_gradient(low="white", high="red", labels = comma) +
guides(fill = guide_legend(reverse=TRUE, override.aes=list(colour="black")))
p4
This trellis chart uses facet_wrap to create a separate
panel for each of three major fund types — General, Federal, and State.
Within each panel, bars represent the total budget allocated to each
object type (e.g., Salaries, Contractual Services). The y-axis scales
independently for each panel using scales = "free_y",
allowing meaningful comparisons within each fund type despite their very
different budget magnitudes. This chart reveals how spending priorities
differ depending on the source of funding.
This treliis chart has three separate panels for three fund types: General, Federal, and State funding. The bars on this chart represent the total budget allocated for each object type and the y-axis scales independently on each panel. This visualization shows that State receives the least amount of budget allocation and that General receives the most. Grants, Subsidies, and Contributions from the Federal funding and Salaries from the general stand out as outliers with a large amount of budget allocation with both being over $700,000,000. The third closet is Grants, Subsidies, and Contributions from General with a little over #400,000,000.
# This is to make the dataframe only have the 3 funds and organize it for the dataframe; drop groups instead of keep
trellis_df <- fread("~/RProjects/Data Visualization /Sources/Baltimore_City_Fiscal_2022_Budget_Data.csv",
select = c("fund_name", "object_name", "fy22_budget")) %>%
group_by(fund_name, object_name) %>%
summarise(total = sum(fy22_budget), .groups = 'drop') %>%
filter(fund_name %in% c("General", "Federal", "State")) %>%
data.frame()
ggplot(trellis_df, aes(x = object_name, y = total, fill = fund_name)) +
geom_bar(stat="identity", position="dodge") +
theme_light() +
theme(plot.title = element_text(hjust = 0.5),
axis.text.x = element_text(angle = 45, hjust = 1, size = 7)) + #This code helped to make the x-axis #more readable as it was clunked together originally.
scale_y_continuous(labels = comma) +
labs(title = "Budget by Object Type per Fund (2022)",
x = "Object Type", y = "Total Budget", fill = "Fund Type") +
facet_wrap(~ fund_name, ncol = 4, nrow=2)
This report uses the Baltimore City FY2022 budget dataset and utilizes five data visualizations to understandd its contents. The bar charts showed that the five lowest funded services were not correlated and that the Worker’s Compensation Practice received less than 75,000 dollars. The stacked bar chart showed that Public Works has both the highest total budget as well as the most diverse funding sources and that three of the 5 agencies relied heavily on the general fund. The pie chart shows that Permanant Full-Time receives 13.8% of the budget; which is a massive amount for a specific subobject category. That being said, Other still dominatied with 83.3% of the total budget; indicating that the budget is spread out. The heatmap shows that M-R: American Rescue Plan received the highest agency fund allocation with over 680,000,000 dollars. And the trellis chart that the General Fund has the highest overall budget based on object type. This helps to understand where Baltimore’s priorities are in the 2022 fiscal year and can be compared to future or past datasets to understand whether Baltimore is changing or mainting spending habits.