For this project I have been tasked with cleaning, wrangling, and ultimately visualizing data related to Cincinnati vendor payments. Within this project, we take a closer look at analyzing both the nature of vendors in Cincinnati, as well as their differences.
Columns & Variables
1.1M
14
58 (Check_No) check numbers
The data set is rather large, containing nearly 1.1M rows. The rows represent recorded payments made to vendors by the City of Cincinnati from 2014-2025.
## [1] "FISCAL_YEAR" "ACCT_PERIOD" "DEPT_CODE"
## [4] "DEPT_DESC" "FUND_CODE" "FUND_DESC"
## [7] "EXP_ACCT_CAT" "EXP_ACCT_CAT_DESC" "TRANS_ID"
## [10] "TRANS_LINE_NO" "RECORD_DATE" "CHECK_NO"
## [13] "AMOUNT" "VENDOR_NAME"
Below is a description of each variable in the
VendorPayments dataset:
Question:
How has vendor spending changed across departments over time, and which
departments consistently spend the most?
top_depts <- VendorPayments %>%
group_by(DEPT_DESC) %>%
summarise(Total_Spend = sum(AMOUNT, na.rm = TRUE)) %>%
slice_max(Total_Spend, n = 5) %>%
pull(DEPT_DESC)
VendorPayments %>%
filter(DEPT_DESC %in% top_depts) %>%
group_by(FISCAL_YEAR, DEPT_DESC) %>%
summarise(Annual_Spend = sum(AMOUNT, na.rm = TRUE), .groups = "drop") %>%
ggplot(aes(x = FISCAL_YEAR, y = Annual_Spend, color = DEPT_DESC)) +
geom_line(size = 1.2) +
scale_y_continuous(labels = label_dollar()) +
theme_economist() +
labs(
title = "Top 5 Departments by Vendor Spending Over Time",
x = "Fiscal Year",
y = "Total Spend",
color = "Department"
) +
theme(
legend.position = "bottom",
legend.title = element_text(size = 10),
legend.text = element_text(size = 8),
legend.box = "horizontal"
) +
guides(color = guide_legend(nrow = 2, byrow = TRUE))
The highest spending departments are the Department of Sewers and Finance, Risk Management. The Department Of Water Works, however, has been steadily climbing since 2016. From this chart, we can see that Finance & Risk Management has the most stable spending. Spending for The Southwest Ohio Regional Transit Authority (SORTA) was at a steady climb from 2016 through 2019, however in the three years following, their spending decreased drastically. The most likely reason for this decreased spending is budget cuts and reduced operations. In fact, this interpretation aligns with the budget cuts of Cincinnati’s transporation system during those years. In 2019, SORTA had to dip into reserves to fund their system, due to a severe budget deficit. In 2020, a new funding structure was introduced for Cincinnati’s Metro System.
Question:
Are certain months (Acct_Periods) consistently associated with spikes in
vendor spending?
VendorPayments %>%
group_by(FISCAL_YEAR, ACCT_PERIOD) %>%
summarise(Monthly_Spend = sum(AMOUNT, na.rm = TRUE), .groups = "drop") %>%
mutate(Month = factor(ACCT_PERIOD, levels = 1:12, labels = month.abb)) %>%
ggplot(aes(x = Month, y = Monthly_Spend, fill = as.factor(FISCAL_YEAR))) +
geom_col(position = "dodge") +
scale_fill_viridis_d(option = "C") +
scale_y_continuous(labels = label_dollar()) +
labs(
title = "Monthly Vendor Spending Trends by Fiscal Year",
x = "Month",
y = "Spend",
fill = "Fiscal Year"
) +
theme_minimal()
Vendor trends seem to follow a fairly similar monthly spending pattern. The following is an interpretation of the trends:
Spending is, usually, highest during the months of February through June.
Spending is almost always lowest at the beginning of the fiscal year (January)
June 30 is the end of the fiscal year (government entities), which may account for the spending slowdown from the end of June through August.
There may be spending lag regarding newly approved budgets after the new fiscal year kicks off.
Question: Who are Cincinnati’s top 10 vendors by total payment, and are they missing check numbers?
format_compact <- function(x) {
ifelse(x >= 1e9, paste0(round(x / 1e9, 1), "B"),
ifelse(x >= 1e6, paste0(round(x / 1e6, 1), "M"),
ifelse(x >= 1e3, paste0(round(x / 1e3, 1), "K"), x)))
}
VendorPayments %>%
group_by(VENDOR_NAME) %>%
summarise(
Total_Paid = sum(AMOUNT, na.rm = TRUE),
Missing_Checks = sum(is.na(CHECK_NO)),
.groups = "drop"
) %>%
slice_max(Total_Paid, n = 10) %>%
mutate(VENDOR_NAME = reorder(VENDOR_NAME, Total_Paid)) %>%
ggplot(aes(x = VENDOR_NAME, y = Total_Paid)) +
geom_segment(aes(xend = VENDOR_NAME, y = 0, yend = Total_Paid), color = "gray70") +
geom_point(aes(size = Missing_Checks), color = "steelblue") +
geom_text_repel(aes(label = paste0("$", format_compact(Total_Paid))),
size = 3.5, nudge_y = 1e6) +
coord_flip() +
labs(
title = "Top 10 Vendors by Total Payment (City of Cincinnati)",
x = "Vendor",
y = "Total Paid",
size = "# Missing Check Numbers"
) +
scale_y_continuous(labels = format_compact) +
theme_minimal()
The top 2 paid vendors in the City of Cincinnati are “Community Insurance CO” and “SORTA”.
Top 10 (ordered)
The vendor with the most missing check numbers by payments is the miscellaneous vendor. These misc. vendors are those who don’t align with a specific department, business, or spending category. Because of this, it is understandable how they have the mot number of msising check numbers, as their data accuracy and accountability may be lesser than that of the other higher spending vendors.