Cincinnati Vendor Payments Visualization Project

For this project I have been tasked with cleaning, wrangling, and ultimately visualizing data related to Cincinnati vendor payments. Within this project, we take a closer look at analyzing both the nature of vendors in Cincinnati, as well as their differences.

Columns & Variables

Number of Rows:

1.1M

Number of Columns:

14

Missing Values:

58 (Check_No) check numbers

Dataset Description:

The data set is rather large, containing nearly 1.1M rows. The rows represent recorded payments made to vendors by the City of Cincinnati from 2014-2025.

##  [1] "FISCAL_YEAR"       "ACCT_PERIOD"       "DEPT_CODE"        
##  [4] "DEPT_DESC"         "FUND_CODE"         "FUND_DESC"        
##  [7] "EXP_ACCT_CAT"      "EXP_ACCT_CAT_DESC" "TRANS_ID"         
## [10] "TRANS_LINE_NO"     "RECORD_DATE"       "CHECK_NO"         
## [13] "AMOUNT"            "VENDOR_NAME"

Variable Descriptions

Below is a description of each variable in the VendorPayments dataset:

  • FISCAL_YEAR: The fiscal year in which the transaction occurred (numeric).
  • ACCT_PERIOD: Accounting period (e.g., month) of the fiscal year (numeric).
  • DEPT_CODE: A short code identifying the department responsible for the transaction (text).
  • DEPT_DESC: Full description or name of the department (text).
  • FUND_CODE: Code representing the funding source for the transaction (text).
  • FUND_DESC: Full description of the fund associated with the transaction (text).
  • EXP_ACCT_CAT: Categorical code for the type of expense (text).
  • EXP_ACCT_CAT_DESC: Description of the expense category (text).
  • TRANS_ID: Unique identifier for the transaction (text).
  • TRANS_LINE_NO: Line number within the transaction (numeric).
  • RECORD_DATE: The date the transaction was recorded (floating timestamp).
  • CHECK_NO: Check number used for payment, if applicable (text, contains missing values).
  • AMOUNT: Dollar amount of the transaction (numeric).
  • VENDOR_NAME: Name of the vendor who received the payment (text).

Data Analysis

Question 1: Department Spending Over Time

Question:
How has vendor spending changed across departments over time, and which departments consistently spend the most?

Code:

top_depts <- VendorPayments %>%
  group_by(DEPT_DESC) %>%
  summarise(Total_Spend = sum(AMOUNT, na.rm = TRUE)) %>%
  slice_max(Total_Spend, n = 5) %>%
  pull(DEPT_DESC)

VendorPayments %>%
  filter(DEPT_DESC %in% top_depts) %>%
  group_by(FISCAL_YEAR, DEPT_DESC) %>%
  summarise(Annual_Spend = sum(AMOUNT, na.rm = TRUE), .groups = "drop") %>%
  ggplot(aes(x = FISCAL_YEAR, y = Annual_Spend, color = DEPT_DESC)) +
  geom_line(size = 1.2) +
  scale_y_continuous(labels = label_dollar()) +
  theme_economist() +
  labs(
    title = "Top 5 Departments by Vendor Spending Over Time",
    x = "Fiscal Year",
    y = "Total Spend",
    color = "Department"
  ) +
  theme(
    legend.position = "bottom",
    legend.title = element_text(size = 10),
    legend.text = element_text(size = 8),
    legend.box = "horizontal"
  ) +
  guides(color = guide_legend(nrow = 2, byrow = TRUE))

Interpretation:

The highest spending departments are the Department of Sewers and Finance, Risk Management. The Department Of Water Works, however, has been steadily climbing since 2016. From this chart, we can see that Finance & Risk Management has the most stable spending. Spending for The Southwest Ohio Regional Transit Authority (SORTA) was at a steady climb from 2016 through 2019, however in the three years following, their spending decreased drastically. The most likely reason for this decreased spending is budget cuts and reduced operations. In fact, this interpretation aligns with the budget cuts of Cincinnati’s transporation system during those years. In 2019, SORTA had to dip into reserves to fund their system, due to a severe budget deficit. In 2020, a new funding structure was introduced for Cincinnati’s Metro System.

Question 3: Top 10 Vendors & Their Missing Checks

Question: Who are Cincinnati’s top 10 vendors by total payment, and are they missing check numbers?

Code:

format_compact <- function(x) {
  ifelse(x >= 1e9, paste0(round(x / 1e9, 1), "B"),
         ifelse(x >= 1e6, paste0(round(x / 1e6, 1), "M"),
                ifelse(x >= 1e3, paste0(round(x / 1e3, 1), "K"), x)))
}

VendorPayments %>%
  group_by(VENDOR_NAME) %>%
  summarise(
    Total_Paid = sum(AMOUNT, na.rm = TRUE),
    Missing_Checks = sum(is.na(CHECK_NO)),
    .groups = "drop"
  ) %>%
  slice_max(Total_Paid, n = 10) %>%
  mutate(VENDOR_NAME = reorder(VENDOR_NAME, Total_Paid)) %>%
  ggplot(aes(x = VENDOR_NAME, y = Total_Paid)) +
  geom_segment(aes(xend = VENDOR_NAME, y = 0, yend = Total_Paid), color = "gray70") +
  geom_point(aes(size = Missing_Checks), color = "steelblue") +
  geom_text_repel(aes(label = paste0("$", format_compact(Total_Paid))),
                  size = 3.5, nudge_y = 1e6) +
  coord_flip() +
  labs(
    title = "Top 10 Vendors by Total Payment (City of Cincinnati)",
    x = "Vendor",
    y = "Total Paid",
    size = "# Missing Check Numbers"
  ) +
  scale_y_continuous(labels = format_compact) +
  theme_minimal()

Interpretation:

The top 2 paid vendors in the City of Cincinnati are “Community Insurance CO” and “SORTA”.

Top 10 (ordered)

  1. Community Insurance CO
  2. SORTA
  3. PRUS Construction
  4. BOE (Board of Ed)
  5. ADELTA, Inc.
  6. Duke Energy
  7. Misc
  8. Rick & Ballauer Excavating Co., Inc.
  9. Building Crafts, Inc.
  10. Hamilton County

The vendor with the most missing check numbers by payments is the miscellaneous vendor. These misc. vendors are those who don’t align with a specific department, business, or spending category. Because of this, it is understandable how they have the mot number of msising check numbers, as their data accuracy and accountability may be lesser than that of the other higher spending vendors.