MOCO spending Project 3

Author

Jake Abdow

Published

May 9, 2023

a. Intro

For Project 3, I decided to use the “county spending” data set from dataMontgomery the Counties program dedicated to “making county public data freely available in an open and consumable format.”(Open, 2023) While there was no ReadMe file, after researching dataMontgomery webpage I found the forum which included the process in how they create data sets. According to the Open Data Operations Manual, “Executive Branch departments, offices, and agencies collectively employ more than 600 applications, databases, and spreadsheets from which”data” may be derived.” Once applications have been collected, they are subjected “quantitative prioritization process” with the goal-scoring data sets to represent their value relative to each other resulting in a ranking system that lets dataMontomgeromy “manage the inventory pipeline and get the highest value data sets published first.”

This data set has over 27 variables, and information about the amount spent for Montgomery County government services, departments, and programs includes characters, numeric, and time data types. It is important to note that this data does not include agency spending. I started cleaning my data by removing the columns and variables I knew I wouldn’t be using for my project or were blank. This was also necessary to do the size of the data set. Moving on to ensure no missing values, I checked for NAs using the filter function, and !is.na. Next, to create my various visualizations, I grouped my data by Department, Service, Program, Fiscal.Year in various ways and using summarize(Total_Amount = sum(Amount)) to get spending totals. Additionally, I performed numerous filters to remove columns such as General Government, which represented around 60 percent of total spending and made it hard to have any good visualization. Lastly, I use Select to get the top 15 departments for my animation. I needed to adjust for scientific notions, so I used options(screen = 999) because my Total_Amounts were so large.

As a Business/Finance major, this data set was extremely intriguing, especially since it covers Montgomery County, where I live. I can see exactly where the government is allocating its funds. For example, looking at education spending, there is relevant information about Montgomery College and its funding from the county, which has a real impact on students, including myself. In addition, as a resident paying taxes, I want to ensure funds are being utilized properly.

b. Background research about this topic.

Montgomery County spending has been on the rise. According to Wtop, last year, “the Montgomery County Council approved a $6.3 billion operating budget”, an increase from the previous 5.9 billion in 2021. The budget is supposed to cover various sectors/expenses, including Education, public safety, health and human services, transportation, recreation, and debt services. The operating budget allocates “$2.9 billion for the county’s schools” and Education, which is nearly half of the budget, including a boost to mental health services and school safety programs. The Public Safety fund includes “$296 million for the county’s police department” and $251.8 million for Fire and rescue services.” Transportation spending includes funding for the “restoration of Ride On and Call-n-Ride service to pre-pandemic levels.” (this can be seen in my visualization) Most of the country’s budget is working toward the goal of continuing efforts tied to economic recovery from the pandemic.

Works Cited

Ryan, Kate. “Montgomery Co. Council Oks $6.3 Billion Preliminary Budget.” WTOP News, 20 May 2022, wtop.com/montgomery-county/2022/05/preliminary-budget-vote-in-montgomery-county-commits-6-3b-for-fy-2023/.

“Open Data Operations Manual.” dataMontgomery, www.montgomerycountymd.gov/open/Resources/Files/FY20-Open-Data-Operations-Manual-FINAL.pdf. Accessed 3 May 2023.

#install.packages("rlang", version = "1.1.0")

library(tidyverse)

Warning: package 'tidyverse' was built under R version 4.2.3

── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.0     ✔ readr     2.1.4
✔ forcats   1.0.0     ✔ stringr   1.5.0
✔ ggplot2   3.4.1     ✔ tibble    3.1.8
✔ lubridate 1.9.2     ✔ tidyr     1.3.0
✔ purrr     1.0.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(plotly)


Attaching package: 'plotly'

The following object is masked from 'package:ggplot2':

    last_plot

The following object is masked from 'package:stats':

    filter

The following object is masked from 'package:graphics':

    layout

library(treemap)
library(viridis)

Loading required package: viridisLite

library(highcharter)

Registered S3 method overwritten by 'quantmod':
  method            from
  as.zoo.data.frame zoo

library(gganimate)

Warning: package 'gganimate' was built under R version 4.2.3

library(magic)

Loading required package: abind

library(patchwork)

Warning: package 'patchwork' was built under R version 4.2.3

library(RColorBrewer)
library(networkD3)

Warning: package 'networkD3' was built under R version 4.2.3


Attaching package: 'networkD3'

The following object is masked from 'package:highcharter':

    JS

library(highcharter) 
library(rlang)

Warning: package 'rlang' was built under R version 4.2.3


Attaching package: 'rlang'

The following objects are masked from 'package:purrr':

    %@%, flatten, flatten_chr, flatten_dbl, flatten_int, flatten_lgl,
    flatten_raw, invoke, splice

Set working directory and load data

setwd("C:/Users/jakea/OneDrive/Desktop/MC 2022/DATA-110")
CS <- read.csv("County_Spending.csv")
head(CS)

  Fiscal.Year.Period Fiscal.Year                           Service
1                  9        2016 Housing and Community Development
2                  6        2021                General Government
3                  7        2017                General Government
4                  8        2017 Housing and Community Development
5                 10        2017         Health and Human Services
6                  2        2017                General Government
                                    Department
1                Housing and Community Affairs
2                                      Finance
3 Technology and Enterprise Business Solutions
4                Housing and Community Affairs
5                    Health and Human Services
6                    Non-Departmental Accounts
                            Program                          Fund
1     Multi-Family Housing Programs Montgomery Housing Initiative
2 Economic Development Fund Program                    Grant Fund
3         Chief Information Officer                  General Fund
4     Multi-Family Housing Programs Montgomery Housing Initiative
5                   Trauma Services                  General Fund
6                         Utilities                  General Fund
            Category                   Expense.Category Account.Code
1 Operating Expenses                  Public Assistance        65116
2 Operating Expenses                              Loans        65214
3 Operating Expenses Other Supplies/Materials/Equipment        62896
4 Operating Expenses                  Public Assistance        65130
5 Operating Expenses                             Travel        64010
6 Operating Expenses                          Utilities        65014
                                      Invoice.Description
1                                       Public Assistance
2 Grants-Convertible To Loans (Economic Development Fund)
3                                          Parking Meters
4               Senior Rental Assistance Program Benefits
5                                Metropolitan Area Travel
6                                             Electricity
                               Vendor Vendor.Number        Zip Contract.Number
1                          1126022265    2301153206      75240      2301153206
2    HEALTH RESEARCH AND ANALYSIS LLC         65969      20814                
3  P-CARD PURCHASES (JP MORGAN CHASE)         50766 60197-4471                
4 MCG HHS - RENTAL ASSISTANCE PROGRAM         38715      20850                
5                       Cabrera, Mona         38562                           
6                               PEPCO         34863 20068-0001                
  PO.Number PO.Line         Invoice.Number Invoice.Line
1 737461995      NA             1993278089           10
2                NA             REOPEN4474            1
3                NA          2298496846001            1
4                NA             HHS|233002           10
5                NA                1321180            4
6                NA 5502240899521000136638            1
  Invoice.Distribution.Line   Amount Invoice.Date Payment.Method Payment.Date
1                         1  1546.00   03/24/2016          CHECK   03/25/2016
2                         1  5000.00   12/23/2020            EFT   03/09/2021
3                         1     5.32   11/30/2016            EFT   01/11/2017
4                         1   200.00   02/17/2017          CHECK   02/21/2017
5                         1     1.39   04/06/2017          CHECK   04/24/2017
6                         1 10798.79   08/04/2016            EFT   08/09/2016
  Payment.Number Payment.Status
1     3232348497     RECONCILED
2         631826     RECONCILED
3         461215     RECONCILED
4        2276923     RECONCILED
5        2287101     RECONCILED
6         447544     RECONCILED

Clean CS dataset by removing columns that would not be used for my project and Filter for any NA values.

CS <- CS %>% 
  select(-Contract.Number, -PO.Number, -PO.Line, -Invoice.Line, -Invoice.Distribution.Line, -Invoice.Date, -Payment.Date) 

CS <- CS %>% 
  filter(!is.na(Fiscal.Year) & !is.na(Service) & !is.na(Department) & !is.na(Program) &  !is.na(Fund) & !is.na(Category) & !is.na(Expense.Category) & !is.na(Account.Code) & !is.na(Invoice.Description) & !is.na(Vendor) & !is.na(Vendor.Number) & !is.na(Zip) & !is.na(Invoice.Number) & !is.na(Amount) & !is.na(Payment.Method) & !is.na(Payment.Status) & !is.na(Payment.Number))

Create all the datasets nessacary for visualizations

CSTotal <- CS %>% # total spending for county each year
  group_by(Fiscal.Year) %>%
  summarize(Total_Amount = sum(Amount))
CS_department_total <- CS %>% # total spending per department
  group_by(Department) %>%
  summarize(Total_Amount = sum(Amount))
CS_service_total <- CS %>% # total spending per service
  group_by(Service) %>%
  summarize(Total_Amount = sum(Amount))
CS_department_year <- CS %>% # total spending per department each year
  group_by(Department, Fiscal.Year) %>%
  summarize(Total_Amount = sum(Amount)) %>% 
  filter(Department != "Countywide Generic") %>% # remove Countywide Generic because it is to much of an out linear
  filter(Department != "Debt Service") %>%
  filter(Department %in% c("Human Resources", "Transportation", "Non-Departmental Accounts", "General Services", "Liquor Control", "Health and Human Services", "Montgomery College", "Environmental Protection", "Finance", "Housing and Community Affairs", "Alcohol Beverage Services", "Technology and Enterprise Business Solutions", "Fire and Rescue Service", "Police", "Recreation"))

`summarise()` has grouped output by 'Department'. You can override using the
`.groups` argument.

CS_service_year <- CS %>%  # total spending per service each year
  group_by(Service, Fiscal.Year) %>%
  summarize(Total_Amount = sum(Amount)) %>% 
  filter(Service != "General Government") %>% # remove General Government because it is to much of an out linear
  filter(Service != "Debt Service") %>% # remove Debt Services because it is to much of an out linear
  filter(Service != "Not Defined") # remove Not Defined because no information

`summarise()` has grouped output by 'Service'. You can override using the
`.groups` argument.

CS_important <- CS %>%
  group_by(Department, Service, Program, Fiscal.Year) %>%
  summarize(Total_Amount = sum(Amount))

`summarise()` has grouped output by 'Department', 'Service', 'Program'. You can
override using the `.groups` argument.

CS_sankey <- CS %>% # data set need for sankey graph
  group_by(Department, Service, Program, Fiscal.Year) %>%
  summarize(Total_Amount = sum(Amount)) %>%
  filter(Service %in% c("General Government")) # only look at general government because we took out as out linear, debt services fund department debt services so not interest in looking deeper

`summarise()` has grouped output by 'Department', 'Service', 'Program'. You can
override using the `.groups` argument.

CS_sankey_CG <- CS %>% # data set need for sankey graph
  group_by(Department, Service, Program, Fiscal.Year) %>%
  summarize(Total_Amount = sum(Amount)) %>%
  filter(Department %in% c("Countywide Generic")) # only look at general government because we took out as out linear, debt services fund department debt services so not interest in looking deeper

`summarise()` has grouped output by 'Department', 'Service', 'Program'. You can
override using the `.groups` argument.

CS_GeneralG <- CS %>% 
  filter(Service %in% c("General Government")) 
CS_GeneralG_total_department <- CS_GeneralG %>%
  group_by(Department) %>%
  summarize(Total_Amount = sum(Amount))
CS_GeneralG_total_department_time <- CS_GeneralG %>%
  group_by(Fiscal.Year, Department) %>%
  summarize(Total_Amount = sum(Amount))

`summarise()` has grouped output by 'Fiscal.Year'. You can override using the
`.groups` argument.

Removes scenitific notion

options(scipen = 999)

Statiscal Analysis Correltation between Fiscal.Year and Total_Amount

# Create plot with customizations
stats <- ggplot(CSTotal, aes(x = Fiscal.Year, y = Total_Amount, color = Total_Amount)) +
  geom_point() +
  geom_smooth(method='lm',  color = "red") +
  scale_color_gradientn(colors = brewer.pal(9, "Reds")) +
  scale_y_continuous(labels = scales::dollar_format(prefix = "$")) +
  labs(title = "Correlation between Fiscal Year and Total Amount",
       x = "Fiscal Year",
       y = "Total Amount",
       caption = "Source: dataMontgomery") +
  theme_dark() 
stats

`geom_smooth()` using formula = 'y ~ x'

Looking at the correlation between Fiscal Year and Total Amount is a help analysis because it can help predict trends. In this case the trend of spending is positive and we see an obvious increase in spending overtime=. This trend is not always as linear and could be even more valuable in other scenarios. You can also view the trend of this years spending and how much we have already used for this years budget of 6.3 billion.

Facet wrap of each serivce and total spending per year

# Create the plot
ggplot(CS_service_year, aes(x = Fiscal.Year, y = Total_Amount, fill = Total_Amount)) +
  geom_col() +
  scale_fill_gradientn(colors = brewer.pal(9, "Blues")) +
  scale_y_continuous(labels = scales::label_number( big.mark = ",", prefix = "$")) +
  labs(title = "Total Spending Per Service Each Year From 2014-2023",
       x = "Fiscal Year",
       y = "Total Amount ",
       fill = "Total Amount",
       caption = "Source: dataMontgomery") +
  theme_bw() +
  facet_wrap("Service")

Here I broke down spending by Services in Montgomery County. I had to remove General Government and Debt Services because they were out linears and make the visual hard to read. CIP and Culture and Recreation were the two sectors with the least spent. While, Transportation and Health and Human Services have received the most. An interesting note is that we see during the spike of covid 19, spending for Health Service was heavily increased but has resided to normal levels

Facet wrap of each serivce and total spending per year

# Create the plot
ggplot(CS_department_year, aes(x = Fiscal.Year, y = Total_Amount, fill = Total_Amount)) +
  geom_col() +
  scale_fill_gradientn(colors = brewer.pal(9, "Blues")) +
  scale_y_continuous(labels = scales::label_number( big.mark = ",", prefix = "$")) +
  labs(title = "Total Spending for top 15 Department From 2014-2023",
       x = "Fiscal Year",
       y = "Total Amount ",
       fill = "Total Amount",
       caption = "Source: dataMontgomery") +
  theme_bw() +
  facet_wrap("Department")

In this facet_warp plot I broke down spending by the top 15 department based on total spending because there are more than two dozen. Montgomery College’s funds are shown which is relevant to us students. I noticed the Alcohol Beverage Services department has seen a dramatic decrease in spending, while, at the same time Liquor Control spending dramatic increased. In addition I saw the Human Resources department has seen a significant trend in increased spending, which may be why the county has needed to approve a bigger budget for that department

Animation of Total Spending for each Service minus General Goverment and Debt Services from 2014-2023

ggplot(CS_service_year, aes(y = Service, x = Total_Amount, fill = Total_Amount)) +
  geom_col() +
  scale_x_continuous(labels = scales::dollar_format(prefix = "$"), expand = c(0, 0)) +
  scale_fill_gradientn(colors = viridis(8)) +
  labs(title = "Total Spending by Service (Year: {frame_time})", 
       x = "Total Amount", 
       y = NULL, fill = "Total Amount") +
  theme_classic() +
  transition_time(Fiscal.Year) +
  ease_aes('linear') +
  coord_flip() +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1)) # make text readable at on the x axis is not would be horizontal

The animation shows the same relationship between Service spending over time as my previous visualization but is no longer static.

Animation of Total Spending for top 15 Departments minus Countywide Genric and Debt Services from 2014-2023

ggplot(CS_department_year, aes(y = Department, x = Total_Amount, fill = Total_Amount)) +
  geom_col() +
  scale_x_continuous(labels = scales::dollar_format(prefix = "$"), expand = c(0, 0)) +
  scale_fill_gradientn(colors = brewer.pal(9, "Blues")) +
  labs(title = "Total Spending for top 15 Departments (Year: {frame_time})", 
       x = "Total Amount", 
       y = NULL, fill = "Total Amount") +
  theme_classic() +
  transition_time(Fiscal.Year) +
  ease_aes('linear') +
  coord_flip() +
  theme(axis.text.x = element_text(angle = 90, vjust = 0.5, hjust = 1)) # make text readable at on the x axis is not would be horizontal

Create a node data frame and Add a column to the CS_sankey data frame to indicate the ID of the source and target nodes

nodes <- data.frame(name = unique(c(CS_important$Service, CS_important$Department)))
CS_important$IDsource <- match(CS_important$Service, nodes$name) - 1
CS_important$IDtarget <- match(CS_important$Department, nodes$name) - 1

Create Sankey diagram to show the flow of cash from Services to each Department (cluttered and hard to read titles but interesting visual as you see everything)

sankeyNetwork(Links = CS_important, Nodes = nodes, Source = "IDsource", Target = "IDtarget", Value = "Total_Amount", NodeID = "name",sinksRight = FALSE, fontSize = 10)

Links is a tbl_df. Converting to a plain data frame.

Looking at the Sankey Diagram can be useful in a couple ways. First we are able to get a simple visualization that is easily understood. We see what Services feed into the different departments and if any two services fund the same department. For example General government and Public Safety both fund the State’s Attorney department.

Same plot but using plot ly instead of networkd3 to add interactivity, color, etc

nodes <- data.frame(name = unique(c(CS_important$Service, CS_important$Department)))

CS_important$IDsource <- match(CS_important$Service, nodes$name) - 1
CS_important$IDtarget <- match(CS_important$Department, nodes$name) - 1

plot_ly(type = "sankey", orientation = "h", node = list(label = nodes$name), link = list(source = CS_important$IDsource, target = CS_important$IDtarget, value = CS_important$Total_Amount)) %>%
  layout(title = "Flow of Cash from Services to Departments",
        height = 1500,  # Adjust the height of the plot
         width = 800)  # Adjust the width of the plot

Warning: Specifying width/height in layout() is now deprecated.
Please specify in ggplotly() or plot_ly()

Just looking at the flow of money from the general goverment

flows <- data.frame(
  source = CS_sankey$Service, 
  target = CS_sankey$Department, 
  value = CS_sankey$Total_Amount)

nodes <- data.frame(name = unique(c(flows$source, flows$target)))
flows$source_id <- match(flows$source, nodes$name) - 1
flows$target_id <- match(flows$target, nodes$name) - 1
plot_ly(type = "sankey", orientation = "l", node = list( label = nodes$name, pad = 15, thickness = 20,line = list(color = "black", width = 0.3)), link = list(source = flows$source_id, target = flows$target_id, value = flows$value)) %>% 
  layout(title = "Flow of Capitial from General Goverment to Departments",  font = list(size = 10), hovertemplate = ("Source: %{source.name}<br>Target: %{target.name}<br>Value: %{value}"))

Warning: 'layout' objects don't have these attributes: 'hovertemplate'
Valid attributes include:
'_deprecated', 'activeshape', 'annotations', 'autosize', 'autotypenumbers', 'calendar', 'clickmode', 'coloraxis', 'colorscale', 'colorway', 'computed', 'datarevision', 'dragmode', 'editrevision', 'editType', 'font', 'geo', 'grid', 'height', 'hidesources', 'hoverdistance', 'hoverlabel', 'hovermode', 'images', 'legend', 'mapbox', 'margin', 'meta', 'metasrc', 'modebar', 'newshape', 'paper_bgcolor', 'plot_bgcolor', 'polar', 'scene', 'selectdirection', 'selectionrevision', 'separators', 'shapes', 'showlegend', 'sliders', 'smith', 'spikedistance', 'template', 'ternary', 'title', 'transition', 'uirevision', 'uniformtext', 'updatemenus', 'width', 'xaxis', 'yaxis', 'barmode', 'bargap', 'mapType'

Next, using the Sankey Diagram to break down each service individually is extremely beneficial at see where money is allocate in different Services. Viewing all the services is cluttered and hard to read. By just viewing one service we clearly see the flow of money from General Government. Furthermore, customizing the sankey diagram to adjust the size of each connecting to be based on the total amount spent see what departments are getting the most money.

c. What the visualization represents?

My visualization represents the spending patterns of the Montgomery County government. My first few visualizations represent and show the total spending per year for Montgomery County and the services or departments. The first is a scatter plot with a geom_smooth showing the correlation between fixed year and the total spending for the county as a whole. Nothing too surprising in the visualization can just view what years are above average vs. below and also see the overall trend of spending. Next, using facet_wrap, I was able to break down my visualization to view the trend of spending for each service for the county. Noticeable General government and County Wide Generic represent 60 percent of both service and department spending. I did remove this as it made it hard to visualize any other patterns. An interesting observation was that Health and Human Services saw a spike in spending during 2020 and 2021. Obviously, this was due to Covid 19, but it is interesting to see the effects on spending. Furthermore, it seems as if transportation had an influx of capital till that same period, and funds were allocated elsewhere. Moving on, I created animations to show the same change in spending. Here you notice the volatile changes in spending as well as data like Education, which has a consistent increase each year. The following visualizations show the flow of the capital using different Sankey diagrams. Here we view what departments get the most money allocated from each service, specifically the general government service. Housing and Community Development was behind Countywide Generic as the most funded department. Something I wish I could have included in my visual was my roller coaster animation. I originally planned to use the roller coaster data set from the shared Google drive to show coasters following a track based on their max height, max length, speed and the duration of the ride. While I successfully created my own track using present coordinates and based one coaster animation on duration. I could not figure out a way to base the coordinates off each coaster as I was only given max values and had to estimate/assume to the rest. This resulted in me only have one coaster in my animation and was not partially useful.