Introduction

" Montgomery is a city in Hamilton County, Ohio, United States, settled in 1796. The town was a coach stop on the Cincinnati-Zanesville Road, later known as the Montgomery Pike, with an inn, two taverns, a grist mill and a carding mill to process its agricultural products.

The colorful landscape, tree-lined streets and miles of brick paver sidewalks in Montgomery lend themselves to the charming “Americana” feel of this well-preserved, historic community. It is home to wonderfully quaint old buildings that are a reminder of a quieter, slower time in the community. The treasured Montgomery Heritage District, located in and around Montgomery Road, is a diversified and unique destination for shopping, dining, and entertainment with storefront shops, nationally-recognized and fine ethnic restaurants, galleries, services and performance venues. " - “http://www.montgomeryohio.org/

This project is to answer a few questions basis the spend data of the government. The dataset includes County spending data for Montgomery County government.

The idea is to analyse the data, see the trends in the spend over the years, the mode of payemnts being used the most, the departments which spend the most, ROC and CV of the transactions made by all departments and other interesting facts about the governemnet spend data.

Synopsis

After spending a lot of time understanding, validating and cleaning the data; data is analysed to answer different questions on the spending of all the departments of the government of Montgomery county.

The data used is the County spending data for Montgomery County government. It does not include agency spending. The data is cleaned to pick the columns that are required to make it meaningful and relevant for the analysis as all the data fields are not required for the answers that the analysis demand.

Packages used

Packages loaded

library(readr)
library(dplyr)
library(tibble)
library(ggplot2)
library(knitr)
library(DT)
library(scales)

Purpose

readr - The goal of readr is to provide a fast and friendly way to read tabular data into R.

dplyr- It is used for data manipulation. It provides some great, easy-to-use functions that are very handy when performing exploratory data analysis and manipulation.

tibble - Tibbles are a modern take on data frames. They keep the features that have stood the test of time, and drop the features that used to be convenient but are now frustrating (i.e. converting character vectors to factors).

ggplot 2 - This packageoffers a powerful graphics language for creating elegant and complex plots. Grouping can be represented by color, symbol, size, and transparency. It is a wonderfulfor tool for visualizing data.

knitr - Used to format the table using kable function

DT - It creates an HTML widget to display R data objects with DataTables.

scales - Used to scale axis in the plots

Data Preparation

Data Import

url <- "http://data.montgomerycountymd.gov/api/views/vpf9-6irq/rows.csv?accessType=DOWNLOAD"
county_spending <- read.csv(url,stringsAsFactors = TRUE)
county_spending <- county_spending %>% as_tibble()

Data Overview

Column Name Description Data Type
Fiscal Year Period Our fiscal year runs July 1st through June 30th. A period is equal to one month (Example: period 1 is July, period 2 is August, period 3 is September, etc.) Number
Fiscal Year The County’s fiscal year (such as 2015) represents a 12 month period used for annual financial reporting beginning July 1st through June 30th Number
Service High Organization level grouping, such as Public Safety and Community Development Text
Department The name of the Department Text
Program Subdivision of Departments made up of a primary service, function, or set of activities which address a specific responsibility or goal within an agency’s mission. (Example: Operations, Budget, Aquatics) Text
Fund Accounting device that the County uses to keep track of specific sources of funding and spending for particular purposes (Example: General Fund, Grant Fund; CIP Fund) Text
Category Represents how data is presented on the County’s financial statements (expense, revenue, liability, asset, owner’s equity or fund balance) Text
Expense Category The highest level grouping of account codes that the County has spent funds on. For example, the “Travel” expense Text
Account Code A unique account identification number (related to Account Name field) Text
Invoice Description Lowest level of description regarding how the County expends monies (Example: local travel) Text
Vendor Supplier name Text
Vendor Number A unique supplier identification number Text
Zip The zip code for the supplier’s address to which the supplier’s payment is sent Text
Contract Number A unique number used to identify a County contract Text
PO Number A unique number used to identify a purchase order issued to a vendor Text
PO Line The line number for a specific item or service that is being purchased on the Purchase Order (PO). (For example, if five different items are purchased, then there will be five PO lines.) Text
Invoice Number A number assigned to an invoice, used to uniquely identify a supplier’s invoice Text
Invoice Line A line number from a supplier invoice, for the County’s purchase of a good or a service Text
Invoice Distribution Line The line number indicating which Department(s) or Program(s) the invoice is charged to. (For example, if five different departments are being charged for an item, then there will be five distribution lines) Text
Amount The dollar amount associated with an invoice distribution line Number
Invoice Date The Invoice date is the date listed in ERP as the invoice date and may contain a null value Date & Time
Payment Method Type of payment, such as check, EFT, credit card (MCG JPM SUA) or wire transactions Text
Payment Date The date the check was written or the payment was made Date & Time
Payment Number The identifying number on the paper check, electronic payment, or credit card transactions Number
Payment Status Status of a payment through the banking system, such as Negotiable, Void, or Reconciled Text

Source

Source

The data is taken from https://www.data.gov/

Click to Download

Data Insight

The data is last updated on October 25, 2016 and is updated on a quarterly basis. It has 25 rows and 837606 rows.

The data is owned by MCG ESB Service and updated by Department of Finance.

Data Pecularity

The data has 588654 blank values.

Finding blank values in each column
county_spending[county_spending == ''] <- NA
na_count <-sapply(county_spending, function(y) sum(length(which(is.na(y)))))
na_count <- data.frame(na_count)
kable(na_count)
na_count
Fiscal.Year.Period 0
Fiscal.Year 0
Service 0
Department 0
Program 0
Fund 0
Category 0
Expense.Category 0
Account.Code 0
Invoice.Description 0
Vendor 0
Vendor.Number 0
Zip 82157
Contract.Number 708046
PO.Number 482044
PO.Line 588654
Invoice.Number 0
Invoice.Line 0
Invoice.Distribution.Line 0
Amount 0
Invoice.Date 1609
Payment.Method 0
Payment.Date 0
Payment.Number 0
Payment.Status 0
Looking at datatype of the data

While importing the data the data was converted to factor values by keeping stringasfactor option TRUE so it needs to be checked if all the data fields are in the right format.

kable(str(county_spending))
## Classes 'tbl_df', 'tbl' and 'data.frame':    837606 obs. of  25 variables:
##  $ Fiscal.Year.Period       : int  6 11 8 12 6 6 10 7 6 5 ...
##  $ Fiscal.Year              : int  2014 2014 2015 2014 2016 2016 2015 2014 2014 2014 ...
##  $ Service                  : Factor w/ 11 levels "CIP","Culture and Recreation",..: 6 6 6 6 10 10 6 2 6 6 ...
##  $ Department               : Factor w/ 47 levels "Board of Appeals",..: 23 23 28 23 18 18 28 45 39 28 ...
##  $ Program                  : Factor w/ 314 levels "24-Hours Crisis Center",..: 126 126 300 126 115 268 300 71 227 300 ...
##  $ Fund                     : Factor w/ 33 levels "Bethesda Parking",..: 9 9 15 9 11 11 15 3 12 15 ...
##  $ Category                 : Factor w/ 2 levels "Not Defined",..: 2 2 1 2 2 2 1 2 2 1 ...
##  $ Expense.Category         : Factor w/ 58 levels "Advertising",..: 22 22 25 50 36 40 25 9 9 25 ...
##  $ Account.Code             : int  65624 65628 66010 64100 62016 63634 66010 60537 60076 66010 ...
##  $ Invoice.Description      : Factor w/ 681 levels "800 Line Charges",..: 4 461 56 325 130 416 56 144 515 56 ...
##  $ Vendor                   : Factor w/ 16627 levels "1000822890","1001219301",..: 8529 15662 5128 4136 4889 8304 10298 7890 540 5128 ...
##  $ Vendor.Number            : num  10759 54288 5645 40341 34954 ...
##  $ Zip                      : Factor w/ 3331 levels "","00000","01040",..: 1337 120 44 1191 1052 1337 1753 1242 25 44 ...
##  $ Contract.Number          : Factor w/ 3821 levels "","0363200064AA",..: NA NA NA NA NA NA NA NA 3368 NA ...
##  $ PO.Number                : Factor w/ 54690 levels "","1000015","1000052",..: 8759 11719 NA NA 13102 NA NA 9272 54525 NA ...
##  $ PO.Line                  : num  1 3 NA NA 1 NA NA 1 NA NA ...
##  $ Invoice.Number           : Factor w/ 656765 levels "_0000025034_010114-123114",..: 81365 200522 518897 481074 653925 559184 560850 566535 389722 518030 ...
##  $ Invoice.Line             : num  1 3 1 2 1 1 3 1 1 1 ...
##  $ Invoice.Distribution.Line: num  1 1 1 1 1 1 1 1 1 1 ...
##  $ Amount                   : num  1576281 25616 27394 195 148 ...
##  $ Invoice.Date             : Factor w/ 1463 levels "","01/01/2013",..: 1403 552 158 659 1376 1402 348 51 1306 1268 ...
##  $ Payment.Method           : Factor w/ 4 levels "CHECK","EFT",..: 2 2 2 1 3 1 1 1 1 2 ...
##  $ Payment.Date             : Factor w/ 831 levels "01/01/2014","01/02/2014",..: 809 292 96 344 13 812 216 48 771 741 ...
##  $ Payment.Number           : num  368883 380202 403394 2053265 6009472 ...
##  $ Payment.Status           : Factor w/ 2 levels "NEGOTIABLE","RECONCILED": 2 2 2 2 1 2 2 2 2 2 ...

We notice that we need to change the format of date type of columns.

Data Cleaning

Following steps are taken to clean the data:

  • The columns which are not relevant are disposed.
  • Changed data types
    • Column - Invoice.Date is changed from Factor to Date Type.
    • Column - Fiscal_Year is changed from Integer to Factor Type.

Data Summary

Data Description of data fields chosen

Column Name Description Data Type
Fiscal Year Period Our fiscal year runs July 1st through June 30th. A period is equal to one month (Example: period 1 is July, period 2 is August, period 3 is September, etc.) Number
Fiscal Year The County’s fiscal year (such as 2015) represents a 12 month period used for annual financial reporting beginning July 1st through June 30th Number
Service High Organization level grouping, such as Public Safety and Community Development Text
Department The name of the Department Text
Program Subdivision of Departments made up of a primary service, function, or set of activities which address a specific responsibility or goal within an agency’s mission. (Example: Operations, Budget, Aquatics) Text
Fund Accounting device that the County uses to keep track of specific sources of funding and spending for particular purposes (Example: General Fund, Grant Fund; CIP Fund) Text
Expense Category The highest level grouping of account codes that the County has spent funds on. For example, the “Travel” expense Text
Amount The dollar amount associated with an invoice distribution line Number
Invoice Date The Invoice date is the date listed in ERP as the invoice date and may contain a null value Date & Time
Payment Method Type of payment, such as check, EFT, credit card (MCG JPM SUA) or wire transactions Text

Finding more about the data

The maximum spend in the data $ 2.6110^{8} in a single transaction. The minimum spend in the data $ -6.235908110^{5} in a single transaction. The Average spend per transaction is 1.480987110^{4}.

Exploratory Data Analysis

Spend Analysis(CV)

In this section, the analysis on the CV(Charge Volume) is done to answer various questions and to provide insights through the observations.

The departments spending over the years

Insight

Department of Countrywide Generic and Debt Services are > the ones which have the highest spend over all the years.

What do these departments spned on?

Countywide Generic

Debt Service

The top 10 departments spending the most in 2016

Now we look at the top 10 departments which have the hight spend in the latest Fiscal Year.

The top 10 departments spending the most over the years

The data for the Fiscal Year 2017 is excluded as it is the rolling year and the data in this year is incomplete.

The spending trend of these departments over the years

Insight

The spedning trends of all the Departments have been almost uniform and according to the budget allocated. There are no sudden drop or rise for almost all the top spending departments over the years we have tested the data for.

The top 5 expenses of Montgomery Government

Insight

The major spend is made on the contract and services followed by Insurance , Debt service and Liquor Purchases. The trend is similar over the years and thus we can conclude that the spending of the Montgomary government is budgeted, well planned and fixed.

Transaction Analysis(ROC)

This section, aims for thehe analysis on the ROC(number of transactions) made by the departments and to see the if they follow the similar treand as the CV.

Departments making the highest number of transactionsover the years

ROC of these departments over the years

Insight

The top 10 departments making the maximum transactions differ from the ones with the maximum spend. The trend in the number of transactions over the years also almost constant for almost all departments except Housing and Community Affairs where the number of transactions are increasing over the years.

Payment Analysis

We now compare all the methods of payments used in this section.

Insight

The check is the most widely used mode of payment in the year 2014 and 2015 whereas 2016 and 2017 show a change of trend with Electronic Fund Transfer being used more than checks

Summary

The data very clearly shows the smooth working of Montgomery County Government from the year 2014 to till date. The CV and ROC trends are constant over the years and the same departments carry the highest rank in spending and transacting from the past 4 years which very clearly indicates the on budget and planned nature of the governmnet expenses.

The methodology was simply to compare the highest spending and transacting departments over the years and check the uniformity in the data. The trends of the top 10 departments ( ROC and CV) were constant and thus gave conformity on the smooth functioning of the government.