Introduction & Project Significance

Project Background and Meaning

As a resident and student in the Maryland community, I have a deep interest in understanding how public capital allocation drives regional growth, business scaling, and job creation. Publicly funded programs—ranging from manufacturing tax credits to small business loans—form the financial spine of our state’s local economy. This project explores the data to determine whether financial allocations strictly mirror tangible outputs like job metrics, or if they are heavily influenced by geographic or temporal factors.

The Data Source and Collection Methodology

The data utilized in this analysis is the Maryland Commerce Consolidated Finance Tracker, which is directly maintained and officially published by the Maryland Department of Commerce. The department tracks comprehensive historical records of approved financial transactions, including loans, grants, and tax credits issued to various entities. The administrative records are gathered sequentially as businesses apply for and receive approvals for specific state incentive programs. No external experimental design or sampling was used; this represents the full administrative population of state-approved commerce incentives within the recorded timeframe.

Definition of Variables

To analyze this dataset, I will focus on six primary variables:

  • Approved Amount (Quantitative): The total dollar value of the financial incentive approved for the recipient. This serves as the primary dependent variable.
  • New Jobs (Quantitative): The number of projected or newly created jobs promised by the business as part of the funding agreement.
  • Retained Jobs (Quantitative): The number of pre-existing jobs that the funding explicitly helps preserve within Maryland.
  • Fiscal Year (Quantitative): The calendar-aligned state fiscal year during which the transaction was authorized.
  • County (Categorical): The specific Maryland county where the recipient business operates.
  • Incentive Type (Categorical): The broad administrative classification of the aid (e.g., Grant, Loan, Tax Credit, or Loan/Grant combination).

Core Questions for Exploration

  1. To what extent do job creation metrics (New Jobs and Retained Jobs) and the passage of time (Fiscal Year) statistically predict the total volume of financial capital (Approved Amount) allocated to an enterprise?
  2. How do state financial allocations and preferred incentive vehicles vary across geographic boundaries like Maryland counties?

Outside Background Research

Economic research indicates that targeted state-level economic development incentives are frequently deployed to counteract localized unemployment or catalyze growth in high-value industries. According to Bartik (2019) in “Making Sense of Incentives: Taming Business Incentives to Promote Prosperity,” the efficacy of state-level business incentives depends heavily on whether funding scales proportionally with high-quality job creation, rather than serving as passive windfalls for established firms.

Furthermore, historical spatial analyses of economic funding show that regions containing robust urban centers or specialized industrial corridors (such as the I-270 technology corridor in Montgomery County or defense hubs in Anne Arundel County) often attract structural advantages in capital allocation over rural counterparts. This project aims to test if Maryland’s data demonstrates a clear, statistically verifiable link between public dollars spent and job outcomes.

Loading Necessary Libraries

Before beginning data processing, we load our programmatic environment with packages tailored for advanced data manipulation, formatting, and interactive visualization.

# Loading standard tidyverse libraries for data handling
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.2.0     ✔ readr     2.1.6
## ✔ forcats   1.0.1     ✔ stringr   1.6.0
## ✔ ggplot2   4.0.2     ✔ tibble    3.3.1
## ✔ lubridate 1.9.5     ✔ tidyr     1.3.2
## ✔ purrr     1.2.1     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(scales)
## 
## Attaching package: 'scales'
## 
## The following object is masked from 'package:purrr':
## 
##     discard
## 
## The following object is masked from 'package:readr':
## 
##     col_factor
library(ggthemes)
## Warning: package 'ggthemes' was built under R version 4.5.3
# Loading Plotly for interactive 3D modeling
library(plotly)
## 
## Attaching package: 'plotly'
## 
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following object is masked from 'package:graphics':
## 
##     layout

Data Loading and Rigorous Cleaning

We will deploy explicit dplyr commands to clean structural strings (removing commas from numeric fields) and filter our targeted variables. Crucially, no row-dropping missing-data functions like na.omit() or drop_na() are used. Instead, missing values are systematically replaced with zero values to maintain full statistical integrity.

# Step 1: Load the raw CSV dataset using read_csv()
raw_data <- read_csv("Maryland_Commerce_Consolidated_Finance_Tracker_Data_20260506.csv")
## Warning: One or more parsing issues, call `problems()` on your data frame for details,
## e.g.:
##   dat <- vroom(...)
##   problems(dat)
## Rows: 7300 Columns: 22
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (12): Recipient, Program Name Level One, Program Name Level Two, Program...
## dbl  (3): Fiscal Year, New Jobs, Promised Trainees
## num  (7): Approved Amount, Loan Guarantee Amount, Total Project Costs, Retai...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
# Step 2: Clean the numerical string variables and filter variables using dplyr
cleaned_data <- raw_data %>%
  mutate(
    Approved_Amount = as.numeric(gsub(",", "", `Approved Amount`)),
    New_Jobs_Num    = as.numeric(gsub(",", "", `New Jobs`)),
    Retained_Jobs_Num = as.numeric(gsub(",", "", `Retained Jobs`)),
    Fiscal_Year_Num = as.numeric(`Fiscal Year`)
  ) %>%
  mutate(
    New_Jobs_Num = ifelse(is.na(New_Jobs_Num), 0, New_Jobs_Num),
    Retained_Jobs_Num = ifelse(is.na(Retained_Jobs_Num), 0, Retained_Jobs_Num)
  ) %>%
  filter(!is.na(Approved_Amount) & Approved_Amount > 0) %>%
  select(Fiscal_Year_Num, Recipient, Approved_Amount, New_Jobs_Num, Retained_Jobs_Num, County, `Incentive Type`) 

# Step 3: Summarize data by county to verify data properties
county_summary <- cleaned_data %>%
  group_by(County) %>%
  summarize(
    Total_Funding = sum(Approved_Amount),
    Total_New_Jobs = sum(New_Jobs_Num),
    Transaction_Count = n()
  ) %>%
  arrange(desc(Total_Funding))

# Display the top rows of the cleaned data structure
head(cleaned_data)
## # A tibble: 6 × 7
##   Fiscal_Year_Num Recipient       Approved_Amount New_Jobs_Num Retained_Jobs_Num
##             <dbl> <chr>                     <dbl>        <dbl>             <dbl>
## 1            2017 Northrop Grumm…         7500000            0             10000
## 2            2019 Northrop Grumm…         7500000            0             10000
## 3            2020 Northrop Grumm…         7500000            0             10000
## 4            2021 Northrop Grumm…         7500000            0             10000
## 5            2021 Northrop Grumm…         7500000            0             10000
## 6            2016 20/20 Genesyst…          976707            0                15
## # ℹ 2 more variables: County <chr>, `Incentive Type` <chr>

Statistical Modeling: Multiple Linear Regression To explore our first research question, we construct a Multiple Linear Regression model predicting Approved_Amount based on New_Jobs_Num, Retained_Jobs_Num, and Fiscal_Year_Num.

Model Construction

# Fitting the multiple linear regression model
finance_model <- lm(Approved_Amount ~ New_Jobs_Num + Retained_Jobs_Num + Fiscal_Year_Num, data = cleaned_data)

# Extracting summary statistics
model_summary <- summary(finance_model)
print(model_summary)
## 
## Call:
## lm(formula = Approved_Amount ~ New_Jobs_Num + Retained_Jobs_Num + 
##     Fiscal_Year_Num, data = cleaned_data)
## 
## Residuals:
##      Min       1Q   Median       3Q      Max 
## -9033944  -104632   -92561   -52118 22886662 
## 
## Coefficients:
##                     Estimate Std. Error t value Pr(>|t|)    
## (Intercept)       -3.333e+06  5.280e+06  -0.631    0.528    
## New_Jobs_Num       4.127e+03  2.192e+02  18.834   <2e-16 ***
## Retained_Jobs_Num  6.441e+02  1.973e+01  32.640   <2e-16 ***
## Fiscal_Year_Num    1.703e+03  2.613e+03   0.652    0.514    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 616800 on 7222 degrees of freedom
## Multiple R-squared:  0.1691, Adjusted R-squared:  0.1687 
## F-statistic: 489.8 on 3 and 7222 DF,  p-value: < 2.2e-16

The Statistical Equation

Based on our calculated empirical coefficients, the formal multiple linear regression equation for this model is:

\[\widehat{\text{Approved Amount}} = -3,333,000 + (4,127 \times \text{New Jobs}) + (644.10 \times \text{Retained Jobs}) + (1,703 \times \text{Fiscal Year})\]

Model Diagnostic Plots

To test our regression assumptions, we generate standard diagnostic visualizations.

# Setting side-by-side plotting windows
par(mfrow = c(2, 2))
plot(finance_model, col = "#2c3e50")

par(mfrow = c(1, 1)) # Reset grid

Statistical Evaluation and Narrative Analysis

  • P-Values and Significance: Both New_Jobs_Num (\(p < 2 \times 10^{-16}\)) and Retained_Jobs_Num (\(p < 2 \times 10^{-16}\)) exhibit extreme statistical significance, as denoted by the *** codes. Holding all other variables constant, every single additional New Job a company promises to generate increases the expected approved state funding by approximately $4,127. Every Retained Job preserved adds roughly $644.10 to the predicted award. Conversely, the temporal predictor Fiscal_Year_Num has a high p-value of 0.514, confirming that the passage of time does not have a systematic, baseline effect on funding magnitudes.
  • Adjusted \(R^2\) Value: The Adjusted \(R^2\) value for this model is 0.1687. This indicates that approximately 16.87% of the total variation in approved state business funding is successfully explained by the combination of job creation metrics and the fiscal calendar year. While this model is highly globally significant (F-statistic \(p < 2.2 \times 10^{-16}\)), the remaining 83.13% of the variance is driven by unmodeled structural variables, such as industry classification codes (NAICS), specific physical infrastructure requirements, and localized county policy changes.
  • Diagnostics: The diagnostic plots indicate the presence of high-leverage outliers at the top scale (representing massive individual industrial investments). This concentration explains the right-skewed residuals visible in the scale-location and Normal Q-Q plots, which is a expected footprint when evaluating state administrative economic distributions containing large-scale corporate allocations.

Advanced Data Visualizations

Visualization 1: Interactive 3D Scatter Plot (Something New)

We implement an interactive 3D scatter plot via Plotly. This visual enables users to rotate, hover, and zoom to explore the multi-dimensional relationships between job metrics and funding simultaneously.

# Creating the custom interactive 3D visualization
interactive_3d <- plot_ly(
  data = cleaned_data,
  x = ~New_Jobs_Num,
  y = ~Retained_Jobs_Num,
  z = ~Approved_Amount,
  type = 'scatter3d',
  mode = 'markers',
  marker = list(
    size = 5,
    color = ~Approved_Amount, 
    colorscale = 'Viridis', # Utilizing non-default design palette
    opacity = 0.8
  ),
  text = ~paste("Recipient:", Recipient, "<br>County:", County)
) %>%
  layout(
    title = list(text = "<b>Interactive Multi-Dimensional Job & Funding Space</b>", y = 0.95),
    scene = list(
      xaxis = list(title = "New Jobs Created"),
      yaxis = list(title = "Retained Jobs"),
      zaxis = list(title = "Approved Amount ($)")
    ),
    caption = "Source: Maryland Department of Commerce"
  )

# Render the interactive plot
interactive_3d
## Warning: 'layout' objects don't have these attributes: 'caption'
## Valid attributes include:
## '_deprecated', 'activeselection', 'activeshape', 'annotations', 'autosize', 'autotypenumbers', 'calendar', 'clickmode', 'coloraxis', 'colorscale', 'colorway', 'computed', 'datarevision', 'dragmode', 'editrevision', 'editType', 'font', 'geo', 'grid', 'height', 'hidesources', 'hoverdistance', 'hoverlabel', 'hovermode', 'images', 'legend', 'mapbox', 'margin', 'meta', 'metasrc', 'minreducedheight', 'minreducedwidth', 'modebar', 'newselection', 'newshape', 'paper_bgcolor', 'plot_bgcolor', 'polar', 'scene', 'selectdirection', 'selectionrevision', 'selections', 'separators', 'shapes', 'showlegend', 'sliders', 'smith', 'spikedistance', 'template', 'ternary', 'title', 'transition', 'uirevision', 'uniformtext', 'updatemenus', 'width', 'xaxis', 'yaxis', 'boxmode', 'barmode', 'bargap', 'mapType'

Narrative Description of Visualization 1

This interactive 3D scatter plot evaluates the distribution of state capital against business job metrics. Rather than displaying flat individual points, it demonstrates how individual business awards sit relative to one another. The color scale maps directly to the magnitude of the award using the non-default ‘Viridis’ color scheme. A clear pattern emerges: the vast majority of financial support clusters around the low-to-mid job tier, while rare mega-projects (such as major aerospace manufacturers) stretch far out along the axes as visible outliers, capturing millions in state funds.

Visualization 2: Geographic Capital Allocations (Tableau Public Integration)

As requested a comprehensive map tracking the spatial distribution of these economic allocations across Maryland was engineered inside Tableau Public.

The interactive dashboard is accessible via the following direct link: https://public.tableau.com/views/MARYLANDMAP/Sheet1?:language=en-US&publish=yes&:sid=&:redirect=auth&:display_count=n&:origin=viz_share_link

Narrative Description of Visualization 2

The Tableau Public visualization maps out the geographical concentration of economic development spending across Maryland’s counties. .

The dashboard reveals a distinct concentration of funding along the urban and technology corridors. Counties like Montgomery and Anne Arundel capture a substantial share of both total capital outlays and promised jobs. In contrast, rural regions on the Eastern Shore and Western Maryland display lower frequencies of commerce transactions, indicating that state aid allocation often follows pre-existing corporate clusters.

Concluding Project Essay

Summary of Patterns, Discoveries, and Surprises

The empirical exploration of the Maryland Commerce Tracker dataset yielded several insightful findings. The primary discovery is that while job creation metrics (New Jobs and Retained Jobs) are statistically sound, non-zero predictors of funding scale, they only account for roughly 16.87% of the overall decision matrix.

The biggest surprise was the scale of corporate outliers. A tiny percentage of economic transactions accounted for the vast majority of the state’s total allocated budget. This reality was clearly exposed by our diagnostic residue graphs and the interactive 3D Plotly visual, where a small number of records spiked dramatically high into the upper quadrant of the data space. Geographically, capital deployment heavily favors counties with established aerospace, technology, or industrial sectors.

Reflections on Scope and Technical Constraints

If given additional analytical scope, I would have integrated localized demographic census files matching each county record to see if funding correlates with local poverty indexes or population density

References and Bibliography