This project analyzes service efficiency at Maryland Motor Vehicle Administration (MVA) branches by examining how customer volume and wait times vary across locations and over time. The dataset includes information on customers served and average wait times for multiple MVA branches across fiscal years FY23, FY24, and FY25.
The dataset contains both categorical and quantitative variables. Categorical variables include branch location and time period (month), while quantitative variables include the number of customers served and average wait times.
The data was collected and published by the Maryland Motor Vehicle Administration (MVA) as part of its public service performance reporting. The dataset represents aggregated operational statistics collected from branch-level service records. However, limited documentation is available describing the exact data collection methodology.
The goal of this project is to answer the following research question:
How do customer volume and monthly trends influence wait times at MVA branches?
I chose this dataset because it reflects real-world public service operations and provides insight into how demand and efficiency interact in government service systems. Understanding these patterns can help identify whether increased customer demand is associated with longer wait times or whether some months show improved operational efficiency despite higher traffic.
library(readr)
library(ggplot2)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(tidyr)
library(highcharter)
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
df <- read_csv("MVA_Customers_Served_&_Wait_Time_by_Branch_20260415.csv")
## Rows: 25 Columns: 19
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Branch
## dbl (9): FY23 Wait Time, FY24 Wait Time, FY25 Wait Time, July 2025 Wait Time...
## num (9): FY23 Customers Served, FY24 Customers Served, FY25 Customers Served...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
In this step, I clean the dataset to keep only the variables needed for analysis. I select the branch name, monthly customer counts, and monthly wait times. I also convert all numeric variables to proper numeric format and remove missing values to ensure accurate calculations.
mva_clean <- df %>%
select(
Branch,
`July 2025 Customers Served`,
`August 2025 Customers Served`,
`September 2025 Customers Served`,
`October 2025 Customers Served`,
`November 2025 Customers Served`,
`December 2025 Customers Served`,
`July 2025 Wait Time`,
`August 2025 Wait Time`,
`September 2025 Wait Time`,
`October 2025 Wait Time`,
`November 2025 Wait Time`,
`December 2025 Wait Time`
) %>%
mutate(across(-Branch, as.numeric)) %>%
filter(
!is.na(`July 2025 Customers Served`) &
!is.na(`August 2025 Customers Served`) &
!is.na(`September 2025 Customers Served`) &
!is.na(`October 2025 Customers Served`) &
!is.na(`November 2025 Customers Served`) &
!is.na(`December 2025 Customers Served`) &
!is.na(`July 2025 Wait Time`) &
!is.na(`August 2025 Wait Time`) &
!is.na(`September 2025 Wait Time`) &
!is.na(`October 2025 Wait Time`) &
!is.na(`November 2025 Wait Time`) &
!is.na(`December 2025 Wait Time`)
) %>%
mutate(
Avg_Customers =
(
`July 2025 Customers Served` +
`August 2025 Customers Served` +
`September 2025 Customers Served` +
`October 2025 Customers Served` +
`November 2025 Customers Served` +
`December 2025 Customers Served`
) / 6,
Total_Customers =
`July 2025 Customers Served` +
`August 2025 Customers Served` +
`September 2025 Customers Served` +
`October 2025 Customers Served` +
`November 2025 Customers Served` +
`December 2025 Customers Served`,
Avg_Wait =
(
`July 2025 Wait Time` +
`August 2025 Wait Time` +
`September 2025 Wait Time` +
`October 2025 Wait Time` +
`November 2025 Wait Time` +
`December 2025 Wait Time`
) / 6
)
head(mva_clean)
## # A tibble: 6 × 16
## Branch July 2025 Customers …¹ August 2025 Customer…² September 2025 Custo…³
## <chr> <dbl> <dbl> <dbl>
## 1 Annapolis 9307 9533 8265
## 2 Baltimor… 12480 12245 11471
## 3 Bel Air 5413 5465 5324
## 4 Beltsvil… 10079 9998 9340
## 5 Columbia 5992 5918 5889
## 6 Cumberla… 2021 2348 2179
## # ℹ abbreviated names: ¹`July 2025 Customers Served`,
## # ²`August 2025 Customers Served`, ³`September 2025 Customers Served`
## # ℹ 12 more variables: `October 2025 Customers Served` <dbl>,
## # `November 2025 Customers Served` <dbl>,
## # `December 2025 Customers Served` <dbl>, `July 2025 Wait Time` <dbl>,
## # `August 2025 Wait Time` <dbl>, `September 2025 Wait Time` <dbl>,
## # `October 2025 Wait Time` <dbl>, `November 2025 Wait Time` <dbl>, …
This visualization compares total customers served and average wait times across months. The goal is to identify whether higher customer volume corresponds to longer wait times and to observe monthly patterns in service efficiency.
plot_data <- data.frame(
Month = factor(
rep(c("July", "August", "September",
"October", "November", "December"), 2),
levels = c("July", "August", "September",
"October", "November", "December")
),
Value = c(
sum(mva_clean$`July 2025 Customers Served`),
sum(mva_clean$`August 2025 Customers Served`),
sum(mva_clean$`September 2025 Customers Served`),
sum(mva_clean$`October 2025 Customers Served`),
sum(mva_clean$`November 2025 Customers Served`),
sum(mva_clean$`December 2025 Customers Served`),
mean(mva_clean$`July 2025 Wait Time`) * 10000,
mean(mva_clean$`August 2025 Wait Time`) * 10000,
mean(mva_clean$`September 2025 Wait Time`) * 10000,
mean(mva_clean$`October 2025 Wait Time`) * 10000,
mean(mva_clean$`November 2025 Wait Time`) * 10000,
mean(mva_clean$`December 2025 Wait Time`) * 10000
),
Type = rep(c("Customers Served", "Wait Time"), each = 6)
)
ggplot(plot_data,
aes(x = Month,
y = Value,
fill = Type)) +
geom_bar(stat = "identity",
position = "dodge") +
labs(
title = "Maryland MVA Customers Served and Wait Time",
x = "Month",
y = "Value",
fill = "Category",
caption = "Source: Maryland MVA"
) +
scale_fill_manual(
values = c(
"Customers Served" = "skyblue",
"Wait Time" = "darkorange"
)
) +
theme_classic()
This visualization shows that customer volume varies across months, while wait times do not always follow the same pattern. Some months with higher customer counts also show higher wait times, suggesting a possible relationship between demand and service delay.
To better explore the relationship between customer volume and wait times across MVA branches, I created an interactive scatter plot. This visualization allows users to compare how average customer demand relates to average wait time at each branch.
highchart() %>%
hc_add_series(
data = mva_clean,
type = "scatter",
hcaes(x = Avg_Customers,
y = Avg_Wait,
name = Branch)
) %>%
hc_title(text = "Customer Volume vs Wait Time by Branch") %>%
hc_xAxis(title = list(text = "Average Customers Served")) %>%
hc_yAxis(title = list(text = "Average Wait Time")) %>%
hc_tooltip(
pointFormat =
"Branch: {point.name}<br>
Customers: {point.x}<br>
Wait Time: {point.y}"
)
This is an interactive scatter plot created using Highcharter. It allows users to hover over each point to view branch-level customer volume and wait time values.
Here is the Visualization link:
https://public.tableau.com/app/profile/jason.gomes/viz/MVAWaittimeAnalysis/Sheet1?publish=yes
To statistically test the relationship between customer volume and wait time, a linear regression model is used. The dependent variable is wait time, and the independent variable is customer volume, with branch included as a categorical factor.
model <- lm(
Avg_Wait ~ `July 2025 Customers Served` +
`December 2025 Customers Served`,
data = mva_clean
)
summary(model)
##
## Call:
## lm(formula = Avg_Wait ~ `July 2025 Customers Served` + `December 2025 Customers Served`,
## data = mva_clean)
##
## Residuals:
## Min 1Q Median 3Q Max
## -3.3716 -0.9015 -0.3003 1.2915 2.6707
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) 4.4968317 0.3312599 13.575 3.6e-12 ***
## `July 2025 Customers Served` 0.0007914 0.0007737 1.023 0.317
## `December 2025 Customers Served` -0.0009704 0.0009577 -1.013 0.322
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 1.525 on 22 degrees of freedom
## Multiple R-squared: 0.06581, Adjusted R-squared: -0.01912
## F-statistic: 0.7749 on 2 and 22 DF, p-value: 0.4729
A multiple linear regression was used to see if customer volume affects wait times at Maryland MVA branches. Average wait time was the dependent variable, while July 2025 customers served and December 2025 customers served were the predictor variables. The results showed very weak relationships between customer volume and wait time. The p-values for both predictors (0.317 and 0.322) and the overall model (0.4729) were all greater than 0.05, meaning the results were not statistically significant. The R-squared value was 0.06581 and the adjusted R-squared value was -0.01912, which means the model explains very little of the variation in wait times. This suggests that other factors, such as staffing or branch efficiency, likely have a bigger impact on wait times than customer volume.
#Conclusion This project analyzed customer volume and wait times at Maryland MVA branches. The goal was to see whether serving more customers leads to longer wait times. The charts showed that customer volume and wait times change across different months, but they do not always follow the same pattern. The regression analysis showed a weak relationship between customer volume and wait time, and the results were not statistically significant. Overall, the project suggests that customer volume alone does not strongly explain wait times. Other factors, such as branch operations, staffing levels, or efficiency, may have a bigger impact. This project shows how data analysis and visualization can be used to better understand public service performance.