DATA 110 Project 2: MVA Wait Time Analysis

Introduction

This project analyzes service efficiency at Motor Vehicle Administration (MVA) branches by examining how customer volume and time-related factors affect wait times. The dataset includes information on customers served and wait times across multiple branches for fiscal years FY23 through FY25, along with monthly data. The dataset contains both categorical and quantitative variables. The categorical variables include Branch, Fiscal Year, and Month. The quantitative variables include Customers Served and Wait Time. The data was obtained from an official MVA dataset and cleaned by trimming column names and restructuring the dataset into a usable format. I chose this dataset because it reflects real-world service efficiency and customer experience. Understanding how demand impacts wait times is important for improving public services.

Research Question

How does the number of customers affect wait times at MVA branches?
Load Libraries

library(readr)
library(ggplot2)
library(tidyr)
library(shiny)
library(dplyr)

## 
## Attaching package: 'dplyr'

## The following objects are masked from 'package:stats':
## 
##     filter, lag

## The following objects are masked from 'package:base':
## 
##     intersect, setdiff, setequal, union

library(here)

## here() starts at /Users/jasongomes/Documents/DATA 110/Project 2

Load Data

df <- read_csv("MVA_Customers_Served_&_Wait_Time_by_Branch_20260415.csv")

## Rows: 25 Columns: 19
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Branch
## dbl (9): FY23 Wait Time, FY24 Wait Time, FY25 Wait Time, July 2025 Wait Time...
## num (9): FY23 Customers Served, FY24 Customers Served, FY25 Customers Served...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Data Cleaning

df_clean <- df %>%
  select(
    Branch,
    `FY23 Customers Served`, `FY23 Wait Time`,
    `FY24 Customers Served`, `FY24 Wait Time`,
    `FY25 Customers Served`, `FY25 Wait Time`
  ) %>%
  drop_na()

To better understand the relationship between the number of customers served and wait times, scatter plots are created for each fiscal year (FY23, FY24, and FY25). These visualizations help identify patterns, trends, and differences across branches and time periods.

FY23 Scatter plot

ggplot(df, aes(x = `FY23 Customers Served`, y = `FY23 Wait Time`, color = Branch)) +
  geom_point() +
  labs(
    title = "FY23 Customers vs Wait Time by Branch",
    x = "Customers Served",
    y = "Wait Time",
    color = "Branch"
  ) +
  theme_minimal()

FY24 Scatter Plot

ggplot(df, aes(x = `FY24 Customers Served`, y = `FY24 Wait Time`, color = Branch)) +
  geom_point() +
  labs(
    title = "FY24 Customers vs Wait Time by Branch",
    x = "Customers Served",
    y = "Wait Time",
    color = "Branch"
  ) +
  theme_minimal()

FY25 Scatter plot

ggplot(df, aes(x = `FY25 Customers Served`, y = `FY25 Wait Time`, color = Branch)) +
  geom_point() +
  labs(
    title = "FY25 Customers vs Wait Time by Branch",
    x = "Customers Served",
    y = "Wait Time",
    color = "Branch"
  ) +
  theme_minimal()

Multiple Linear Regression The variables were selected to examine how customer volume across different fiscal years relates to wait time. Multiple predictors were included to meet the requirement of multiple linear regression and to explore whether customer trends over time influence wait times.

model <- lm(`FY25 Wait Time` ~ 
              `FY23 Customers Served` + 
              `FY24 Customers Served` + 
              `FY25 Customers Served`,
            data = df)

summary(model)

## 
## Call:
## lm(formula = `FY25 Wait Time` ~ `FY23 Customers Served` + `FY24 Customers Served` + 
##     `FY25 Customers Served`, data = df)
## 
## Residuals:
##     Min      1Q  Median      3Q     Max 
## -3.3685 -1.2859 -0.0792  1.0461  3.1236 
## 
## Coefficients:
##                           Estimate Std. Error t value Pr(>|t|)    
## (Intercept)              4.583e+00  3.780e-01  12.125 6.01e-11 ***
## `FY23 Customers Served`  5.432e-05  1.239e-04   0.439    0.665    
## `FY24 Customers Served` -8.493e-05  1.726e-04  -0.492    0.628    
## `FY25 Customers Served`  3.505e-05  8.982e-05   0.390    0.700    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 1.74 on 21 degrees of freedom
## Multiple R-squared:  0.04746,    Adjusted R-squared:  -0.08862 
## F-statistic: 0.3487 on 3 and 21 DF,  p-value: 0.7904

The regression results show that the coefficients for customer variables are very small, indicating a weak relationship with wait time. The p-values for all predictors are greater than 0.05, meaning they are not statistically significant. The adjusted R² value is very low, indicating that the model does not explain much of the variation in wait times. Additionally, the overall model is not statistically significant based on the F-test. This suggests that customer volume alone is not a strong predictor of wait times, and other factors may influence wait times at MVA branches.

Shiny App

ui <- fluidPage(
  titlePanel("MVA Wait Time"),
  sidebarLayout(
    sidebarPanel(
      selectInput("branch", "Choose Branch:",
                  choices = unique(df$Branch)),
      selectInput("year", "Choose Fiscal Year:",
                  choices = c("FY23","FY24","FY25"))
    ),

    mainPanel(
      plotOutput("plot")
    )
  )
)

server <- function(input, output) {

  output$plot <- renderPlot({

    # Filter by branch
    data_filtered <- df %>% filter(Branch == input$branch)

    # Select correct columns based on year
    if (input$year == "FY23") {
      x <- data_filtered$`FY23 Customers Served`
      y <- data_filtered$`FY23 Wait Time`
    } else if (input$year == "FY24") {
      x <- data_filtered$`FY24 Customers Served`
      y <- data_filtered$`FY24 Wait Time`
    } else {
      x <- data_filtered$`FY25 Customers Served`
      y <- data_filtered$`FY25 Wait Time`
    }

    # Plot
    ggplot(data_filtered, aes(x = x, y = y)) +
      geom_point(color = "blue", size = 3) +
      geom_smooth(method = "lm", se = FALSE) +
      labs(
        title = paste("Branch:", input$branch, "-", input$year),
        x = "Customers Served",
        y = "Wait Time"
      ) +
      theme_minimal()
  })
}

shinyApp(ui, server)

Shiny applications not supported in static R Markdown documents

The Shiny application allows users to interactively explore the relationship between customers served and wait times by selecting different branches and fiscal years. This provides a dynamic way to analyze how patterns vary across locations and time.

Conclusion

This project looked at how the number of customers affects wait times at MVA branches using data from FY23, FY24, and FY25. The scatter plots showed a small positive relationship, meaning that branches with more customers may have longer wait times. However, the relationship was not very strong because the data points were spread out.

The multiple linear regression results showed that customer volume is not a strong predictor of wait time. The p-values were high, and the adjusted R² value was very low, which means the model does not explain much of the changes in wait times. This suggests that other factors, such as staffing or efficiency, may have a bigger impact.

A Shiny app was also created to make the data interactive. It allows users to choose different branches and fiscal years to see how customer numbers and wait times are related. This helps make the data easier to explore and understand.

Overall, customer volume has some effect on wait times, but it is not the main factor. Other variables are likely more important, and future analysis could include more data to better understand wait times.

DATA 110 Project 2: MVA Wait Time Analysis

Jason Gomes

2026-04-17

Introduction

Research Question

Shiny App

Conclusion