Data 101 Project 3

Author

Ameer Adegun

Data 101 Project 3

By: Ameer Adegun

Introduction

Research Question: What factors predict the number of days it takes to solve a standing water violation?

To answer this question, I used the Standing Water Violations dataset from the Montgomery County Open Data Portal: https://data.montgomerycountymd.gov/Consumer-Housing/Standing-Water/mx9q-5uj7/about_data. This dataset tracks housing code violations related to standing water, including when they were filed, closed, and how they were resolved.

Each row represents a single violation case. The key variables used in this analysis are:

days_to_resolve(quantitative)

disposition(categorical)

city(categorical)

I will use multiple linear regression because days_to_resolve is a continuous outcome variable and I want to understand the influence of multiple predictors simultaneously.

Data Analysis

library(tidyverse)
── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
✔ dplyr     1.1.4     ✔ readr     2.1.6
✔ forcats   1.0.1     ✔ stringr   1.6.0
✔ ggplot2   4.0.2     ✔ tibble    3.3.1
✔ lubridate 1.9.4     ✔ tidyr     1.3.2
✔ purrr     1.2.1     
── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
✖ dplyr::filter() masks stats::filter()
✖ dplyr::lag()    masks stats::lag()
ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
setwd("C:/Users/SwagD/Desktop/Data 101")

data <- read.csv("Standing_Water_20260423.csv")

head(data)
  Case.Number Date.Filed Date.Assigned Date.Closed          Disposition
1      194172 2025/08/13    2025/08/13  2025/08/21 Violations Corrected
2      193814 2025/07/30    2025/07/30                  Citation issued
3      193814 2025/07/30    2025/07/30                  Citation issued
4      193261 2025/07/08    2025/07/08  2025/09/22 Violations Corrected
5      193261 2025/07/08    2025/07/08  2025/09/22 Violations Corrected
6      193206 2025/07/03    2025/07/03  2025/08/07 Violations Corrected
       Street.Address    Unit.Number          City Zip.Code
1   4919 DOWNLAND TER                        OLNEY    20832
2 10713 RISINGDALE CT                   GERMANTOWN    20876
3 10713 RISINGDALE CT                   GERMANTOWN    20876
4 7103 PINEHURST PKWY                  CHEVY CHASE    20815
5 7103 PINEHURST PKWY                  CHEVY CHASE    20815
6  8811 COLESVILLE RD Not Applicable SILVER SPRING    20910
  Service.Request.Number Service.Request.Created.Date
1             1599710330                   2025/08/13
2             1598648491                   2025/07/30
3             1598619108                   2025/07/29
4             1596840018                   2025/07/08
5             1598686955                   2025/07/30
6             1596690451                   2025/07/07
  Service.Request.Closed.Time Service.Request.Status Violation.ID
1                  2025/08/21                 Closed       728339
2                                                          727427
3                                                          727427
4                  2025/07/16                              725958
5                  2025/08/07                              725958
6                                                          726059
  Inspection.Date  Corrected Location.Description                 Action
1      2025/08/18 2025/08/21             Exterior Treat and/or Eliminate
2      2025/08/01 2025/09/09             Exterior Treat and/or Eliminate
3      2025/08/01 2025/09/09             Exterior Treat and/or Eliminate
4      2025/07/09 2025/09/22             Exterior Treat and/or Eliminate
5      2025/07/09 2025/09/22             Exterior Treat and/or Eliminate
6      2025/07/10 2025/08/04               Garage Treat and/or Eliminate
  Code.Reference      Condition            Item Latitude Longitude    Location
1    26-9(a)(12) Standing Water Public Nuisance        0         0 POINT (0 0)
2    26-9(a)(12) Standing Water Public Nuisance        0         0 POINT (0 0)
3    26-9(a)(12) Standing Water Public Nuisance        0         0 POINT (0 0)
4    26-9(a)(12) Standing Water Public Nuisance        0         0 POINT (0 0)
5    26-9(a)(12) Standing Water Public Nuisance        0         0 POINT (0 0)
6    26-9(a)(12) Standing Water Public Nuisance        0         0 POINT (0 0)

Before I do anything I will first clean the data by standardizing column names.

# This cleans the column names
names(data) <- tolower(gsub("[(). \\-]", "_", names(data)))

# This cleans the dataset
clean_data <- data |>
  filter(!is.na(date_closed), !is.na(date_filed), !is.na(disposition), !is.na(city)) |>
  mutate(
    date_filed      = as.Date(date_filed),
    date_closed     = as.Date(date_closed),
    days_to_resolve = as.numeric(date_closed - date_filed),
    disposition     = as.factor(disposition),
    city            = as.factor(city)
  ) |>
  filter(days_to_resolve >= 0)

Now that I’ve cleaned my dataset I will now look at the summary statistics for the dataset.

clean_data |>
  group_by(disposition) |>
  summarize(
    mean_days   = mean(days_to_resolve),
    median_days = median(days_to_resolve),
    max_days    = max(days_to_resolve),
    min_days    = min(days_to_resolve)
  )
# A tibble: 12 × 5
   disposition                           mean_days median_days max_days min_days
   <fct>                                     <dbl>       <dbl>    <dbl>    <dbl>
 1 ""                                         11          11         11       11
 2 "ADU Class III Denial"                    391         391        391      391
 3 "ADU Class III Passed Inspection"         496         496        496      496
 4 "Change of Ownership"                     465         404.      1177       21
 5 "Citation issued"                         346.        421        536       82
 6 "Referred to a Montgomery County Age…      48.5        48.5       54       43
 7 "Reoccupied"                             1772.       1905       1978      355
 8 "TP Annual Inspection Completed"           77          77         77       77
 9 "Triennial - No Violations Found"        1154        1154       1154     1154
10 "Triennial Completed"                     527.        594       1091      150
11 "Violation Unfounded"                     599.        562       1301       83
12 "Violations Corrected"                    268.         94       1642        0

Here is the visualization I created to help visualize the distribution of resolution times by disposition type.

clean_data |>
  group_by(disposition) |>
  summarize(mean_days = mean(days_to_resolve)) |>
  ggplot(aes(x = reorder(disposition, -mean_days), y = mean_days, fill = disposition)) +
  geom_col(color = "black") +
  scale_fill_brewer(palette = "Set1") +
  labs(
    title   = "Average Days to Resolve by Disposition Type",
    x       = "Disposition",
    y       = "Average Days to Resolve",
    caption = "Source: Montgomery County Open Data Portal"
  ) +
  theme_minimal() +
  theme(legend.position = "none",
        axis.text.x = element_text(angle = 45, hjust = 1))
Warning in RColorBrewer::brewer.pal(n, pal): n too large, allowed maximum for palette Set1 is 9
Returning the palette you asked for with that many colors

Justification of Approach

I chose multiple linear regression because the outcome variable days_to_resolve is continuous and I want to examine the influence of multiple predictors at the same time. This is because the dependent variable is numeric and I’m examining multiple predictors at once.

I chose disposition and city as predictors because disposition describes how the violation was resolved which affects how long it takes, and city may reflect differences in inspector workload or resources across locations.

The 5 assumptions I will check are: Linearity, Independence, Normality of Residuals, Homoscedasticity, No Multicollinearity

Statistical Analysis

# Fit the multiple linear regression model
model <- lm(days_to_resolve ~ disposition + city, data = clean_data)

summary(model)

Call:
lm(formula = days_to_resolve ~ disposition + city, data = clean_data)

Residuals:
     Min       1Q   Median       3Q      Max 
-1233.59  -173.21     0.00    62.13  1470.73 

Coefficients: (1 not defined because of singularities)
                                                                    Estimate
(Intercept)                                                          -322.90
dispositionADU Class III Denial                                       626.45
dispositionADU Class III Passed Inspection                            592.23
dispositionChange of Ownership                                        675.79
dispositionCitation issued                                            565.11
dispositionReferred to a Montgomery County Agency - no jurisdiction   166.59
dispositionReoccupied                                                1824.05
dispositionTP Annual Inspection Completed                             399.90
dispositionTriennial - No Violations Found                           1250.23
dispositionTriennial Completed                                        628.10
dispositionViolation Unfounded                                        841.94
dispositionViolations Corrected                                       423.90
cityBETHESDA                                                           87.45
cityBROOKEVILLE                                                       213.32
cityBURTONSVILLE                                                       65.43
cityCABIN JOHN                                                        -40.00
cityCHEVY CHASE                                                       -15.93
cityDAMASCUS                                                          -46.60
cityDICKERSON                                                         -58.00
cityGAITHERSBURG                                                      182.95
cityGERMANTOWN                                                         20.27
cityKENSINGTON                                                        333.02
cityMONTGOMERY VILLAGE                                                114.80
cityOLNEY                                                             -66.00
cityPOTOMAC                                                           333.90
cityROCKVILLE                                                         140.93
citySILVER SPRING                                                     226.67
cityTAKOMA PARK                                                           NA
cityWASHINGTON GROVE                                                   72.06
                                                                    Std. Error
(Intercept)                                                             370.68
dispositionADU Class III Denial                                         430.22
dispositionADU Class III Passed Inspection                              427.84
dispositionChange of Ownership                                          325.09
dispositionCitation issued                                              352.82
dispositionReferred to a Montgomery County Agency - no jurisdiction     373.61
dispositionReoccupied                                                   312.33
dispositionTP Annual Inspection Completed                               476.69
dispositionTriennial - No Violations Found                              319.68
dispositionTriennial Completed                                          306.22
dispositionViolation Unfounded                                          337.12
dispositionViolations Corrected                                         304.13
cityBETHESDA                                                            218.60
cityBROOKEVILLE                                                         277.87
cityBURTONSVILLE                                                        241.38
cityCABIN JOHN                                                          367.08
cityCHEVY CHASE                                                         223.70
cityDAMASCUS                                                            250.76
cityDICKERSON                                                           367.08
cityGAITHERSBURG                                                        229.85
cityGERMANTOWN                                                          219.07
cityKENSINGTON                                                          225.88
cityMONTGOMERY VILLAGE                                                  244.79
cityOLNEY                                                               236.95
cityPOTOMAC                                                             218.12
cityROCKVILLE                                                           216.92
citySILVER SPRING                                                       213.63
cityTAKOMA PARK                                                             NA
cityWASHINGTON GROVE                                                    305.19
                                                                    t value
(Intercept)                                                          -0.871
dispositionADU Class III Denial                                       1.456
dispositionADU Class III Passed Inspection                            1.384
dispositionChange of Ownership                                        2.079
dispositionCitation issued                                            1.602
dispositionReferred to a Montgomery County Agency - no jurisdiction   0.446
dispositionReoccupied                                                 5.840
dispositionTP Annual Inspection Completed                             0.839
dispositionTriennial - No Violations Found                            3.911
dispositionTriennial Completed                                        2.051
dispositionViolation Unfounded                                        2.497
dispositionViolations Corrected                                       1.394
cityBETHESDA                                                          0.400
cityBROOKEVILLE                                                       0.768
cityBURTONSVILLE                                                      0.271
cityCABIN JOHN                                                       -0.109
cityCHEVY CHASE                                                      -0.071
cityDAMASCUS                                                         -0.186
cityDICKERSON                                                        -0.158
cityGAITHERSBURG                                                      0.796
cityGERMANTOWN                                                        0.093
cityKENSINGTON                                                        1.474
cityMONTGOMERY VILLAGE                                                0.469
cityOLNEY                                                            -0.279
cityPOTOMAC                                                           1.531
cityROCKVILLE                                                         0.650
citySILVER SPRING                                                     1.061
cityTAKOMA PARK                                                          NA
cityWASHINGTON GROVE                                                  0.236
                                                                    Pr(>|t|)
(Intercept)                                                         0.384109
dispositionADU Class III Denial                                     0.145975
dispositionADU Class III Passed Inspection                          0.166885
dispositionChange of Ownership                                      0.038137
dispositionCitation issued                                          0.109838
dispositionReferred to a Montgomery County Agency - no jurisdiction 0.655858
dispositionReoccupied                                               9.29e-09
dispositionTP Annual Inspection Completed                           0.401912
dispositionTriennial - No Violations Found                          0.000104
dispositionTriennial Completed                                      0.040761
dispositionViolation Unfounded                                      0.012822
dispositionViolations Corrected                                     0.163970
cityBETHESDA                                                        0.689297
cityBROOKEVILLE                                                     0.443015
cityBURTONSVILLE                                                    0.786453
cityCABIN JOHN                                                      0.913270
cityCHEVY CHASE                                                     0.943246
cityDAMASCUS                                                        0.852648
cityDICKERSON                                                       0.874515
cityGAITHERSBURG                                                    0.426414
cityGERMANTOWN                                                      0.926309
cityKENSINGTON                                                      0.141020
cityMONTGOMERY VILLAGE                                              0.639289
cityOLNEY                                                           0.780708
cityPOTOMAC                                                         0.126434
cityROCKVILLE                                                       0.516189
citySILVER SPRING                                                   0.289174
cityTAKOMA PARK                                                           NA
cityWASHINGTON GROVE                                                0.813449
                                                                       
(Intercept)                                                            
dispositionADU Class III Denial                                        
dispositionADU Class III Passed Inspection                             
dispositionChange of Ownership                                      *  
dispositionCitation issued                                             
dispositionReferred to a Montgomery County Agency - no jurisdiction    
dispositionReoccupied                                               ***
dispositionTP Annual Inspection Completed                              
dispositionTriennial - No Violations Found                          ***
dispositionTriennial Completed                                      *  
dispositionViolation Unfounded                                      *  
dispositionViolations Corrected                                        
cityBETHESDA                                                           
cityBROOKEVILLE                                                        
cityBURTONSVILLE                                                       
cityCABIN JOHN                                                         
cityCHEVY CHASE                                                        
cityDAMASCUS                                                           
cityDICKERSON                                                          
cityGAITHERSBURG                                                       
cityGERMANTOWN                                                         
cityKENSINGTON                                                         
cityMONTGOMERY VILLAGE                                                 
cityOLNEY                                                              
cityPOTOMAC                                                            
cityROCKVILLE                                                          
citySILVER SPRING                                                      
cityTAKOMA PARK                                                        
cityWASHINGTON GROVE                                                   
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

Residual standard error: 299.7 on 511 degrees of freedom
Multiple R-squared:  0.6937,    Adjusted R-squared:  0.6775 
F-statistic: 42.86 on 27 and 511 DF,  p-value: < 2.2e-16
r_squared     <- summary(model)$r.squared
adj_r_squared <- summary(model)$adj.r.squared

cat("R-squared:", round(r_squared, 3),
    "\nAdjusted R-squared:", round(adj_r_squared, 3))
R-squared: 0.694 
Adjusted R-squared: 0.678

Checking Assumptions

I checked linearity and homoscedasticity using residuals vs fitted values. The ideal is a random scatter around zero with no pattern.

plot(model$fitted.values, model$residuals,
     main = "Residuals vs Fitted",
     xlab = "Fitted Values",
     ylab = "Residuals")
abline(h = 0)

For the independence each row is a separate standing water violation case so independence is reasonably assumed.

The normality of residuals checked using a Q-Q plot. Points should follow the diagonal line.

qqnorm(model$residuals)
qqline(model$residuals)

For no multicollinearity the disposition and city measureed different things, the type of outcome vs the location, so multicollinearity is not a concern here.

# RMSE - average prediction error in days
residuals <- model$residuals
rmse <- sqrt(mean(residuals^2))
rmse
[1] 291.8282

Discussion of Results

The linear regression model looked at what factors predicted how long it takes to resolve a standing water violation in Montgomery County. The outcome variable was days_to_resolve.

The R squared value tells us how much of the variation in resolution time is explained by the model. The R squared accounts for the number of predictors and is a better measure of fit. The RMSE tells us on average how many days off the model’s predictions are.

Disposition tells us how many more or fewer days that resolution type takes compared to the reference category. Just like how each city tells us how much faster or slower violations are resolved there compared to the reference city. If a coefficient is positive the violation takes longer, if its negative it is resolved faster.

The best use of this is how violations get resolved and where it is located influences the resolution time. If disposition types or cities consistently take longer, resources could be moved there to speed up resolution and improve housing conditions for residents.

Conclusion

This analysis used multiple linear regression to see what factors predict the number of days it takes to resolve a standing water violation. The predictors were disposition type and city. The model provides insight into which resolution outcomes and locations are associated with faster or slower resolution times. Future research could include additional predictors such as inspector workload or seasonal patterns to improve the model’s explanatory power.