Final Project DATA 110

Image

Driving Under the Influence

Source of Image

Source: Altshuler Law Firm. “Driving Under the Influence of Narcotics.” Altshuler Law, https://www.altshulerlaw.com/blog/driving-under-the-influence-of-narcotics/

Background Research

Maryland law clearly defines DUI and DWI offenses and outlines as strict legal consequences for reckless driving.

According to the Maryland Motor Vehicle Administration, a blood alcohol concentration (BAC) of 0.08 or higher is considered legally impaired for drivers over 21, while drivers under 21 are prohibited from having any alcohol in their system.

If a driver is arrested for DUI, they face immediate license suspension, fines, and long-term penalties as well as have that data permanently on their driving record.

The severity of penalties increase such as second DUI conviction which leads to longer jail sentences, higher fines, and longer license revocations. Refusing a chemical test can result in automatic license suspension which ranges from 270 days to two years depending on the type of prior offense.

These strict laws show how DUI enforcement in Maryland is heavily regulated and that consequences escalate based on the behavior and prior history.

This background is very important for my analysis because it shows that DUI severity is not only a data variable on my dataset but an outcome which is influenced by policies and behavior of the driver.

Source:

Maryland Motor Vehicle Administration. “Driving Under the Influence (DUI) & Driving While Impaired (DWI).” MVA Maryland,

https://mva.maryland.gov/your-mva-guide/drivers-revoked-or-suspended-licenses/driving-under-influence-dui

Introduction

This project analyzes DUI (Driving Under the Influence) incidents in Montgomery County, Maryland by using data from the Montgomery County Police Department Open Data Portal.

This dataset is a collection of real DUI incident reports from the Montgomery County Police Department. Each entry represents a specific crash or incident where alcohol and drug impairment was involved and documented by law enforcement.

The dataset I have it includes key details about where the incident happened like the City and specific place type such as the roadway or intersection as well as the the exact geographic coordinates which are latitude and longitude.

It also is about the severity of the incident which is measured by the number of people who are injured and affected.

The dataset comes directly from police records and the data reflects real-world traffic safety conditions in Montgomery County.

It also helps show patterns where DUI incidents tend to happen and whether certain locations or areas are associated with more severe outcomes.

The reason why I chose this topic is because DUI incidents are a major public safety issue which can affect communities.

I wanted to better understand if certain areas in Montgomery County experience more severe DUI incidents in comparison to others and how I can use data to see those patterns.

This topic is meaningful to me because it connects public safety concerns with statistical analysis and geographic trends.

Research Question

How do location (City and Place type) and geographic position (Latitude and Longitude) influence the severity of DUI incidents which are measured by the number of victims, in Montgomery County?

Dataset:

The data was collected by the Montgomery County Police Department (MCPD) and published through the Montgomery County Open Data Portal.

Each row represents a single DUI incident recorded by police officers. It contains 26 variables and 19,984 observations and includes both categorical and quantitative variables.

The Variables I am using are:

City (categorical) - Location of the incident such as Rockville or Germantown

Place (categorical) - The type of envionment and the place where the incident occured

Victims (quantative) - The Number of individuals who were involved in the incident

Latitude (quantative) - Numerical geographic coordinate representing north-south position. Higher values mean farther north in the county.

Longitude (quantative) - Numerical geographic coordinates which represent east and west positions. More negative values generally means more west

Loading the Libraries and Dataset

# Citations/Disclaimer: This code follows what learned from course/class notes

# In this chunk I am going to be loading my libraries for my data cleaning, visualization, and analysis

# I am going to use tidyverse for data wrangling and ggplot2 visualizations.

# I am going to be using plotly to create interactive visualizations.

library(readr)
library(tidyverse)

## Warning: package 'tidyverse' was built under R version 4.5.2

## Warning: package 'ggplot2' was built under R version 4.5.2

## Warning: package 'dplyr' was built under R version 4.5.2

## Warning: package 'forcats' was built under R version 4.5.2

## Warning: package 'lubridate' was built under R version 4.5.2

## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.2.0     ✔ purrr     1.1.0
## ✔ forcats   1.0.1     ✔ stringr   1.5.1
## ✔ ggplot2   4.0.2     ✔ tibble    3.3.0
## ✔ lubridate 1.9.5     ✔ tidyr     1.3.1
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors

library(ggplot2)
library(plotly)

## Warning: package 'plotly' was built under R version 4.5.3

## 
## Attaching package: 'plotly'
## 
## The following object is masked from 'package:ggplot2':
## 
##     last_plot
## 
## The following object is masked from 'package:stats':
## 
##     filter
## 
## The following object is masked from 'package:graphics':
## 
##     layout

dui <- read_csv("Driving_Under_the_Influence_from_Crime_Data_20260506.csv")

## Rows: 19983 Columns: 26
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (17): Dispatch Date / Time, Crime Name1, Crime Name2, Crime Name3, Polic...
## dbl  (9): Incident ID, Offence Code, CR Number, NIBRS Code, Victims, Zip Cod...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Exploratory Data Analysis

# Citations/Disclaimer: This code follows what learned from course/class notes

# I am using the function head in order to see the first couple of rows of  my dataset

# I am using the function dim which displays the number of rows and columns 

# I am using the function colnames to see what variables are in my dataset

# I am using the function str in order to see the structure and the data types fir the variables 

head(dui)

## # A tibble: 6 × 26
##   `Incident ID` `Offence Code` `CR Number` `Dispatch Date / Time` `NIBRS Code`
##           <dbl>          <dbl>       <dbl> <chr>                         <dbl>
## 1     201573624           5404   260019134 05/04/2026 02:31:08 AM           90
## 2     201573490           5404   260018974 05/02/2026 07:50:31 PM           90
## 3     201573622           5404   260018878 05/02/2026 02:50:28 AM           90
## 4     201573460           5404   260018836 05/01/2026 07:26:27 PM           90
## 5     201573298           5404   260018696 05/01/2026 02:12:06 AM           90
## 6     201573383           5404   260018687 04/30/2026 11:13:01 PM           90
## # ℹ 21 more variables: Victims <dbl>, `Crime Name1` <chr>, `Crime Name2` <chr>,
## #   `Crime Name3` <chr>, `Police District Name` <chr>, `Block Address` <chr>,
## #   City <chr>, State <chr>, `Zip Code` <dbl>, Agency <chr>, Place <chr>,
## #   Sector <chr>, Beat <chr>, PRA <chr>, `Address Number` <dbl>,
## #   Start_Date_Time <chr>, End_Date_Time <chr>, Latitude <dbl>,
## #   Longitude <dbl>, `Police District Number` <chr>, Location <chr>

dim(dui)

## [1] 19983    26

colnames(dui)

##  [1] "Incident ID"            "Offence Code"           "CR Number"             
##  [4] "Dispatch Date / Time"   "NIBRS Code"             "Victims"               
##  [7] "Crime Name1"            "Crime Name2"            "Crime Name3"           
## [10] "Police District Name"   "Block Address"          "City"                  
## [13] "State"                  "Zip Code"               "Agency"                
## [16] "Place"                  "Sector"                 "Beat"                  
## [19] "PRA"                    "Address Number"         "Start_Date_Time"       
## [22] "End_Date_Time"          "Latitude"               "Longitude"             
## [25] "Police District Number" "Location"

str(dui)

## spc_tbl_ [19,983 × 26] (S3: spec_tbl_df/tbl_df/tbl/data.frame)
##  $ Incident ID           : num [1:19983] 2.02e+08 2.02e+08 2.02e+08 2.02e+08 2.02e+08 ...
##  $ Offence Code          : num [1:19983] 5404 5404 5404 5404 5404 ...
##  $ CR Number             : num [1:19983] 2.6e+08 2.6e+08 2.6e+08 2.6e+08 2.6e+08 ...
##  $ Dispatch Date / Time  : chr [1:19983] "05/04/2026 02:31:08 AM" "05/02/2026 07:50:31 PM" "05/02/2026 02:50:28 AM" "05/01/2026 07:26:27 PM" ...
##  $ NIBRS Code            : num [1:19983] 90 90 90 90 90 90 90 90 90 90 ...
##  $ Victims               : num [1:19983] 1 1 1 1 1 1 1 1 1 1 ...
##  $ Crime Name1           : chr [1:19983] "Crime Against Society" "Crime Against Society" "Crime Against Society" "Crime Against Society" ...
##  $ Crime Name2           : chr [1:19983] "Driving Under the Influence" "Driving Under the Influence" "Driving Under the Influence" "Driving Under the Influence" ...
##  $ Crime Name3           : chr [1:19983] "DRIVING UNDER THE INFLUENCE LIQUOR" "DRIVING UNDER THE INFLUENCE LIQUOR" "DRIVING UNDER THE INFLUENCE LIQUOR" "DRIVING UNDER THE INFLUENCE LIQUOR" ...
##  $ Police District Name  : chr [1:19983] "GERMANTOWN" "MONTGOMERY VILLAGE" "MONTGOMERY VILLAGE" "WHEATON" ...
##  $ Block Address         : chr [1:19983] NA NA NA NA ...
##  $ City                  : chr [1:19983] "GERMANTOWN" "MONTGOMERY VILLAGE" "GAITHERSBURG" "SILVER SPRING" ...
##  $ State                 : chr [1:19983] "MD" "MD" "MD" "MD" ...
##  $ Zip Code              : num [1:19983] 20874 20886 20877 20906 20876 ...
##  $ Agency                : chr [1:19983] "MCPD" "MCPD" "GPD" "MCPD" ...
##  $ Place                 : chr [1:19983] "Street - In vehicle" "Street - In vehicle" "Street - In vehicle" "Street - In vehicle" ...
##  $ Sector                : chr [1:19983] "N" "R" "P" "L" ...
##  $ Beat                  : chr [1:19983] "5N1" "6R2" "6P3" "4L1" ...
##  $ PRA                   : chr [1:19983] "595" "422" "419" "524" ...
##  $ Address Number        : num [1:19983] NA NA NA NA NA NA NA NA NA NA ...
##  $ Start_Date_Time       : chr [1:19983] "05/04/2026 02:31:00 AM" "05/02/2026 07:50:00 PM" "05/02/2026 02:50:00 AM" "05/01/2026 07:26:00 PM" ...
##  $ End_Date_Time         : chr [1:19983] NA NA NA NA ...
##  $ Latitude              : num [1:19983] 39.2 39.2 39.1 39.1 39.2 ...
##  $ Longitude             : num [1:19983] -77.3 -77.2 -77.2 -77.1 -77.2 ...
##  $ Police District Number: chr [1:19983] "5D" "6D" "6D" "4D" ...
##  $ Location              : chr [1:19983] "POINT (-77.2537 39.1751)" "POINT (-77.2142 39.1506)" "POINT (-77.1684 39.1469)" "POINT (-77.0773 39.0527)" ...
##  - attr(*, "spec")=
##   .. cols(
##   ..   `Incident ID` = col_double(),
##   ..   `Offence Code` = col_double(),
##   ..   `CR Number` = col_double(),
##   ..   `Dispatch Date / Time` = col_character(),
##   ..   `NIBRS Code` = col_double(),
##   ..   Victims = col_double(),
##   ..   `Crime Name1` = col_character(),
##   ..   `Crime Name2` = col_character(),
##   ..   `Crime Name3` = col_character(),
##   ..   `Police District Name` = col_character(),
##   ..   `Block Address` = col_character(),
##   ..   City = col_character(),
##   ..   State = col_character(),
##   ..   `Zip Code` = col_double(),
##   ..   Agency = col_character(),
##   ..   Place = col_character(),
##   ..   Sector = col_character(),
##   ..   Beat = col_character(),
##   ..   PRA = col_character(),
##   ..   `Address Number` = col_double(),
##   ..   Start_Date_Time = col_character(),
##   ..   End_Date_Time = col_character(),
##   ..   Latitude = col_double(),
##   ..   Longitude = col_double(),
##   ..   `Police District Number` = col_character(),
##   ..   Location = col_character()
##   .. )
##  - attr(*, "problems")=<externalptr>

Data Cleaning

# Citations/Disclaimer: This code and analysis follows what learned from course/class notes

# I am cleaning the dataset by converting key variables and removing missing values for accuracy  

# I am focusing on Latitude, Longitude, Victims, City, and Place because they are important for my research question about the DUI location patterns 

# I am using the function colSums to try to check the missing values in my dataset



dui_clean <- dui %>%
  mutate(
    Latitude = as.numeric(gsub(",", "", Latitude)),
    Longitude = as.numeric(gsub(",", "", Longitude)),
    Victims = as.numeric(gsub(",", "", Victims))
    
  ) %>%
  
  filter(
    !is.na(City),
    !is.na(Place),
    !is.na(Victims),
    !is.na(Latitude),
    !is.na(Longitude)
  )

colSums(is.na(dui_clean))

##            Incident ID           Offence Code              CR Number 
##                      0                      0                      0 
##   Dispatch Date / Time             NIBRS Code                Victims 
##                   3631                      0                      0 
##            Crime Name1            Crime Name2            Crime Name3 
##                      0                      0                      0 
##   Police District Name          Block Address                   City 
##                     56                  10663                      0 
##                  State               Zip Code                 Agency 
##                    249                    232                      0 
##                  Place                 Sector                   Beat 
##                      0                      0                      0 
##                    PRA         Address Number        Start_Date_Time 
##                      0                  10661                      0 
##          End_Date_Time               Latitude              Longitude 
##                  14717                      0                      0 
## Police District Number               Location 
##                      0                      0

Summary Statistics

# Citations/Disclaimer: This code and analysis follows what learned from course/class notes

# I am using the summary() function to see the statisitcs for the Victims, Latitude, and Longitude

# This also helps me understand the distribution of DUI as well as the variables of location

# Using mean I am calculating the average number of victims per DUI incident

# I am also looking for the maximum number of victims in a single DUI incident

summary(dui_clean$Victims)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##       1       1       1       1       1       1

summary(dui_clean$Latitude)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##    0.00   39.04   39.09   38.41   39.14   39.34

summary(dui_clean$Longitude)

##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  -77.52  -77.20  -77.12  -75.78  -77.05    0.00

mean(dui_clean$Victims, na.rm = TRUE)

## [1] 1

max(dui_clean$Victims, na.rm = TRUE)

## [1] 1

# Citations/Disclaimer: This code and analysis follows what learned from course/class notes

# I am grouping the data by City to see the locations which have the highest number of DUI incidents.

# This helps me identify geographic patterns across Montgomery County.

dui_city <- dui_clean %>%
  group_by(City) %>%
  summarize(count = n())

dui_city

## # A tibble: 87 × 2
##    City        count
##    <chr>       <int>
##  1 20877           1
##  2 4               1
##  3 ADELPHI         1
##  4 ASHTON         25
##  5 ASPEN HILL      8
##  6 BARNESVILLE     1
##  7 BEALLSVILLE     2
##  8 BEALSVILLE      1
##  9 BEHESDA         1
## 10 BELTSVILLE      1
## # ℹ 77 more rows

# I am also grouping the data by Place to understand the types of locations where DUIs occur most often.

# This helps me see the common locations

dui_place <- dui_clean %>%
  group_by(Place) %>%
  summarize(count = n())

dui_place

## # A tibble: 55 × 2
##    Place                        count
##    <chr>                        <int>
##  1 Auto Dealership                  1
##  2 Bank - ATM                       5
##  3 Bar/Night Club                  11
##  4 Church/Synagogue/Temple          2
##  5 Commercial - Office Building     6
##  6 Construction Site                2
##  7 Convenience Store               14
##  8 Field/ Open Space                1
##  9 Gas Station                     47
## 10 Golf Course                      2
## # ℹ 45 more rows

Multiple Linear Regression

# Citations/Disclaimer: This code follows what learned from course/class notes

# I am building a multiple linear regression model to see if Latitude and Longitude are related to DUI severity

# DUI is measured by the number of victims which are in each incident 

# I am using coefficients, p-values, and adjusted R-squared in order to evaluate the relationship between location and victim count.


model <- lm(Victims ~ Latitude + Longitude, data = dui_clean)

summary(model)

## 
## Call:
## lm(formula = Victims ~ Latitude + Longitude, data = dui_clean)
## 
## Residuals:
##        Min         1Q     Median         3Q        Max 
## -8.000e-15 -4.000e-15 -3.000e-15 -3.000e-15  6.292e-11 
## 
## Coefficients:
##              Estimate Std. Error   t value Pr(>|t|)    
## (Intercept) 1.000e+00  2.390e-14 4.184e+13   <2e-16 ***
## Latitude    2.147e-14  6.534e-14 3.290e-01    0.742    
## Longitude   1.084e-14  3.312e-14 3.270e-01    0.744    
## ---
## Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
## 
## Residual standard error: 4.452e-13 on 19974 degrees of freedom
## Multiple R-squared:    0.5,  Adjusted R-squared:  0.4999 
## F-statistic:  9987 on 2 and 19974 DF,  p-value: < 2.2e-16

Regression Equation

Victims = 1 + (2.147 * 10^-14) Latitude + (1.084 * 10^-14) Longitude

Regression Analysis

I found that the regression model was statistically significant since it has an F-statistic of 9987 and a p-value less than 2.2e−16. The adjusted R² value was 0.4999 which suggests that the model explains about 50% of the variation in Victims.

When I look at the individual predictors, Latitude (p = 0.742) and Longitude (p = 0.744) are not statistically significant since both p-values are greater than 0.05. This means there isn’t enough evidence that either variable has a real relationship with Victims. So, I fail to reject the null hypothesis for both predictors.

Despite my model being statistically significant the individual variables do not show strong linear relationships with Victims and the significance ends up being influenced by the large sample size rather than just the effect of the predictors

Pivot for Tableu (Using what learned in Assingment 9)

# I am creating a pivot table in order to summarize DUI incidents by City and Place

# I am counting the number of incidents for each Place

# I am reshaping the data in order for Place to become a column and City to become a row

# I am replacing missing values with 0 in order for my table to complete

dui_pivot_city_place <- dui_clean %>%
  group_by(City, Place) %>%
  summarise(count = n(), .groups = "drop") %>%
  pivot_wider(
    names_from = Place,
    values_from = count,
    values_fill = 0
  )

# I am renaming variables to make them clearer and more readable for Tableau analysis.

# # City is renamed to City_Name and Place is renamed to Location_Type for easier interpretation.

# I am exporting the cleaned dataset as a CSV file for use in Tableau.


dui_final <- dui_clean |>
  rename(
    City_Name = City,
    Location_Type = Place
  )

write.csv(dui_final, "dui_final.csv", row.names = FALSE)

Tableau Visualization 1

https://public.tableau.com/app/profile/arnav.shah7897/viz/ArnavShahFinalProjectTableauVisualization/Sheet4?publish=yes

Tableau Visualization Analysis

The Tableau visualization shows the number of DUI victims across different address ranges in Montgomery County. The highest victim total appears between the ranges of 1200 - 2500 and around 4900 - 5000. Some locations end up reaching to about 190 victims. Many higher address that ranges above 5500 have about fewer than 40 victims.

The chart also shows how DUI incidents are clustered rather than evenly distributed in the county. This connects back to my research question by showing the way how location influences DUI severity and the victim counts.

My regression model found that Latitude and Longitude were not individually significant predictors. However my visualization reveals geographically of how certain locations have more DUI incidents.

Interactive Plot Visualization 2

# Citations/Disclaimer: This code follows what we learned from class notes

# In this chunk I am creating a scatterplot in order to visualize DUI locations in Montgomery County

# Longitude and Latitude are used to map geographic patterns of incidents

# I am coloring points by City in order to compare where incidents are concentrated

interactive_plot <- ggplot(dui_clean, aes(
  x = Longitude,
  y = Latitude,
  color = City
)) +
  geom_point(alpha = 0.8, size = 1.5) +
  
  scale_color_brewer(palette = "Set3") +
  
  theme_minimal() +
  
  labs(
    title = "Interactive DUI Incident Location Map",
    x = "Longitude",
    y = "Latitude",
    color = "City",
    caption = "Source: Montgomery County Police Department Open Data"
  )

ggplotly(interactive_plot)

## Warning in RColorBrewer::brewer.pal(n, pal): n too large, allowed maximum for palette Set3 is 12
## Returning the palette you asked for with that many colors

Interactive Plot Analysis

This interactive visualization I used to try to help answer my research question about how geographic position influences DUI incidents in Montgomery County. Most DUI incidents are concentrated between about -77.3 and -76.9 longitude and 39.0 to 39.2 latitude which shows how incidents are often clustered within specific areas of the county rather than being evenly distributed

Each point represents one DUI incident. Colors are separate incidents by city. This explains how location plays a role in where DUI incidents occur most frequently.

Conclusion

In conclusion I found that DUI incidents in Montgomery County are clustered in certain geographic areas rather than being evenly spread out. My visualizations showed incidents within certain latitude, longitude, and address ranges, which suggests that location does relate to where DUIs happen more frequently.

My regression results showed that Latitude and Longitude were not significant predictors of the number of victims. This tells me that even though location patterns exist, geographic coordinates alone do not explain the severity of DUI incidents.

I also enjoyed working on this project since this issue is relevant to everyday life. I also liked connecting my data analysis with public safety and seeing the patterns through the maps and visualizations.

References

Dataset Source

Montgomery County Police Department. Driving Under the Influence from Crime Data. Montgomery County Open Data Portal.

https://data.montgomerycountymd.gov/Public-Safety/Driving-Under-the-Influence-from-Crime-Data/bmwa-mwqy/about_data

Background Research Source

Maryland Motor Vehicle Administration. Driving Under the Influence (DUI) & Driving While Impaired (DWI). MVA Maryland. https://mva.maryland.gov/your-mva-guide/drivers-revoked-or-suspended-licenses/driving-under-influence-dui

Image Source

Altshuler Law Firm. Driving Under the Influence of Narcotics. Altshuler Law.

https://www.altshulerlaw.com/blog/driving-under-the-influence-of-narcotics/