For the Final Project, I decided to explore and analyze the Traffic Crash Reports (CPD) data set. This data set contains several records of crashes reported in Cincinnati that the Cincinnati Police Department responded to. The data set is available on City of Cincinnati open data and was last updated on February 18, 2025.

Import Necessary Packages and Load Data Set

library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr     1.1.4     ✔ readr     2.1.5
## ✔ forcats   1.0.0     ✔ stringr   1.5.1
## ✔ ggplot2   3.5.1     ✔ tibble    3.2.1
## ✔ lubridate 1.9.3     ✔ tidyr     1.3.1
## ✔ purrr     1.0.2     
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag()    masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(here)
## here() starts at C:/Users/court/Downloads/Descriptive Analytics
library(knitr)
data <- read_csv(here('Final Project/Traffic_Crash_Reports.csv'))
## Rows: 388353 Columns: 28
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (21): ADDRESS_X, COMMUNITY_COUNCIL_NEIGHBORHOOD, CPD_NEIGHBORHOOD, SNA_N...
## dbl  (7): LATITUDE_X, LONGITUDE_X, AGE, CRASHSEVERITYID, LOCALREPORTNO, ROAD...
## 
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.

Data Set Description

Each row in this data set is a record of a response by the Cincinnati Police Department(CPD) to an accident. The table below shows a description for each column in the data set.

col_names <- names(data)
col_description <- c('Location of the accident',
                     'Approxamite latitude coordinates of the location of the accident',
                    'Approxamite longitude coordinates of the location of the accident',
                    'Age of the person involved in the accident',
                    'Community Council neighborhood in which this accident occured',
                    'CPD Neighborhood in which this accident occured',
                    'SNA Neighborhood in which this accident occured',
                    'Date the crash occured',
                    'Type of right of way that crash occured in (i.e. intersection, on ramp, driveway)',
                    'Classifies the crash as resulting in a fatality, injury or property damage',
                    'ID that Classifies the crash as resulting in a fatality, injury or property damage',
                    'Date the crash was reported to CPD',
                    'Day of the week the crash occured on',
                    'Gender of the person involved in the accident',
                    'Category of the injuries sustained to the person involved in the accident',
                    'Unique identifier to distinguish each crash record',
                    'Light conditions of the roadway at the time of the accident',
                    'Report number (unique identifier for each individual crash',
                   'The manner in which the crash occured',
                   'Condition of the roadway at the time of the accident',
                   'Contour of the roadway at the location of the accident',
                   'Road surface type',
                   'Class of the road on which the accident occured',
                   'Road class description',
                   'Type of unit/automobile that was involved in the accident',
                   'Description of the person involvement in the accident: driver, occupant or a pedestrain',
                   'Type of weather conditions at the time of the accident',
                   'Zip code of the location of the accident')

col_info <- data.frame(
  Column_Names = col_names,
  Description = col_description
)
kable(col_info)
Column_Names Description
ADDRESS_X Location of the accident
LATITUDE_X Approxamite latitude coordinates of the location of the accident
LONGITUDE_X Approxamite longitude coordinates of the location of the accident
AGE Age of the person involved in the accident
COMMUNITY_COUNCIL_NEIGHBORHOOD Community Council neighborhood in which this accident occured
CPD_NEIGHBORHOOD CPD Neighborhood in which this accident occured
SNA_NEIGHBORHOOD SNA Neighborhood in which this accident occured
CRASHDATE Date the crash occured
CRASHLOCATION Type of right of way that crash occured in (i.e. intersection, on ramp, driveway)
CRASHSEVERITY Classifies the crash as resulting in a fatality, injury or property damage
CRASHSEVERITYID ID that Classifies the crash as resulting in a fatality, injury or property damage
DATECRASHREPORTED Date the crash was reported to CPD
DAYOFWEEK Day of the week the crash occured on
GENDER Gender of the person involved in the accident
INJURIES Category of the injuries sustained to the person involved in the accident
INSTANCEID Unique identifier to distinguish each crash record
LIGHTCONDITIONSPRIMARY Light conditions of the roadway at the time of the accident
LOCALREPORTNO Report number (unique identifier for each individual crash
MANNEROFCRASH The manner in which the crash occured
ROADCONDITIONSPRIMARY Condition of the roadway at the time of the accident
ROADCONTOUR Contour of the roadway at the location of the accident
ROADSURFACE Road surface type
ROADCLASS Class of the road on which the accident occured
ROADCLASSDESC Road class description
UNITTYPE Type of unit/automobile that was involved in the accident
TYPEOFPERSON Description of the person involvement in the accident: driver, occupant or a pedestrain
WEATHER Type of weather conditions at the time of the accident
ZIP Zip code of the location of the accident

Descriptive Analytics

# Dimensions of the Data Set
dim(data)
## [1] 388353     28
# Number of Rows in the Data Set
nrow(data)
## [1] 388353
# Number of Columns in the Data Set
ncol(data)
## [1] 28
# Total Missing Values
sum(is.na(data))
## [1] 598477
# Missing Values Per Column
colSums(is.na(data))
##                      ADDRESS_X                     LATITUDE_X 
##                             44                             62 
##                    LONGITUDE_X                            AGE 
##                             64                          48887 
## COMMUNITY_COUNCIL_NEIGHBORHOOD               CPD_NEIGHBORHOOD 
##                              0                              0 
##               SNA_NEIGHBORHOOD                      CRASHDATE 
##                              0                             32 
##                  CRASHLOCATION                  CRASHSEVERITY 
##                         194332                             17 
##                CRASHSEVERITYID              DATECRASHREPORTED 
##                             17                             46 
##                      DAYOFWEEK                         GENDER 
##                             30                          44484 
##                       INJURIES                     INSTANCEID 
##                            379                              0 
##         LIGHTCONDITIONSPRIMARY                  LOCALREPORTNO 
##                             43                              0 
##                  MANNEROFCRASH          ROADCONDITIONSPRIMARY 
##                             43                             43 
##                    ROADCONTOUR                    ROADSURFACE 
##                             43                             43 
##                      ROADCLASS                  ROADCLASSDESC 
##                         150898                         150992 
##                       UNITTYPE                   TYPEOFPERSON 
##                            379                            367 
##                        WEATHER                            ZIP 
##                             43                           7189
# Duplicate Rows
nrow(data[duplicated(data), ])
## [1] 4

While this data set will need a lot of data wrangling due the the missing values, it should still help provide some insight into the recent accidents that have occurred across Cincinnati.