For the Final Project, I decided to explore and analyze the Traffic Crash Reports (CPD) data set. This data set contains several records of crashes reported in Cincinnati that the Cincinnati Police Department responded to. The data set is available on City of Cincinnati open data and was last updated on February 18, 2025.
library(tidyverse)
## ── Attaching core tidyverse packages ──────────────────────── tidyverse 2.0.0 ──
## ✔ dplyr 1.1.4 ✔ readr 2.1.5
## ✔ forcats 1.0.0 ✔ stringr 1.5.1
## ✔ ggplot2 3.5.1 ✔ tibble 3.2.1
## ✔ lubridate 1.9.3 ✔ tidyr 1.3.1
## ✔ purrr 1.0.2
## ── Conflicts ────────────────────────────────────────── tidyverse_conflicts() ──
## ✖ dplyr::filter() masks stats::filter()
## ✖ dplyr::lag() masks stats::lag()
## ℹ Use the conflicted package (<http://conflicted.r-lib.org/>) to force all conflicts to become errors
library(here)
## here() starts at C:/Users/court/Downloads/Descriptive Analytics
library(knitr)
data <- read_csv(here('Final Project/Traffic_Crash_Reports.csv'))
## Rows: 388353 Columns: 28
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (21): ADDRESS_X, COMMUNITY_COUNCIL_NEIGHBORHOOD, CPD_NEIGHBORHOOD, SNA_N...
## dbl (7): LATITUDE_X, LONGITUDE_X, AGE, CRASHSEVERITYID, LOCALREPORTNO, ROAD...
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Each row in this data set is a record of a response by the Cincinnati Police Department(CPD) to an accident. The table below shows a description for each column in the data set.
col_names <- names(data)
col_description <- c('Location of the accident',
'Approxamite latitude coordinates of the location of the accident',
'Approxamite longitude coordinates of the location of the accident',
'Age of the person involved in the accident',
'Community Council neighborhood in which this accident occured',
'CPD Neighborhood in which this accident occured',
'SNA Neighborhood in which this accident occured',
'Date the crash occured',
'Type of right of way that crash occured in (i.e. intersection, on ramp, driveway)',
'Classifies the crash as resulting in a fatality, injury or property damage',
'ID that Classifies the crash as resulting in a fatality, injury or property damage',
'Date the crash was reported to CPD',
'Day of the week the crash occured on',
'Gender of the person involved in the accident',
'Category of the injuries sustained to the person involved in the accident',
'Unique identifier to distinguish each crash record',
'Light conditions of the roadway at the time of the accident',
'Report number (unique identifier for each individual crash',
'The manner in which the crash occured',
'Condition of the roadway at the time of the accident',
'Contour of the roadway at the location of the accident',
'Road surface type',
'Class of the road on which the accident occured',
'Road class description',
'Type of unit/automobile that was involved in the accident',
'Description of the person involvement in the accident: driver, occupant or a pedestrain',
'Type of weather conditions at the time of the accident',
'Zip code of the location of the accident')
col_info <- data.frame(
Column_Names = col_names,
Description = col_description
)
kable(col_info)
| Column_Names | Description |
|---|---|
| ADDRESS_X | Location of the accident |
| LATITUDE_X | Approxamite latitude coordinates of the location of the accident |
| LONGITUDE_X | Approxamite longitude coordinates of the location of the accident |
| AGE | Age of the person involved in the accident |
| COMMUNITY_COUNCIL_NEIGHBORHOOD | Community Council neighborhood in which this accident occured |
| CPD_NEIGHBORHOOD | CPD Neighborhood in which this accident occured |
| SNA_NEIGHBORHOOD | SNA Neighborhood in which this accident occured |
| CRASHDATE | Date the crash occured |
| CRASHLOCATION | Type of right of way that crash occured in (i.e. intersection, on ramp, driveway) |
| CRASHSEVERITY | Classifies the crash as resulting in a fatality, injury or property damage |
| CRASHSEVERITYID | ID that Classifies the crash as resulting in a fatality, injury or property damage |
| DATECRASHREPORTED | Date the crash was reported to CPD |
| DAYOFWEEK | Day of the week the crash occured on |
| GENDER | Gender of the person involved in the accident |
| INJURIES | Category of the injuries sustained to the person involved in the accident |
| INSTANCEID | Unique identifier to distinguish each crash record |
| LIGHTCONDITIONSPRIMARY | Light conditions of the roadway at the time of the accident |
| LOCALREPORTNO | Report number (unique identifier for each individual crash |
| MANNEROFCRASH | The manner in which the crash occured |
| ROADCONDITIONSPRIMARY | Condition of the roadway at the time of the accident |
| ROADCONTOUR | Contour of the roadway at the location of the accident |
| ROADSURFACE | Road surface type |
| ROADCLASS | Class of the road on which the accident occured |
| ROADCLASSDESC | Road class description |
| UNITTYPE | Type of unit/automobile that was involved in the accident |
| TYPEOFPERSON | Description of the person involvement in the accident: driver, occupant or a pedestrain |
| WEATHER | Type of weather conditions at the time of the accident |
| ZIP | Zip code of the location of the accident |
# Dimensions of the Data Set
dim(data)
## [1] 388353 28
# Number of Rows in the Data Set
nrow(data)
## [1] 388353
# Number of Columns in the Data Set
ncol(data)
## [1] 28
# Total Missing Values
sum(is.na(data))
## [1] 598477
# Missing Values Per Column
colSums(is.na(data))
## ADDRESS_X LATITUDE_X
## 44 62
## LONGITUDE_X AGE
## 64 48887
## COMMUNITY_COUNCIL_NEIGHBORHOOD CPD_NEIGHBORHOOD
## 0 0
## SNA_NEIGHBORHOOD CRASHDATE
## 0 32
## CRASHLOCATION CRASHSEVERITY
## 194332 17
## CRASHSEVERITYID DATECRASHREPORTED
## 17 46
## DAYOFWEEK GENDER
## 30 44484
## INJURIES INSTANCEID
## 379 0
## LIGHTCONDITIONSPRIMARY LOCALREPORTNO
## 43 0
## MANNEROFCRASH ROADCONDITIONSPRIMARY
## 43 43
## ROADCONTOUR ROADSURFACE
## 43 43
## ROADCLASS ROADCLASSDESC
## 150898 150992
## UNITTYPE TYPEOFPERSON
## 379 367
## WEATHER ZIP
## 43 7189
# Duplicate Rows
nrow(data[duplicated(data), ])
## [1] 4
While this data set will need a lot of data wrangling due the the missing values, it should still help provide some insight into the recent accidents that have occurred across Cincinnati.